# For Loops¶

author: Simon Ejdemyr date: January, 2016 summary: For loops can be useful when you want to iterate a process in R — e.g., run a simulation. This tutorial explains how to write for loops and shows how to use them to run Monte Carlo simulations. For loops are neat, but it’s worth emphasizing that you should avoid them and instead use vectorization — which is much faster — when possible.

## Writing a for loop¶

Let’s start with a very simple example. Let’s say you have the following vector `v1`

with individuals’ height in centimeters:

```
[1]:
```

```
v1 <- c(175, 182, 150, 187, 165)
```

We can convert the values in `v1`

from centimeters to meters using a for loop:

```
[2]:
```

```
v2 <- rep(NA, 5) #create vector v2 with NA values
for(i in 1:5) { #loop over elements in v1 and store in v2
v2[i] <- v1[i] / 100
}
v2 #v2 after the for loop
```

- 1.75
- 1.82
- 1.5
- 1.87
- 1.65

Note that we could have done this using vectorization, which is more compact and faster (when we have a lot of data):

```
[3]:
```

```
v2 <- v1 / 100
```

However, the example illustrates the following points about writing a for loop:

- Begin by creating an object that can store the results of your for loop. In the example above, we created
`v2`

for this purpose. With vectors, we need to specify how many elements we want to eventually store, in this case 5. (This is not true if you wanted to store the results in a list.) - The basic structure of the loop usually is:

Here `n`

represents the number of times you want to iterate the loop. The loop will run from 1 to `n`

by an integer count. If you instead wanted the loop to iterate from 1 to `n`

but skip every other number you could use `seq(1, n, by = 2)`

in place of `1:n`

.

`v2`

).Here’s a more general approach accomplishing the same thing, but where we keep the number of iterations flexible depending on how many elements `v1`

contains:

```
[4]:
```

```
v1 <- c(175, 182, 150, 187, 165)
n <- length(v1)
v2 <- rep(NA, n)
for(i in 1:n) {
v2[i] <- v1[i] / 100
}
v2
```

- 1.75
- 1.82
- 1.5
- 1.87
- 1.65

Of course, you can store outputs from the for loop in a vector within a data frame. Say we had the following data frame with names and heights:

```
[5]:
```

```
ppl <- data.frame(person = letters[1:5], height_cm = v1)
ppl
```

person | height_cm |
---|---|

a | 175 |

b | 182 |

c | 150 |

d | 187 |

e | 165 |

Let’s add a variable that expresses height in inches instead:

```
[6]:
```

```
ppl$height_inch <- NA #add variable of NAs
n <- nrow(ppl) #get number of observations to loop over
for(i in 1:n){
ppl$height_inch[i] <- ppl$height_cm[i] * 0.393701
}
ppl
```

person | height_cm | height_inch |
---|---|---|

a | 175 | 68.89768 |

b | 182 | 71.65358 |

c | 150 | 59.05515 |

d | 187 | 73.62209 |

e | 165 | 64.96067 |

Note that when adding a constant or `NA`

values to a vector within a data frame, `R`

(correctly) assumes that you want to add this constant to every element of the variable, so you don’t need to specify how many times you want to add `NA`

in `ppl$height_inch <- NA`

.

## An appliction¶

For loops can be used to carry out Monte Carlo simulations. In the example below, we’ll draw repeated samples from a population, calculate the mean for each sample, and test whether we on average do a good job of estimating the population mean.

Say the population consists of 10 individuals with the following heights:

```
[7]:
```

```
v <- c(175, 182, 150, 187, 165, 177, 200, 198, 157, 165)
mean(v) #population mean
```

Unfortunately, for whatever reason, we do not know the heights of all of these individuals. We can only (randomly) sample 5 of them. From this random sample of five individuals we estimate the height of all 10 individuals. We can draw a sample of 5 from `v`

and take the mean of this sample using the following code:

```
[8]:
```

```
v <- c(175, 182, 150, 187, 165, 177, 200, 198, 157, 165)
smpl <- sample(v, 5)
mean(smpl)
```

Would we on average expect to estimate the mean of the population accurately? Let’s use a Monte Carlo simulation to find out. We’ll draw 10,000 random samples of five from `v`

and take the mean of each of these samples. With an unbiased estimator we would, on average, expect the sample estimate to equal the population parameter of interest.

```
[9]:
```

```
n <- 10000
smpl_means <- rep(NA, n)
for(i in 1:n){
smpl <- sample(v, 5)
smpl_means[i] <- mean(smpl)
}
mean(smpl_means)
```