For Loops¶
author: Simon Ejdemyr date: January, 2016 summary: For loops can be useful when you want to iterate a process in R — e.g., run a simulation. This tutorial explains how to write for loops and shows how to use them to run Monte Carlo simulations. For loops are neat, but it’s worth emphasizing that you should avoid them and instead use vectorization — which is much faster — when possible.
Writing a for loop¶
Let’s start with a very simple example. Let’s say you have the following vector v1
with individuals’ height in centimeters:
[1]:
v1 <- c(175, 182, 150, 187, 165)
We can convert the values in v1
from centimeters to meters using a for loop:
[2]:
v2 <- rep(NA, 5) #create vector v2 with NA values
for(i in 1:5) { #loop over elements in v1 and store in v2
v2[i] <- v1[i] / 100
}
v2 #v2 after the for loop
- 1.75
- 1.82
- 1.5
- 1.87
- 1.65
Note that we could have done this using vectorization, which is more compact and faster (when we have a lot of data):
[3]:
v2 <- v1 / 100
However, the example illustrates the following points about writing a for loop:
- Begin by creating an object that can store the results of your for loop. In the example above, we created
v2
for this purpose. With vectors, we need to specify how many elements we want to eventually store, in this case 5. (This is not true if you wanted to store the results in a list.) - The basic structure of the loop usually is:
Here n
represents the number of times you want to iterate the loop. The loop will run from 1 to n
by an integer count. If you instead wanted the loop to iterate from 1 to n
but skip every other number you could use seq(1, n, by = 2)
in place of 1:n
.
v2
).Here’s a more general approach accomplishing the same thing, but where we keep the number of iterations flexible depending on how many elements v1
contains:
[4]:
v1 <- c(175, 182, 150, 187, 165)
n <- length(v1)
v2 <- rep(NA, n)
for(i in 1:n) {
v2[i] <- v1[i] / 100
}
v2
- 1.75
- 1.82
- 1.5
- 1.87
- 1.65
Of course, you can store outputs from the for loop in a vector within a data frame. Say we had the following data frame with names and heights:
[5]:
ppl <- data.frame(person = letters[1:5], height_cm = v1)
ppl
person | height_cm |
---|---|
a | 175 |
b | 182 |
c | 150 |
d | 187 |
e | 165 |
Let’s add a variable that expresses height in inches instead:
[6]:
ppl$height_inch <- NA #add variable of NAs
n <- nrow(ppl) #get number of observations to loop over
for(i in 1:n){
ppl$height_inch[i] <- ppl$height_cm[i] * 0.393701
}
ppl
person | height_cm | height_inch |
---|---|---|
a | 175 | 68.89768 |
b | 182 | 71.65358 |
c | 150 | 59.05515 |
d | 187 | 73.62209 |
e | 165 | 64.96067 |
Note that when adding a constant or NA
values to a vector within a data frame, R
(correctly) assumes that you want to add this constant to every element of the variable, so you don’t need to specify how many times you want to add NA
in ppl$height_inch <- NA
.
An appliction¶
For loops can be used to carry out Monte Carlo simulations. In the example below, we’ll draw repeated samples from a population, calculate the mean for each sample, and test whether we on average do a good job of estimating the population mean.
Say the population consists of 10 individuals with the following heights:
[7]:
v <- c(175, 182, 150, 187, 165, 177, 200, 198, 157, 165)
mean(v) #population mean
Unfortunately, for whatever reason, we do not know the heights of all of these individuals. We can only (randomly) sample 5 of them. From this random sample of five individuals we estimate the height of all 10 individuals. We can draw a sample of 5 from v
and take the mean of this sample using the following code:
[8]:
v <- c(175, 182, 150, 187, 165, 177, 200, 198, 157, 165)
smpl <- sample(v, 5)
mean(smpl)
Would we on average expect to estimate the mean of the population accurately? Let’s use a Monte Carlo simulation to find out. We’ll draw 10,000 random samples of five from v
and take the mean of each of these samples. With an unbiased estimator we would, on average, expect the sample estimate to equal the population parameter of interest.
[9]:
n <- 10000
smpl_means <- rep(NA, n)
for(i in 1:n){
smpl <- sample(v, 5)
smpl_means[i] <- mean(smpl)
}
mean(smpl_means)