For Loops

author: Simon Ejdemyr date: January, 2016 summary: For loops can be useful when you want to iterate a process in R — e.g., run a simulation. This tutorial explains how to write for loops and shows how to use them to run Monte Carlo simulations. For loops are neat, but it’s worth emphasizing that you should avoid them and instead use vectorization — which is much faster — when possible.

Writing a for loop

Let’s start with a very simple example. Let’s say you have the following vector v1 with individuals’ height in centimeters:

[1]:
v1 <- c(175, 182, 150, 187, 165)

We can convert the values in v1 from centimeters to meters using a for loop:

[2]:
v2 <- rep(NA, 5)               #create vector v2 with NA values
for(i in 1:5) {                #loop over elements in v1 and store in v2
    v2[i] <- v1[i] / 100
}
v2                             #v2 after the for loop
  1. 1.75
  2. 1.82
  3. 1.5
  4. 1.87
  5. 1.65

Note that we could have done this using vectorization, which is more compact and faster (when we have a lot of data):

[3]:
v2 <- v1 / 100

However, the example illustrates the following points about writing a for loop:

  1. Begin by creating an object that can store the results of your for loop. In the example above, we created v2 for this purpose. With vectors, we need to specify how many elements we want to eventually store, in this case 5. (This is not true if you wanted to store the results in a list.)
  2. The basic structure of the loop usually is:
for(i in 1:n) { #commands to execute for iteration i }

Here n represents the number of times you want to iterate the loop. The loop will run from 1 to n by an integer count. If you instead wanted the loop to iterate from 1 to n but skip every other number you could use seq(1, n, by = 2) in place of 1:n.

3. Within the for loop we want to save the output of each iteration to an element of the vector (or list) we created initially (in this case v2).

Here’s a more general approach accomplishing the same thing, but where we keep the number of iterations flexible depending on how many elements v1 contains:

[4]:
v1 <- c(175, 182, 150, 187, 165)
n <- length(v1)
v2 <- rep(NA, n)
for(i in 1:n) {
    v2[i] <- v1[i] / 100
}
v2
  1. 1.75
  2. 1.82
  3. 1.5
  4. 1.87
  5. 1.65

Of course, you can store outputs from the for loop in a vector within a data frame. Say we had the following data frame with names and heights:

[5]:
ppl <- data.frame(person = letters[1:5], height_cm = v1)
ppl
personheight_cm
a 175
b 182
c 150
d 187
e 165

Let’s add a variable that expresses height in inches instead:

[6]:
ppl$height_inch <- NA                                     #add variable of NAs
n <- nrow(ppl)                                            #get number of observations to loop over
for(i in 1:n){
    ppl$height_inch[i] <- ppl$height_cm[i] * 0.393701
}
ppl
personheight_cmheight_inch
a 175 68.89768
b 182 71.65358
c 150 59.05515
d 187 73.62209
e 165 64.96067

Note that when adding a constant or NA values to a vector within a data frame, R (correctly) assumes that you want to add this constant to every element of the variable, so you don’t need to specify how many times you want to add NA in ppl$height_inch <- NA.

An appliction

For loops can be used to carry out Monte Carlo simulations. In the example below, we’ll draw repeated samples from a population, calculate the mean for each sample, and test whether we on average do a good job of estimating the population mean.

Say the population consists of 10 individuals with the following heights:

[7]:
v <- c(175, 182, 150, 187, 165, 177, 200, 198, 157, 165)
mean(v)    #population mean
175.6

Unfortunately, for whatever reason, we do not know the heights of all of these individuals. We can only (randomly) sample 5 of them. From this random sample of five individuals we estimate the height of all 10 individuals. We can draw a sample of 5 from v and take the mean of this sample using the following code:

[8]:
v <- c(175, 182, 150, 187, 165, 177, 200, 198, 157, 165)
smpl <- sample(v, 5)
mean(smpl)
174.4

Would we on average expect to estimate the mean of the population accurately? Let’s use a Monte Carlo simulation to find out. We’ll draw 10,000 random samples of five from v and take the mean of each of these samples. With an unbiased estimator we would, on average, expect the sample estimate to equal the population parameter of interest.

[9]:
n <- 10000
smpl_means <- rep(NA, n)
for(i in 1:n){
    smpl <- sample(v, 5)
    smpl_means[i] <- mean(smpl)
}

mean(smpl_means)
175.65338