Introduction¶
In this first in a number of tutorials, we’ll cover the very basics of R. If you’ve programmed before you can skip much of this. But regardless of your background, we hope you’ll find this and subsequent tutorials useful for learning R’s many tools for graphing, statistical analysis, and data collection and management — or what we collectively might call “data science.”
Installing R¶
First, download R free here!
After you’ve downloaded R itself, you will probably want to also download a program called RStudio (installation instructions here; note that you need to already have R to use RStudio). RStudio is a little “helper” program that makes it a little easier to write code for R (it is what is referred to as an “integrated development environment” (IDE)). There are also a lot of other IDEs, but RStudio is the easiest and one of the most popular.
Interacting with R¶
R has a text-based interface program, which means you can’t ask it to do things by clicking buttons or using drop-down menus. Instead, it has a command prompt where you type messages to R. The place you can type has a little right arrow symbol (>
). Just type your message after that right arrow and hit return. In screenshot below, for example, I’m about to ask R to print the phrase “Hello!” (though I haven’t hit return, so it hasn’t done anything yet):
After you type a command to R, hit return and R will try and do what you’ve asked of it. If I hit return after print("Hello!")
, for example, R will print out the phrase “Hello!”:
You will notice that after it printed out “Hello!”, a new right arrow appeared. That’s R’s way of saying it’s done doing the last thing you asked it to do.
Code Examples On This Site¶
On this website, you’ll find that code examples don’t look quite like they do when you’re typing in R yourself. Instead, you’ll see code appear in grey blocks with a number on the left side. Below these blocks, you will see the output R has returned after running that code. For example, here’s that same "Hello!"
line in the style used on this site:
[57]:
"Hello!"
In addition, some code will include “comments”. Comments are notes placed in someone’s code to explain what’s going on to other programmers. Comments always start with a #
in R, which tells R that the text that follows is not something it should try and execute. Comments will always appear in italics and in yellow.
[59]:
# This is a comment. In the next line, I'll add 2 and 3.
2 + 3
Basic Math in R¶
Now that we’ve learned how to pass commands to R, we can start asking R to do things for us. For example, R can do all the normal math operations you are familiar with:
[60]:
# Addition
2 + 2
[62]:
# Multiplication
2 * 3
[63]:
# Division
4 / 2
[64]:
# And even exponentiation (e.g. 2 raised to the third power)
2 ^ 3
Variables¶
Congratulations! You now know how to do math in R!
If we want to do more than use R as a calculator, though, we need to be able to not only do math problems, but also store the results of our calculations so we can reuse them in the future, or combine the results of lots of different calculations. In the examples above, R did the math we asked it to do, and printed out the results, but it didn’t keep a copy of those results anywhere.
In order to store the value of our calculations, we need to assign them to a variable. Once we’ve assigned a value to a variable, we can then recall that value any time by invoking the variable. For example, let’s calculate the weight of a velociraptor in pounds.
First, let’s store the weight of a velociraptor in kilograms (estimated to be 113 kg) in a variable called velociraptor_weight_in_kg
:
[24]:
velociraptor_weight_in_kg = 113
Basically, all I’ve done is given a name (the variable name) to a value (in this case, 113). Now any time I use that variable name, R knows that that variable name is just a stand-in for 113. For example, if you just type a variable name in R, it will tell you the value associated with that variable:
[14]:
velociraptor_weight_in_kg
And now we can do calculations with that variable. There are 2.2 pounds in a kilogram, so to get a velociraptor’s weight in pounds, we can just multiple our weight in kg variable by the conversion factor:
[16]:
velociraptor_weight_in_kg * 2.2
We can also do math with multiple variables, because really anywhere you see a variable, you can just imagine that the value associated with the variable is there instead.
For example, suppose I have two pet dinosaurs, and my partner has three dinosaurs. If we got married, how many dinosaurs would we have? Let’s do this super-complicated math using variables.
[30]:
nick_pet_dinosaurs = 2
adriane_pet_dinosaurs = 3
nick_pet_dinosaurs + adriane_pet_dinosaurs
And if we wanted to, we could also store that new value in a new variable called family_pet_dinosaurs
.
[31]:
family_pet_dinosaurs = nick_pet_dinosaurs + adriane_pet_dinosaurs
family_pet_dinosaurs
One important thing about variables is that you can change the value associated with a variable. Suppose that while walking to work, I stumbled upon a truely adorable Nigersaurus and couldn’t resist adopting her.
(Photo Credit: Matt Martyniuk (Dinoguy2))
If that happened, we’d need to update the number of pet’s I have by 1!
[32]:
nick_pet_dinosaurs = nick_pet_dinosaurs + 1
Now if we ask R for the value of nick_pet_dinosaurs
, we’ll see it has increased by 1:
[65]:
nick_pet_dinosaurs
Note there’s something a little weird about the order in which things happen here: when we assign something to a variable by writing variable_name = [some expression]
, R evaluates the expression on the right first, then assigns the results of that expression to the variable on the left hand side.
Given how we normally read left-to-right, this can be a little confusing. So what R did here was:
- first calculate
nick_pet_dinosaurs + 1
(which is the same as2 + 1
), - then assigned the value of that expression (
3
) to the variablenick_pet_dinsoaurs
, replacing the old value of2
.
Variable Exercises¶
OK, this is a great time to pause and try a few exercises for yourself.
Let’s suppose that you have a dinosaur zoo. In your zoo, you have two T-Rexes, three Unaysaurus, and five Spinosaurus
- Create variables for the number of each dino you have called
my_trexes
,my_unas
, andmy_spinos
. - Now use those variables to calculate how many total dinosaurs you have.
- Oh no! One of your t-rexes got out and ate an Unaysaurus. Decrease the value of
my_unas
by one. - Double oh no! Your T-Rexes were male and female, and they just had a baby! Increase you number T-Rexes by one!
- Sadly, one of your Spinosauruses died of old age. :( Decreases your count of Spinosauruses by one.
- How many dinos do you have now? You’ve probably lost count of all these changes, but thankfully they’re all stored in variables, so you can just add them all up!
Answers to exercises can be found here, but only go to that page if you get REALLY stuck! The best way to learn to program is to try things out and see what works, so don’t deny yourself the learning opportunity that process provides but looking at answers too quickly!
=
), and the other is with the two symbols that make an arrow (<-
). So the following two commands are exactly the same:[34]:
x = 72
x <- 72
Types of Data¶
Up till now, we’ve only been working with numbers, but R is actually equiped to work with a number of different kinds of data. In the course of this tutorial we’ll introduce all of them, but there are really three main ones to be aware of:
numeric
andinteger
: The main data types for numbers. These two types are slightly different (integers is restricted to, we, integers; numeric can be an integer or a number with decimals), but you can think of them as interchangable for now.character
: Text data, like a person’s name, or a quote from a book. Written with"
before and after (or single quotes ('
) before and after if you’d prefer.logical
: Data that only takes on the values of true and false. WrittenTRUE
andFALSE
If you’re ever unsure of the type of a variable (or more precisely, of the type of the value associated with a variable), you can ask R with the class()
function:
[41]:
pi = 3.1416
class(pi)
[42]:
mystery_novel = "T'was a dark and story night"
class(mystery_novel)
[44]:
my_logical = TRUE
class(my_logical)
Characters¶
The value of characters will be evident when we start working with real data, when our datasets will include things like country names, or capitals, or open-ended survey responses. In all these situations, we will use the character
type to store text.
Note that if a variable is a character, even if it looks like a number, R will treat it like text and you won’t be able to do things like add or multiple it. For example:
a = 5
b = "7"
a + b
Will generate an error:
Error in a + b: non-numeric argument to binary operator
Because plus is only defined for numbers, and R sees "7"
as text. I’ll show you how to deal with this silly situation below.
Logicals¶
The use of logicals is less obvious at the moment, but they will eventually prove very important. For now, it’s enough to know they exist, and that the place you are most likely to encounter them is when you write tests. For example:
[48]:
7 > 5
[49]:
4 < 3
This is also a good time to introduce the double-equal sign. Because we use =
to assign values to variables, we can’t use it to test if two things are equal. To ask R to evaluate whether two things are equal, therefore, we use a double-equal sign (==
). For example:
[50]:
a = 5
b = 5
a == b
[51]:
c = 7
a == c
A Brief Introduction to Functions¶
One last note before we finish off this section: one of the most powerful thing about a language like R is that it’s full of pre-made tools called functions. A function is basically a little program inside R. They can do everything from simple operations (like adding up numbers) to extremely complicated operations (fit a machine learning model).
The idea of a function is that takes in a set of arguments as an input, and then it returns a result. To use a function, you type out the function name, then place the arguments you want to pass to the function inside parentheses. For example, there’s a function called as.numeric()
that’s designed to convert a variable that is a character (like "7"
) to a numeric value (7
) so we can do arithmatic with that value.
So to use it, we pass the function as.numeric
the variable (b
). The function will then take the value associated with b
("7"
), convert it to a numeric value, then return the converted value. For example:
[53]:
a = 5
b = "7"
b_as_a_numeric = as.numeric(b)
a + b_as_a_numeric
Note that as.numeric
didn’t actually change the value of the variable b
– instead, it returned the converted value, and we assigned it to the variable b_as_a_numeric
. If we look at b
, we will see it is still "7"
[54]:
b
But as.numeric
won’t work on everything. If you pass as.numeric
a character that doesn’t look like a number, it’s smart enough to know that there’s no way to convert it, so instead of a number, it returns NA
, which is what R uses when it has no idea what to do. For example:
silly_example = "This doesn't look like a number at all!"
as.numeric(silly_example)
Warning message in eval(expr, envir, enclos):
“NAs introduced by coercion”
<NA>
We’ll talk much more about functions in the future, but for now it’s enough to recognize that they are little programs, and that they operate by accepting inputs (the things placed inside the parenthesis that follow the function name) and return a result to you which you can look at or assign to a variable for later use.
Exercises for Data Types¶
[ ]: