Chapter 1 A quick introduction to R

1.1 Using R as a big calculator

1.1.1 Basic arithmetic

The Get up and running chapter showed that R could handle familiar arithmetic operations: division, multiplication, addition and subtraction. If we want to add or subtract two numbers, we place the + or - symbol in between two numbers and hit Enter. R will read the arithmetic expression, evaluate it, and print the result to the Console. This works as you’d expect:

Addition

3 + 2
## [1] 5

Subtraction

5 - 1
## [1] 4

Multiplication and division are no different. However, we can’t use x or ÷ symbols for these operations. Instead, use * and / to multiply and divide:

Multiplication

7 * 2
## [1] 14

Division

3 / 2
## [1] 1.5

We can also exponentiate numbers, i.e. raise one number to the power of another. Use the ^ operator to do this:

Exponentiation

4^2
## [1] 16

This raises 4 to the power of 2 (i.e. we squared it). In general, we can raise a number x to the power of y using x^y. Neither x nor y need to be whole numbers either.

Operators?

What does ‘operator’ mean? An operator is simply a symbol (or sequence of symbols) that does something specific with one or more inputs. For example, operators like /, *, + and - carry out arithmetic calculations with pairs of numbers. Operators are one of the basic building blocks of a programming language like R.

1.1.2 Combining arithmetic operations

We can also combine arithmetic operations. Assume we want to subtract 6 from 23. The expression to perform this calculation is:

2^3 - 6
## [1] 2

Simple enough, but what if we had wanted to carry out a slightly longer calculation that required the last answer to then be divided by 2? This is the wrong way to do this:

2^3 - 6 / 2
## [1] 5

The answer we expect here is \(1\). So what happened? R evaluated \(6/2\) first and then subtracted this answer from \(2^3\).

If that’s obvious, great. If not, it’s time to learn about order of precedence. R uses a standard set of rules to decide the order in which calculations feed into one another to unambiguously evaluate any expression. It uses the same order as every other computer language, which thankfully is the same one we all learn in mathematics classes at school. The order of precedence is:

  1. exponents and roots (also, ‘powers’ or ‘orders’)

  2. division and then multiplication

  3. additional and then subtraction

BODMAS and friends

If you find it difficult to remember the standard order of precedence there are a load of mnemonics that can to help.

We need to control the order of evaluation to arrive at the answer we were looking for in the above example. Do this by grouping together calculations inside parentheses, i.e. ‘round brackets’ ( and ). Here’s the expression we should have used:

(2^3 - 6) / 2
## [1] 1

We can use more than one pair of parentheses to control the order of evaluation in more complex calculations. The order of evaluation then happens ‘inside-out’. For example, if we want to find the cube root of 2 (i.e. 21/3) rather than 23 in that last calculation we would instead write:

(2^(1/3) - 6) / 2
## [1] -2.370039

The parentheses around the 1/3 are needed to ensure this is evaluated prior to being used as the exponent.

Working efficiently at the Console

Working at the Console soon gets tedious if we have to retype similar things over and over again. There is no need to do this, though. Place the cursor at the prompt and hit the up arrow. What happens? This brings back the last expression sent to R’s interpreter. Hit the up arrow again to see the last-but-one expression, and so on. We go back down the list using the down arrow. Once we’re at the line we need, we use the left and right arrows to move around the expression and the delete key to remove the parts we want to change. Once an expression has been edited we can hit Enter to send it to R again. Try it.

1.2 Problematic calculations

Now is a good time to highlight how R handles certain kinds of awkward numerical calculations. One of these involves division of a non-zero by 0. Mathematically, division of a finite number by 0 equals A Very Large Number: infinity. Some programming languages will respond to an attempt to do this with an error. R is a bit more forgiving:

1 / 0
## [1] Inf

R has a special built-in value that allows it to handle this kind of result. This is Inf, which stands for ‘infinity’.

The other special kind of value we sometimes run into is generated by calculations that don’t have a well-defined numerical result. For example, look what happens when we try to divide 0 by 0:

0 / 0
## [1] NaN

The NaN in this result stands for ‘Not a Number’. R produces NaN because \(0 / 0\) is not defined mathematically: it produces something that is Not a Number.

The reason we are pointing out Inf and NaN is not that we expect to use them. It’s important to know what they represent because they often arise due to a mistake somewhere in our code. It’s hard to track down such mistakes if we don’t know how Inf and NaN arise.

R as a fancy calculator

What we’ve seen so far is that we can interact with R via the so-called REPL: the read-evaluate-print loop. R takes user input (e.g. 1 / 0), evaluates it (1 / 0 = Inf), prints the results (## [1] Inf), and then waits for the next input (e.g. 0 / 0). This facility is handy because it means we can use R interactively, working through a set of calculations line-by-line.

1.3 Storing and reusing results

We’ve not yet tried to do anything remotely complicated or interesting beyond using parentheses to construct longer calculations. This approach is acceptable when a calculation is straightforward, but it quickly becomes unwieldy for dealing with anything more complicated.

The best way to see what we mean is by working through a simple example—solving a quadratic equation. You probably remember these from school. A quadratic equation looks like this:

\[a + bx + cx^2 = 0\] If we know the values of \(a\), \(b\) and \(c\) then we can solve this equation to find the values of \(x\) that satisfy this equation. Here’s the well-known formula for these solutions: \[ x = \frac{-b\pm\sqrt{b^2-4ac}}{2a} \] We can use R to calculate these solutions for us. Say that we want to find the solutions to the quadratic equation when \(a=1\), \(b=6\) and \(c=5\). We have to turn the above equation into a pair of R expressions:

Solution 1

(-6 + (6^2 -4 * 1 * 5)^(1/2)) / (2 * 1)
## [1] -1

Solution 2

(-6 - (6^2 -4 * 1 * 5)^(1/2)) / (2 * 1)
## [1] -5

The output tells us that the two values of \(x\) that satisfy this particular quadratic equation are -1 and -5.

But what should we do if we now need to solve a different quadratic equation? Working at the Console, we could bring up the expressions we typed (using the up arrow) and change the numbers to match the new values of \(a\), \(b\) and \(c\). However, editing expressions like this is tedious, and more importantly, it’s error-prone because we have to make sure we substitute the new numbers into precisely the right positions.

A partial solution to this problem is to store the values of \(a\), \(b\) and \(c\) in some way so that we only have to change them one. We’ll see why this is useful in a moment.

First, we need to learn how to store results in R. The key to this is to use the assignment operator, written as an arrow pointing to the left, <-. Sticking with the current example, we need to store the numbers 1, 6 and 5. We do this by typing out three expressions, one after the another, each time hitting enter to get R to evaluate it:

a <- 1
b <- 6
c <- 5

The exact sequence <- defines the assignment operator. R won’t recognise it as assignment if we try to include a space between the < and - symbols.

Notice that R didn’t print anything to screen. So what actually happened? We asked R to first evaluate the expression on the right hand side of each <- (just a number in this case) and then assigns the result of that evaluation instead of printing it. Each result has a name associated with it, which appears on the left hand-side of the <-.

The net result of all this is that we have stored the numbers 1, 6 and 5 somewhere in R and associated them with the letters a, b and c, respectively. We can check whether this assignment business has worked by looking at the Environment tab in the top right RStudio window. There should be three ‘names’ listed in that tab now (a, b and c) along with the associated numbers 1, 6 and 5.

What does this mean in practical terms? Look at what happens if we now type the letter a into the Console and hit Enter:

a
## [1] 1

It looks the same as if we had typed the number 1 directly into the Console. We stored the output from three separate R expressions, associating each a name so that we can access it again2. Whenever we use the assignment operator <- we are telling R to keep whatever kind of value results from the calculation on the right-hand side of <-, giving it the name on the left-hand side so that we can access it later.

Why is this useful? Let’s imagine we want to do more than one thing with our three numbers. If we want to know their sum or their product we can now use:

Sum

a + b + c
## [1] 12

Product

a * b * c
## [1] 30

So… once we’ve stored a result and associated it with a name we can reuse it whenever needed. Returning to our example, we can now calculate the solutions to the quadratic equation by typing these two expressions into the Console:

Solution 1

(-b + (b^2 -4 * a * c)^(1/2)) / (2 * a)
## [1] -1

Solution 2

(-b - (b^2 -4 * a * c)^(1/2)) / (2 * a)
## [1] -5

Imagine we’d now like to find the solutions to a different quadratic equation where \(a=1\), \(b=5\) and \(c=5\). We only changed the value of \(b\) here. To find the new solutions we have to do two things. First we change the value of the number associated with b

b <- 5

…then we bring up those lines that calculate the solutions to the quadratic equation and run them, one after the other:

(-b + (b^2 -4 * a * c)^(1/2)) / (2 * a)
## [1] -1.381966
(-b - (b^2 -4 * a * c)^(1/2)) / (2 * a)
## [1] -3.618034

We don’t have to retype those expressions. We can use the up arrow to bring each one back to the prompt and hit Enter. This is much simpler than editing the expressions.

More importantly, we are beginning to see the benefits of using something like R—we can break down complex calculations into a series of steps, storing and reusing intermediate results as required.

RStudio shortcut

We use the assignment operator <- all the time when working with R. Because it’s inefficient to type the < and - characters repeatedly, RStudio has a built-in shortcut for typing the assignment operator.

The shortcut is ‘Alt + -’ . Try it now. Move the cursor to the Console, hold down the Alt key (‘Option’ on a Mac), and press the - sign key. RStudio will auto-magically add insert <-. If you only learn one RStudio shortcut, learn this one! It will save you a lot of time in the long run.

1.4 How does assignment work?

When we use the assignment operator <- to associate names and values, we refer to this as creating or modifying a variable. This is much less tedious than using words like ‘associate’, ‘value’, and ‘name’ all the time. Why is it called a variable? What happens when we run these lines:

Create myvar and print out its value

myvar <- 1
myvar
## [1] 1

Modify myvar and and print out its new value

myvar <- 7
myvar
## [1] 7

The first time we used <- with myvar on the left-hand side, we created a variable myvar associated with the value 1. We then printed out the value associated with myvar. The second line myvar <- 7 modified the value of myvar to be 7 (and printed this out again). This is why we refer to myvar as a variable: we can change its value as we please.

What happened to the old value associated with myvar? In short, it is gone, kaput, lost… forever. The moment we assign a new value to myvar the old one is destroyed and can no longer be accessed. Remember this.

Keep in mind that the expression on the right-hand side of <- can be any kind of calculation and the variable can have any (valid) name we like. For example, if we want to perform the calculation (1 + 2^3) / (2 + 7) and associate the result with the word answer, we would do this:

answer <- (1 + 2^3) / (2 + 7)
answer
## [1] 1

Any expression can be used on the right-hand side of the assignment operator as along as it generates an output of some kind. For example, we can create new variables from others:

newvar <- 2 * answer

What happened here? Start at the right-hand side of <-. The expression on this side contained the variable answer so R went to see if answer actually exists. It does, so it then substituted the value associated with answer into the calculation and assigned the resulting value of 2 to newvar.

Finally, look at what happens if we copy a variable using the assignment operator:

myvar <- 7
mycopy <- myvar

We now have a pair of variables, myvar and mycopy, associated with the number 7. Each of these is associated with a different copy of this number. If we change the value associated with one of these variables it does not change the value of the other, as this shows:

myvar <- 10
myvar
## [1] 10
mycopy
## [1] 7

R always behaves like this unless we work hard to alter this behaviour. Remember that—every time we assign one variable to another, we actually make a completely new, independent copy of its associated value. That probably doesn’t seem like an obvious or important point, but trust us, it is. It will be critical to remember this behaviour when we start learning how to manipulate data sets.

1.5 Global environment

Whenever we associate a name with a value we create a copy of both these things somewhere in the computer’s memory. In R the “somewhere” is called an environment. We aren’t going to get into a discussion of R’s many different kinds of environments—that’s an advanced topic well beyond the scope of this book. The one environment we do need to be aware of is the Global Environment.

Whenever we perform an assignment in the Console the variable we create is placed into the Global Environment. The set of variables currently in existence are listed in the Environment tab in RStudio. Take a look. There are two columns in the Environment tab: the first shows the names of the variables, the second summarises their values.

The Global Environment is temporary

By default, R will try to save everything in the Global Environment when we close it down and restore everything when we start the next R session. It does this by writing a copy of the Global Environment to disk. In theory, this means we can close down R, reopen it, and pick things up from where we left off. Don’t rely on this behaviour! It just increases the risk of making a mistake.

1.6 Naming rules and conventions

We don’t have to use a single letter to name things in R. We could use the words tom, dick and harry in place of a, b and c. It might be confusing to use them, but tom, dick and harry are all legal names as far as to R is concerned:

  • A legal name in R is any sequence of letters, numbers, ., or _, but the sequence of characters we use must begin with a letter. Both upper and lower case letters are allowed. For example, num_1, num.1, num1, NUM1, myNum1 are all legal names, but 1num and _num1 are not because they begin with 1 and _.

  • R is case sensitive—it treats upper and lower case letters as different characters. This means R treats num and Num as distinct names. Forgetting about case sensitivity is a good way to create errors when using R. Try to remember that.

Don’t begin a name with .

We are allowed to begin a name with a ., but this usually is A Bad Idea. Why? Because variable names that begin with . are hidden from view in the Global Environment—the value it refers to exists but it’s invisible. This behaviour exists to allow R to create invisible variables that control how it behaves. This is useful, but it isn’t really meant to be used by the average user.


  1. Technically, this is called binding the name to a value. You don’t need to remember this.↩︎