Chapter 3 Vectors

3.1 Introduction

This chapter has three goals. First, we want to learn how to work with a vector, one of the basic structures used to represent data in R-land. Second, we’ll learn how to work with vectors by using numeric vectors to perform simple calculations. Finally, we’ll introduce a couple of different kinds of vectors—character vectors and logical vectors. This material provides a foundation for working with real data sets in the next block.

Data structures?

The term ‘data structure’ is used to describe conventions or rules for organising and storing data on a computer. Computer languages use many different kinds of data structures. Fortunately, we only need to learn about a couple of relatively simple ones to use R for data analysis: ‘vectors’ and ‘data frames’. This chapter will consider vectors. The next chapter will look at collections of vectors (a.k.a. data frames).

3.2 Atomic vectors

We’ll start with a definition, even though it probably won’t make much sense yet: a vector is a 1-dimensional data structure for storing a set of values, each accessible by its position in the vector. The simplest kind of vectors in R are called atomic vectors⁴.

There are different kinds of atomic vector, but their defining, common feature is that it can only contain data of one ‘type’. They might contain all integers (e.g. 2, 4, 6, …) or all characters (e.g. “A”, “B”, “C”), but they can’t mix and match integers and characters (e.g. “A”, 2, “C”, 5).

The word ‘atomic’ in the name refers to the fact that an atomic vector can’t be broken down into anything simpler—they are the simplest kind of data structure R knows. Even when working with a single number we’re actually dealing with an atomic vector. Here’s the very first expression we evaluated in the Introduction to R chapter:

1 + 1

## [1] 2

Look at the output. What is that [1] at the beginning? It’s actually a clue that the output resulting from 1 + 1 is an atomic vector. We can verify this with the is.atomic functions. First, make a variable called x with the result of the 1 + 1 calculation:

x <- 1 + 1
x

## [1] 2

Then use is.atomic to check whether x really is an atomic vector:

is.atomic(x)

## [1] TRUE

Atomic vectors really are the simplest kind of data structure in R. Unlike many other languages, there is simply no way to represent just a number. Instead, a single number is always stored as a vector of length one⁵.

3.3 Numeric vectors

A lot of work in R involves numeric vectors. After all, data analysis is all about numbers. Here’s one simple way to construct a numeric vector (and print it out):

numvec <- numeric(length = 50)
numvec

##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [39] 0 0 0 0 0 0 0 0 0 0 0 0

What happened? We made a numeric vector with 50 elements, each of which is the number 0. The word ‘element’ is used to refer to the values that reside in a vector.

When we create a vector but don’t assign it to a name using <- R just prints it for us. Notice what happens when a vector is printed to the screen. Since a length-50 vector can’t fit on one line, it was printed over two. At the beginning of each line there is a [X]: the number X gives the position of the elements printed at the beginning of each line.

If we need to check that we really have made a numeric vector, we can use the is.numeric⁶ function to do this:

is.numeric(numvec)

## [1] TRUE

This confirms numvec is numeric by returning TRUE (a value of FALSE would mean that numvec is some other kind of object).

Keep in mind that R won’t always print the exact values of the elements of a vector. For example, when R prints a numeric vector, it only prints the elements to 7 significant figures by default. We can see this by printing the built in constant pi to the Console:

pi

## [1] 3.141593

The actual value stored in pi is much more precise than this. We can see this by printing pi again using the print function:

print(pi, digits = 16)

## [1] 3.141592653589793

Different kinds of numbers

Roughly speaking, R stores numbers in two different ways depending of whether they are whole numbers (“integers”) or numbers containing decimal points (“doubles” – don’t ask). We’re not going to worry about this difference. Most of the time, the distinction is invisible to users, so it is easier to think in terms of numeric vectors. We can mix and match integers and doubles in R without having to worry about how it is storing the numbers.

3.4 Constructing numeric vectors

We just saw to make a numeric vector of zeros using the numeric function. This is arguably not a particularly useful skill because we usually need to work vectors of particular values (not just 0). A useful function for creating custom vectors is the c function. Take a look at this example:

c(1.1, 2.3, 4.0, 5.7)

## [1] 1.1 2.3 4.0 5.7

The ‘c’ in the function name stands for ‘combine’. The c function takes a variable number of arguments, each of which must be a vector of some kind, and combines these into a single vector. We supplied the c function with four arguments, each of which was a single number (i.e. a length-one vector). The c function combines these to generate a vector of length 4. Simple.

Now look at this example:

vec1 <- c(1.1, 2.3)
vec2 <- c(4.0, 5.7, 3.6)
c(vec1, vec2)

## [1] 1.1 2.3 4.0 5.7 3.6

This shows that we can use the c function to combine two or more vectors of any length. We combined a length-2 vector with a length-3 vector to produce a new length-5 vector.

The `c` function is odd

Notice that we did not have to name the arguments in those two examples—there were no = involved. The c function is an example of one of those flexible functions that breaks the simple rules of thumb for using arguments that we set out earlier: it can take a variable number of arguments that do not have predefined names. This behaviour is necessary for c to be of any use: to be useful it needs to be flexible enough to take any combination of arguments.

3.5 Named vectors

What happens if use named arguments with c? Take a look at this:

namedv <- c(a = 1, b = 2, c = 3)
namedv

## a b c 
## 1 2 3

What happened? R used the argument names to set the names of each element in the vector. The resulting vector is still a 1-dimensional data structure. When printed to the Console, each element’s value is printed along with its associated name above it. We can extract the names from a named vector using the names function:

names(namedv)

## [1] "a" "b" "c"

Being able to name the elements of a vector is useful because it makes it easier to identify and extract the bits we need.

3.6 Vectorised operations

All the simple arithmetic operators (e.g. + and -) and many mathematical functions are vectorised in R. When we use a vectorised function it operates on vectors on an element-by-element basis. We’ll make a couple of simple vectors to illustrate what we mean:

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x

##  [1]  1  2  3  4  5  6  7  8  9 10

y <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)
y

##  [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

This constructed two length-10 numeric vectors, called x and y, where x is a sequence from 1 to 10 and y is a sequence from 0.1 to 1.0. x and y are the same length. Now look at what happens when we add these using +:

x + y

##  [1]  1.1  2.2  3.3  4.4  5.5  6.6  7.7  8.8  9.9 11.0

When R evaluates the expression x + y it does this by adding the first element of x to the first element of y, the second element of x to the second element of y, and so on, working through all ten elements of x and y. That’s what is meant by a vectorised operation.

Vectorisation is implemented in all the standard mathematical functions. For example, the round function rounds each element of a numeric vector to the nearest integer by default:

round(y)

##  [1] 0 0 0 0 0 1 1 1 1 1

The same behaviour is seen with other mathematical functions like sin, cos, exp, and log—they apply the relevant function to each element of a numeric vector.

It is important to realise that not all functions are vectorised. For example, the sum function takes a vector of numbers and adds them up:

sum(y)

## [1] 5.5

Although sum obviously works on a numeric vector it is not ‘vectorised’ in the sense that it works element-by-element to return an output vector of the same length as its main argument. It just returns a single number—the sum total of the elements of its input.

Vectorisation is not the norm

R’s vectorised behaviour may seem like the obvious thing to do, but most computer languages do not work like this. In other languages, we typically have to write a much more complicated expression to do something so simple. This is one reason R is such a good data analysis language: vectorisation allows us to express repetitious calculations in a simple, intuitive way. This behaviour can save a lot of time.

3.7 Other kinds of atomic vectors

The data we collect and analyse are often in the form of numbers. It comes as no surprise, therefore, that we work with numeric vectors a lot in R. Nonetheless, we also need to use other kinds of vectors, either to represent different types of data, or to help us manipulate our data. This section introduces two new types of atomic vector to help us do this: character vectors and logical vectors.

3.7.1 Character vectors

Each element of a character vectors is what is known as a “character string” (or “string” if we are feeling lazy). That term “character string” refers to a sequence of characters, such as “Treatment 1”, “University of Sheffield”, “Population Density”. A character vector is an atomic vector that stores an ordered collection of one or more character strings.

If we want to construct a character vector in R, we have to place double (") or single (') quotation marks around the characters. For example, we can print the name “Dylan” to the Console like this:

"Dylan"

## [1] "Dylan"

Notice the [1]. This shows that what we just printed is an atomic vector of some kind. We know it’s a character vector because the output is printed with double quotes around the value. We often need to make simple character vectors containing only one value—for example, to set the values of arguments to a function.

The quotation marks are not optional—they tell R we want to treat whatever is inside them as a literal value. The quoting is important. If we try to do the same thing as above without the quotes, we end up with an error:

Dylan

## Error in eval(expr, envir, enclos): object 'Dylan' not found

What happened? When the interpreter sees the word Dylan without quotes it assumes that this must be the name of a variable, so it goes in search of it in the global environment. We haven’t made a variable called Dylan, so there is no way to evaluate the expression and R spits out an error to let us know there’s a problem.

Character vectors are typically constructed to represent data of some kind. The c function is one starting point for this kind of thing:

# make a length-3 character vector
my_name <- c("Dylan", "Zachary", "Childs")
my_name

## [1] "Dylan"   "Zachary" "Childs"

Here we made a length-3 character vector, with elements corresponding to a first name, middle name, and last name. If we want to extract one or more elements from a character vector by their position

Take note, this is not equivalent to the above :

my_name <- c("Dylan Zachary Childs")
my_name

## [1] "Dylan Zachary Childs"

This length-1 character vector’s only element is a single character string containing the first, middle and last name separated by spaces. We didn’t even need to use the the c function here because we were only ever working with a length-1 character vector. i.e. we could have typed "Dylan Zachary Childs" and we would have ended up with exactly the same text printed at the Console.

3.7.2 Logical vectors

The elements of logical vectors only take two values: TRUE or FALSE. Don’t let the simplicity of logical vectors fool you. They’re very useful. As with other kinds of atomic vectors, the c function can be used to construct a logical vector:

l_vec <- c(TRUE, FALSE)
l_vec

## [1]  TRUE FALSE

So why are logical vectors useful? Their allow us to represent the results of questions such as, “is x greater than y” or “is x equal to y”. The results of such comparisons may then be used to carry out various kinds of subsetting operations.

Before we can look at how to use logical vectors to evaluate comparisons, we need to introduce relational operators. These sound fancy, but they are very simple: we use relational operators to evaluate the relative value of vector elements. Six are available in R:

x < y: is x less than y?
x > y: is x greater than y?
x <= y: is x less than or equal to y?
x >= y: is x greater than or equal to y?
x == y: is x equal to y?
x != y: is x not equal to y?

The easiest way to understand how these work is by example. We need a couple of numeric variables first:

x <- c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
y <- c(3, 6, 9, 12, 15, 18, 21, 24, 27, 30)
x

##  [1] 11 12 13 14 15 16 17 18 19 20

##  [1]  3  6  9 12 15 18 21 24 27 30

Now, if we need to evaluate and represent a question like, “is x greater than than y”, we can use either < or >:

x > y

##  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

The x > y expression produces a logical vector, with TRUE values associated with elements in x are less than y, and FALSE otherwise. In this example, x is less than y until we reach the value of 15 in each sequence. Notice too that relational operators are vectorised: they work on an element by element basis.

What does the == operator do? It compares the elements of two vectors to determine if they are exactly equal:

x == y

##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

The output of this comparison is true only for one element, the number 15, which is at the 5^th position in both x and y. The != operator is essentially the opposite of ==. It identifies cases where two elements are not exactly equal. We could step through each of the different relational operators, but hopefully, they are self-explanatory at this point (if not, experiment with them).

`=` and `==` are not the same

If we want to test for equivalence between the elements of two vectors we must use double equals (==), not single equals (=). Forgetting to use == instead of = is a very common source of mistakes. The = symbol already has a use in R—assigning name-value pairs—so it can’t also be used to compare vectors because this would lead to ambiguity in our R scripts. Using = when you meant to use == is a very common mistake. If you make it, this will lead to all kinds of difficult-to-comprehend problems with your scripts. Try to remember the difference!

The other common vector is called a “list”. Lists are very useful but we won’t cover them in this book.↩︎
The same is true for things like sets of characters ("dog", "cat", "fish", …) and logical values (TRUE or FALSE) discussed in the next two chapters.↩︎
This may not look like the most useful function in the world, but sometimes we need functions like is.numeric to understand what R is doing or root out mistakes in our scripts.↩︎