Exploratory Data Analysis with R
What you will learn
Aims
Topics
Technologies
What is R?
What is RStudio?
How to use this book
Text, instructions, and explanations
R code and output
Get up and running
Different ways to run RStudio
Installing R and RStudio locally
Installing RStudio
A quick look at RStudio
Working at the Console in RStudio
I Introduction to R
1
A quick introduction to R
1.1
Using R as a big calculator
1.1.1
Basic arithmetic
1.1.2
Combining arithmetic operations
1.2
Problematic calculations
1.3
Storing and reusing results
1.4
How does assignment work?
1.5
Global environment
1.6
Naming rules and conventions
2
Using functions
2.1
Introduction
2.2
Functions and arguments
2.3
Evaluating arguments and returning results
2.4
Specifying function arguments
2.5
Combining functions
2.6
Functions do not have ‘side effects’
3
Vectors
3.1
Introduction
3.2
Atomic vectors
3.3
Numeric vectors
3.4
Constructing numeric vectors
3.5
Named vectors
3.6
Vectorised operations
3.7
Other kinds of atomic vectors
3.7.1
Character vectors
3.7.2
Logical vectors
4
Data frames
4.1
Introduction
4.2
Data frames
4.3
Exploring data frames
4.4
Extracting and adding a single variable
5
Packages
5.1
The R package system
5.2
Task views
5.3
Using packages
5.3.1
Viewing installed packages
5.3.2
Installing packages
5.3.3
Loading and attaching packages
5.3.4
Don’t use RStudio for loading packages!
5.3.5
An analogy
5.4
Package data
5.5
The tidyverse ecosystem of packages
II Data Wrangling
6
Getting ready to use
dplyr
6.1
Introduction
6.2
Tidy data
6.3
Penguins! 🐧+📊= 😃
6.4
Um… tibbles?
6.5
Missing values
6.6
Introducing
dplyr
6.6.1
A first look at
dplyr
6.6.2
dplyr
pseudocode
7
Working with variables
7.1
Introduction
7.1.1
Getting ready
7.2
Subset variables with
select
7.2.1
Alternative ways to identify variables with
select
7.2.2
Renaming variables with
select
and
rename
7.3
Creating variables with
mutate
7.3.1
Transforming and dropping variables
8
Working with observations
8.1
Introduction
8.1.1
Getting ready
8.2
Relational and logical operators
8.3
Subset observations with
filter
8.4
Reordering observations with
arrange
9
Summarising and grouping
9.1
Summarising variables with
summarise
9.1.1
More complicated calculations with
summarise
9.2
Grouped operations using
group_by
9.2.1
More than one grouping variable
9.2.2
Using
group_by
with other verbs
9.3
Removing grouping information
10
Building pipelines
10.1
Why do we need ‘pipes’?
10.2
Using pipes (
%>%
)
11
Helper functions
11.1
Introduction
11.2
Working with
select
11.3
Working with
mutate
and
transmute
11.4
Working with
filter
11.5
Working with
summarise
III Exporing Data
12
Exploratory data analysis
12.1
Introduction
12.2
Statistical variables and data
12.2.1
Numeric vs categorical variables
12.2.2
Ratio vs interval scales
12.3
Populations and samples
12.3.1
Sample distributions
12.3.2
Associations
12.4
Types of EDA
12.5
A primer of descriptive statistics
12.5.1
Numeric variables
12.5.2
Categorical variables
12.5.3
Associations
13
Introduction to
ggplot2
13.1
The anatomy of ggplot2
13.1.1
Layers
13.1.2
Scales
13.1.3
Coordinate system
13.1.4
Faceting
13.2
A quick introduction to ggplot2
13.2.1
Making a start
13.3
A standard way of using
ggplot2
13.3.1
How should we format
ggplot2
code?
13.4
Increasing the information density…
13.4.1
…via aesthetic mappings
13.4.2
…via facets
13.4.3
…via multiple layers
14
Customising plots
14.1
Geom properties
14.1.1
Relationship between aesthetic mappings and geom properties
14.2
Plot scales
14.3
Labels
14.4
Themes
14.5
Advice for making plots
15
Exploring one variable
15.1
Exploring numerical variables
15.1.1
What kind of numeric variable?
15.1.2
Histograms
15.1.3
Dot plots
15.2
Exploring categorical variables
15.2.1
What kind of categorical variable?
15.2.2
Bar plots
16
Exploring associations
16.1
Associations between numeric variables
16.2
Associations between categorical variables
16.3
Categorical-numerical associations
16.3.1
Alternatives to box and whiskers plots
16.4
Multivariate associations
17
Doing more with
ggplot2
17.1
Comparing descriptive statistics
17.1.1
Error bars
17.1.2
Alternatives to bar plots
17.2
Adding text annotations
17.3
Saving plots
17.4
Multi-panel plots
Supplementary Material
A
Getting help
A.1
Introduction
A.2
Browsing the help system
A.3
Searching for help files
A.4
Navigating help files
A.5
Vignettes
B
Managing projects, scripts and data files
References
Published with bookdown
Exploratory Data Analysis with R
References