6 R programming
This chapter teaches you the R skills that are needed to pass PA.
6.1 Notebook chunks
On the Exam, you will start with an .Rmd (R Markdown) template, which organize code into R Notebooks. Within each notebook, code is organized into chunks.
# This is a chunk
Your time is valuable. Throughout this book, I will include useful keyboard shortcuts.
Shortcut: To run everything in a chunk quickly, press CTRL + SHIFT + ENTER
. To create a new chunk, use CTRL + ALT + I
.
6.2 Basic operations
The usual math operations apply.
# Addition
1 + 2
## [1] 3
3 - 2
## [1] 1
# Multiplication
2 * 2
## [1] 4
# Division
4 / 2
## [1] 2
# Exponentiation
2^3
## [1] 8
There are two assignment operators: =
and <-
. The latter is preferred because
it is specific to assigning a variable to a value. The =
operator is also used
for specifying arguments in functions (see the functions section).
Shortcut: ALT + -
creates a <-
..
# Variable assignment
<- 2
y
# Equality
4 == 2
## [1] FALSE
5 == 5
## [1] TRUE
3.14 > 3
## [1] TRUE
3.14 >= 3
## [1] TRUE
Vectors can be added just like numbers. The c
stands for “concatenate,” which
creates vectors.
<- c(1, 2)
x <- c(3, 4)
y + y x
## [1] 4 6
* y x
## [1] 3 8
<- x + y
z ^2 z
## [1] 16 36
/ 2 z
## [1] 2 3
+ 3 z
## [1] 7 9
I already mentioned numeric
types. There are also character
(string) types,
factor
types, and boolean
types.
<- "The"
character <- c("The", "Quick") character_vector
Character vectors can be combined with the paste()
function.
<- "The"
a <- "Quick"
b <- "Brown"
c <- "Fox"
d paste(a, b, c, d)
## [1] "The Quick Brown Fox"
Factors look like character vectors but can only contain a finite number of predefined values.
The below factor has only one “level,” which is the list of assigned values.
<- as.factor(character)
factor levels(factor)
## [1] "The"
The levels of a factor are by default in R in alphabetical order (Q comes alphabetically before T).
<- as.factor(character_vector)
factor_vector levels(factor_vector)
## [1] "Quick" "The"
In building linear models, the order of the factors matters. In GLMs, the “reference level” or “base level” should always be the level which has the most observations. This will be covered in the section on linear models.
Booleans are just TRUE
and FALSE
values. R understands T
or TRUE
in the
same way, but the latter is preferred. When doing math, bools are converted to
0/1 values where 1 is equivalent to TRUE and 0 FALSE.
<- TRUE
bool_true <- FALSE
bool_false * bool_false bool_true
## [1] 0
Booleans are automatically converted into 0/1 values when there is a math operation.
+ 1 bool_true
## [1] 2
Vectors work in the same way.
<- c(TRUE, TRUE, FALSE)
bool_vect sum(bool_vect)
## [1] 2
Vectors are indexed using [
. If you are only extracting a single element, you
should use [[
for clarity.
<- c("a", "b", "c")
abc 1]] abc[[
## [1] "a"
2]] abc[[
## [1] "b"
c(1, 3)] abc[
## [1] "a" "c"
c(1, 2)] abc[
## [1] "a" "b"
-2] abc[
## [1] "a" "c"
-c(2, 3)] abc[
## [1] "a"
6.3 Lists
Lists are vectors that can hold mixed object types.
<- list(TRUE, "Character", 3.14)
my_list my_list
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] "Character"
##
## [[3]]
## [1] 3.14
Lists can be named.
<- list(bool = TRUE, character = "character", numeric = 3.14)
my_list my_list
## $bool
## [1] TRUE
##
## $character
## [1] "character"
##
## $numeric
## [1] 3.14
The $
operator indexes lists.
$numeric my_list
## [1] 3.14
$numeric + 5 my_list
## [1] 8.14
Lists can also be indexed using [[
.
1]] my_list[[
## [1] TRUE
2]] my_list[[
## [1] "character"
Lists can contain vectors, other lists, and any other object.
<- list(vector = c(1, 2, 3),
everything character = c("a", "b", "c"),
list = my_list)
everything
## $vector
## [1] 1 2 3
##
## $character
## [1] "a" "b" "c"
##
## $list
## $list$bool
## [1] TRUE
##
## $list$character
## [1] "character"
##
## $list$numeric
## [1] 3.14
To find out the type of an object, use class
or str
or summary
.
class(x)
## [1] "numeric"
class(everything)
## [1] "list"
str(everything)
## List of 3
## $ vector : num [1:3] 1 2 3
## $ character: chr [1:3] "a" "b" "c"
## $ list :List of 3
## ..$ bool : logi TRUE
## ..$ character: chr "character"
## ..$ numeric : num 3.14
summary(everything)
## Length Class Mode
## vector 3 -none- numeric
## character 3 -none- character
## list 3 -none- list
6.4 Functions
You only need to understand the very basics of functions. The big picture is that understanding functions help you to understand everything in R, since R is a functional programming language, unlike Python, C, VBA, Java, all object-oriented, or SQL, which is not a language but a series of set-operations.
Functions do things. The convention is to name a function as a verb. The function make_rainbows()
would create a rainbow. The function summarise_vectors()
would summarise vectors. Functions may or may not have an input and output.
If you need to do something in R, there is a high probability that someone has already written a function to do it. That being said, creating simple functions is quite helpful.
Here is an example that has a side effect of printing the input:
<- function(my_name){
greet_me print(paste0("Hello, ", my_name))
}
greet_me("Future Actuary")
## [1] "Hello, Future Actuary"
A function that returns something
When returning the last evaluated expression, the return
statement is optional. In fact, it is discouraged by convention.
<- function(x, y) {
add_together + y
x
}
add_together(2, 5)
## [1] 7
<- function(x, y) {
add_together # Works, but bad practice
return(x + y)
}
add_together(2, 5)
## [1] 7
Binary operations in R are vectorized. In other words, they are applied element-wise.
<- c(1, 2, 3)
x_vector <- c(4, 5, 6)
y_vector add_together(x_vector, y_vector)
## [1] 5 7 9
Many functions in R actually return lists! This is why R objects can be indexed with dollar sign.
library(ExamPAData)
<- lm(charges ~ age, data = health_insurance)
model $coefficients model
## (Intercept) age
## 3165.8850 257.7226
Here’s a function that returns a list.
<- function(x,y) {
sum_multiply <- x + y
sum <- x * y
product list("Sum" = sum, "Product" = product)
}
<- sum_multiply(2, 3)
result $Sum result
## [1] 5
$Product result
## [1] 6
6.5 Data frames
You can think of a data frame as a table that is implemented as a list of vectors.
<- data.frame(
df age = c(25, 35),
has_fsa = c(FALSE, TRUE)
) df
## age has_fsa
## 1 25 FALSE
## 2 35 TRUE
To index columns in a data frame, the same “$” is used as indexing a list.
$age df
## [1] 25 35
To find the number of rows and columns, use dim
.
dim(df)
## [1] 2 2
To find a summary, use summary
summary(df)
## age has_fsa
## Min. :25.0 Mode :logical
## 1st Qu.:27.5 FALSE:1
## Median :30.0 TRUE :1
## Mean :30.0
## 3rd Qu.:32.5
## Max. :35.0