Exam PA Study Manual
Preface
1
Join the Online Course
2
The exam
3
Prometric Demo
4
Introduction
5
Getting started
5.1
Download the data
5.2
Download ISLR
5.3
New users
6
R programming
6.1
Notebook chunks
6.2
Basic operations
6.3
Lists
6.4
Functions
6.5
Data frames
6.6
Pipes
6.7
The SOA’s code doesn’t use pipes or dplyr, so can I skip learning this?
7
Data manipulation
7.1
Garbage in; garbage out 🗑
7.2
Be a detective 🔎
7.3
A picture is worth a thousand words 📷
7.4
Factor or numeric ❓
7.5
73.6% of statistics are false 😱
7.6
How to save time with dplyr
7.7
Look at the data
7.8
Transform the data
7.9
Exercises
7.10
Answers to exercises
8
Visualization
8.1
Create a plot object (ggplot)
8.2
Add a plot
8.3
Data manipulation chaining
9
Introduction to modeling
9.1
Modeling vocabulary
9.2
Modeling notation
9.3
Ordinary Least Squares (OLS)
9.4
Regression vs. classification
9.5
Regression metrics
9.6
Example
9.6.1
Assumptions of OLS
9.6.2
Assumptions of GLMs
9.7
Advantages and disadvantages
9.8
GLMs for regression
9.9
Interpretation of coefficients
9.9.1
Identity link
9.9.2
Log link
9.10
Other links
9.11
Binary target
9.12
Count target
9.13
Link functions
9.14
Interpretation of coefficients
9.14.1
Logit
9.14.2
Probit, Cauchit, Cloglog
9.15
Example
9.16
Area Under the ROC Curve (AUC)
9.17
Additional reading
9.18
Residuals
9.18.1
Raw residuals
9.18.2
Deviance residuals
9.19
Example
9.20
Log transforms of continuous predictors
9.21
Reference levels
9.22
Interactions
9.23
Offsets
9.24
Tweedie regression
9.25
Combinations of Link Functions and Target Distributions
9.25.1
Gaussian Response with Log Link
9.25.2
Gaussian Response with Inverse Link
9.25.3
Gaussian Response with Identity Link
9.25.4
Gaussian Response with Log Link and Negative Values
9.25.5
Gamma Response with Log Link
9.25.6
Gamma with Inverse Link
9.26
Stepwise subset selection
9.27
Penalized Linear Models
9.28
Ridge Regression
9.29
Lasso
9.30
Elastic Net
9.31
Advantages and disadvantages
9.32
Example: Ridge Regression
9.33
Example: The Lasso
9.34
References
10
Tree-based models
10.1
Decision Trees
10.1.1
Advantages and disadvantages
10.2
Ensemble learning
10.2.1
Bagging
10.2.2
Boosting
10.3
Random Forests
10.3.1
Example
10.3.2
Variable Importance
10.3.3
Partial dependence
10.3.4
Advantages and disadvantages
10.4
Gradient Boosted Trees
10.4.1
AdaBoost
10.4.2
Gradient Boosting
10.4.3
Notation
10.4.4
Parameters
10.4.5
Example
10.4.6
Advantages and disadvantages
10.5
Exercises
10.5.1
1. RF with
randomForest
10.5.2
2. RF tuning with
caret
10.5.3
3. Tuning a GBM with
caret
11
Unsupervised Learning
11.1
Principal Component Analysis (PCA)
11.1.1
Example: PCA on US Arrests
11.1.2
Example: PCA on Cancel Cells
11.2
Clustering
11.2.1
K-Means Clustering
11.3
Hierarchical Clustering
11.3.1
Example: Clustering Cancel Cells
11.3.2
References
12
References
Published with bookdown
Predictive Analytics for Actuaries
1
Join the Online Course