9 Generalized linear Models (GLMs)

Lecture

Already enrolled? Watch the full video: Practice Exams + Lessons

GLMs are a broad category of models. Ordinary Least Squares and Logistic Regression are both examples of GLMs.

9.0.1 Assumptions of OLS

We assume that the target is Gaussian with a mean equal to the linear predictor. This can be broken down into two parts:

  1. A random component: The target variable \(Y|X\) is normally distributed with mean \(\mu = \mu(X) = E(Y|X)\)

  2. A link between the target and the covariates (also known as the systemic component) \(\mu(X) = X\beta\)

This says that each observation follows a normal distribution that has a mean that is equal to the linear predictor. Another way of saying this is that “after we adjust for the data, the error is normally distributed and the variance is constant.” If \(I\) is an n-by-in identity matrix, and \(\sigma^2 I\) is the covariance matrix, then

\[ \mathbf{Y|X} \sim N( \mathbf{X \beta}, \mathbf{\sigma^2} I) \]

9.0.2 Assumptions of GLMs

GLMs are more general which eludes that they are more flexible. We relax these two assumptions by saying that the model is defined by

  1. A random component: \(Y|X \sim \text{some exponential family distribution}\)

  2. A link: between the random component and covariates:

\[g(\mu(X)) = X\beta\] where \(g\) is called the link function and \(\mu = E[Y|X]\).

Each observation follows some type of exponential distribution (Gamma, Inverse Gaussian, Poisson, Binomial, etc.), and that distribution has a mean which is related to the linear predictor through the link function. Additionally, there is a dispersion parameter, but that is more info is needed here. For an explanation, see Ch. 2.2 of CAS Monograph 5.

9.1 Advantages and disadvantages

There is usually at least one question on the PA exam which asks you to “list some of the advantages and disadvantages of using this particular model,” and so here is one such list. It is unlikely that the grader will take off points for including too many comments and so a good strategy is to include everything that comes to mind.

GLM Advantages

  • Easy to interpret
  • Can easily be deployed in spreadsheet format
  • Handles different response/target distributions
  • Is commonly used in insurance ratemaking

GLM Disadvantages

  • Does not select features (without stepwise selection)
  • Strict assumptions around distribution shape and randomness of error terms
  • Predictor variables need to be uncorrelated
  • Unable to detect non-linearity directly (although this can manually be addressed through feature engineering)
  • Sensitive to outliers
  • Low predictive power

9.2 GLMs for regression

For regression problems, we try to match the actual distribution to the distribution of the model being used in the GLM. These are the most likely distributions.

The choice of target distribution should be similar to the actual distribution of \(Y\). For instance, if \(Y\) is never less than zero, then using the Gaussian distribution is not ideal because this can allow for negative values. If the distribution is right-skewed, then the Gamma or Inverse Gaussian may be appropriate because they are also right-skewed.

Notice that the top three distributions are continuous but the bottom two are discrete.

There are five link functions for a continuous \(Y\), , although the choice of distribution family will typically rule out several of these immediately. The linear predictor (a.k.a., the systemic component) is \(z\) and the link function is how this connects to the expected value of the response.

\[z = X\beta = g(\mu)\]

If the target distribution must have a positive mean, such as in the Inverse Gaussian or Gamma, then the Identity or Inverse links are poor choices because they allow for negative values; the mean range is \((-\infty, \infty)\). The other link functions force the mean to be positive.

9.3 Interpretation of coefficients

The GLM’s interpretation depends on the choice of link function.