OLS: Estimator and diagnostics

MACS 33001 University of Chicago

\[\newcommand{\E}{\mathrm{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\se}{\text{se}} \newcommand{\Lagr}{\mathcal{L}} \newcommand{\lagr}{\mathcal{l}}\]

Regression

  • Regression
  • Response variable \(Y\)
  • Covariate \(X\)
  • Regression function

    \[r(x) = \E (Y | X = x) = \int y f(y|x) dy\]

  • Estimate the form

    \[(Y_1, X_1), \ldots, (Y_n, X_n) \sim F_{X,Y}\]

Simple linear regression

\[r(x) = \beta_0 + \beta_1 x\]

  • \(X_i\) is one-dimensional
  • \(r(x)\) assumed to be linear
  • \(\Var (\epsilon_i | X = x) = \sigma^2\) does not depend on \(x\)

Linear regression model

\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

  • \(\E (\epsilon_i | X_i) = 0\)
  • \(\Var (\epsilon_i | X_i) = \sigma^2\)
  • Unknown \(\beta_0\), \(\beta_1\), and \(\sigma^2\)
  • \(\hat{\beta}_0\) and \(\hat{\beta}_1\)
  • Fitted line

    \[\hat{r}(x) = \hat{\beta}_0 + \hat{\beta}_1 x\]

  • Fitted values

    \[\hat{Y}_i = \hat{r}(X_i)\]

  • Residuals

    \[\hat{\epsilon}_i = Y_i - \hat{Y}_i = Y_i - (\hat{\beta}_0 + \hat{\beta}_1 x)\]

  • Residual sum of squares

    \[RSS = \sum_{i=1}^n \hat{\epsilon}_i^2\]

Estimation strategy

Estimation strategy

Estimation strategy

  • Unbiased
    • \(E(\hat{\beta}) = \beta\)
  • Efficient
    • \(\min(Var(\hat{\beta}))\)

Least squares estimator

  • Values \(\hat{\beta}_0, \hat{\beta}_1\) that

    \[\min(RSS)\]

\[ \begin{aligned} \hat \beta_1 & = \dfrac{ \sum_{i=1}^n Y_iX_i - \bar{Y}_n \sum_{i=1}^nX_i}{ \sum_{i=1}^n X_i^2 - \bar{X}_n\sum_{i=1}^nX_i}\\ \hat \beta_0 & = \bar{Y}_n - \hat{\beta}_1 \bar{X}_n \\ \hat{\sigma}^2 &= \left( \frac{1}{n - 2} \right) \sum_{i=1}^n \hat{\epsilon}_i^2 \end{aligned} \]

Least squares estimator

Multivariate formulation

\[\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{u}\]

  • \(\mathbf{Y}\): \(N\times 1\) vector
  • \(\mathbf{X}\): \(N \times K\) matrix
  • \(\boldsymbol{\beta}\): \(K \times 1\) vector
  • \(\mathbf{u}\): \(N\times 1\) vector
  • \(i \in \{1,\ldots,N \}\)
  • \(k \in \{1,\ldots,K \}\)

Multivariate formulation

\[(\mathbf{X'X})^{-1}\mathbf{X'Y} = \boldsymbol{\beta}\]

Maximum likelihood estimator

  • \(\epsilon_i | X_i \sim N(0, \sigma^2)\)

    \[Y_i | X_i \sim N(\mu_i, \sigma^2)\]

    • \(\mu_i = \beta_0 + \beta_1 X_i\)
  • Likelihood function

    \[ \begin{align} \prod_{i=1}^n f(X_i, Y_i) &= \prod_{i=1}^n f_X(X_i) f_{Y | X} (Y_i | X_i) \\ &= \prod_{i=1}^n f_X(X_i) \times \prod_{i=1}^n f_{Y | X} (Y_i | X_i) \\ &= \Lagr_1 \times \Lagr_2 \end{align} \]

    \[ \begin{align} \Lagr_1 &= \prod_{i=1}^n f_X(X_i) \\ \Lagr_2 = \prod_{i=1}^n f_{Y | X} (Y_i | X_i) \end{align} \]

  • Conditional likelihood

    \[ \begin{align} \Lagr_2 &\equiv \Lagr(\beta_0, \beta_1, \sigma^2) \\ &= \prod_{i=1}^n f_{Y | X}(Y_i | X_i) \\ &\propto \frac{1}{\sigma} \exp \left\{ -\frac{1}{2\sigma^2} \sum_{i=1}^n (Y_i - \mu_i)^2 \right\} \end{align} \]

  • Conditional log-likelihood

    \[\lagr(\beta_0, \beta_1, \sigma^2) = -n \log(\sigma) - \frac{1}{2\sigma^2} \left( Y_i - (\beta_0 + \beta_1 X_i) \right)^2\]

    • \(\max \left( \lagr(\beta_0, \beta_1, \sigma^2) \right)\)
    • Equivalent to \(\min(RSS)\)

      \[RSS = \sum_{i=1}^n \hat{\epsilon}_i^2 = \sum_{i=1}^n \left( Y_i - (\hat{\beta}_0 + \hat{\beta}_1 x) \right)\]

Properties of the least squares estimator

  • Conditional on \(X^n = (X_1, \ldots, X_n)\)
  • \(\hat{\beta}^T = (\hat{\beta}_0, \hat{\beta}_1)^T\)

    \[ \begin{align} \E (\hat{\beta} | X^n) &= \begin{pmatrix} \beta_0 \\ \beta_1 \end{pmatrix} \\ \Var (\hat{\beta} | X^n) &= \frac{\sigma^2}{n s_X^2} \begin{pmatrix} \frac{1}{n} \sum_{i=1}^n X_i^2 & -\bar{X}^n \\ -\bar{X}^n & 1 \end{pmatrix} \end{align} \]

    \[s_X^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X}_n)^2\]

Properties of the least squares estimator

\[ \begin{align} \widehat{\se} (\hat{\beta}_0) &= \frac{\hat{\sigma}}{s_X \sqrt{n}} \sqrt{\frac{ \sum_{i=1}^n X_i^2}{n}} \\ \widehat{\se} (\hat{\beta}_1) &= \frac{\hat{\sigma}}{s_X \sqrt{n}} \end{align} \]

Properties of the least squares estimator

  1. Consistency

    \[\hat{\beta}_0 \xrightarrow{P} \beta_0 \, \text{and} \, \hat{\beta}_1 \xrightarrow{P} \beta_1\]

  2. Asymptotic normality

    \[\frac{\hat{\beta}_0 - \beta_0}{\widehat{\se}(\hat{\beta}_0)} \leadsto N(0,1) \, \text{and} \, \frac{\hat{\beta}_1 - \beta_1}{\widehat{\se}(\hat{\beta}_1)} \leadsto N(0,1)\]

  3. Approximate \(1 - \alpha\) confidence intervals for \(\beta_0\) and \(\beta_1\) are

    \[\hat{\beta}_0 \pm z_{\alpha / 2} \widehat{\se}(\hat{\beta}_0) \, \text{and} \, \hat{\beta}_1 \pm z_{\alpha / 2} \widehat{\se}(\hat{\beta}_1)\]

  4. The Wald test for testing \(H_0: \beta_1 = 0\) versus \(H_1: \beta_1 \neq 0\) is reject \(H_0\) if \(\mid W \mid > z_{\alpha / 2}\) where

    \[W = \frac{\hat{\beta}_1}{\widehat{\se}(\hat{\beta}_1)}\]

Assumptions of linear regression models

  • Linearity
  • Constant variance
  • Normality
  • Independence
  • Fixed \(X\)
  • \(X\) is not invariant

Linearity

\[\E(\epsilon_i) \equiv E(\epsilon_i | X_i) = 0\]

\[ \begin{aligned} \mu_i \equiv E(Y_i) \equiv E(Y | X_i) &= E(\beta_0 + \beta_1 X_i + \epsilon_i) \\ \mu_i &= \beta_0 + \beta_1 X_i + E(\epsilon_i) \\ \mu_i &= \beta_0 + \beta_1 X_i + 0 \\ \mu_i &= \beta_0 + \beta_1 X_i \end{aligned} \]

Constant variance

  • The variance of the errors is the same regardless of the values of \(X\)

    \[\Var(\epsilon_i | X_i) = \sigma^2\]

Normality

  • The errors are assumed to be normally distributed

    \[\epsilon_i \mid X_i \sim N(0, \sigma^2)\]

Independence

  • Observations are sampled independently from one another
  • Any pair of errors \(\epsilon_i\) and \(\epsilon_j\) are independent for \(i \neq j\).

Fixed \(X\)

  • \(X\) is assumed to be fixed or measured without error and independent of the error
  • Experimental design
  • Observational study

    \[\epsilon_i \sim N(0, \sigma^2), \text{for } i = 1, \dots, n\]

\(X\) is not invariant

Handling violations of assumptions

  • Ooopsie
  • Inference can be
    • Tricky
    • Biased
    • Inefficient
    • Error prone
  • Move to more robust inferential method
  • Diagnose assumption violations and solve them in the OLS framework

Unusual and influential data

  • Outliers
    • Unusual observations
    • Due to \(Y_i\)
    • Due to \(X_i\)
    • Some combination of the two
  • Potential disproportionate influence on the regression model

Terms

  • Outlier
  • Leverage
  • Discrepancy
  • Influence

    \[\text{Influence} = \text{Leverage} \times \text{Discrepancy}\]

Unusual and influential data

Measuring leverage

  • Hat statistic

    \[h_i = \frac{1}{n} + \frac{(X_i - \bar{X})^2}{\sum_{j=1}^{n} (X_{j} - \bar{X})^2}\]

  • Measures the contribution of observation \(Y_i\) to the fitted value \(\hat{Y}_j\)
  • Solely a function of \(X\)
  • Larger values indicate higher leverage

Measuring discrepancy

  • Use the residuals?
  • Standardized residual

    \[\hat{\epsilon}_i ' \equiv \frac{\hat{\epsilon}_i}{S_{E} \sqrt{1 - h_i}}\]

    • \(S_E = \sqrt{\frac{\hat{\epsilon}_i^2}{(n - k - 1)}}\)
  • Studentized residual

    \[\hat{\epsilon}_i^{\ast} \equiv \frac{\hat{\epsilon}_i}{S_{E(-i)} \sqrt{1 - h_i}}\]

Measuring influence

  • \(\text{DFBETA}_{ij}\)

    \[D_{ij} = \hat{\beta_1j} - \hat{\beta}_{1j(-i)}, \text{for } i=1, \dots, n \text{ and } j = 0, \dots, k\]

  • \(\text{DFBETAS}_{ij}\)

    \[D^{\ast}_{ij} = \frac{D_{ij}}{SE_{-i}(\beta_{1j})}\]

  • Cook’s D

    \[D_i = \frac{\hat{\epsilon}^{'2}_i}{k + 1} \times \frac{h_i}{1 - h_i}\]

Visualizing leverage, discrepancy, and influence

  • Number of federal laws struck down by the U.S. Supreme Court in each Congress
  1. Age
  2. Tenure
  3. Unified

OLS model

## # A tibble: 4 x 5
##   term        estimate std.error statistic    p.value
##   <chr>          <dbl>     <dbl>     <dbl>      <dbl>
## 1 (Intercept) -12.1       2.54       -4.76 0.00000657
## 2 age           0.219     0.0448      4.88 0.00000401
## 3 tenure       -0.0669    0.0643     -1.04 0.300     
## 4 unified       0.718     0.458       1.57 0.121

Outliers?

Outliers?

Hat-values

  • Anything exceeding twice the average \(\bar{h} = \frac{k + 1}{n}\) is noteworthy

    ## # A tibble: 9 x 10
    ##   Congress congress nulls   age tenure unified  year    hat student  cooksd
    ##   <chr>       <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>  <dbl>   <dbl>   <dbl>
    ## 1 1st             1     0  49.8  0.800       1  1789 0.0974   0.330 0.00296
    ## 2 3rd             3     0  52.8  4.20        0  1793 0.113    0.511 0.00841
    ## 3 12th           12     0  49    6.60        1  1811 0.0802   0.669 0.00980
    ## 4 17th           17     0  59   16.6         1  1821 0.0887  -0.253 0.00157
    ## 5 20th           20     0  61.7 17.4         1  1827 0.0790  -0.577 0.00719
    ## 6 23rd           23     0  64   18.4         1  1833 0.0819  -0.844 0.0159 
    ## 7 34th           34     0  64   14.6         0  1855 0.0782  -0.561 0.00671
    ## 8 36th           36     0  68.7 17.8         0  1859 0.102   -1.07  0.0326 
    ## 9 99th           99     3  71.9 16.7         0  1985 0.0912   0.295 0.00221

Studentized residuals

  • Anything outside of the range \([-2,2]\) is discrepant

    ## # A tibble: 7 x 10
    ##   Congress congress nulls   age tenure unified  year    hat student cooksd
    ##   <chr>       <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>  <dbl>   <dbl>  <dbl>
    ## 1 67th           67     6  66     9          1  1921 0.0361    2.14 0.0415
    ## 2 74th           74    10  71.1  14.2        1  1935 0.0514    4.42 0.223 
    ## 3 90th           90     6  64.7  13.3        1  1967 0.0195    2.49 0.0292
    ## 4 91st           91     6  65.1  13          1  1969 0.0189    2.42 0.0269
    ## 5 92nd           92     5  62     9.20       1  1971 0.0146    2.05 0.0150
    ## 6 98th           98     7  69.9  14.7        0  1983 0.0730    3.02 0.165 
    ## 7 104th         104     8  60.6  12.5        1  1995 0.0208    4.48 0.0897

Influence

  • \(D_i > \frac{4}{n - k - 1}\)

    ## # A tibble: 4 x 10
    ##   Congress congress nulls   age tenure unified  year    hat student cooksd
    ##   <chr>       <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>  <dbl>   <dbl>  <dbl>
    ## 1 67th           67     6  66      9         1  1921 0.0361    2.14 0.0415
    ## 2 74th           74    10  71.1   14.2       1  1935 0.0514    4.42 0.223 
    ## 3 98th           98     7  69.9   14.7       0  1983 0.0730    3.02 0.165 
    ## 4 104th         104     8  60.6   12.5       1  1995 0.0208    4.48 0.0897

How to treat unusual observations

  • Mistake
  • Weird observation
  • Why is it weird?

Omit unusual observations

## [1] 0.232
## [1] 0.258
## [1] 1.68
## [1] 1.29

Non-normally distributed errors

\[\epsilon_i | X_i \sim N(0, \sigma^2)\]

  • Under CLT, inference based on the least-squares estimator is approximately valid under broad conditions
  • Violation does not effect validity
  • Violation effects efficiency

Detecting non-normally distributed errors

## # A tibble: 3,997 x 4
##      age sex    compositeHourlyWages yearsEducation
##    <int> <chr>                 <dbl>          <int>
##  1    40 Male                  10.6              15
##  2    19 Male                  11                13
##  3    46 Male                  17.8              14
##  4    50 Female                14                16
##  5    31 Male                   8.2              15
##  6    30 Female                17.0              13
##  7    61 Female                 6.7              12
##  8    46 Female                14                14
##  9    43 Male                  19.2              18
## 10    17 Male                   7.25             11
## # ... with 3,987 more rows
## # A tibble: 4 x 5
##   term           estimate std.error statistic   p.value
##   <chr>             <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)      -8.12    0.599       -13.6 5.27e- 41
## 2 sexMale           3.47    0.207        16.8 4.04e- 61
## 3 yearsEducation    0.930   0.0343       27.1 5.47e-149
## 4 age               0.261   0.00866      30.2 3.42e-180

## [1]  967 3911

Detecting non-normally distributed errors

Fixing non-normally distributed errors

  • Power transformations
  • Log transformations

Fixing non-normally distributed errors

## # A tibble: 4 x 5
##   term           estimate std.error statistic   p.value
##   <chr>             <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)      1.10    0.0380        28.9 1.97e-167
## 2 sexMale          0.224   0.0131        17.1 2.16e- 63
## 3 yearsEducation   0.0559  0.00217       25.7 2.95e-135
## 4 age              0.0182  0.000549      33.1 4.50e-212

## [1] 2760 3267

Non-constant error variance

  • Homoscedasticity

    \[\text{Var}(\epsilon_i) = \sigma^2\]

    \[\widehat{\se}(\hat{\beta}_{1j}) = \sqrt{\hat{\sigma}^{2} (X'X)^{-1}_{jj}}\]

  • Heteroscedasticity

Detecting heteroscedasticity

\[Y_i = 2 + 3X_i + \epsilon\]

\[\epsilon_i \sim N(0,1)\]

Detecting heteroscedasticity

\[Y_i = 2 + 3X_i + \epsilon_i\]

\[\epsilon_i \sim N(0,\frac{X}{2})\]

Breusch-Pagan test

  • Estimate an OLS model and obtain the squared residuals \(\hat{\epsilon}^2\)
  • Regress \(\hat{\epsilon}^2\) against:
    • All the \(k\) variables you think might be causing the heteroscedasticity
  • Calculate \(R^2_{\hat{\epsilon}^2}\) for the residual model and multiply it by the number of observations \(n\)
    • \(\chi^2_{(k-1)}\) distribution

Breusch-Pagan test

## 
##  studentized Breusch-Pagan test
## 
## data:  sim_homo_mod
## BP = 0.3, df = 1, p-value = 0.6
## 
##  studentized Breusch-Pagan test
## 
## data:  sim_hetero_mod
## BP = 200, df = 1, p-value <2e-16

Weighted least squares regression

  • \(\epsilon_i \sim N(0, \sigma_i^2)\)

    \[ \begin{bmatrix} \sigma_1^2 & 0 & 0 & 0 \\ 0 & \sigma_2^2 & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \sigma_n^2 \\ \end{bmatrix} \]

  • \(w_i = \frac{1}{\sigma_i^2}\)

    \[ \mathbf{W} = \begin{bmatrix} \frac{1}{\sigma_1^2} & 0 & 0 & 0 \\ 0 & \frac{1}{\sigma_2^2} & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \frac{1}{\sigma_n^2} \\ \end{bmatrix} \]

  • Traditional regression estimator

    \[\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\mathbf{y}\]

  • Substitute with weighting matrix

    \[\hat{\boldsymbol{\beta}} = (\mathbf{X}' \mathbf{W} \mathbf{X})^{-1} \mathbf{X}' \mathbf{W} \mathbf{y}\]

    \[\sigma_{i}^2 = \frac{\sum(w_i \hat{\epsilon}_i^2)}{n}\]

Estimating the weights \(W_i\)

  1. Use the residuals from a preliminary OLS regression to obtain estimates of the error variance within different subsets of observations
  2. Model the weights as a function of observable variables in the model

Weighted least squares regression

Weighted least squares regression

## # A tibble: 4 x 5
##   term           estimate std.error statistic   p.value
##   <chr>             <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)      -8.12    0.599       -13.6 5.27e- 41
## 2 sexMale           3.47    0.207        16.8 4.04e- 61
## 3 yearsEducation    0.930   0.0343       27.1 5.47e-149
## 4 age               0.261   0.00866      30.2 3.42e-180
## # A tibble: 4 x 5
##   term           estimate std.error statistic p.value
##   <chr>             <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)      -8.10   0.0160       -506.       0
## 2 sexMale           3.47   0.00977       356.       0
## 3 yearsEducation    0.927  0.00142       653.       0
## 4 age               0.261  0.000170     1534.       0

Corrections for the variance-covariance estimates

  • Correct standard errors only
  • Huber-White (robust) standard errors

Corrections for the variance-covariance estimates

## # A tibble: 4 x 6
##   term           estimate std.error statistic   p.value std.error.rob
##   <chr>             <dbl>     <dbl>     <dbl>     <dbl>         <dbl>
## 1 (Intercept)      -8.12    0.599       -13.6 5.27e- 41       0.636  
## 2 sexMale           3.47    0.207        16.8 4.04e- 61       0.207  
## 3 yearsEducation    0.930   0.0343       27.1 5.47e-149       0.0385 
## 4 age               0.261   0.00866      30.2 3.42e-180       0.00881

Non-linearity in the data

  • Assume \(\E (\epsilon_i) = 0\)
  • Implies the regression line accurately reflects the relationship between \(X\) and \(Y\)
  • Nonlinearity
    • Nonlinear relationship
    • Conditional relationship

Partial residual

\[\hat{\epsilon}_i^{(j)} = \hat{\epsilon}_i + \hat{\beta}_j X_{ij}\]

Partial residual plots

Correcting for nonlinearity

\[\log(\text{Wage}) = \beta_0 + \beta_1(\text{Male}) + \beta_2 \text{Age} + \beta_3 \text{Age}^2 + \beta_4 \text{Education}^2\]

## # A tibble: 5 x 5
##   term                 estimate std.error statistic   p.value
##   <chr>                   <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)          0.397    0.0578         6.87 7.62e- 12
## 2 sexMale              0.221    0.0124        17.8  3.21e- 68
## 3 I(yearsEducation^2)  0.00181  0.0000786     23.0  1.19e-109
## 4 age                  0.0830   0.00319       26.0  2.93e-138
## 5 I(age^2)            -0.000852 0.0000410    -20.8  3.85e- 91

Correcting for nonlinearity

  • Partial residuals

    \[\hat{\epsilon}_i^{\text{Age}} = 0.083 \times \text{Age}_i -0.0008524 \times \text{Age}^2_i + \hat{\epsilon}_i\]

    \[\hat{\epsilon}_i^{\text{Education}} = 0.002 \times \text{Education}^2_i + \hat{\epsilon}_i\]

  • Partial fits

    \[\hat{Y}_i^{(\text{Age})} = 0.083 \times \text{Age}_i -0.0008524 \times \text{Age}^2_i\]

    \[\hat{Y}_i^{(\text{Education})} = 0.002 \times \text{Education}^2_i\]

Correcting for nonlinearity

Collinearity

A state of a model where explanatory variables are correlated with one another

Perfect collinearity

## 
## Call:
## lm(formula = mpg ~ disp + wt + cyl, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.403 -1.403 -0.495  1.339  6.072 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 41.10768    2.84243   14.46  1.6e-14 ***
## disp         0.00747    0.01184    0.63   0.5332    
## wt          -3.63568    1.04014   -3.50   0.0016 ** 
## cyl         -1.78494    0.60711   -2.94   0.0065 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.59 on 28 degrees of freedom
## Multiple R-squared:  0.833,  Adjusted R-squared:  0.815 
## F-statistic: 46.4 on 3 and 28 DF,  p-value: 5.4e-11

Perfect collinearity

## 
## Call:
## lm(formula = mpg ~ disp + wt + cyl + disp_mean, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.403 -1.403 -0.495  1.339  6.072 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 41.10768    2.84243   14.46  1.6e-14 ***
## disp         0.00747    0.01184    0.63   0.5332    
## wt          -3.63568    1.04014   -3.50   0.0016 ** 
## cyl         -1.78494    0.60711   -2.94   0.0065 ** 
## disp_mean         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.59 on 28 degrees of freedom
## Multiple R-squared:  0.833,  Adjusted R-squared:  0.815 
## F-statistic: 46.4 on 3 and 28 DF,  p-value: 5.4e-11

Perfect collinearity

Less-than-perfect collinearity

Less-than-perfect collinearity

## # A tibble: 3 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept) -173.     43.8         -3.96 9.01e-  5
## 2 age           -2.29    0.672       -3.41 7.23e-  4
## 3 limit          0.173   0.00503     34.5  1.63e-121

Less-than-perfect collinearity

## # A tibble: 3 x 5
##   term         estimate std.error statistic  p.value
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept) -378.       45.3       -8.34  1.21e-15
## 2 limit          0.0245    0.0638     0.384 7.01e- 1
## 3 rating         2.20      0.952      2.31  2.13e- 2

Less-than-perfect collinearity

Correlation matrix

Scatterplot matrix

Variance inflation factor (VIF)

  • Ratio of the variance of \(\hat{\beta}_{1j}\) when fitting the full model divided by the variance of \(\hat{\beta}_{1j}\) if fit on its own model
  • Rule of thumb - greater than 10

Variance inflation factor (VIF)

##   age limit 
##  1.01  1.01
##  limit rating 
##    160    160

Fixing multicollinearity

  • Drop one or more of the collinear variables from the model - NO!
  • Add more data
  • Transform the variables
  • Shrinkage methods