OLS: Interaction terms

MACS 33001 University of Chicago

\[\newcommand{\E}{\mathrm{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\se}{\text{se}} \newcommand{\Lagr}{\mathcal{L}} \newcommand{\lagr}{\mathcal{l}}\]

Additive model

\[Y = \beta_0 + \beta_1 X + \beta_2 Z + \epsilon_i\]

Additive model

Additive model

\[\E[Y] = \beta_0 + \beta_1 X + \beta_2 Z\]

\[\frac{\delta \E[Y]}{\delta X} = \beta_1\]

\[\frac{\delta \E[Y]}{\delta Z} = \beta_2\]

Multiplicative interaction model

\[Y = \beta_0 + \beta_1 X + \beta_2 Z + \beta_3 XZ + \epsilon_i\]

  • Direct effects
  • Constitutive terms
  • Interaction term

Multiplicative interaction model

\[ \begin{align} \E[Y] & = \beta_0 + \beta_1 X + \beta_2 Z + \beta_3 XZ \\ & = \beta_0 + \beta_2 Z + (\beta_1 + \beta_3 Z) X \end{align} \]

\[\frac{\delta \E[Y]}{\delta X} = \beta_1 + \beta_3 Z\]

\[\E[Y] = \beta_0 + \beta_2 Z + \psi_1 X\]

\[ \begin{align} \E[Y] & = \beta_0 + \beta_1 X + (\beta_2 + \beta_3 X) Z \\ & = \beta_0 + \beta_1 X + \psi_2 Z \end{align} \]

Multiplicative interaction model

  • Conditional impact
  • If \(Z = 0\), then:

    \[ \begin{align} \E[Y] & = \beta_0 + \beta_1 X + \beta_2 (0) + \beta_3 X (0) \\ & = \beta_0 + \beta_1 X \end{align} \]

  • If \(X = 0\), then:

    \[ \begin{align} \E[Y] & = \beta_0 + \beta_1 (0) + \beta_2 Z + \beta_3 (0) Z \\ & = \beta_0 + \beta_2 Z \end{align} \]
  • \(\psi_1 = \beta_1\) and \(\psi_2 = \beta_2\)
  • \(+\beta_3\) and \(-\beta_3\)
  • \(\psi_1\) and \(\psi_2\)

Conducting inference

  • Obtaining estimates of parameters

    \[\hat{\psi}_1 = \hat{\beta}_1 + \hat{\beta}_3 Z\] \[\hat{\psi}_2 = \hat{\beta}_2 + \hat{\beta}_3 X\]

  • Obtaining estimates of standard errors

Conducting inference

  1. \(\text{Var}(aX) = a^2 \text{Var}(X)\)
  2. \(\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) + 2 \text{Cov}(X,Y)\)
  3. \(\text{Cov}(X, aY) = a \text{Cov}(X,Y)\)

Conducting inference

\[\widehat{\text{Var}(\hat{\psi}_1}) = \widehat{\text{Var} (\hat{\beta}_1)} +Z^2 \widehat{\text{Var} (\hat{\beta}_3)} + 2 Z \widehat{\text{Cov} (\hat{\beta}_1, \hat{\beta}_3)}\]

\[\widehat{\text{Var}(\hat{\psi}_2}) = \widehat{\text{Var} (\hat{\beta}_2)} + X^2 \widehat{\text{Var} (\hat{\beta}_3)} + 2 X \widehat{\text{Cov} (\hat{\beta}_2, \hat{\beta}_3)}\]

  • Depend on \(\beta_1\), \(\beta_2\), and/or \(\beta_3\)
  • Both also depend on the level/value of the interacted variable

Two dichtomous covariates

\[Y = \beta_0 + \beta_1 D_1 + \beta_2 D_2 + \beta_3 D_1 D_2 + \epsilon_i\]

\[ \begin{align} \E[Y | D_1 = 0, D_2 = 0] & = \beta_0 \\ \E[Y | D_1 = 1, D_2 = 0] & = \beta_0 + \beta_1 \\ \E[Y | D_1 = 0, D_2 = 1] & = \beta_0 + \beta_2 \\ \E[Y | D_1 = 1, D_2 = 1] & = \beta_0 + \beta_1 + \beta_2 + \beta_3 \\ \end{align} \]

Two dichtomous covariates

Two dichtomous covariates

One dichotomous and one continuous covariate

\[Y = \beta_0 + \beta_1 X + \beta_2 D + \beta_3 XD + \epsilon_i\]

\[ \begin{align} \E[Y | X, D = 0] & = \beta_0 + \beta_1 X \\ \E[Y | X, D = 1] & = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) X \end{align} \]

Two continuous covariates

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \epsilon_i\]

Two continuous covariates

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \epsilon_i\]

Quadratic, cubic, and other polynomial effects

\[Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \epsilon\]

\[\frac{\delta \E[Y]}{\delta X} = \beta_1 + 2 \beta_2 X\]

Quadratic, cubic, and other polynomial effects

Higher-order interaction terms

\[ \begin{align} Y = \beta_0 &+ \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \beta_4 X_1 X_2 \\ & + \beta_5 X_1 X_3 + \beta_6 X_2 X_3 + \beta_7 X_1 X_2 X_3 + \epsilon \end{align} \]

Higher-order interaction terms

\[ \begin{align} Y = \beta_0 &+ \beta_1 X + \beta_2 D_1 + \beta_3 D_2 + \beta_4 X D_1 \\ & + \beta_5 X D_2 + \beta_6 D_1 D_2 + \beta_7 X D_1 D_2 + \epsilon \end{align} \]

Higher-order interaction terms

Key rules

  • Don’t omit the “direct effects”
  • Zero should be meaningful
  • Rescaling the variables doesn’t guarantee statistical significance
  • Flexible alternatives
  • Interpreting three(+)-way interactions

Estimating models with multiplicative interactions

  • Obama feeling thermometer (ObamaTherm)
  • RConserv
  • ObamaConserv
  • GOP

Obama data

##    ObamaTherm       RConserv     ObamaConserv       GOP      
##  Min.   :  0.0   Min.   :1.00   Min.   :1.00   Min.   :0.00  
##  1st Qu.: 50.0   1st Qu.:2.00   1st Qu.:2.00   1st Qu.:0.00  
##  Median : 75.0   Median :5.00   Median :2.00   Median :0.00  
##  Mean   : 69.6   Mean   :4.24   Mean   :2.98   Mean   :0.24  
##  3rd Qu.:100.0   3rd Qu.:6.00   3rd Qu.:4.00   3rd Qu.:0.00  
##  Max.   :100.0   Max.   :7.00   Max.   :7.00   Max.   :1.00

Basic linear model

## # A tibble: 3 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    93.4      1.57       59.4 0.      
## 2 RConserv       -4.10     0.368     -11.2 9.48e-28
## 3 GOP           -26.5      1.59      -16.7 2.82e-57
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic   p.value    df logLik    AIC
## *     <dbl>         <dbl> <dbl>     <dbl>     <dbl> <int>  <dbl>  <dbl>
## 1     0.325         0.324  23.1      336. 9.28e-120     3 -6365. 12738.
## # ... with 3 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>

Dichotomous interaction

\[ \begin{align} \text{Obama} = \beta_0 &+ \beta_1 (\text{RConserv}) \\ & + \beta_2 (\text{GOP})\\ & + \beta_3 (\text{RConserv}) (\text{GOP}) \\ & + \epsilon \end{align} \]

Dichotomous interaction

## # A tibble: 4 x 5
##   term         estimate std.error statistic  p.value
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     92.3      1.64      56.2  0.      
## 2 RConserv        -3.81     0.388     -9.81 5.26e-22
## 3 GOP            -11.1      6.68      -1.66 9.79e- 2
## 4 RConserv:GOP    -2.86     1.20      -2.38 1.75e- 2
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic   p.value    df logLik    AIC
## *     <dbl>         <dbl> <dbl>     <dbl>     <dbl> <int>  <dbl>  <dbl>
## 1     0.328         0.326  23.0      226. 1.15e-119     4 -6362. 12735.
## # ... with 3 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>

Dichotomous interaction

  • GOP = 0

    \[ \begin{align} E(\text{Obama}) = 92.255 & -3.805 (\text{RConserv}) -11.069 (0)\\ & -2.856 (\text{RConserv} \times 0) \\ = 92.255 & -3.805 (\text{RConserv}) \end{align} \]

  • GOP = 1

    \[ \begin{align} E(\text{Obama}) & = (92.255 -11.069 (1)) + (-3.805 -2.856 (\text{RConserv} \times 1)) \\ & = 81.186 -6.661 (\text{RConserv}) \end{align} \]

Dichotomous interaction

Separate models

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    92.3      1.60       57.7 0.      
## 2 RConserv       -3.81     0.378     -10.1 7.87e-23
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    81.2       6.98     11.6  1.90e-26
## 2 RConserv       -6.66      1.22     -5.44 1.04e- 7

Causal direction

Calculating standard errors

\[ \begin{align} \text{Obama} = \beta_0 &+ (\beta_1 + \beta_3 \text{GOP}) (\text{RConserv}) \\ & + \beta_2 (\text{GOP}) + \epsilon \\ = &\beta_0 + \psi_1 (\text{RConserv}) + \beta_2 (\text{GOP}) + \epsilon \end{align} \]

  • Point estimate

    ## [1] -13.9
  • Standard error

    \[\hat{\sigma}_{\hat{\psi}_1} = \sqrt{\widehat{\text{Var}(\hat{\beta}_1)} + (\text{GOP})^2 \widehat{\text{Var}(\hat{\beta_3})} + 2 (\text{GOP}) \widehat{\text{Cov}(\hat{\beta}_1 \hat{\beta}_3)}}\]

    ##              (Intercept) RConserv    GOP RConserv:GOP
    ## (Intercept)        2.691   -0.574 -2.691        0.574
    ## RConserv          -0.574    0.151  0.574       -0.151
    ## GOP               -2.691    0.574 44.677       -7.797
    ## RConserv:GOP       0.574   -0.151 -7.797        1.442
    ## [1] 1.14

Conducting inference

Hypothesis testing

linearHypothesis(obama_ideo_gop, "RConserv + RConserv:GOP")
## Linear hypothesis test
## 
## Hypothesis:
## RConserv  + RConserv:GOP = 0
## 
## Model 1: restricted model
## Model 2: ObamaTherm ~ RConserv * GOP
## 
##   Res.Df    RSS Df Sum of Sq    F  Pr(>F)    
## 1   1394 757039                              
## 2   1393 738815  1     18225 34.4 5.7e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(obama_ideo_gop, "GOP + 7 * RConserv:GOP")
## Linear hypothesis test
## 
## Hypothesis:
## GOP  + 7 RConserv:GOP = 0
## 
## Model 1: restricted model
## Model 2: ObamaTherm ~ RConserv * GOP
## 
##   Res.Df    RSS Df Sum of Sq   F Pr(>F)    
## 1   1394 821850                            
## 2   1393 738815  1     83036 157 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Continuous interaction

\[ \begin{align} \text{Obama} = \beta_0 &+ \beta_1 (\text{RConserv}) \\ & + \beta_2 (\text{ObamaConserv})\\ & + \beta_3 (\text{RConserv}) (\text{ObamaConserv}) \\ & + \epsilon \end{align} \]

Continuous interaction

## # A tibble: 4 x 5
##   term                  estimate std.error statistic   p.value
##   <chr>                    <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)             117.       2.97      39.4  8.00e-229
## 2 RConserv                -14.9      0.600    -24.9  2.52e-113
## 3 ObamaConserv             -6.73     0.929     -7.25 7.06e- 13
## 4 RConserv:ObamaConserv     2.81     0.182     15.4  1.53e- 49
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic   p.value    df logLik    AIC
## *     <dbl>         <dbl> <dbl>     <dbl>     <dbl> <int>  <dbl>  <dbl>
## 1     0.451         0.450  20.8      381. 8.30e-181     4 -6221. 12452.
## # ... with 3 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>

Continuous interaction

Predicted values plots

Predicted values plots