1 Preliminaries

Set seed for reproducibility

Load packages we will need in this tutorial. Make sure you have them installed. If not, you can install them using install.packages("package_name").

library(here)
library(sandwich)
library(lmtest)

To avoid clutter, I have coded a couple of functions elsewhwere that we can outsource and use here right away. These functions will help us generate data for the Monte Carlo simulations, and plot the results of our simulations.

source(here("scripts/gen_data.R"))
source(here("scripts/plots_estimates.R"))

1.1 Introduction

The goal of this tutorial is to illustrate how to perform heteroskedasticity-robust inference in the context of the linear regression model. I will pay special attention to the different approaches to compute heteroskedasticity-robust standard errors as seen in class, and how they affect the inference on the OLS estimates.

1.2 Generate the Data and Estimate the model

Let us generate data according to model 2 (see the gen_data function for details). We will then estimate the OLS model using the generated data.

data <- gen_data(n = 1000, dgp = "model 2")
Y <- data$Y
X <- data$X

Base R includes an intercept by default. We need to remove it from our model because gen_data.R returns a matrix with an intercept.

fitModel <- lm(Y ~ X - 1)
summary(fitModel)
## 
## Call:
## lm(formula = Y ~ X - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8948 -0.6204  0.0040  0.6261  3.4329 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## XIntercept  0.45611    0.06257   7.290  6.3e-13 ***
## XX1         2.02521    0.03045  66.513  < 2e-16 ***
## XX2        -0.97326    0.10958  -8.882  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9896 on 997 degrees of freedom
## Multiple R-squared:  0.8191, Adjusted R-squared:  0.8186 
## F-statistic:  1505 on 3 and 997 DF,  p-value: < 2.2e-16

Alternatively, we could have specified the model using the matrix X directly, which is useful when we want to include only specific columns of X in our model.

fitModel_b <- lm(data$Y ~ data$X[,2:3])
summary(fitModel_b)
## 
## Call:
## lm(formula = data$Y ~ data$X[, 2:3])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8948 -0.6204  0.0040  0.6261  3.4329 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.45611    0.06257   7.290  6.3e-13 ***
## data$X[, 2:3]X1  2.02521    0.03045  66.513  < 2e-16 ***
## data$X[, 2:3]X2 -0.97326    0.10958  -8.882  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9896 on 997 degrees of freedom
## Multiple R-squared:  0.8189, Adjusted R-squared:  0.8186 
## F-statistic:  2254 on 2 and 997 DF,  p-value: < 2.2e-16

1.3 Heteroskedasticity-Consistent (HC) covariance matrix estimators

To adjust for heteroskedasticity, you can use the sandwich package to obtain robust standard errors.

hc0_se <- sqrt(diag(vcovHC(fitModel, type = "HC0")))
hc1_se <- sqrt(diag(vcovHC(fitModel, type = "HC1")))
hc2_se <- sqrt(diag(vcovHC(fitModel, type = "HC2")))
hc3_se <- sqrt(diag(vcovHC(fitModel, type = "HC3")))

Display the robust standard errors

robust_se <- data.frame(HC0 = hc0_se, HC1 = hc1_se, HC2 = hc2_se, HC3 = hc3_se)
print(robust_se)
##                   HC0        HC1        HC2        HC3
## XIntercept 0.06072025 0.06081153 0.06083106 0.06094216
## XX1        0.03349143 0.03354178 0.03359335 0.03369580
## XX2        0.11033993 0.11050581 0.11056648 0.11079373

The previous formulas provide the estimates using the estimators HC0, HC1, HC2, and HC3, we covered in class. For example, if want to derive the HC0 estimator by hand:

  1. Calculate the OLS residuals
residuals <- Y - X %*% betaHat
residuals <- as.vector(residuals)
  1. Calculate the variance of the OLS estimator using the residuals
vHat0 <- solve(t(X) %*% X) %*% t(X) %*% diag(residuals^2) %*% X %*% solve(t(X) %*% X)
  1. Calculate the standard errors from the variance-covariance matrix
robust_se$HC0hand <- sqrt(diag(vHat0))

Display the robust standard errors

print(robust_se)
##                   HC0        HC1        HC2        HC3    HC0hand
## XIntercept 0.06072025 0.06081153 0.06083106 0.06094216 0.06074967
## XX1        0.03349143 0.03354178 0.03359335 0.03369580 0.03361936
## XX2        0.11033993 0.11050581 0.11056648 0.11079373 0.11068945

To incorporate heteroskedasticity-consistent (HC) estimators when reporting standard errors in the summary() output, you can use the coeftest() function from the lmtest package along with the vcovHC() function from the sandwich package. This approach will allow you to display the model coefficients with robust standard errors.

1.3.1 Summary of the model with standard errors adjusted for heteroskedasticity

HC0 type

coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC0"))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060720  7.5117 1.294e-13 ***
## XX1         2.025214   0.033491 60.4696 < 2.2e-16 ***
## XX2        -0.973262   0.110340 -8.8206 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

HC1 type

coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC1"))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060812  7.5005 1.404e-13 ***
## XX1         2.025214   0.033542 60.3788 < 2.2e-16 ***
## XX2        -0.973262   0.110506 -8.8073 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

HC2 type

coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC2"))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060831  7.4981 1.429e-13 ***
## XX1         2.025214   0.033593 60.2862 < 2.2e-16 ***
## XX2        -0.973262   0.110566 -8.8025 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

HC3 type

coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC3"))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060942  7.4844 1.577e-13 ***
## XX1         2.025214   0.033696 60.1029 < 2.2e-16 ***
## XX2        -0.973262   0.110794 -8.7845 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summarize the model with different HC estimators using the summarize_with_robust_se() function we created in the scripts/plots_estimates.R file. This function will display the coefficients, standard errors, t-values, and p-values for each HC type.

summarize_with_robust_se(fitModel)
## 
## HC0 robust standard errors:
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060720  7.5117 1.294e-13 ***
## XX1         2.025214   0.033491 60.4696 < 2.2e-16 ***
## XX2        -0.973262   0.110340 -8.8206 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## HC1 robust standard errors:
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060812  7.5005 1.404e-13 ***
## XX1         2.025214   0.033542 60.3788 < 2.2e-16 ***
## XX2        -0.973262   0.110506 -8.8073 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## HC2 robust standard errors:
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060831  7.4981 1.429e-13 ***
## XX1         2.025214   0.033593 60.2862 < 2.2e-16 ***
## XX2        -0.973262   0.110566 -8.8025 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## HC3 robust standard errors:
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## XIntercept  0.456115   0.060942  7.4844 1.577e-13 ***
## XX1         2.025214   0.033696 60.1029 < 2.2e-16 ***
## XX2        -0.973262   0.110794 -8.7845 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1