Set seed for reproducibility
Load packages we will need in this tutorial. Make sure you have them
installed. If not, you can install them using
install.packages("package_name").
library(here)
library(sandwich)
library(lmtest)
To avoid clutter, I have coded a couple of functions elsewhwere that we can outsource and use here right away. These functions will help us generate data for the Monte Carlo simulations, and plot the results of our simulations.
source(here("scripts/gen_data.R"))
source(here("scripts/plots_estimates.R"))
The goal of this tutorial is to illustrate how to perform heteroskedasticity-robust inference in the context of the linear regression model. I will pay special attention to the different approaches to compute heteroskedasticity-robust standard errors as seen in class, and how they affect the inference on the OLS estimates.
Let us generate data according to model 2 (see the
gen_data function for details). We will then estimate the
OLS model using the generated data.
data <- gen_data(n = 1000, dgp = "model 2")
Y <- data$Y
X <- data$X
Base R includes an intercept by default. We need to remove it from
our model because gen_data.R returns a matrix with an
intercept.
fitModel <- lm(Y ~ X - 1)
summary(fitModel)
##
## Call:
## lm(formula = Y ~ X - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8948 -0.6204 0.0040 0.6261 3.4329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.45611 0.06257 7.290 6.3e-13 ***
## XX1 2.02521 0.03045 66.513 < 2e-16 ***
## XX2 -0.97326 0.10958 -8.882 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9896 on 997 degrees of freedom
## Multiple R-squared: 0.8191, Adjusted R-squared: 0.8186
## F-statistic: 1505 on 3 and 997 DF, p-value: < 2.2e-16
Alternatively, we could have specified the model using the matrix
X directly, which is useful when we want to include only
specific columns of X in our model.
fitModel_b <- lm(data$Y ~ data$X[,2:3])
summary(fitModel_b)
##
## Call:
## lm(formula = data$Y ~ data$X[, 2:3])
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8948 -0.6204 0.0040 0.6261 3.4329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.45611 0.06257 7.290 6.3e-13 ***
## data$X[, 2:3]X1 2.02521 0.03045 66.513 < 2e-16 ***
## data$X[, 2:3]X2 -0.97326 0.10958 -8.882 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9896 on 997 degrees of freedom
## Multiple R-squared: 0.8189, Adjusted R-squared: 0.8186
## F-statistic: 2254 on 2 and 997 DF, p-value: < 2.2e-16
To adjust for heteroskedasticity, you can use the
sandwich package to obtain robust standard errors.
hc0_se <- sqrt(diag(vcovHC(fitModel, type = "HC0")))
hc1_se <- sqrt(diag(vcovHC(fitModel, type = "HC1")))
hc2_se <- sqrt(diag(vcovHC(fitModel, type = "HC2")))
hc3_se <- sqrt(diag(vcovHC(fitModel, type = "HC3")))
Display the robust standard errors
robust_se <- data.frame(HC0 = hc0_se, HC1 = hc1_se, HC2 = hc2_se, HC3 = hc3_se)
print(robust_se)
## HC0 HC1 HC2 HC3
## XIntercept 0.06072025 0.06081153 0.06083106 0.06094216
## XX1 0.03349143 0.03354178 0.03359335 0.03369580
## XX2 0.11033993 0.11050581 0.11056648 0.11079373
The previous formulas provide the estimates using the estimators HC0, HC1, HC2, and HC3, we covered in class. For example, if want to derive the HC0 estimator by hand:
residuals <- Y - X %*% betaHat
residuals <- as.vector(residuals)
vHat0 <- solve(t(X) %*% X) %*% t(X) %*% diag(residuals^2) %*% X %*% solve(t(X) %*% X)
robust_se$HC0hand <- sqrt(diag(vHat0))
Display the robust standard errors
print(robust_se)
## HC0 HC1 HC2 HC3 HC0hand
## XIntercept 0.06072025 0.06081153 0.06083106 0.06094216 0.06074967
## XX1 0.03349143 0.03354178 0.03359335 0.03369580 0.03361936
## XX2 0.11033993 0.11050581 0.11056648 0.11079373 0.11068945
To incorporate heteroskedasticity-consistent (HC) estimators when
reporting standard errors in the summary() output, you can
use the coeftest() function from the lmtest package along
with the vcovHC() function from the sandwich
package. This approach will allow you to display the model coefficients
with robust standard errors.
HC0 type
coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060720 7.5117 1.294e-13 ***
## XX1 2.025214 0.033491 60.4696 < 2.2e-16 ***
## XX2 -0.973262 0.110340 -8.8206 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
HC1 type
coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC1"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060812 7.5005 1.404e-13 ***
## XX1 2.025214 0.033542 60.3788 < 2.2e-16 ***
## XX2 -0.973262 0.110506 -8.8073 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
HC2 type
coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC2"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060831 7.4981 1.429e-13 ***
## XX1 2.025214 0.033593 60.2862 < 2.2e-16 ***
## XX2 -0.973262 0.110566 -8.8025 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
HC3 type
coeftest(fitModel, vcov = vcovHC(fitModel, type = "HC3"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060942 7.4844 1.577e-13 ***
## XX1 2.025214 0.033696 60.1029 < 2.2e-16 ***
## XX2 -0.973262 0.110794 -8.7845 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Summarize the model with different HC estimators using the
summarize_with_robust_se() function we created in the
scripts/plots_estimates.R file. This function will display
the coefficients, standard errors, t-values, and p-values for each HC
type.
summarize_with_robust_se(fitModel)
##
## HC0 robust standard errors:
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060720 7.5117 1.294e-13 ***
## XX1 2.025214 0.033491 60.4696 < 2.2e-16 ***
## XX2 -0.973262 0.110340 -8.8206 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## HC1 robust standard errors:
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060812 7.5005 1.404e-13 ***
## XX1 2.025214 0.033542 60.3788 < 2.2e-16 ***
## XX2 -0.973262 0.110506 -8.8073 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## HC2 robust standard errors:
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060831 7.4981 1.429e-13 ***
## XX1 2.025214 0.033593 60.2862 < 2.2e-16 ***
## XX2 -0.973262 0.110566 -8.8025 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## HC3 robust standard errors:
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## XIntercept 0.456115 0.060942 7.4844 1.577e-13 ***
## XX1 2.025214 0.033696 60.1029 < 2.2e-16 ***
## XX2 -0.973262 0.110794 -8.7845 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1