Set seed for reproducibility
Load packages we will need in this tutorial. Make sure you have them
installed. If not, you can install them using
install.packages("package_name").
library(here)
library(Matrix)
library(ggplot2)
library(dplyr)
To avoid clutter, I have coded a couple of functions elsewhwere that we can outsource and use here right away. These functions will help us generate data for the Monte Carlo simulations, and plot the results of our simulations.
source(here("scripts/gen_data.R"))
source(here("scripts/plots_estimates.R"))
The goal of this tutorial is to illustrate the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) in the context of the linear regression model with heteroskedastic errors. To this end, I will simulate data from a linear regression model and show how the OLS estimates behave as the sample size increases.
Let us set the parameters of the data generating process (DGP) we
will use for our simulations. In particular, we will generate data
according to model 2 (see the gen_data function for
details).
sample_sizes <- c(50, 100, 1000) # Sample sizes for the simulations
num_simulations <- 5000 # Number of simulations to run
simulate_ols <- matrix(NA, nrow = length(sample_sizes), ncol = num_simulations) # Initialize a matrix to store OLS estimates
for (t in seq_along(sample_sizes)) {
for (s in 1:num_simulations) {
data <- gen_data(n = sample_sizes[t], dgp = "model 2")
Y <- data$Y
X <- data$X
betaHat <- solve(t(X) %*% X) %*% t(X) %*% Y # Compute OLS estimates using matrix algebra
simulate_ols[t, s] <- betaHat[2]
}
}
For each simulation draw, betaHat is a 3-dimensional
vector containing the \(\hat{\beta}\)
estimates associated to a constant and two covariates. The
simulate_ols matrix stores the estimated slope coefficient
of covariate \(X_2\) across
simulations.
Under mild assumptions we have seen in class, we expect that as the sample size increases, \(\hat{\beta}\) concentrates around the true parameter \(\beta\). Below is a visualization of the estimated slope coefficient across sample sizes.
my_plots(sample_sizes, simulate_ols)
plot_asympt(sample_sizes, simulate_ols)
We have also seen that under a finite second moment assumption (plus some regularity conditions), \(\sqrt{n}(\hat{\beta}-\beta)\) converges in distribution to a normal random vector with mean 0 and variance \(\Omega\).
simulate_ols_stand <- matrix(NA, nrow = length(sample_sizes), ncol = num_simulations)
for(t in seq_along(sample_sizes)) {
simulate_ols_stand[t, ] <- (simulate_ols[t, ] - 2) * sqrt(sample_sizes[t])
}
plot_asympt(sample_sizes, simulate_ols_stand)