1 Preliminaries

Set seed for reproducibility

Load packages we will need in this tutorial. Make sure you have them installed. If not, you can install them using install.packages("package_name").

library(here)
library(Matrix)
library(ggplot2)
library(dplyr)

To avoid clutter, I have coded a couple of functions elsewhwere that we can outsource and use here right away. These functions will help us generate data for the Monte Carlo simulations, and plot the results of our simulations.

source(here("scripts/gen_data.R"))
source(here("scripts/plots_estimates.R"))

1.1 Introduction

The goal of this tutorial is to illustrate the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) in the context of the linear regression model with heteroskedastic errors. To this end, I will simulate data from a linear regression model and show how the OLS estimates behave as the sample size increases.

1.2 Monte Carlo Design

Let us set the parameters of the data generating process (DGP) we will use for our simulations. In particular, we will generate data according to model 2 (see the gen_data function for details).

sample_sizes <- c(50, 100, 1000) # Sample sizes for the simulations
num_simulations <- 5000 # Number of simulations to run
simulate_ols <- matrix(NA, nrow = length(sample_sizes), ncol = num_simulations) # Initialize a matrix to store OLS estimates

for (t in seq_along(sample_sizes)) {
  for (s in 1:num_simulations) {
    data <- gen_data(n = sample_sizes[t], dgp = "model 2")
    Y <- data$Y
    X <- data$X
    betaHat <- solve(t(X) %*% X) %*% t(X) %*% Y # Compute OLS estimates using matrix algebra
    simulate_ols[t, s] <- betaHat[2]
  }
}

For each simulation draw, betaHat is a 3-dimensional vector containing the \(\hat{\beta}\) estimates associated to a constant and two covariates. The simulate_ols matrix stores the estimated slope coefficient of covariate \(X_2\) across simulations.

1.3 Law of Large Numbers (LLN)

Under mild assumptions we have seen in class, we expect that as the sample size increases, \(\hat{\beta}\) concentrates around the true parameter \(\beta\). Below is a visualization of the estimated slope coefficient across sample sizes.

my_plots(sample_sizes, simulate_ols)

plot_asympt(sample_sizes, simulate_ols)

1.4 Central Limit Theorem (CLT)

We have also seen that under a finite second moment assumption (plus some regularity conditions), \(\sqrt{n}(\hat{\beta}-\beta)\) converges in distribution to a normal random vector with mean 0 and variance \(\Omega\).

simulate_ols_stand <- matrix(NA, nrow = length(sample_sizes), ncol = num_simulations)
for(t in seq_along(sample_sizes)) {
  simulate_ols_stand[t, ] <- (simulate_ols[t, ] - 2) * sqrt(sample_sizes[t])
}

plot_asympt(sample_sizes, simulate_ols_stand)

Large Sample Behavior in the Linear Regression Model

Mauricio Olivares

1 Preliminaries

1.1 Introduction

1.2 Monte Carlo Design

1.3 Law of Large Numbers (LLN)

1.4 Central Limit Theorem (CLT)