raw_data <- read.dta("data/Butler_Broockman_AJPS_2011_public.dta",
convert.factors = FALSE)
data <- raw_data %>%
filter(treat_noprimary == 1) %>%
select(reply_atall, treat_deshawn, leg_party, leg_white) %>%
mutate(
reply_atall = as.numeric(reply_atall),
treat_deshawn = as.numeric(treat_deshawn),
leg_party = as.factor(leg_party),
leg_white = as.factor(leg_white)
) %>%
filter(!is.na(reply_atall) & !is.na(treat_deshawn)
& !is.na(leg_party) & !is.na(leg_white))2 Covariate Adjustment in RCTs
2.1 Motivation
Even in a perfectly randomized experiment, we observe a single realization of the assignment mechanism. This means that — by chance — the treated and control groups may be slightly imbalanced on pre-treatment characteristics (covariates).
Key insight: Including pre-treatment covariates in the regression does not change the consistency of the estimator (randomization already takes care of that), but it can substantially reduce variance, yielding tighter confidence intervals.
Covariate adjustment is most valuable when the included covariates are strong predictors of the outcome. If covariates explain little variation in the outcome, there is little precision gain.
2.2 The Butler & Broockman (2011) Experiment
Butler and Broockman (2011) conducted a large-scale field experiment to study racial discrimination in political representation. They sent emails to state legislators across the United States. The emails were identical in content but were signed with names that are distinctly associated with either Black or white Americans.
- Treatment (
treat_deshawn = 1): Email signed with a Black-sounding name (e.g., DeShawn Jackson)
- Control (
treat_deshawn = 0): Email signed with a white-sounding name (e.g., Jake Mueller) - Outcome (
reply_atall): Binary indicator — did the legislator reply at all?
The key question: Does the racial signal in the name affect the probability of receiving a reply?
2.3 Data Carpentry
We restrict the sample to email senders who did not signal partisan preference, i.e., fictitious constituents who did not ask for help to register in future primary elections (treat_noprimary == 1) , and select the relevant variables.
Let’s take a quick look at the data:
head(data) %>%
kbl(caption = "First rows of the analysis dataset") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)| reply_atall | treat_deshawn | leg_party | leg_white | |
|---|---|---|---|---|
| 3 | 0 | 1 | R | 1 |
| 4 | 0 | 1 | R | 1 |
| 5 | 0 | 0 | D | 1 |
| 12 | 1 | 0 | R | 1 |
| 16 | 1 | 1 | R | 1 |
| 27 | 0 | 0 | D | 1 |
A brief summary of the outcome and treatment variables:
data %>%
group_by(treat_deshawn) %>%
summarise(
n = n(),
mean_reply = mean(reply_atall),
sd_reply = sd(reply_atall)
) %>%
mutate(treat_deshawn = ifelse(treat_deshawn == 1, "Black name", "White name")) %>%
rename(`Treatment group` = treat_deshawn,
`N` = n,
`Mean reply rate` = mean_reply,
`SD reply rate` = sd_reply) %>%
kbl(digits = 3,
caption = "Reply rates by treatment group") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = FALSE)| Treatment group | N | Mean reply rate | SD reply rate |
|---|---|---|---|
| White name | 812 | 0.605 | 0.489 |
| Black name | 806 | 0.553 | 0.497 |
2.4 Estimation
2.4.1 Without Covariate Adjustment
The simplest approach is to regress the outcome on the treatment indicator alone:
\[ \texttt{reply\_atall}_i = \alpha + \beta \cdot \texttt{treat\_deshawn}_i + \varepsilon_i \]
The OLS estimate of \(\beta\) gives us the Average Partial Effect (APE), which in a randomized experiment equals the Average Treatment Effect (ATE).
reg_no_covariates <- lm(reply_atall ~ treat_deshawn, data = data)In a randomized experiment, the OLS estimate \(\hat{\beta}\) is numerically identical to the difference in sample means between the treated and control groups:
\[\hat{\beta}_{\text{OLS}} = \bar{Y}_1 - \bar{Y}_0\]
Let’s verify this:
ape <- mean(data$reply_atall[data$treat_deshawn == 1]) -
mean(data$reply_atall[data$treat_deshawn == 0])
cat("Simple difference in means:", round(ape, 6), "\n")Simple difference in means: -0.05133
cat("OLS coefficient: ", round(coef(reg_no_covariates)[2], 6), "\n")OLS coefficient: -0.05133
They are identical — as expected.
2.4.2 With Covariate Adjustment
Now we add two pre-treatment covariates:
leg_party: the legislator’s party affiliationleg_white: whether the legislator is white
\[ \texttt{reply\_atall}_i = \alpha + \beta \cdot \texttt{treat\_deshawn}_i + \gamma_1 \cdot \texttt{leg\_party}_i + \gamma_2 \cdot \texttt{leg\_white}_i + \varepsilon_i \]
reg_covariates <- lm(reply_atall ~ treat_deshawn + leg_party + leg_white,
data = data)2.5 Results
The table below puts both specifications side by side for comparison.
modelsummary(
list(
"No Adjustment" = reg_no_covariates,
"With Covariates" = reg_covariates
),
coef_rename = c(
treat_deshawn = "Black name (treatment)",
leg_party2 = "Republican",
leg_white1 = "White legislator"
),
stars = c("*" = .1, "**" = .05, "***" = .01),
gof_map = c("nobs", "r.squared", "adj.r.squared"),
title = "Effect of Sender's Race on Legislator Reply Rate",
notes = "Outcome: binary indicator for whether the legislator replied at all."
)| No Adjustment | With Covariates | |
|---|---|---|
| * p < 0.1, ** p < 0.05, *** p < 0.01 | ||
| Outcome: binary indicator for whether the legislator replied at all. | ||
| (Intercept) | 0.605*** | 0.421*** |
| (0.017) | (0.037) | |
| Black name (treatment) | -0.051** | -0.048** |
| (0.025) | (0.024) | |
| leg_partyR | 0.060** | |
| (0.025) | ||
| White legislator | 0.178*** | |
| (0.038) | ||
| Num.Obs. | 1618 | 1618 |
| R2 | 0.003 | 0.024 |
| R2 Adj. | 0.002 | 0.022 |
2.6 Interpreting the Results
The point estimate barely changes when we add covariates — this is expected in a well-randomized experiment. The treatment effect estimate is robust.
Precision improves (smaller standard errors) when we add covariates that explain variation in the outcome. Legislator characteristics (
leg_party,leg_white) predict reply behavior, so they help.The treatment effect is negative — legislators are less likely to reply to emails with a Black-sounding name. This is evidence of racial discrimination in political representation.
Lin (2013) shows that in finite samples, the optimal covariate adjustment strategy is to include covariates interacted with a centered treatment indicator. This approach is asymptotically at least as efficient as simple covariate adjustment. In practice, for large samples the difference is small, but it is worth knowing about.