Potential Outcomes: From Theory to Code

Applied Causal Analysis — Coding Exercise

Mauricio Olivares

LMU Munich · Chair of Statistics and Econometrics

2026-04-20

Part I: The Fundamental Problem

What a researcher wishes they could see

Recall from the lecture: for each unit \(i\), we have two potential outcomes

\[Y_i = D_i \cdot Y_i(1) + (1 - D_i) \cdot Y_i(0)\]

but we only ever observe one of \(Y_i(0)\) or \(Y_i(1)\).

Let us make this concrete with a small example.

Generating the Full Schedule of Potential Outcomes

Think of the health care example. We observe \(n = 10\) patients.

n_small <- 10

# Individual treatment effects: β_i ~ N(-0.5, 1)
# Negative on average: hospitalization helps (lower score = better health)
beta <- rnorm(n_small, mean = -0.5, sd = 1)

# Y(0): health score if NOT hospitalised (higher = worse health)
Y0 <- rnorm(n_small, mean = 3, sd = 0.5)

# Y(1): health score if hospitalised
Y1 <- Y0 + beta

Note

\(\beta_i = Y_i(1) - Y_i(0)\) is the individual treatment effect — the quantity we can never directly observe.

The Full Potential Outcomes Table

This is what a god-like researcher would see — both columns at once:

\(i\) \(Y_i(0)\) \(Y_i(1)\) \(\beta_i\)
1 2.05 0.61 -1.44
2 3.46 2.92 -0.54
3 2.78 3.11 0.33
4 3.07 2.14 -0.94
5 2.29 1.47 -0.81
6 2.45 -0.18 -2.63
7 3.64 5.65 2.01
8 3.64 2.02 -1.63
9 2.35 2.02 -0.33
10 2.89 2.96 0.08

Now Assign Treatment — and Watch Data Disappear

# Random assignment: flip a fair coin
D <- rbinom(n_small, 1, 0.5)

# Observed outcome: only ONE potential outcome is revealed
Y_obs <- ifelse(D == 1, Y1, Y0)
\(i\) \(D_i\) \(Y_i(0)\) \(Y_i(1)\) \(Y_i^{obs}\)
1 0 2.05 NA 2.05
2 1 NA 2.92 2.92
3 0 2.78 NA 2.78
4 1 NA 2.14 2.14
5 0 2.29 NA 2.29
6 0 2.45 NA 2.45
7 0 3.64 NA 3.64
8 0 3.64 NA 3.64
9 1 NA 2.02 2.02
10 1 NA 2.96 2.96

Every ? is a missing counterfactual — this is what makes causal inference a missing data problem.

The True ATE — and Why We Can’t Compute It

# True ATE: we know this only because we simulated the data
ATE_true <- mean(beta)
round(ATE_true, 3)
[1] -0.591
# What a researcher actually computes from observed data
APE <- mean(Y_obs[D == 1]) - mean(Y_obs[D == 0])
round(APE, 3)
[1] -0.3

Warning

With only \(n = 10\) units and random assignment, these two numbers may already differ noticeably. Why? Small samples, not bias. Let us scale up.

Part II: Selection Bias

Scaling Up — and Introducing Selection

We now use \(n = 1{,}000\) units and introduce selection on levels: sicker patients (higher \(Y_i(0)\)) are more likely to seek hospitalisation.

n <- 1000

# Individual treatment effects
beta  <- rnorm(n, mean = -0.5, sd = 1)

# Y(0): baseline health (higher = worse)
Y0 <- rnorm(n, mean = 3, sd = 1)

# Y(1): health if hospitalised
Y1 <- Y0 + beta

A Biased Assignment Mechanism

# Selection on levels: probability of treatment increases with Y(0)
# i.e., sicker people are more likely to go to hospital
p_treat <- plogis(-2 + 1.5 * Y0)   # logistic link
D_sel   <- rbinom(n, 1, p_treat)

# Observed outcomes
Y_sel <- ifelse(D_sel == 1, Y1, Y0)
Fraction treated: 0.85 
Mean Y(0) among treated:   3.148 
Mean Y(0) among untreated: 1.966 

Important

The treated have a higher baseline \(Y_i(0)\) — they are sicker to begin with. This is exactly selection on levels.

The Bias Decomposition — in Code

Recall from the lecture:

\[\underbrace{\mathbb{E}[Y_i|D_i=1] - \mathbb{E}[Y_i|D_i=0]}_{\text{APE (observed)}} = \underbrace{\mathbb{E}[Y_i(1)-Y_i(0)|D_i=1]}_{\text{ATT}} + \underbrace{\mathbb{E}[Y_i(0)|D_i=1] - \mathbb{E}[Y_i(0)|D_i=0]}_{\text{selection bias}}\]

APE_sel <- mean(Y_sel[D_sel == 1]) - mean(Y_sel[D_sel == 0])
ATT_sel <- mean(Y1[D_sel == 1])    - mean(Y0[D_sel == 1])
SB_sel  <- mean(Y0[D_sel == 1])    - mean(Y0[D_sel == 0])
ATE_true <- mean(beta)

The Numbers

Quantity Value
APE (observed) 0.678
ATT -0.504
Selection Bias 1.182
ATT + Selection Bias 0.678
True ATE -0.501

Note

ATT + Selection Bias = APE ✓ — the decomposition holds exactly.

The selection bias is positive: hospitalized patients are sicker to begin with, inflating the observed health gap and masking the true (negative) treatment effect.

Visualising the Selection Problem

The two distributions of \(Y_i(0)\) are shifted — the treated were sicker before treatment.

Part III: Random Assignment as the Fix

Randomly Assigning Treatment

# Fair coin flip — independent of Y(0) and Y(1)
D_rand <- rbinom(n, 1, 0.5)

# Observed outcomes
Y_rand <- ifelse(D_rand == 1, Y1, Y0)
Mean Y(0) among treated:   2.969 
Mean Y(0) among untreated: 2.97 

Tip

Under random assignment, the two groups have similar baseline \(Y_i(0)\). Selection bias vanishes.

The Decomposition Under Random Assignment

APE_rand <- mean(Y_rand[D_rand == 1]) - mean(Y_rand[D_rand == 0])
ATT_rand <- mean(Y1[D_rand == 1])     - mean(Y0[D_rand == 1])
SB_rand  <- mean(Y0[D_rand == 1])     - mean(Y0[D_rand == 0])
ATU_rand <- mean(Y1[D_rand == 0])     - mean(Y0[D_rand == 0])
Quantity Selection on Levels Random Assignment
APE (observed) 0.678 -0.482
ATT -0.504 -0.481
ATU NA -0.520
True ATE -0.501 -0.501
Selection Bias 1.182 -0.001
ATT + Selection Bias 0.678 -0.482

Side-by-Side: What Randomisation Fixes

The distributions of \(Y_i(0)\) now overlap almost perfectly. Selection bias \(\approx 0\).

Randomisation Works In Expectation

Does it always work in a single draw? Let us check across many replications:

R       <- 2000   # replications
APE_sim <- numeric(R)

for (r in 1:R) {
  D_r       <- rbinom(n, 1, 0.5)
  Y_r       <- ifelse(D_r == 1, Y1, Y0)
  APE_sim[r] <- mean(Y_r[D_r == 1]) - mean(Y_r[D_r == 0])
}
Mean of APE across 2000 replications: -0.5007 
True ATE:                               -0.5012 

Sampling Distribution of the APE

The APE is centred on the true ATE. In any single study there is sampling variability — but the estimator is unbiased.

Part IV: Selection on Gains

A Different Kind of Selection

In-class exercise (Lecture 3, Slide 19)

Show that in general \(\text{ATE} \neq \text{ATT}\) due to selection on gains, but that \(\text{ATE} = \text{ATT}\) under random assignment.

Selection on gains means individuals sort into treatment based on how much they benefit:

\[\mathbb{E}[Y_i(1) - Y_i(0) \mid D_i = 1] \neq \mathbb{E}[Y_i(1) - Y_i(0) \mid D_i = 0]\]

Generating Selection on Gains

# Those who benefit MORE (more negative beta) are more likely to seek treatment
p_gains <- plogis(2 - 3 * beta)    # treatment prob increases as β_i decreases
D_gains <- rbinom(n, 1, p_gains)
Y_gains <- ifelse(D_gains == 1, Y1, Y0)
Mean β_i among treated:   -0.761 
Mean β_i among untreated: 0.722 
True ATE:                 -0.501 

The treated have more negative \(\beta_i\) on average — they are self-selecting because they gain more from treatment.

ATE vs ATT vs ATU Under Selection on Gains

ATE_gains <- mean(beta)
ATT_gains <- mean(beta[D_gains == 1])
ATU_gains <- mean(beta[D_gains == 0])
APE_gains <- mean(Y_gains[D_gains == 1]) - mean(Y_gains[D_gains == 0])
SB_gains  <- mean(Y0[D_gains == 1]) - mean(Y0[D_gains == 0])
Quantity Value
True ATE -0.501
ATT -0.761
ATU 0.722
APE (observed) -0.710
Selection Bias (levels) 0.051

\(\text{ATE} \neq \text{ATT} \neq \text{ATU}\) — each answers a different causal question.

Random Assignment Restores ATE = ATT

ATT_r <- mean(beta[D_rand == 1])
ATU_r <- mean(beta[D_rand == 0])
ATE_r <- mean(beta)
Quantity Selection on Gains Random Assignment
ATE -0.501 -0.501
ATT -0.761 -0.481
ATU 0.722 -0.520

Under random assignment, \(D_i \perp (Y_i(0), Y_i(1))\), so who ends up treated is unrelated to their gains — and ATE \(\approx\) ATT \(\approx\) ATU. ✓

Part V: QTE Identification

Beyond Means — Quantile Treatment Effects

Recall from Lecture 3: under random assignment,

\[\underbrace{F_1^{-1}(\tau) - F_0^{-1}(\tau)}_{\text{QTE}_\tau,\ \text{unobserved}} = \underbrace{F^{-1}(\tau \mid D_i=1) - F^{-1}(\tau \mid D_i=0)}_{\text{observed}}\]

Let us verify this numerically.

Computing QTEs

quantiles <- seq(0.1, 0.9, by = 0.05)

# Unobserved QTE: quantiles of potential outcome distributions
qte_true <- quantile(Y1, quantiles) - quantile(Y0, quantiles)

# Observed under random assignment
qte_obs  <- quantile(Y_rand[D_rand == 1], quantiles) -
            quantile(Y_rand[D_rand == 0], quantiles)

# Observed under selection on levels (biased)
qte_sel  <- quantile(Y_sel[D_sel == 1], quantiles) -
            quantile(Y_sel[D_sel == 0], quantiles)

QTE: Random Assignment Identifies the Truth

The observed QTE under random assignment (blue) tracks the true QTE closely. Under selection on levels (red) it is systematically distorted.

Summary

What We Did — and Why It Matters

  1. The fundamental problem — with \(n=10\) we saw the missing counterfactual table directly: every ? is what makes causal inference hard

  2. Selection on levels — sicker patients self-select into hospitals; the bias decomposition \(\text{APE} = \text{ATT} + \text{Selection Bias}\) holds exactly in the data

  3. Random assignment — the fix: \(D_i \perp (Y_i(0), Y_i(1))\) kills selection bias, and the APE is unbiased for the ATE across replications

  4. Selection on gains — a subtler problem: ATE \(\neq\) ATT \(\neq\) ATU; random assignment equalises all three

  5. QTE identification — under random assignment, observed quantile differences recover the true QTE; under selection they do not

Looking Ahead

The next step: estimation and inference — how do we attach uncertainty to these estimates?

  • Standard errors for the difference-in-means
  • Permutation tests (no asymptotic approximation needed)
  • Bootstrap inference
  • Covariate adjustment for efficiency gains

Reading for next class: Lecture 3 slides, sections on covariate balance and the CATE.