2 Permutation Tests

2.1 Setup and Hypothesis

Consider a sample of $n = 8$ units (beer cups in our example). In class, we represented the two brands as 0 or 1. Thus, for each cup $i$, we have $W_i \in \{0,1\}$. The lady tasting beer reports her )educated) guess for each cup $i$, so also a binary reporting $Y_i \in \{0,1\}$. Suppose we the observed report $Y$ and true types $W$ (unknown to the lady) are:

\[Y = (0,1,1,1,0,1,0,0), \qquad W = (1,1,1,1,0,0,0,0).\]

The null hypothesis is that $Y$ and $W$ are statistically independent:

\[H_0: Y \perp W\]

meaning that the lady cannot distinguish the two labels (her guesses are completely uninformative of the true types).

What can we randomize in this experiment?

While the vector $Y$ is treated as a random variable, once the reporting is made, we treat it as fixed. What is random and entirely up to the experiment design is the sequence of types $W$. This allows us to construct an exact null distribution by varying $W$ over all its possible realizations, with no asymptotic approximation required.

2.2 Test Statistic

A natural test statistic:

\[T_n(W, Y) \;=\; \sum_{i=1}^n W_i Y_i,\]

which counts the number of agreements between educated guesses and true types. Notice that large values of $T_n$ provide evidence against $H_0$.

Observed value. With the data above:

\[T_n^{\mathrm{obs}} = W^\top Y = 1\cdot 0 + 1\cdot 1 + 1\cdot 1 + 1\cdot 1 + 0\cdot 0 + 0\cdot 1 + 0\cdot 0 + 0\cdot 0 = 3.\]

Code

Y <- c(0, 1, 1, 1, 0, 1, 0, 0)
W <- c(1, 1, 1, 1, 0, 0, 0, 0)
T_obs <- sum(W * Y)
cat("Observed test statistic: T_obs =", T_obs)

Observed test statistic: T_obs = 3

2.3 The Permutation Group $\mathbf{G}_n$

Under $H_0$, every vector $w \in \{0,1\}^8$ with exactly four 1s and four 0s was equally likely to have been the realized treatment assignment (compare this with the way we did it in class). Define:

\[\mathbf{G}_n \;=\; \bigl\{ w \in \{0,1\}^8 : \textstyle\sum_i w_i = 4 \bigr\}.\]

The cardinality of $\mathbf{G}_n$ is $M := \#\{\mathbf{G}_n\} = \binom{8}{4} = 70$.

Code

# Generate all elements of G_n: vectors of length 8 with exactly four 1s
G_n <- as.data.frame(t(combn(8, 4))) |>
  apply(1, function(idx) {
    w <- integer(8)
    w[idx] <- 1L
    w
  }) |>
  t()

cat("Number of elements in G_n:", nrow(G_n))

Number of elements in G_n: 70

2.3.1 All 70 permutation vectors

The table below displays the complete set $\mathbf{G}_n$. Each row is a distinct treatment assignment $w \in \mathbf{G}_n$, and the last column reports $T_n(w, Y) = \sum_i w_i Y_i$.

Code

Y <- c(0, 1, 1, 1, 0, 1, 0, 0)

# Compute T_n(w, Y) for each permutation
T_vals <- apply(G_n, 1, function(w) sum(w * Y))

# Build display table
perm_df <- as.data.frame(G_n)
colnames(perm_df) <- paste0("w", 1:8)
perm_df$T_n <- T_vals
perm_df$Permutation <- seq_len(nrow(perm_df))
perm_df <- perm_df[, c("Permutation", paste0("w", 1:8), "T_n")]

knitr::kable(
  perm_df,
  align = "c",
  caption = "All 70 elements of $\\mathbf{G}_n$ and the corresponding test statistic $T_n(w, Y)$."
)

All 70 elements of $\mathbf{G}_n$ and the corresponding test statistic $T_n(w, Y)$.
Permutation	w1	w2	w3	w4	w5	w6	w7	w8	T_n
1	1	1	1	1	0	0	0	0	3
2	1	1	1	0	1	0	0	0	2
3	1	1	1	0	0	1	0	0	3
4	1	1	1	0	0	0	1	0	2
5	1	1	1	0	0	0	0	1	2
6	1	1	0	1	1	0	0	0	2
7	1	1	0	1	0	1	0	0	3
8	1	1	0	1	0	0	1	0	2
9	1	1	0	1	0	0	0	1	2
10	1	1	0	0	1	1	0	0	2
11	1	1	0	0	1	0	1	0	1
12	1	1	0	0	1	0	0	1	1
13	1	1	0	0	0	1	1	0	2
14	1	1	0	0	0	1	0	1	2
15	1	1	0	0	0	0	1	1	1
16	1	0	1	1	1	0	0	0	2
17	1	0	1	1	0	1	0	0	3
18	1	0	1	1	0	0	1	0	2
19	1	0	1	1	0	0	0	1	2
20	1	0	1	0	1	1	0	0	2
21	1	0	1	0	1	0	1	0	1
22	1	0	1	0	1	0	0	1	1
23	1	0	1	0	0	1	1	0	2
24	1	0	1	0	0	1	0	1	2
25	1	0	1	0	0	0	1	1	1
26	1	0	0	1	1	1	0	0	2
27	1	0	0	1	1	0	1	0	1
28	1	0	0	1	1	0	0	1	1
29	1	0	0	1	0	1	1	0	2
30	1	0	0	1	0	1	0	1	2
31	1	0	0	1	0	0	1	1	1
32	1	0	0	0	1	1	1	0	1
33	1	0	0	0	1	1	0	1	1
34	1	0	0	0	1	0	1	1	0
35	1	0	0	0	0	1	1	1	1
36	0	1	1	1	1	0	0	0	3
37	0	1	1	1	0	1	0	0	4
38	0	1	1	1	0	0	1	0	3
39	0	1	1	1	0	0	0	1	3
40	0	1	1	0	1	1	0	0	3
41	0	1	1	0	1	0	1	0	2
42	0	1	1	0	1	0	0	1	2
43	0	1	1	0	0	1	1	0	3
44	0	1	1	0	0	1	0	1	3
45	0	1	1	0	0	0	1	1	2
46	0	1	0	1	1	1	0	0	3
47	0	1	0	1	1	0	1	0	2
48	0	1	0	1	1	0	0	1	2
49	0	1	0	1	0	1	1	0	3
50	0	1	0	1	0	1	0	1	3
51	0	1	0	1	0	0	1	1	2
52	0	1	0	0	1	1	1	0	2
53	0	1	0	0	1	1	0	1	2
54	0	1	0	0	1	0	1	1	1
55	0	1	0	0	0	1	1	1	2
56	0	0	1	1	1	1	0	0	3
57	0	0	1	1	1	0	1	0	2
58	0	0	1	1	1	0	0	1	2
59	0	0	1	1	0	1	1	0	3
60	0	0	1	1	0	1	0	1	3
61	0	0	1	1	0	0	1	1	2
62	0	0	1	0	1	1	1	0	2
63	0	0	1	0	1	1	0	1	2
64	0	0	1	0	1	0	1	1	1
65	0	0	1	0	0	1	1	1	2
66	0	0	0	1	1	1	1	0	2
67	0	0	0	1	1	1	0	1	2
68	0	0	0	1	1	0	1	1	1
69	0	0	0	1	0	1	1	1	2
70	0	0	0	0	1	1	1	1	1

2.4 Distribution of $T_n$

Under $H_0$ with a uniformly random assignment over $\mathbf{G}_n$, the test statistic $T_n$ follows a Hypergeometric distribution:

\[T_n \;\sim\; \mathrm{Hypergeometric}(n, k, m), \quad n = 8,\; k = 4,\; m = 4.\]

This is because $T_n = \sum_{i=1}^n W_i Y_i$ counts the overlap between a randomly drawn set of 4 treated indices and a fixed set of 4 units with $Y_i = 1$.

Code

dist_df <- data.frame(t = T_vals) |>
  count(t) |>
  mutate(
    prob      = n / sum(n),
    cum_prob  = cumsum(prob),
    label     = paste0(n, "/70")
  )

ggplot(dist_df, aes(x = factor(t), y = prob)) +
  geom_col(fill = "#185FA5", alpha = 0.85, width = 0.55) +
  geom_text(aes(label = label), vjust = -0.5, size = 3.5, color = "#333333") +
  geom_vline(xintercept = which(levels(factor(dist_df$t)) == as.character(T_obs)),
             linetype = "dashed", color = "#D85A30", linewidth = 0.8) +
  annotate("text", x = which(levels(factor(dist_df$t)) == as.character(T_obs)) + 0.3,
           y = max(dist_df$prob) * 0.9,
           label = expression(T[n]^obs == 3),
           color = "#D85A30", size = 4, hjust = 0) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1), expand = expansion(mult = c(0, 0.12))) +
  labs(
    x     = expression(T[n]),
    y     = "Probability",
    title = "Distribution of $T_n$ under $H_0$",
    caption = "Hypergeometric(n=8, k=4, m=4). Dashed line marks the observed statistic."
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title   = element_text(face = "bold", size = 13),
    panel.grid.major.x = element_blank()
  )

2.4.1 Tabulated probabilities

Since we know the distribution of $T_n$, we can tabulate the probabilities.

Code

knitr::kable(
  dist_df |> select(t, n, prob, cum_prob) |>
    rename(`$t$` = t, `Count` = n, `$\\mathrm{P}[T_n = t]$` = prob,
           `$\\mathrm{P}[T_n \\leq t]$` = cum_prob) |>
    mutate(across(where(is.numeric), ~round(., 4))),
  align = "c",
  caption = "Distribution of $T_n$."
)

Distribution of $T_n$.
$t$	Count	$\mathrm{P}[T_n = t]$	$\mathrm{P}[T_n \leq t]$
0	1	0.0143	0.0143
1	16	0.2286	0.2429
2	36	0.5143	0.7571
3	16	0.2286	0.9857
4	1	0.0143	1.0000

and calculate the critical values for any fixed nominal level $\alpha\in(0,1)$, denoted $\hat{c}_n(1-\alpha)$. That is, $\hat{c}_n(1-\alpha)$ is the smallest $t$ in the support of $T_n$ such that the cumulative distribution function (CDF) of the distribution exceeds $1 - \alpha$:

\[\hat{c}_n(1 - \alpha) = \inf\bigl\{ t \in \{0, \ldots, 4\} : \mathrm{P}[T_n \leq t] \geq 1 - \alpha \bigr\}.\]

For $\alpha = 0.05$, $\hat{c}_n(0.95)=3$. Since $T_n^{\mathrm{obs}} = 3 \not> 3$, we do not reject $H_0$ at level $\alpha = 0.05$.

2.5 Another look at the critical values

Let $T_n^{(1)} \leq T_n^{(2)} \leq \cdots \leq T_n^{(70)}$ denote the order statistics of $\{T_n(w, Y) : w \in \mathbf{G}_n\}$.

Fix a nominal level $\alpha \in (0,1)$. Define:

\[k \;=\; M - \lfloor M\alpha \rfloor,\]

where $\lfloor \cdot \rfloor$ denotes the floor function. The permutation critical value is then defined as the $k$-th order statistic:

\[\hat{c}_n(1 - \alpha) \;:=\; T_n^{(k)}.\]

For $\alpha = 0.05$:

\[k = 70 - \lfloor 70 \times 0.05 \rfloor = 70 - 3 = 67, \qquad \hat{c}_n(0.95) = T_n^{(67)} = 3.\]

Code

alpha <- 0.05
M     <- nrow(G_n)
k     <- M - floor(M * alpha)
c_hat <- sort(T_vals)[k]
cat(sprintf("alpha = %.2f | M = %d | k = %d | c_hat(1-alpha) = %d\n",
            alpha, M, k, c_hat))

alpha = 0.05 | M = 70 | k = 67 | c_hat(1-alpha) = 3

So we obtained the same critical value by brute force. Decision rule. Reject $H_0$ if and only if $T_n^{\mathrm{obs}} > \hat{c}_n(1-\alpha)$.

Since $T_n^{\mathrm{obs}} = 3 \not> 3$, we do not reject $H_0$ at level $\alpha = 0.05$.

Exact size control

The permutation test achieves exact finite-sample size control. That is, for any $\alpha \in (0,1)$ and regardless of the distribution of $Y$:

\[\mathrm{P}_{H_0}\bigl[T_n > \hat{c}_n(1-\alpha)\bigr] \;\leq\; \alpha.\]

This follows from the fact that, under $H_0$, each of the $M = 70$ values of $T_n(w, Y)$ is equally likely, so the tail probability above $\hat c_n(1-\alpha)$ is exactly $\lfloor M\alpha \rfloor / M \leq \alpha$.

2.6 Interactive Simulation

The widget below lets you explore how the permutation distribution and critical value change as you modify the outcome vector $Y$ and the significance level $\alpha$. The treatment assignment is fixed at $W = (1,1,1,1,0,0,0,0)$.

Interactive Permutation Test Explorer

Outcome vector Y (click to toggle 0/1)

Significance level α

0.05

Observed statistic T_n^obs

—

Critical value ĉ_n(1−α)

—

k = M − ⌊Mα⌋

—

Decision

—

t	Count	P[T_n = t]	P[T_n ≤ t]

How to use the widget

Toggle $Y_i$: click any of the eight buttons to flip the corresponding outcome between 0 and 1.
Adjust $\alpha$: drag the slider to change the nominal level.
Press Update to recompute the permutation distribution, the critical value, and the decision.
Bars shaded in red correspond to values of $T_n$ that fall in the rejection region $\{t : t > \hat{c}_n(1-\alpha)\}$.
The row highlighted in yellow in the table marks the critical value.

2.7 Key Takeaways

Exact finite-sample validity. The permutation test controls size exactly for any finite $n$, with no distributional assumption on $Y$.
The critical value is an order statistic. $\hat{c}_n(1-\alpha) = T_n^{(k)}$ with $k = M - \lfloor M\alpha \rfloor$ is the $k$-th largest value of the statistic across all permutations.
The permutation distribution is discrete. For small samples, the achievable significance levels are restricted to multiples of $1/M$. In this example, $M = 70$, so the finest achievable level is $1/70 \approx 0.014$.
Equivalence with the hypergeometric. Under $H_0$, $T_n = \sum_i W_i Y_i \sim \mathrm{Hypergeometric}(n, k, m)$. The permutation distribution is thus fully characterized by $n$, the number of treated units, and the number of $Y_i = 1$.
Insufficient power in small samples. With $n = 8$ and $\alpha = 0.05$, the critical value is $\hat{c}_n(0.95) = 3$ and $T_n^{\mathrm{obs}} = 3$, so we fail to reject $H_0$. This reflects a genuine power limitation of exact tests with very small samples.

# Permutation Tests {#sec-permtest} ```{r} #| label: setup-permtest #| include: false library(tidyverse) ``` ## Setup and Hypothesis {#sec-setup} Consider a sample of $n = 8$ units (beer cups in our example). In class, we represented the two brands as 0 or 1. Thus, for each cup $i$, we have $W_i \in \{0,1\}$. The lady tasting beer reports her )educated) guess for each cup $i$, so also a binary reporting $Y_i \in \{0,1\}$. Suppose we the observed report $Y$ and true types $W$ (unknown to the lady) are: $$Y = (0,1,1,1,0,1,0,0), \qquad W = (1,1,1,1,0,0,0,0).$$ The null hypothesis is that $Y$ and $W$ are statistically independent: $$H_0: Y \perp W$$ meaning that the lady cannot distinguish the two labels (her guesses are completely uninformative of the true types). ::: {.callout-important} ## What can we randomize in this experiment? While the vector $Y$ is treated as a **random** variable, once the reporting is made, we treat it as **fixed**. What is random and entirely up to the experiment design is the sequence of types $W$. This allows us to construct an **exact** null distribution by varying $W$ over all its possible realizations, with no asymptotic approximation required. ::: --- ## Test Statistic {#sec-statistic} A natural test statistic: $$T_n(W, Y) \;=\; \sum_{i=1}^n W_i Y_i,$$ which counts the number of agreements between educated guesses and true types. Notice that large values of $T_n$ provide evidence against $H_0$. **Observed value.** With the data above: $$T_n^{\mathrm{obs}} = W^\top Y = 1\cdot 0 + 1\cdot 1 + 1\cdot 1 + 1\cdot 1 + 0\cdot 0 + 0\cdot 1 + 0\cdot 0 + 0\cdot 0 = 3.$$ ```{r observed-stat} Y <- c(0, 1, 1, 1, 0, 1, 0, 0) W <- c(1, 1, 1, 1, 0, 0, 0, 0) T_obs <- sum(W * Y) cat("Observed test statistic: T_obs =", T_obs) ``` --- ## The Permutation Group $\mathbf{G}_n$ {#sec-group} Under $H_0$, every vector $w \in \{0,1\}^8$ with exactly four 1s and four 0s was equally likely to have been the realized treatment assignment (compare this with the way we did it in class). Define: $$\mathbf{G}_n \;=\; \bigl\{ w \in \{0,1\}^8 : \textstyle\sum_i w_i = 4 \bigr\}.$$ The cardinality of $\mathbf{G}_n$ is $M := \#\{\mathbf{G}_n\} = \binom{8}{4} = 70$. ```{r enumerate-permutations} # Generate all elements of G_n: vectors of length 8 with exactly four 1s G_n <- as.data.frame(t(combn(8, 4))) |> apply(1, function(idx) { w <- integer(8) w[idx] <- 1L w }) |> t() cat("Number of elements in G_n:", nrow(G_n)) ``` ### All 70 permutation vectors The table below displays the complete set $\mathbf{G}_n$. Each row is a distinct treatment assignment $w \in \mathbf{G}_n$, and the last column reports $T_n(w, Y) = \sum_i w_i Y_i$. ```{r permutation-table} #| code-fold: true Y <- c(0, 1, 1, 1, 0, 1, 0, 0) # Compute T_n(w, Y) for each permutation T_vals <- apply(G_n, 1, function(w) sum(w * Y)) # Build display table perm_df <- as.data.frame(G_n) colnames(perm_df) <- paste0("w", 1:8) perm_df$T_n <- T_vals perm_df$Permutation <- seq_len(nrow(perm_df)) perm_df <- perm_df[, c("Permutation", paste0("w", 1:8), "T_n")] knitr::kable( perm_df, align = "c", caption = "All 70 elements of $\\mathbf{G}_n$ and the corresponding test statistic $T_n(w, Y)$." ) ``` --- ## Distribution of $T_n$ {#sec-dist} Under $H_0$ with a uniformly random assignment over $\mathbf{G}_n$, the test statistic $T_n$ follows a **Hypergeometric** distribution: $$T_n \;\sim\; \mathrm{Hypergeometric}(n, k, m), \quad n = 8,\; k = 4,\; m = 4.$$ This is because $T_n = \sum_{i=1}^n W_i Y_i$ counts the overlap between a randomly drawn set of 4 treated indices and a fixed set of 4 units with $Y_i = 1$. ```{r permutation-dist} #| code-fold: true dist_df <- data.frame(t = T_vals) |> count(t) |> mutate( prob = n / sum(n), cum_prob = cumsum(prob), label = paste0(n, "/70") ) ggplot(dist_df, aes(x = factor(t), y = prob)) + geom_col(fill = "#185FA5", alpha = 0.85, width = 0.55) + geom_text(aes(label = label), vjust = -0.5, size = 3.5, color = "#333333") + geom_vline(xintercept = which(levels(factor(dist_df$t)) == as.character(T_obs)), linetype = "dashed", color = "#D85A30", linewidth = 0.8) + annotate("text", x = which(levels(factor(dist_df$t)) == as.character(T_obs)) + 0.3, y = max(dist_df$prob) * 0.9, label = expression(T[n]^obs == 3), color = "#D85A30", size = 4, hjust = 0) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), expand = expansion(mult = c(0, 0.12))) + labs( x = expression(T[n]), y = "Probability", title = "Distribution of $T_n$ under $H_0$", caption = "Hypergeometric(n=8, k=4, m=4). Dashed line marks the observed statistic." ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", size = 13), panel.grid.major.x = element_blank() ) ``` ### Tabulated probabilities Since we know the distribution of $T_n$, we can tabulate the probabilities. ```{r prob-table} knitr::kable( dist_df |> select(t, n, prob, cum_prob) |> rename(`$t$` = t, `Count` = n, `$\\mathrm{P}[T_n = t]$` = prob, `$\\mathrm{P}[T_n \\leq t]$` = cum_prob) |> mutate(across(where(is.numeric), ~round(., 4))), align = "c", caption = "Distribution of $T_n$." ) ``` and calculate the critical values for any fixed nominal level $\alpha\in(0,1)$, denoted $\hat{c}_n(1-\alpha)$. That is, $\hat{c}_n(1-\alpha)$ is the smallest $t$ in the support of $T_n$ such that the cumulative distribution function (CDF) of the distribution exceeds $1 - \alpha$: $$\hat{c}_n(1 - \alpha) = \inf\bigl\{ t \in \{0, \ldots, 4\} : \mathrm{P}[T_n \leq t] \geq 1 - \alpha \bigr\}.$$ For $\alpha = 0.05$, $\hat{c}_n(0.95)=3$. Since $T_n^{\mathrm{obs}} = 3 \not> 3$, we **do not reject $H_0$** at level $\alpha = 0.05$. --- ## Another look at the critical values {#sec-critical} Let $T_n^{(1)} \leq T_n^{(2)} \leq \cdots \leq T_n^{(70)}$ denote the order statistics of $\{T_n(w, Y) : w \in \mathbf{G}_n\}$. Fix a nominal level $\alpha \in (0,1)$. Define: $$k \;=\; M - \lfloor M\alpha \rfloor,$$ where $\lfloor \cdot \rfloor$ denotes the floor function. The **permutation critical value** is then defined as the $k$-th order statistic: $$\hat{c}_n(1 - \alpha) \;:=\; T_n^{(k)}.$$ For $\alpha = 0.05$: $$k = 70 - \lfloor 70 \times 0.05 \rfloor = 70 - 3 = 67, \qquad \hat{c}_n(0.95) = T_n^{(67)} = 3.$$ ```{r critical-value} alpha <- 0.05 M <- nrow(G_n) k <- M - floor(M * alpha) c_hat <- sort(T_vals)[k] cat(sprintf("alpha = %.2f | M = %d | k = %d | c_hat(1-alpha) = %d\n", alpha, M, k, c_hat)) ``` So we obtained the same critical value by brute force. **Decision rule.** Reject $H_0$ if and only if $T_n^{\mathrm{obs}} > \hat{c}_n(1-\alpha)$. Since $T_n^{\mathrm{obs}} = 3 \not> 3$, we **do not reject $H_0$** at level $\alpha = 0.05$. ::: {.callout-note} ## Exact size control The permutation test achieves exact finite-sample size control. That is, for any $\alpha \in (0,1)$ and regardless of the distribution of $Y$: $$\mathrm{P}_{H_0}\bigl[T_n > \hat{c}_n(1-\alpha)\bigr] \;\leq\; \alpha.$$ This follows from the fact that, under $H_0$, each of the $M = 70$ values of $T_n(w, Y)$ is equally likely, so the tail probability above $\hat c_n(1-\alpha)$ is exactly $\lfloor M\alpha \rfloor / M \leq \alpha$. ::: --- ## Interactive Simulation {#sec-interactive} The widget below lets you **explore how the permutation distribution and critical value change** as you modify the outcome vector $Y$ and the significance level $\alpha$. The treatment assignment is fixed at $W = (1,1,1,1,0,0,0,0)$. ```{=html} <style> .perm-widget { font-family: 'Georgia', serif; background: #f8f9fb; border: 1px solid #dde3ec; border-radius: 10px; padding: 24px 28px 20px 28px; margin: 1.5em 0; max-width: 820px; } .perm-widget h4 { margin: 0 0 16px 0; font-size: 1.05em; color: #1a2e4a; border-bottom: 2px solid #185FA5; padding-bottom: 6px; } .perm-controls { display: flex; flex-wrap: wrap; gap: 18px; align-items: flex-end; margin-bottom: 20px; } .perm-control-group { display: flex; flex-direction: column; gap: 5px; } .perm-control-group label { font-size: 0.82em; font-weight: 600; color: #444; letter-spacing: 0.03em; text-transform: uppercase; } .perm-y-toggles { display: flex; gap: 6px; } .perm-y-btn { width: 38px; height: 38px; border: 2px solid #185FA5; border-radius: 6px; background: white; color: #185FA5; font-weight: 700; font-size: 1em; cursor: pointer; transition: all 0.15s; display: flex; align-items: center; justify-content: center; } .perm-y-btn.active { background: #185FA5; color: white; } .perm-y-btn:hover { opacity: 0.85; } .perm-alpha-group { display: flex; align-items: center; gap: 10px; } .perm-alpha-group input[type=range] { width: 120px; accent-color: #185FA5; } .perm-alpha-val { font-weight: 700; color: #185FA5; font-size: 1em; min-width: 38px; } .perm-run-btn { padding: 9px 22px; background: #185FA5; color: white; border: none; border-radius: 6px; font-size: 0.92em; font-weight: 600; cursor: pointer; transition: background 0.15s; letter-spacing: 0.02em; } .perm-run-btn:hover { background: #134d8a; } .perm-results { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; margin-bottom: 18px; } @media (max-width: 600px) { .perm-results { grid-template-columns: 1fr; } } .perm-stat-box { background: white; border: 1px solid #dde3ec; border-radius: 8px; padding: 14px 18px; } .perm-stat-box .label { font-size: 0.78em; text-transform: uppercase; letter-spacing: 0.05em; color: #777; font-weight: 600; margin-bottom: 4px; } .perm-stat-box .value { font-size: 1.55em; font-weight: 700; color: #1a2e4a; } .perm-stat-box .value.reject { color: #b91c1c; } .perm-stat-box .value.fail { color: #15803d; } .perm-chart-wrap { position: relative; height: 220px; background: white; border: 1px solid #dde3ec; border-radius: 8px; padding: 10px 12px 6px; margin-bottom: 14px; } .perm-dist-table { width: 100%; font-size: 0.85em; border-collapse: collapse; } .perm-dist-table th { background: #185FA5; color: white; padding: 6px 10px; text-align: center; font-size: 0.82em; letter-spacing: 0.03em; } .perm-dist-table td { border-bottom: 1px solid #eee; padding: 5px 10px; text-align: center; color: #333; } .perm-dist-table tr.crit-row td { background: #fff3cd; font-weight: 700; color: #D85A30; } .perm-idx-display { font-size: 0.85em; color: #555; margin-top: 4px; font-style: italic; } </style> <div class="perm-widget"> <h4>Interactive Permutation Test Explorer</h4> <div class="perm-controls"> <div class="perm-control-group"> <label>Outcome vector Y <span style="font-weight:400;color:#888;">(click to toggle 0/1)</span></label> <div class="perm-y-toggles" id="perm-y-toggles"></div> <div class="perm-idx-display" id="perm-y-display"></div> </div> <div class="perm-control-group"> <label>Significance level α</label> <div class="perm-alpha-group"> <input type="range" id="perm-alpha" min="0.01" max="0.20" step="0.01" value="0.05"> <span class="perm-alpha-val" id="perm-alpha-val">0.05</span> </div> </div> <div class="perm-control-group"> <label> </label> <button class="perm-run-btn" id="perm-run-btn" onclick="runPermTest()">Update</button> </div> </div> <div class="perm-results"> <div class="perm-stat-box"> <div class="label">Observed statistic T<sub>n</sub><sup>obs</sup></div> <div class="value" id="perm-tobs">—</div> </div> <div class="perm-stat-box"> <div class="label">Critical value ĉ<sub>n</sub>(1−α)</div> <div class="value" id="perm-cv">—</div> </div> <div class="perm-stat-box"> <div class="label">k = M − ⌊Mα⌋</div> <div class="value" id="perm-k">—</div> </div> <div class="perm-stat-box"> <div class="label">Decision</div> <div class="value" id="perm-decision">—</div> </div> </div> <div class="perm-chart-wrap"> <canvas id="perm-chart"></canvas> </div> <table class="perm-dist-table" id="perm-dist-table"> <thead> <tr> <th>t</th> <th>Count</th> <th>P[T<sub>n</sub> = t]</th> <th>P[T<sub>n</sub> ≤ t]</th> </tr> </thead> <tbody id="perm-dist-tbody"></tbody> </table> </div> <script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.js"></script> <script> (function() { // Fixed W with four 1s const W = [1,1,1,1,0,0,0,0]; const n = 8; // Initial Y let Y = [0,1,1,1,0,1,0,0]; // Generate all combinations of 4 indices from 0..7 function combinations(arr, k) { const result = []; function helper(start, current) { if (current.length === k) { result.push([...current]); return; } for (let i = start; i < arr.length; i++) { current.push(arr[i]); helper(i + 1, current); current.pop(); } } helper(0, []); return result; } const combs = combinations([0,1,2,3,4,5,6,7], 4); const G_n = combs.map(idx => { const w = Array(8).fill(0); idx.forEach(i => w[i] = 1); return w; }); const M = G_n.length; // 70 // Build Y toggle buttons const togglesDiv = document.getElementById('perm-y-toggles'); Y.forEach((val, i) => { const btn = document.createElement('button'); btn.className = 'perm-y-btn' + (val === 1 ? ' active' : ''); btn.textContent = val; btn.dataset.idx = i; btn.addEventListener('click', function() { const idx = parseInt(this.dataset.idx); Y[idx] = 1 - Y[idx]; this.textContent = Y[idx]; this.classList.toggle('active', Y[idx] === 1); updateDisplay(); }); togglesDiv.appendChild(btn); }); function updateDisplay() { document.getElementById('perm-y-display').textContent = 'Y = (' + Y.join(', ') + ')'; } updateDisplay(); // Alpha slider const alphaSlider = document.getElementById('perm-alpha'); const alphaVal = document.getElementById('perm-alpha-val'); alphaSlider.addEventListener('input', function() { alphaVal.textContent = parseFloat(this.value).toFixed(2); }); let permChart = null; window.runPermTest = function() { const alpha = parseFloat(alphaSlider.value); // Compute all T_n(w, Y) const T_vals = G_n.map(w => w.reduce((s, wi, i) => s + wi * Y[i], 0)); const T_obs = W.reduce((s, wi, i) => s + wi * Y[i], 0); // Order statistics const T_sorted = [...T_vals].sort((a,b) => a - b); const k = M - Math.floor(M * alpha); const c_hat = T_sorted[k - 1]; // 1-indexed k // Distribution const tMin = Math.min(...T_vals), tMax = Math.max(...T_vals); const distMap = {}; for (let t = tMin; t <= tMax; t++) distMap[t] = 0; T_vals.forEach(t => distMap[t]++); const tVals = Object.keys(distMap).map(Number).sort((a,b) => a-b); const counts = tVals.map(t => distMap[t]); const probs = counts.map(c => c / M); let cumProb = 0; const cumProbs = probs.map(p => { cumProb += p; return Math.round(cumProb * 10000) / 10000; }); // Update stat boxes document.getElementById('perm-tobs').textContent = T_obs; document.getElementById('perm-cv').textContent = c_hat; document.getElementById('perm-k').textContent = k + ' (M=' + M + ')'; const reject = T_obs > c_hat; const decEl = document.getElementById('perm-decision'); decEl.textContent = reject ? 'Reject H₀' : 'Fail to Reject H₀'; decEl.className = 'value ' + (reject ? 'reject' : 'fail'); // Chart const bgColors = tVals.map(t => t > c_hat ? '#D85A3099' : '#185FA599'); if (permChart) permChart.destroy(); permChart = new Chart(document.getElementById('perm-chart'), { type: 'bar', data: { labels: tVals, datasets: [{ data: probs, backgroundColor: bgColors, borderWidth: 0, barPercentage: 0.55, categoryPercentage: 0.7 }] }, options: { responsive: true, maintainAspectRatio: false, plugins: { legend: { display: false }, tooltip: { callbacks: { label: ctx => 'P[Tₙ = ' + ctx.label + '] = ' + ctx.raw.toFixed(4) } }, annotation: {} }, scales: { x: { title: { display: true, text: 'Tₙ', font: { size: 12 } }, grid: { display: false } }, y: { title: { display: true, text: 'Probability', font: { size: 11 } }, ticks: { callback: v => (v*100).toFixed(0) + '%', maxTicksLimit: 5 }, beginAtZero: true } } } }); // Table const tbody = document.getElementById('perm-dist-tbody'); tbody.innerHTML = ''; tVals.forEach((t, i) => { const tr = document.createElement('tr'); if (t === c_hat) tr.className = 'crit-row'; tr.innerHTML = `<td>${t}${t === c_hat ? ' ← ĉₙ' : ''}</td> <td>${counts[i]}/70</td> <td>${probs[i].toFixed(4)}</td> <td>${cumProbs[i].toFixed(4)}</td>`; tbody.appendChild(tr); }); }; // Run on load window.runPermTest(); })(); </script> ``` ::: {.callout-tip} ## How to use the widget - **Toggle $Y_i$**: click any of the eight buttons to flip the corresponding outcome between 0 and 1. - **Adjust $\alpha$**: drag the slider to change the nominal level. - Press **Update** to recompute the permutation distribution, the critical value, and the decision. - Bars shaded in **red** correspond to values of $T_n$ that fall in the rejection region $\{t : t > \hat{c}_n(1-\alpha)\}$. - The row highlighted in **yellow** in the table marks the critical value. ::: --- ## Key Takeaways {#sec-takeaways} 1. **Exact finite-sample validity.** The permutation test controls size exactly for any finite $n$, with no distributional assumption on $Y$. 2. **The critical value is an order statistic.** $\hat{c}_n(1-\alpha) = T_n^{(k)}$ with $k = M - \lfloor M\alpha \rfloor$ is the $k$-th largest value of the statistic across all permutations. 3. **The permutation distribution is discrete.** For small samples, the achievable significance levels are restricted to multiples of $1/M$. In this example, $M = 70$, so the finest achievable level is $1/70 \approx 0.014$. 4. **Equivalence with the hypergeometric.** Under $H_0$, $T_n = \sum_i W_i Y_i \sim \mathrm{Hypergeometric}(n, k, m)$. The permutation distribution is thus fully characterized by $n$, the number of treated units, and the number of $Y_i = 1$. 5. **Insufficient power in small samples.** With $n = 8$ and $\alpha = 0.05$, the critical value is $\hat{c}_n(0.95) = 3$ and $T_n^{\mathrm{obs}} = 3$, so we fail to reject $H_0$. This reflects a genuine power limitation of exact tests with very small samples.

2 Permutation Tests

2.1 Setup and Hypothesis

2.2 Test Statistic

2.3 The Permutation Group \(\mathbf{G}_n\)

2.3.1 All 70 permutation vectors

2.4 Distribution of \(T_n\)

2.4.1 Tabulated probabilities

2.5 Another look at the critical values

2.6 Interactive Simulation

Interactive Permutation Test Explorer

2.7 Key Takeaways