Latin-Square Designs and ANOVA#
Classical Latin-square designs let you test whether several factors matter, after removing two known nuisance sources of variation, using a remarkably small number of runs. They are the natural generalization of randomized complete-block designs to more than one blocking variable [Cochran and Cox, 1957, Fisher, 1935].
When to use a Latin-style design#
You have one experimental factor whose effect you want to estimate (call it the treatment) and two or more nuisance factors you want to block out: time of day, operator, batch of raw material, position in a plate, …
A \(k \times k\) Latin square uses \(k^2\) runs to study one treatment across two blocking factors at \(k\) levels each. A Graeco-Latin square adds a second treatment at the same total cost. A hyper-Graeco-Latin square adds a third. All four designs assume no interactions between the factors — they are additive-effects designs. If you suspect interactions matter, use a factorial design instead.
discopt.doe provides:
Function |
Factors |
Total runs |
|---|---|---|
|
1 |
\(k\) |
|
2 |
\(k^2\) (RCB) |
|
3 |
\(k^2\) |
|
4 |
\(k^2\) |
|
5 |
\(k^2\) |
anova_report then produces an additive-effects F-table from the
completed runs — Type-I sums of squares, F-statistics, p-values, with
support for two-way interactions if you want to test the no-interaction
assumption explicitly.
Plan#
Build a 4×4 Latin square design programmatically; inspect the randomization and balance.
Synthesize a response with known true effects and one blocking nuisance, then recover the effects via
anova_report.Show how to add a Graeco-Latin square (4 factors at the same cost).
End with a quick CLI round-trip showing the same workflow from
discopt doe new latin-squarethroughdiscopt doe anova.
import os
os.environ["JAX_PLATFORMS"] = "cpu"
import numpy as np
import matplotlib.pyplot as plt
from discopt.doe import (
AnovaTable,
anova_report,
graeco_latin_square,
hyper_graeco_latin_square,
latin_square,
latin_square_design,
)
rng = np.random.default_rng(0)
1. A 4×4 Latin square#
A Latin square is a \(k \times k\) matrix of \(k\) symbols such that every symbol appears exactly once in each row and once in each column. With \(k = 4\), the unique 4×4 standard square is::
A B C D
B A D C
C D A B
D C B A
latin_square(k) returns a randomized version. The rows correspond
to one nuisance factor, the columns to a second, and the symbol
inside each cell to the treatment of interest.
square = latin_square(4, seed=1)
print("Latin square (randomized):")
print(np.array(square))
print()
print("Each row contains each symbol once:")
print({i: sorted(set(row)) for i, row in enumerate(square)})
Latin square (randomized):
[[0 1 3 2]
[3 2 0 1]
[2 3 1 0]
[1 0 2 3]]
Each row contains each symbol once:
{0: [0, 1, 2, 3], 1: [0, 1, 2, 3], 2: [0, 1, 2, 3], 3: [0, 1, 2, 3]}
Materializing the run sheet#
For data analysis we want one row per experiment, with explicit
columns for the row block, column block, and treatment. The helper
latin_square_design does that — it returns balanced row dicts
with optional replicates.
design = latin_square_design(
factors={
"row": ["morning", "noon", "afternoon", "evening"],
"column": ["bench-1", "bench-2", "bench-3", "bench-4"],
"treatment": ["A", "B", "C", "D"],
},
replicates=1,
seed=2,
)
print(f"{len(design.rows)} runs from a 4x4 Latin square")
print()
for r in design.rows[:6]:
print(r)
print("...")
16 runs from a 4x4 Latin square
{'row': 'evening', 'column': 'bench-1', 'treatment': 'A', 'replicate': 0, 'run_order': 0}
{'row': 'afternoon', 'column': 'bench-4', 'treatment': 'A', 'replicate': 0, 'run_order': 1}
{'row': 'noon', 'column': 'bench-4', 'treatment': 'C', 'replicate': 0, 'run_order': 2}
{'row': 'evening', 'column': 'bench-2', 'treatment': 'C', 'replicate': 0, 'run_order': 3}
{'row': 'noon', 'column': 'bench-3', 'treatment': 'A', 'replicate': 0, 'run_order': 4}
{'row': 'morning', 'column': 'bench-1', 'treatment': 'C', 'replicate': 0, 'run_order': 5}
...
Balance check: every treatment appears the same number of times in each block. That’s exactly what makes ANOVA on a Latin square clean — the main effects are orthogonal.
from collections import Counter
print("treatment counts by row block:")
for block in ["morning", "noon", "afternoon", "evening"]:
cnt = Counter(r["treatment"] for r in design.rows if r["row"] == block)
print(f" {block:10s}: {dict(cnt)}")
treatment counts by row block:
morning : {'C': 1, 'B': 1, 'D': 1, 'A': 1}
noon : {'C': 1, 'A': 1, 'B': 1, 'D': 1}
afternoon : {'A': 1, 'D': 1, 'B': 1, 'C': 1}
evening : {'A': 1, 'C': 1, 'B': 1, 'D': 1}
2. Synthesize responses and recover the effects#
We simulate a response with
a treatment effect: A=0, B=2, C=5, D=3,
a row-block nuisance: morning=0, noon=1, afternoon=-1, evening=2,
a column-block nuisance: 0, 0.5, -0.5, 1,
Gaussian noise with σ=0.5.
If the design works, anova_report should declare the treatment
factor significant and let the blocks soak up their nuisance
variance.
true_treatment = {"A": 0.0, "B": 2.0, "C": 5.0, "D": 3.0}
true_row = {"morning": 0.0, "noon": 1.0, "afternoon": -1.0, "evening": 2.0}
true_column = {"bench-1": 0.0, "bench-2": 0.5, "bench-3": -0.5, "bench-4": 1.0}
rng = np.random.default_rng(3)
rows_with_response = []
for r in design.rows:
y = (
true_treatment[r["treatment"]]
+ true_row[r["row"]]
+ true_column[r["column"]]
+ 0.5 * rng.normal()
)
rows_with_response.append({**r, "y": float(y)})
# Look at the first few synthetic measurements
for r in rows_with_response[:5]:
print(r)
{'row': 'evening', 'column': 'bench-1', 'treatment': 'A', 'replicate': 0, 'run_order': 0, 'y': 3.0204595606925912}
{'row': 'afternoon', 'column': 'bench-4', 'treatment': 'A', 'replicate': 0, 'run_order': 1, 'y': -1.2778325156570909}
{'row': 'noon', 'column': 'bench-4', 'treatment': 'C', 'replicate': 0, 'run_order': 2, 'y': 7.209049423362889}
{'row': 'evening', 'column': 'bench-2', 'treatment': 'C', 'replicate': 0, 'run_order': 3, 'y': 7.216115196936035}
{'row': 'noon', 'column': 'bench-3', 'treatment': 'A', 'replicate': 0, 'run_order': 4, 'y': 0.27367535394477704}
Running ANOVA#
anova_report(rows, response, factors=None) auto-detects the factor
columns (everything except the response and bookkeeping columns like
replicate, run_order, _*). It returns an AnovaTable whose
summary() method prints the familiar F-table.
table: AnovaTable = anova_report(
rows_with_response,
response="y",
factors=["row", "column", "treatment"],
)
print(table.summary())
Source SS df MS F p
-------------------------------------------------------------------------
row 31.6961 3 10.5654 24.900 0.0008736
column 3.5697 3 1.1899 2.804 0.1307
treatment 53.3333 3 17.7778 41.897 0.0002033
Residual 2.5459 6 0.4243 --- ---
Total 91.1450 15 0.0000 --- ---
Reading the table:
treatmentis highly significant (large F, tiny p-value) — the design correctly recovered the synthetic effect.The two blocks (
row,column) absorbed their nuisance variance; if either one’s effect had been small you would see a high p-value there, and you could justify dropping the block from the model.Residual MS is close to \(\sigma^2 = 0.25\), the variance we used for the noise.
You can also pull out the per-effect dataclasses for programmatic use::
[e.source for e in table.rows]
print("Per-effect summary:")
for e in table.rows:
f_str = f"{e.f:7.2f}" if e.f is not None else " --"
p_str = f"{e.p:.4e}" if e.p is not None else " --"
print(f" {e.source:14s} SS={e.ss:7.3f} df={e.df:2d} F={f_str} p={p_str}")
Per-effect summary:
row SS= 31.696 df= 3 F= 24.90 p=8.7363e-04
column SS= 3.570 df= 3 F= 2.80 p=1.3067e-01
treatment SS= 53.333 df= 3 F= 41.90 p=2.0332e-04
Residual SS= 2.546 df= 6 F= -- p= --
Total SS= 91.145 df=15 F= -- p= --
Testing the no-interaction assumption#
A Latin square cannot estimate interactions in general (the design
doesn’t have enough degrees of freedom). But if you have
replicates — independent repeats of the entire square — you can
add a replicate column and request specific 2-way interactions via
the interactions= argument.
design_rep = latin_square_design(
factors={
"row": ["morning", "noon", "afternoon", "evening"],
"column": ["bench-1", "bench-2", "bench-3", "bench-4"],
"treatment": ["A", "B", "C", "D"],
},
replicates=2,
seed=4,
)
rng = np.random.default_rng(5)
rows_rep = []
for r in design_rep.rows:
y = (
true_treatment[r["treatment"]]
+ true_row[r["row"]]
+ true_column[r["column"]]
+ 0.5 * rng.normal()
)
rows_rep.append({**r, "y": float(y)})
table_int = anova_report(
rows_rep, response="y",
factors=["row", "column", "treatment"],
interactions=[("row", "treatment")],
)
print(table_int.summary())
Source SS df MS F p
-------------------------------------------------------------------------
row 33.9749 3 11.3250 --- ---
column 9.6312 3 3.2104 --- ---
treatment 107.1763 3 35.7254 --- ---
row:treatment 4.7066 9 0.5230 --- ---
Residual -0.6587 13 -0.0507 --- ---
Total 154.8304 31 0.0000 --- ---
The row × treatment interaction is non-significant — exactly
what we’d expect, because the synthetic data was generated from an
additive model. If you saw a significant interaction here, the
no-interaction assumption that underlies the Latin square would be
violated and you should consider switching to a factorial design.
3. Graeco-Latin squares: four factors at the same cost#
A Graeco-Latin square is two mutually-orthogonal Latin squares superimposed. The second symbol set adds a fourth factor without any extra runs — orthogonality means the two treatments are still balanced against each other and against both blocks.
graeco_latin_square(k) returns a list of two MOLS. The convenience
wrapper latin_square_design accepts four factor columns and routes
to it automatically.
gl_design = latin_square_design(
factors={
"row": ["t1", "t2", "t3", "t4"],
"column": ["b1", "b2", "b3", "b4"],
"treatment_A": ["A1", "A2", "A3", "A4"],
"treatment_B": ["B1", "B2", "B3", "B4"],
},
seed=6,
)
print(f"{len(gl_design.rows)} runs — still 16, now with 4 factors")
print()
for r in gl_design.rows[:6]:
print(r)
16 runs — still 16, now with 4 factors
{'row': 't4', 'column': 'b2', 'treatment_A': 'A4', 'treatment_B': 'B1', 'replicate': 0, 'run_order': 0}
{'row': 't1', 'column': 'b2', 'treatment_A': 'A2', 'treatment_B': 'B4', 'replicate': 0, 'run_order': 1}
{'row': 't3', 'column': 'b1', 'treatment_A': 'A4', 'treatment_B': 'B4', 'replicate': 0, 'run_order': 2}
{'row': 't1', 'column': 'b4', 'treatment_A': 'A4', 'treatment_B': 'B3', 'replicate': 0, 'run_order': 3}
{'row': 't2', 'column': 'b3', 'treatment_A': 'A4', 'treatment_B': 'B2', 'replicate': 0, 'run_order': 4}
{'row': 't3', 'column': 'b3', 'treatment_A': 'A2', 'treatment_B': 'B3', 'replicate': 0, 'run_order': 5}
For \(k=2\) no MOLS exist; for \(k=6\) Euler showed none exist either
(the famous “36 officers” non-result). The generator raises
ValueError in those cases — try graeco_latin_square(6) and you’ll
see why this design doesn’t scale to arbitrary \(k\).
A hyper_graeco_latin_square(k) adds a third orthogonal symbol set,
giving five factors at \(k^2\) runs. This works for \(k \in \{4, 5,
7, 8, 9, ...\}\) — any prime power \(\geq 4\). The generator
returns three MOLS; combine them with latin_square_design for an
additive-effects design across one treatment + two blocks + three
“extra” factors.
# Hyper-Graeco-Latin for k=5 -> 25 runs covering 5 factors
hg = hyper_graeco_latin_square(5)
print(f"hyper-Graeco-Latin k=5: {len(hg)} mutually-orthogonal squares")
print(f"each square is {len(hg[0])}x{len(hg[0])}")
hyper-Graeco-Latin k=5: 3 mutually-orthogonal squares
each square is 5x5
4. CLI round trip#
The same workflow is exposed from the command line. The CLI writes
to an .xlsx workbook so a wet-lab user can fill in the response
column by hand and re-run discopt doe anova later.
discopt doe new latin-square \
--level "row:morning,noon,afternoon,evening" \
--level "column:bench-1,bench-2,bench-3,bench-4" \
--level "treatment:A,B,C,D" \
--replicates 1 \
--out latin.xlsx
# … fill in the y column in latin.xlsx …
discopt doe anova latin.xlsx
# or, with interactions:
discopt doe anova latin.xlsx --interaction "row:treatment"
The same --level NAME:v1,v2,… syntax works for graeco-latin
(four factors) and hyper-graeco-latin (five factors).
Summary#
A \(k \times k\) Latin square gives you one treatment + two blocks at \(k^2\) runs.
Graeco-Latin → 3 nuisance factors / 1 treatment, hyper-Graeco-Latin → 4 nuisance / 1 treatment, same \(k^2\) runs.
All four are additive-effects designs — the analysis assumes no interactions. Test that assumption with replicates +
interactions=if you’re unsure.anova_reportis the same function for any balanced design; it auto-detects factor columns and handles unbalanced data with a warning.