Design of Experiments from the Command Line#
discopt.doe ships a full Fisher-information-matrix-based design API,
but writing a Python Experiment subclass and stitching the
design \(\to\) measure \(\to\) fit \(\to\) extend loop together by hand
is a lot to ask of an experimentalist whose lab notebook is a
spreadsheet. The discopt doe CLI closes that gap by wrapping the
same machinery around a single .xlsx workbook that moves between
the bench and the terminal.
Classical response-surface methodology [Box et al., 2005], optimal-design theory [Atkinson et al., 2007], and the model-based framing of parameter-precision DoE [Franceschini and Macchietto, 2008] all trace back to Fisher’s foundational treatment of randomized experimentation [Fisher, 1935]. The CLI gives you a five-verb loop on top of that legacy:
discopt doe templates # list built-in templates
discopt doe new TEMPLATE ... -o file # initial D-optimal batch
discopt doe status file # completed vs pending runs
discopt doe fit file # OLS fit + FIM update
discopt doe extend file --n M # next D-optimal batch
Every verb also accepts --json, so an LLM agent (or a future GUI)
can drive the same workflow without parsing prose. This notebook
walks through the loop end-to-end against a synthetic ground-truth
response surface so the recovered parameters can be checked
against the truth.
import os
os.environ["JAX_PLATFORMS"] = "cpu"
os.environ["JAX_ENABLE_X64"] = "1"
import json
from pathlib import Path
import numpy as np
import openpyxl
from discopt.doe.cli import (
ExtendParams,
NewParams,
do_extend,
do_fit,
do_new,
do_status,
do_templates,
)
1. Templates#
do_templates() is the in-process equivalent of
discopt doe templates --json. Four built-in templates cover the
classical response-surface menu — all linear in parameters, so the
first batch needs no parameter prior.
info = do_templates()
for t in info["templates"]:
print(f" {t['name']:<24} {t['description']}")
2. New campaign — initial D-optimal batch#
The example below is a two-factor full-quadratic response surface
(temperature and pH for yield in a notional chemistry lab).
do_new writes a workbook with n=6 pending runs placed at the
D-optimal corners of the factor space, plus a metadata sheet that
lets every subsequent command reconstruct the experiment from the
file alone.
wb_path = Path("campaign.xlsx")
if wb_path.exists():
wb_path.unlink()
new_out = do_new(
NewParams(
output=wb_path,
n=10,
inputs=[("temp", 50.0, 100.0), ("ph", 3.0, 9.0)],
response_name="yield",
measurement_error=0.05,
criterion="determinant",
seed=0,
n_starts=5,
template="response-surface-2d",
)
)
print("template :", new_out["template"])
print("parameter_names:", new_out["parameter_names"])
print("workbook_path :", new_out["workbook_path"])
print("next_command :", new_out["next_command"])
print("\nInitial design points:")
for row in new_out["designs"]:
print(f" run {row['run_id']:>2}: temp={row['temp']:6.2f} ph={row['ph']:4.2f}")
3. Run the experiments#
In the lab you would open campaign.xlsx, execute each run, type
the measured yield value into the runs sheet, and save the file.
Here we substitute a known ground-truth response with a small
amount of noise so the downstream fits can be validated.
truth = {
"b0": 12.0,
"b1": 0.30,
"b2": -0.80,
"b11": -0.002,
"b22": 0.05,
"b12": 0.01,
}
def predict(temp, ph):
return (
truth["b0"]
+ truth["b1"] * temp
+ truth["b2"] * ph
+ truth["b11"] * temp * temp
+ truth["b22"] * ph * ph
+ truth["b12"] * temp * ph
)
rng = np.random.default_rng(42)
wb = openpyxl.load_workbook(wb_path)
runs = wb["runs"]
headers = [c.value for c in runs[1]]
i_temp = headers.index("temp")
i_ph = headers.index("ph")
i_y = headers.index("yield")
for row in runs.iter_rows(min_row=2):
if row[0].value is None:
continue
t, p = float(row[i_temp].value), float(row[i_ph].value)
row[i_y].value = float(predict(t, p) + rng.normal(0, 0.05))
wb.save(wb_path)
status = do_status({"workbook": str(wb_path)})
print(
f"{status['n_completed']} completed, "
f"{status['n_pending']} pending, {status['n_total']} total."
)
4. Fit parameters#
do_fit reads the completed rows, performs an ordinary-least-
squares fit against the template’s design matrix, updates the
parameters and fim sheets, and returns a structured payload
with point estimates and the log-determinant of the FIM (the
objective optimized by the D-optimal criterion
[Atkinson et al., 2007]).
fit_out = do_fit({"workbook": str(wb_path)})
estimates = {p["name"]: p["estimate"] for p in fit_out["parameters"]}
print(f"n_observations : {fit_out['n_observations']}")
print(f"log_det_fim : {fit_out['log_det_fim']:.3f}\n")
print(f"{'name':<6} {'truth':>10} {'estimate':>10}")
for name, true_val in truth.items():
print(f"{name:<6} {true_val:>10.4f} {estimates[name]:>10.4f}")
5. Extend — next D-optimal batch#
do_extend loads the persisted FIM from the workbook, uses it as a
prior, and appends a fresh batch of D-optimal runs that maximize
information given what we already learned. The new rows show up
in the runs sheet with the next batch index and blank response
cells, ready for the lab.
ext_out = do_extend(ExtendParams(workbook=wb_path, n=4, n_starts=5))
print(f"batch : {ext_out['batch']}")
print(f"new run ids : {ext_out['new_run_ids']}\n")
for row in ext_out["designs"]:
print(f" run {row['run_id']:>2}: temp={row['temp']:6.2f} ph={row['ph']:4.2f}")
status = do_status({"workbook": str(wb_path)})
print(
f"\nworkbook now: {status['n_completed']} completed, "
f"{status['n_pending']} pending, {status['n_total']} total."
)
6. JSON output for agents and GUIs#
The do_* functions return plain dictionaries; the CLI’s --json
flag is just a serializer over that contract. An LLM agent (or a
Streamlit / NiceGUI / Marimo front-end) can read next_command and
status fields to drive the loop without scraping text.
print(json.dumps(do_status({"workbook": str(wb_path)}), indent=2, default=str))
7. The escape hatch — custom Experiment subclasses#
When the four built-in templates are not enough — nonlinear
kinetics, dynamic models, mechanistic chemistry — the same five
verbs accept an --module pkg.mod:MyExperiment argument that
imports any discopt.estimate.Experiment subclass. The workbook
stores the import path plus an initial parameter guess, and
extend keeps using the FIM machinery against the custom model.
Fitting nonlinear-in-parameters models from the CLI is still on
the roadmap; for now the escape hatch covers design (new /
extend) and the Python discopt.estimate API covers fitting.
8. Cleanup#
wb_path.unlink(missing_ok=True)
See also#
Model-Based Design of Experiments — the underlying Python API and the math.
Parameter Estimation — fitting nonlinear-in-parameters models from data.
Identifiability and Estimability — diagnosing whether a model can be estimated from a given dataset.