Head-to-head: discopt vs. SUSPECT#
discopt ships a sound structure detector — it labels expressions CONVEX /
CONCAVE / AFFINE, classifies monotonicity, and computes interval bounds, each
verdict backed by a proof (an interval Hessian, interval gradient, or interval
value enclosure). The natural question is how it stacks up against the reference
implementation in the field: SUSPECT Ceccon et al. [2020], the MINLP
special-structure detector from Misener’s group, referenced directly in the
project’s convexity roadmap (issue #38).
This notebook documents a direct, agree/disagree head-to-head of the two detectors over a single shared corpus of 40 curated instances (43 compared items), on the three verdict axes SUSPECT reports.
Axis |
|
What a positive verdict means |
|---|---|---|
Convexity |
|
A proof of curvature via the interval-Hessian αBB test [Adjiman et al., 1998] |
Monotonicity |
|
A proof: the interval gradient lies in the nonneg/nonpos orthant on the box |
Interval bounds |
|
A sound forward natural-range enclosure |
Methodology: one corpus, identical mathematics#
Both tools are driven from the same neutral expression AST (scripts/suspect_oracle/corpus.py),
rendered into a discopt model and a Pyomo model by two thin renderers. So the
comparison is genuinely on identical math, not two independently hand-written
models that might silently differ.
SUSPECT itself cannot run in discopt’s environment — cog-suspect 2.1.3 is
unmaintained and imports np.float (removed in NumPy ≥ 1.24) and Pyomo’s
ReciprocalExpression (removed in Pyomo ≥ 6.2), neither compatible with
discopt’s NumPy 2.x / Pyomo 6.10. Its verdicts are therefore recorded once,
out of process in a pinned environment, into the committed golden file
suspect_verdicts.json. This notebook (like the live parity tests) imports only
the neutral corpus and the discopt renderer — never SUSPECT — and compares
discopt’s live verdicts against that golden file.
A soundness asymmetry shapes what we assert. discopt’s verdicts are
rigorous: convexity rests on a sound interval Hessian [Moore, 1966, Neumaier, 1990]
built by forward-mode interval AD [Griewank and Walther, 2008] whose minimum
eigenvalue is lower-bounded by an interval Gershgorin theorem [Gershgorin, 1931],
giving a second-order proof of convexity [Boyd and Vandenberghe, 2004]. SUSPECT’s convexity
axis is likewise sound, so there we assert mutual no-contradiction. But
SUSPECT’s monotonicity and bounds are heuristic and occasionally unsound (its
interval cosine ignores interior critical points, so it declares sin monotone
— and mis-bounds it — on [-3, 3], where it is neither). So on those axes the
tests rest on discopt’s own sampling-validated soundness plus
no-contradiction, rather than treating SUSPECT as ground truth.
The convexity head-to-head, computed live#
The cell below loads the shared corpus and the committed SUSPECT golden verdicts,
runs discopt’s classify_expr over every item, normalises both sides to
≤-form, and buckets each comparison into agree / discopt-stronger /
suspect-stronger / contradiction — the exact logic the CI parity test
enforces.
import json
import sys
import warnings
from collections import Counter
from pathlib import Path
# Locate the shared oracle directory (search upward from the working dir).
def _find_oracle() -> Path:
for base in [Path.cwd(), *Path.cwd().parents]:
cand = base / "scripts" / "suspect_oracle"
if (cand / "corpus.py").exists():
return cand
raise FileNotFoundError("scripts/suspect_oracle not found")
oracle = _find_oracle()
sys.path.insert(0, str(oracle))
golden = json.loads((oracle / "suspect_verdicts.json").read_text())
print("SUSPECT golden file:", golden["_meta"])
SUSPECT golden file: {'n_errors': 0, 'n_instances': 40, 'pyomo': '6.1.2', 'tool': 'cog-suspect', 'tool_version': '2.1.3'}
from corpus import INSTANCES
from discopt._jax.convexity import Curvature, classify_expr
from discopt.modeling.core import ObjectiveSense
from render_discopt import build_discopt
_NEG = {
Curvature.CONVEX: Curvature.CONCAVE,
Curvature.CONCAVE: Curvature.CONVEX,
Curvature.AFFINE: Curvature.AFFINE,
Curvature.UNKNOWN: Curvature.UNKNOWN,
}
_TOK = {
Curvature.CONVEX: "convex",
Curvature.CONCAVE: "concave",
Curvature.AFFINE: "affine",
Curvature.UNKNOWN: "unknown",
}
def _norm(t): # SUSPECT calls affine "linear"
return "affine" if t == "linear" else t
def _compatible(a, b): # affine is both convex and concave
return a == b or ("affine" in (a, b) and "unknown" not in (a, b))
def _category(d, s):
if _compatible(d, s):
return "agree"
if d != "unknown" and s == "unknown":
return "discopt_stronger"
if s != "unknown" and d == "unknown":
return "suspect_stronger"
return "contradiction"
verdicts = golden["verdicts"]
counts = Counter()
disagreements = []
with warnings.catch_warnings():
warnings.simplefilter("ignore") # wide-box interval overflow is an abstention signal
for inst in INSTANCES:
name = inst["name"]
model, cnames = build_discopt(inst)
g = verdicts[name]
items = []
if model._objective is not None and g.get("objective") is not None:
curv = classify_expr(model._objective.expression, model)
if model._objective.sense == ObjectiveSense.MAXIMIZE:
curv = _NEG[curv] # normalise "max f" to "min -f"
items.append((f"{name}::objective", _TOK[curv], _norm(g["objective"]["convexity"])))
for con, cn in zip(model._constraints, cnames):
curv = classify_expr(con.body, model)
if con.sense == ">=":
curv = _NEG[curv] # normalise to <=, as SUSPECT does
items.append((f"{name}::{cn}", _TOK[curv], _norm(g["constraints"][cn]["convexity"])))
for key, d, s in items:
cat = _category(d, s)
counts[cat] += 1
if cat != "agree":
disagreements.append((key, d, s, cat))
total = sum(counts.values())
print(f"compared {total} items across {len(INSTANCES)} instances\n")
for cat in ("agree", "discopt_stronger", "suspect_stronger", "contradiction"):
print(f" {cat:18s} {counts[cat]}")
compared 43 items across 40 instances
agree 33
discopt_stronger 5
suspect_stronger 5
contradiction 0
agree dominates, and — the soundness-critical line — there are zero
contradictions: the two sound detectors never call the same body convex on one
side and concave on the other. The disagreements are all one tool proving
curvature the other abstains on:
print("Where the two detectors differ:\n")
for key, d, s in [(k, d, s) for k, d, s, c in disagreements if c == "discopt_stronger"]:
print(f" discopt-stronger {key:38s} discopt={d:8s} suspect={s}")
print()
for key, d, s in [(k, d, s) for k, d, s, c in disagreements if c == "suspect_stronger"]:
print(f" suspect-stronger {key:38s} discopt={d:8s} suspect={s}")
Where the two detectors differ:
discopt-stronger euclidean_norm::objective discopt=convex suspect=unknown
discopt-stronger quad_over_affine::objective discopt=convex suspect=unknown
discopt-stronger exp_perspective::objective discopt=convex suspect=unknown
discopt-stronger quad_over_affine_epigraph::qoa discopt=convex suspect=unknown
discopt-stronger norm_le::soc discopt=convex suspect=unknown
suspect-stronger sin_convex_branch::objective discopt=unknown suspect=convex
suspect-stronger cos_concave_branch::objective discopt=unknown suspect=convex
suspect-stronger asin_convex_branch::objective discopt=unknown suspect=convex
suspect-stronger acos_concave_branch::objective discopt=unknown suspect=convex
suspect-stronger atan_concave_branch::objective discopt=unknown suspect=convex
Reading the disagreements#
The 5 discopt-stronger items are cone primitives SUSPECT 2.1.3 does not
recognise, proven by discopt’s special-pattern recognizers:
euclidean_norm—sqrtof a PSD quadratic (the 2-norm),quad_over_affine—x²/ywithy > 0,exp_perspective—y·exp(x/y), the perspective ofexp,quad_over_affine_epigraph— annlp_cvx_108-style fractional epigraph,norm_le— a second-order-cone constraint.
The 5 SUSPECT-stronger items are SUSPECT’s bound-aware trig / inverse-trig
rules — on a sign-restricted branch it proves sin convex on [π, 2π], cos
concave on [-π/2, π/2], and asin/acos/atan curvature on (0,1) / x>0,
which discopt’s detector currently leaves UNKNOWN. These are tracked
detector gaps: the parity test pins them in KNOWN_SUSPECT_STRONGER, so a newly
appearing gap fails loudly rather than being silently tolerated, and closing one
(adding the bound-aware trig curvature rule) is a clean follow-up.
Monotonicity and interval bounds#
The other two axes are compared the same way (see
python/tests/test_monotonicity_suspect_parity.py and
test_fbbt_bounds_suspect_parity.py), but because SUSPECT is heuristic there,
every discopt verdict is also validated against a dense numeric sampling of
the real body — the comparison rests on discopt’s own soundness, not on
trusting SUSPECT.
Axis |
agree |
discopt-stronger |
suspect-stronger |
contradiction |
|---|---|---|---|---|
Convexity |
33 |
5 |
5 |
0 |
Monotonicity |
40 |
0 |
3 |
0 |
Bounds |
— sound on all 43 items, 0 disjoint enclosures — |
On monotonicity the 3 SUSPECT-stronger items are instructive: sqrt_concave and
cubic are sound discopt conservatism (a domain-edge / sub-ULP abstention),
while sine is a case where SUSPECT claims monotonicity that is false — and
discopt correctly abstains. For bounds, discopt’s forward enclosure is tighter
on quadratics / powers / tan (which SUSPECT leaves unbounded) and looser on
constraint bodies (where SUSPECT folds in the RHS via full FBBT); the two
enclosures always overlap.
Performance: the sparse interval-Hessian#
The convexity certificate’s cost is dominated by the interval Hessian. A naive
implementation materialises a dense n×n interval matrix at every node of the
expression DAG — O(N·n²) for an N-term separable expression, even though such
a Hessian has only O(N) nonzeros. discopt instead carries the gradient and
Hessian as sparse maps of scalar intervals (a missing key is an exact
structural zero) and materialises the single n×n matrix once, at the root, only
when the Gershgorin eigenvalue bound [Gershgorin, 1931] needs it. This turns
certify_convex from O(N·n²) to O(N) while preserving the proof (outward
rounding everywhere, the αBB interval-Hessian foundation of [Adjiman et al., 1998]
intact).
On a separable convex sum over N variables, per-node cost goes from climbing
with n (the O(N·n²) signature) to flat:
N nodes |
dense µs/node |
sparse µs/node |
total speedup |
|---|---|---|---|
13 |
26.6 |
14.2 |
1.9× |
193 |
65.2 |
13.2 |
4.9× |
385 |
176.9 |
13.7 |
12.9× |
769 |
625.0 |
14.0 |
44.5× |
The harness that produced these numbers is scripts/detector_timing.py. The
upshot for the roadmap: the cost that looked like the case for a Rust port was a
representation choice fixable in Python, so the port is not warranted on current
evidence.
Takeaways#
On a shared corpus,
discoptand SUSPECT agree wherever both are sound, with zero contradictions — independent corroboration thatdiscopt’s detector is correct in the wild.discoptis stronger on cone primitives (norms, quadratic-over-affine, perspectives, SOC) that SUSPECT 2.1.3 misses.SUSPECT is stronger on bound-aware trig curvature — five tracked, pinned detector gaps that are clean follow-ups.
Where SUSPECT’s heuristics are unsound (monotonicity/bounds of trig),
discoptcorrectly abstains; its verdicts are validated by numeric sampling.
The comparison is enforced in CI by the three *_suspect_parity.py suites, with
the full write-up in scripts/suspect_oracle/README.md.