Head-to-head: discopt vs. SUSPECT

Head-to-head: `discopt` vs. SUSPECT#

discopt ships a sound structure detector — it labels expressions CONVEX / CONCAVE / AFFINE, classifies monotonicity, and computes interval bounds, each verdict backed by a proof (an interval Hessian, interval gradient, or interval value enclosure). The natural question is how it stacks up against the reference implementation in the field: SUSPECT Ceccon et al. [2020], the MINLP special-structure detector from Misener’s group, referenced directly in the project’s convexity roadmap (issue #38).

This notebook documents a direct, agree/disagree head-to-head of the two detectors over a single shared corpus of 40 curated instances (43 compared items), on the three verdict axes SUSPECT reports.

Axis	`discopt` API	What a positive verdict means
Convexity	`convexity.classify_expr`	A proof of curvature via the interval-Hessian αBB test [Adjiman et al., 1998]
Monotonicity	`monotonicity.classify_monotonicity`	A proof: the interval gradient lies in the nonneg/nonpos orthant on the box
Interval bounds	`convexity.interval_eval.evaluate_interval`	A sound forward natural-range enclosure

Methodology: one corpus, identical mathematics#

Both tools are driven from the same neutral expression AST (scripts/suspect_oracle/corpus.py), rendered into a discopt model and a Pyomo model by two thin renderers. So the comparison is genuinely on identical math, not two independently hand-written models that might silently differ.

SUSPECT itself cannot run in discopt’s environment — cog-suspect 2.1.3 is unmaintained and imports np.float (removed in NumPy ≥ 1.24) and Pyomo’s ReciprocalExpression (removed in Pyomo ≥ 6.2), neither compatible with discopt’s NumPy 2.x / Pyomo 6.10. Its verdicts are therefore recorded once, out of process in a pinned environment, into the committed golden file suspect_verdicts.json. This notebook (like the live parity tests) imports only the neutral corpus and the discopt renderer — never SUSPECT — and compares discopt’s live verdicts against that golden file.

A soundness asymmetry shapes what we assert. discopt’s verdicts are rigorous: convexity rests on a sound interval Hessian [Moore, 1966, Neumaier, 1990] built by forward-mode interval AD [Griewank and Walther, 2008] whose minimum eigenvalue is lower-bounded by an interval Gershgorin theorem [Gershgorin, 1931], giving a second-order proof of convexity [Boyd and Vandenberghe, 2004]. SUSPECT’s convexity axis is likewise sound, so there we assert mutual no-contradiction. But SUSPECT’s monotonicity and bounds are heuristic and occasionally unsound (its interval cosine ignores interior critical points, so it declares sin monotone — and mis-bounds it — on [-3, 3], where it is neither). So on those axes the tests rest on discopt’s own sampling-validated soundness plus no-contradiction, rather than treating SUSPECT as ground truth.

The convexity head-to-head, computed live#

The cell below loads the shared corpus and the committed SUSPECT golden verdicts, runs discopt’s classify_expr over every item, normalises both sides to ≤-form, and buckets each comparison into agree / discopt-stronger / suspect-stronger / contradiction — the exact logic the CI parity test enforces.

import json
import sys
import warnings
from collections import Counter
from pathlib import Path

# Locate the shared oracle directory (search upward from the working dir).
def _find_oracle() -> Path:
    for base in [Path.cwd(), *Path.cwd().parents]:
        cand = base / "scripts" / "suspect_oracle"
        if (cand / "corpus.py").exists():
            return cand
    raise FileNotFoundError("scripts/suspect_oracle not found")

oracle = _find_oracle()
sys.path.insert(0, str(oracle))
golden = json.loads((oracle / "suspect_verdicts.json").read_text())
print("SUSPECT golden file:", golden["_meta"])

SUSPECT golden file: {'n_errors': 0, 'n_instances': 40, 'pyomo': '6.1.2', 'tool': 'cog-suspect', 'tool_version': '2.1.3'}

from corpus import INSTANCES
from discopt._jax.convexity import Curvature, classify_expr
from discopt.modeling.core import ObjectiveSense
from render_discopt import build_discopt

_NEG = {
    Curvature.CONVEX: Curvature.CONCAVE,
    Curvature.CONCAVE: Curvature.CONVEX,
    Curvature.AFFINE: Curvature.AFFINE,
    Curvature.UNKNOWN: Curvature.UNKNOWN,
}
_TOK = {
    Curvature.CONVEX: "convex",
    Curvature.CONCAVE: "concave",
    Curvature.AFFINE: "affine",
    Curvature.UNKNOWN: "unknown",
}

def _norm(t):           # SUSPECT calls affine "linear"
    return "affine" if t == "linear" else t

def _compatible(a, b):  # affine is both convex and concave
    return a == b or ("affine" in (a, b) and "unknown" not in (a, b))

def _category(d, s):
    if _compatible(d, s):
        return "agree"
    if d != "unknown" and s == "unknown":
        return "discopt_stronger"
    if s != "unknown" and d == "unknown":
        return "suspect_stronger"
    return "contradiction"

verdicts = golden["verdicts"]
counts = Counter()
disagreements = []

with warnings.catch_warnings():
    warnings.simplefilter("ignore")  # wide-box interval overflow is an abstention signal
    for inst in INSTANCES:
        name = inst["name"]
        model, cnames = build_discopt(inst)
        g = verdicts[name]
        items = []
        if model._objective is not None and g.get("objective") is not None:
            curv = classify_expr(model._objective.expression, model)
            if model._objective.sense == ObjectiveSense.MAXIMIZE:
                curv = _NEG[curv]  # normalise "max f" to "min -f"
            items.append((f"{name}::objective", _TOK[curv], _norm(g["objective"]["convexity"])))
        for con, cn in zip(model._constraints, cnames):
            curv = classify_expr(con.body, model)
            if con.sense == ">=":
                curv = _NEG[curv]  # normalise to <=, as SUSPECT does
            items.append((f"{name}::{cn}", _TOK[curv], _norm(g["constraints"][cn]["convexity"])))
        for key, d, s in items:
            cat = _category(d, s)
            counts[cat] += 1
            if cat != "agree":
                disagreements.append((key, d, s, cat))

total = sum(counts.values())
print(f"compared {total} items across {len(INSTANCES)} instances\n")
for cat in ("agree", "discopt_stronger", "suspect_stronger", "contradiction"):
    print(f"  {cat:18s} {counts[cat]}")

compared 43 items across 40 instances

  agree              33
  discopt_stronger   5
  suspect_stronger   5
  contradiction      0

agree dominates, and — the soundness-critical line — there are zero contradictions: the two sound detectors never call the same body convex on one side and concave on the other. The disagreements are all one tool proving curvature the other abstains on:

print("Where the two detectors differ:\n")
for key, d, s in [(k, d, s) for k, d, s, c in disagreements if c == "discopt_stronger"]:
    print(f"  discopt-stronger  {key:38s}  discopt={d:8s} suspect={s}")
print()
for key, d, s in [(k, d, s) for k, d, s, c in disagreements if c == "suspect_stronger"]:
    print(f"  suspect-stronger  {key:38s}  discopt={d:8s} suspect={s}")

Where the two detectors differ:

  discopt-stronger  euclidean_norm::objective               discopt=convex   suspect=unknown
  discopt-stronger  quad_over_affine::objective             discopt=convex   suspect=unknown
  discopt-stronger  exp_perspective::objective              discopt=convex   suspect=unknown
  discopt-stronger  quad_over_affine_epigraph::qoa          discopt=convex   suspect=unknown
  discopt-stronger  norm_le::soc                            discopt=convex   suspect=unknown

  suspect-stronger  sin_convex_branch::objective            discopt=unknown  suspect=convex
  suspect-stronger  cos_concave_branch::objective           discopt=unknown  suspect=convex
  suspect-stronger  asin_convex_branch::objective           discopt=unknown  suspect=convex
  suspect-stronger  acos_concave_branch::objective          discopt=unknown  suspect=convex
  suspect-stronger  atan_concave_branch::objective          discopt=unknown  suspect=convex

Reading the disagreements#

The 5 discopt-stronger items are cone primitives SUSPECT 2.1.3 does not recognise, proven by discopt’s special-pattern recognizers:

euclidean_norm — sqrt of a PSD quadratic (the 2-norm),
quad_over_affine — x²/y with y > 0,
exp_perspective — y·exp(x/y), the perspective of exp,
quad_over_affine_epigraph — an nlp_cvx_108-style fractional epigraph,
norm_le — a second-order-cone constraint.

The 5 SUSPECT-stronger items are SUSPECT’s bound-aware trig / inverse-trig rules — on a sign-restricted branch it proves sin convex on [π, 2π], cos concave on [-π/2, π/2], and asin/acos/atan curvature on (0,1) / x>0, which discopt’s detector currently leaves UNKNOWN. These are tracked detector gaps: the parity test pins them in KNOWN_SUSPECT_STRONGER, so a newly appearing gap fails loudly rather than being silently tolerated, and closing one (adding the bound-aware trig curvature rule) is a clean follow-up.

Monotonicity and interval bounds#

The other two axes are compared the same way (see python/tests/test_monotonicity_suspect_parity.py and test_fbbt_bounds_suspect_parity.py), but because SUSPECT is heuristic there, every discopt verdict is also validated against a dense numeric sampling of the real body — the comparison rests on discopt’s own soundness, not on trusting SUSPECT.

Axis	agree	discopt-stronger	suspect-stronger	contradiction
Convexity	33	5	5	0
Monotonicity	40	0	3	0
Bounds	— sound on all 43 items, 0 disjoint enclosures —

On monotonicity the 3 SUSPECT-stronger items are instructive: sqrt_concave and cubic are sound discopt conservatism (a domain-edge / sub-ULP abstention), while sine is a case where SUSPECT claims monotonicity that is false — and discopt correctly abstains. For bounds, discopt’s forward enclosure is tighter on quadratics / powers / tan (which SUSPECT leaves unbounded) and looser on constraint bodies (where SUSPECT folds in the RHS via full FBBT); the two enclosures always overlap.

Performance: the sparse interval-Hessian#

The convexity certificate’s cost is dominated by the interval Hessian. A naive implementation materialises a dense n×n interval matrix at every node of the expression DAG — O(N·n²) for an N-term separable expression, even though such a Hessian has only O(N) nonzeros. discopt instead carries the gradient and Hessian as sparse maps of scalar intervals (a missing key is an exact structural zero) and materialises the single n×n matrix once, at the root, only when the Gershgorin eigenvalue bound [Gershgorin, 1931] needs it. This turns certify_convex from O(N·n²) to O(N) while preserving the proof (outward rounding everywhere, the αBB interval-Hessian foundation of [Adjiman et al., 1998] intact).

On a separable convex sum over N variables, per-node cost goes from climbing with n (the O(N·n²) signature) to flat:

N nodes	dense µs/node	sparse µs/node	total speedup
13	26.6	14.2	1.9×
193	65.2	13.2	4.9×
385	176.9	13.7	12.9×
769	625.0	14.0	44.5×

The harness that produced these numbers is scripts/detector_timing.py. The upshot for the roadmap: the cost that looked like the case for a Rust port was a representation choice fixable in Python, so the port is not warranted on current evidence.

Takeaways#

On a shared corpus, discopt and SUSPECT agree wherever both are sound, with zero contradictions — independent corroboration that discopt’s detector is correct in the wild.
discopt is stronger on cone primitives (norms, quadratic-over-affine, perspectives, SOC) that SUSPECT 2.1.3 misses.
SUSPECT is stronger on bound-aware trig curvature — five tracked, pinned detector gaps that are clean follow-ups.
Where SUSPECT’s heuristics are unsound (monotonicity/bounds of trig), discopt correctly abstains; its verdicts are validated by numeric sampling.

The comparison is enforced in CI by the three *_suspect_parity.py suites, with the full write-up in scripts/suspect_oracle/README.md.