discopt Advanced Features#
This notebook showcases the advanced capabilities that distinguish discopt from traditional MINLP solvers:
Differentiable optimization — Compute sensitivities of optimal solutions to parameters
GPU acceleration — Batch relaxation evaluation with JAX vmap
Piecewise McCormick relaxations — Tighter convex underestimators for nonlinear terms
GNN branching — Learned variable selection in Branch & Bound
Cutting planes — RLT and outer approximation cuts
Multi-start heuristics — Feasibility pump and diverse starting points
CUTEst interface — Loading standard NLP test problems
Prerequisites: familiar with the basics from notebooks/quickstart.ipynb.
import os
os.environ["JAX_PLATFORMS"] = "cpu"
os.environ["JAX_ENABLE_X64"] = "1"
import discopt.modeling as dm
import numpy as np
print("discopt loaded successfully")
discopt loaded successfully
1. Differentiable Optimization#
discopt can differentiate the optimal objective value with respect to model parameters. This is useful for sensitivity analysis, bilevel optimization, and end-to-end learning.
Level 1: Envelope Theorem#
For a parametric NLP \(\min_x f(x; p)\) subject to \(g(x; p) \leq 0\), the sensitivity of the optimal value is given by the envelope theorem [Fiacco, 1983]:
where \(\lambda^*\) are the optimal dual variables (Lagrange multipliers) returned by Ipopt. No additional linear system solve is needed.
Example: how does optimal cost change with the price parameter?#
from discopt._jax.differentiable import differentiable_solve
m = dm.Model("parametric_cost")
x = m.continuous("x", lb=0, ub=5)
y = m.continuous("y", lb=0, ub=5)
price = m.parameter("price", value=3.0)
m.minimize(price * x + y**2)
m.subject_to(x + y >= 2)
result = differentiable_solve(m)
print(f"Status: {result.status}")
print(f"Objective: {result.objective:.6f}")
print(f"x* = {result.x['x'].item():.6f}")
print(f"y* = {result.x['y'].item():.6f}")
print(f"d(obj*)/d(price) = {result.gradient(price):.6f}")
print()
print("Interpretation: increasing price by 1 changes optimal cost by ~d(obj*)/d(price)")
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
Status: optimal
Objective: 3.750000
x* = 0.500000
y* = 1.500000
d(obj*)/d(price) = 0.500000
Interpretation: increasing price by 1 changes optimal cost by ~d(obj*)/d(price)
Tracing sensitivity across a range of parameter values#
Let’s sweep the price parameter and see how the optimal objective changes.
prices = np.linspace(0.5, 5.0, 10)
objectives = []
sensitivities = []
for p_val in prices:
price.value = np.float64(p_val)
res = differentiable_solve(m)
objectives.append(res.objective)
sensitivities.append(res.gradient(price))
print(f"{'Price':>8s} {'Obj*':>10s} {'dObj*/dPrice':>14s}")
print("-" * 34)
for p, o, s in zip(prices, objectives, sensitivities):
print(f"{p:8.2f} {o:10.4f} {s:14.6f}")
Price Obj* dObj*/dPrice
----------------------------------
0.50 0.9375 1.750000
1.00 1.7500 1.500000
1.50 2.4375 1.250000
2.00 3.0000 1.000000
2.50 3.4375 0.750000
3.00 3.7500 0.500000
3.50 3.9375 0.250000
4.00 4.0000 0.000047
4.50 4.0000 -0.000000
5.00 4.0000 -0.000000
Level 3: Implicit Differentiation via KKT System#
For more accurate sensitivities (including dx*/dp, not just dobj*/dp), use L3 implicit differentiation through the KKT conditions [Agrawal et al., 2019, Amos and Kolter, 2017].
The L3 result also caches the KKT factorization for sIPOPT-style sensitivity queries [Pirnay et al., 2012] — see the dedicated Sensitivity Analysis notebook for approximate_resolve, dual_sensitivity, reduced_hessian, and sensitivity methods.
from discopt._jax.differentiable import differentiable_solve_l3
# Reset parameter
price.value = np.float64(3.0)
result_l3 = differentiable_solve_l3(m)
print(f"Status: {result_l3.status}")
print(f"Objective: {result_l3.objective:.6f}")
print()
print("L1 gradient (envelope theorem):")
print(f" d(obj*)/d(price) = {result_l3.gradient(price):.6f}")
print()
print("L3 gradient (implicit differentiation):")
print(f" d(obj*)/d(price) = {result_l3.implicit_gradient(price):.6f}")
print()
sm = result_l3.sensitivity_matrix()
if sm is not None:
print(f"Full dx*/dp matrix shape: {sm.shape}")
print(f" dx*/d(price) = {sm[:, 0]}")
Status: optimal
Objective: 3.750000
L1 gradient (envelope theorem):
d(obj*)/d(price) = 0.500000
L3 gradient (implicit differentiation):
d(obj*)/d(price) = 0.500000
Full dx*/dp matrix shape: (2, 1)
dx*/d(price) = [-0.5 0.5]
JAX-native differentiable solve (custom_jvp)#
For embedding optimization inside a larger JAX computation, use _make_jax_differentiable_solve which returns a function compatible with jax.grad.
import jax
from discopt._jax.differentiable import _flatten_params, _make_jax_differentiable_solve
# Build a differentiable solve function
price.value = np.float64(3.0)
solve_fn = _make_jax_differentiable_solve(m)
# Get the flat parameter vector
p_flat = _flatten_params(m)
print(f"Parameter vector: {p_flat}")
# Evaluate and differentiate with jax.grad
obj_star = solve_fn(p_flat)
grad_fn = jax.grad(solve_fn)
grad_val = grad_fn(p_flat)
print(f"Optimal objective via JAX: {float(obj_star):.6f}")
print(f"Gradient via jax.grad: {float(grad_val[0]):.6f}")
Parameter vector: [3.]
Optimal objective via JAX: 3.750000
Gradient via jax.grad: 0.500001
2. GPU Acceleration#
discopt’s JAX-based architecture enables GPU acceleration for:
Automatic differentiation (gradient, Hessian, Jacobian computation)
McCormick relaxation evaluation via
jax.vmapBatch NLP solving with the pure-JAX IPM [Nocedal and Wright, 2006]
To use GPU, set JAX_PLATFORMS=gpu at the top of your script (before importing JAX). On macOS, only CPU is available.
os.environ["JAX_PLATFORMS"] = "gpu" # Use NVIDIA GPU
The nlp_solver="ipm" backend is a pure-JAX IPM that supports jax.vmap for solving multiple NLP relaxations simultaneously.
# Demonstrate batch IPM: solve the same problem with different bounds
# This is what happens internally during Branch & Bound
import time
m = dm.Model("batch_demo")
x = m.continuous("x", lb=0, ub=5)
y = m.continuous("y", lb=0, ub=5)
z = m.binary("z")
m.minimize(x**2 + y**2 + z)
m.subject_to(x + y >= 1)
# With nlp_solver="ipm" and batch_size > 1, the solver uses vmap'd IPM
t0 = time.perf_counter()
result_ipm = m.solve(nlp_solver="ipm", batch_size=8)
t_ipm = time.perf_counter() - t0
print("IPM backend:")
print(f" Status: {result_ipm.status}")
print(f" Objective: {result_ipm.objective}")
print(f" Time: {t_ipm:.3f}s")
print(f" Nodes: {result_ipm.node_count}")
IPM backend:
Status: optimal
Objective: 0.500000000011067
Time: 0.416s
Nodes: 1
3. Piecewise McCormick Relaxations#
Standard McCormick relaxations [McCormick, 1976] can be loose for highly nonlinear functions. Piecewise McCormick [Bergamini et al., 2005, Castro, 2015] divides the variable domain into \(k\) partitions and computes tighter relaxations on each piece.
Use the partitions parameter in solve():
partitions=0(default): standard McCormick relaxationpartitions=kfor \(k > 0\): \(k\) partitions per variable for tighter bounds
Tighter relaxations mean smaller gaps and fewer Branch & Bound nodes, at the cost of more work per node.
# Compare standard vs. piecewise McCormick on a nonlinear MINLP
m = dm.Model("mccormick_demo")
x = m.continuous("x", lb=0.1, ub=5)
y = m.continuous("y", lb=0.1, ub=5)
z = m.binary("z")
m.minimize(dm.exp(x) * y + z)
m.subject_to(x + y >= 2)
m.subject_to(x * y <= 5 * z + 1)
# Standard relaxation
result_std = m.solve(partitions=0, max_nodes=500)
print("Standard McCormick:")
print(f" Status: {result_std.status}, Obj: {result_std.objective}, Nodes: {result_std.node_count}")
# Piecewise McCormick with 4 partitions
result_pw = m.solve(partitions=4, max_nodes=500)
print("\nPiecewise McCormick (k=4):")
print(f" Status: {result_pw.status}, Obj: {result_pw.objective}, Nodes: {result_pw.node_count}")
Standard McCormick:
Status: optimal, Obj: 0.6685893748848494, Nodes: 1
Piecewise McCormick (k=4):
Status: optimal, Obj: 0.6685893748848494, Nodes: 1
4. GNN Branching#
discopt includes a Graph Neural Network (GNN) for learned variable selection during Branch & Bound, following the architecture of Gasse et al. [2019].
The GNN operates on a bipartite variable-constraint graph:
Variable nodes: features include value, bounds, fractionality
Constraint nodes: features include slack, violation
Edges: encode which variables appear in which constraints
After 2 rounds of message passing, the GNN scores each integer variable. The variable with the highest score is selected for branching.
Use branching_policy="gnn" to enable it. Currently this runs inference with random weights (no pre-trained model), serving as a demonstration of the architecture. With trained weights it can significantly outperform most-fractional branching.
m = dm.Model("gnn_branching_demo")
x1 = m.continuous("x1", lb=0, ub=5)
x2 = m.continuous("x2", lb=0, ub=5)
y1 = m.binary("y1")
y2 = m.binary("y2")
m.minimize(x1**2 + x2**2 + 2 * y1 + 3 * y2)
m.subject_to(x1 + x2 >= 1)
m.subject_to(x1 <= 4 * y1)
m.subject_to(x2 <= 4 * y2)
# Standard branching
result_frac = m.solve(branching_policy="fractional", max_nodes=200)
print("Fractional branching:")
print(
f" Status: {result_frac.status}, Obj: {result_frac.objective}, Nodes: {result_frac.node_count}"
)
# GNN branching (inference with random weights for demonstration)
result_gnn = m.solve(branching_policy="gnn", max_nodes=200)
print("\nGNN branching:")
print(f" Status: {result_gnn.status}, Obj: {result_gnn.objective}, Nodes: {result_gnn.node_count}")
Fractional branching:
Status: optimal, Obj: 3.000000000065412, Nodes: 7
GNN branching:
Status: optimal, Obj: 3.000000000065412, Nodes: 7
5. Cutting Planes#
discopt supports two types of cutting planes to tighten relaxations:
RLT (Reformulation-Linearization Technique) [Sherali and Adams, 1990]: McCormick linearization envelopes for bilinear terms \(x_i \cdot x_j\)
OA (Outer Approximation) [Duran and Grossmann, 1986, Fletcher and Leyffer, 1994]: gradient-based tangent hyperplanes at NLP relaxation solutions
Enable with cutting_planes=True. Cuts are accumulated across B&B iterations.
m = dm.Model("cutting_planes_demo")
x1 = m.continuous("x1", lb=0, ub=5)
x2 = m.continuous("x2", lb=0, ub=5)
y = m.binary("y")
m.minimize(x1**2 + x2**2 + y)
m.subject_to(x1 * x2 >= 1) # bilinear term -> RLT cuts apply
m.subject_to(x1 + x2 <= 5 * y)
# Without cutting planes
r1 = m.solve(cutting_planes=False, max_nodes=200)
print("Without cutting planes:")
print(f" Status: {r1.status}, Obj: {r1.objective}, Nodes: {r1.node_count}")
# With cutting planes
r2 = m.solve(cutting_planes=True, max_nodes=200)
print("\nWith cutting planes (RLT + OA):")
print(f" Status: {r2.status}, Obj: {r2.objective}, Nodes: {r2.node_count}")
Without cutting planes:
Status: optimal, Obj: 2.9999896593231137, Nodes: 13
With cutting planes (RLT + OA):
Status: optimal, Obj: 2.9999896593231137, Nodes: 13
6. Multi-Start Heuristics#
For difficult MINLP problems, finding a good initial feasible solution is critical. discopt provides two heuristic strategies:
Multi-start NLP#
Launches NLP solves from diverse starting points generated by stratified random sampling. Tracks the best feasible (and integer-feasible) solution found.
Feasibility Pump#
The feasibility pump [Bonami et al., 2008, Fischetti et al., 2005] iteratively rounds and re-solves to find integer-feasible solutions:
Round integer variables to the nearest integer
Fix integer variables and re-solve the NLP for continuous variables
If feasible, return. Otherwise perturb and retry.
from discopt._jax.primal_heuristics import MultiStartNLP
m = dm.Model("multistart_demo")
x = m.continuous("x", lb=-5, ub=5)
y = m.continuous("y", lb=-5, ub=5)
# Rosenbrock: notoriously hard for single-start solvers due to banana valley
m.minimize((1 - x) ** 2 + 100 * (y - x**2) ** 2)
ms = MultiStartNLP(m, n_starts=16, seed=42)
ms_result = ms.solve()
print("Multi-start results:")
print(f" Starts launched: {ms_result.n_starts}")
print(f" Feasible solutions: {ms_result.n_feasible}")
print(f" Best objective: {ms_result.best_objective:.8f} (expected: 0.0)")
if ms_result.all_objectives:
print(f" Worst among feasible: {max(ms_result.all_objectives):.4f}")
print(f" Best among feasible: {min(ms_result.all_objectives):.8f}")
Multi-start results:
Starts launched: 16
Feasible solutions: 16
Best objective: 0.00000000 (expected: 0.0)
Worst among feasible: 0.0000
Best among feasible: 0.00000000
from discopt._jax.nlp_evaluator import NLPEvaluator
from discopt._jax.primal_heuristics import feasibility_pump
from discopt.solvers.nlp_ipopt import solve_nlp
# Build an MINLP and get the NLP relaxation solution
m2 = dm.Model("pump_demo")
x = m2.continuous("x", lb=0, ub=5)
n = m2.integer("n", lb=0, ub=3)
m2.minimize((x - 1.7) ** 2 + (n - 2) ** 2)
m2.subject_to(x + n <= 4)
# Solve the NLP relaxation (ignoring integrality)
evaluator = NLPEvaluator(m2)
lb, ub = evaluator.variable_bounds
x0 = 0.5 * (np.clip(lb, -10, 10) + np.clip(ub, -10, 10))
nlp_result = solve_nlp(evaluator, x0, options={"print_level": 0})
print(f"NLP relaxation: x={nlp_result.x[0]:.4f}, n={nlp_result.x[1]:.4f} (fractional!)")
# Run feasibility pump to get an integer-feasible solution
x_feas = feasibility_pump(m2, nlp_result.x)
if x_feas is not None:
print(f"Feasibility pump: x={x_feas[0]:.4f}, n={x_feas[1]:.4f} (integer-feasible)")
else:
print("Feasibility pump did not find an integer-feasible solution")
NLP relaxation: x=1.7000, n=2.0000 (fractional!)
Feasibility pump: x=1.7000, n=2.0000 (integer-feasible)
7. CUTEst Interface#
discopt can load and solve problems from the CUTEst [Gould et al., 2015] collection, the standard test set for nonlinear optimization with over 1500 problems.
Requirements: pycutest is an optional dependency.
pip install discopt[cutest]
You also need gfortran and the CUTEst/SIFDecode libraries. See the pycutest documentation for setup.
API overview#
from discopt.interfaces.cutest import CUTEstProblem, list_cutest_problems
# List available problems matching criteria
names = list_cutest_problems(constraints="U", max_n=10) # unconstrained, <= 10 vars
# Load a specific problem
prob = CUTEstProblem("ROSENBR")
print(prob.info) # metadata: n, m, classification
print(prob.x0) # starting point
print(prob.bl, prob.bu) # variable bounds
# Create an evaluator for use with discopt's solvers
evaluator = prob.to_evaluator()
obj = evaluator.evaluate_objective(prob.x0)
grad = evaluator.evaluate_gradient(prob.x0)
The NLPEvaluatorFromCUTEst provides the same interface as the JAX-based NLPEvaluator, so CUTEst problems can be solved with Ipopt or the pure-JAX IPM.
# Demonstrate the CUTEst interface (only runs if pycutest + CUTEst are installed)
try:
from discopt.interfaces.cutest import CUTEstProblem
prob = CUTEstProblem("ROSENBR")
print(f"Problem: {prob.name}")
print(f" Variables: {prob.n}")
print(f" Constraints: {prob.m}")
print(f" x0 = {prob.x0}")
print(f" Classification: {prob.info.classification}")
evaluator = prob.to_evaluator()
obj = evaluator.evaluate_objective(prob.x0)
grad = evaluator.evaluate_gradient(prob.x0)
print(f" f(x0) = {obj:.6f}")
print(f" ||grad|| = {np.linalg.norm(grad):.6f}")
except ImportError:
print("pycutest is not installed. Install with: pip install discopt[cutest]")
print("Skipping CUTEst demo.")
except RuntimeError as e:
print(f"CUTEst libraries not found: {e}")
print("Install CUTEst and set the CUTEST environment variable.")
print("See: https://jfowkes.github.io/pycutest/")
pycutest is not installed. Install with: pip install discopt[cutest]
Skipping CUTEst demo.
Summary of Advanced Solver Options#
All advanced features are controlled through m.solve() parameters or dedicated API functions:
Feature |
How to Use |
|---|---|
Differentiable solving (L1) |
|
Differentiable solving (L3) |
|
sIPOPT sensitivity queries |
|
JAX-native differentiation |
|
GPU acceleration |
|
Pure-JAX IPM |
|
Batch NLP solving |
|
Piecewise McCormick |
|
GNN branching |
|
Cutting planes |
|
Multi-start |
|
Feasibility pump |
|
CUTEst problems |
|
See the Sensitivity Analysis notebook for detailed examples of the sIPOPT-style features [Pirnay et al., 2012].