discopt Advanced Features

discopt Advanced Features#

This notebook showcases the advanced capabilities that distinguish discopt from traditional MINLP solvers:

Differentiable optimization — Compute sensitivities of optimal solutions to parameters
GPU acceleration — Batch relaxation evaluation with JAX vmap
Piecewise McCormick relaxations — Tighter convex underestimators for nonlinear terms
GNN branching — Learned variable selection in Branch & Bound
Cutting planes — RLT and outer approximation cuts
Multi-start heuristics — Feasibility pump and diverse starting points
CUTEst interface — Loading standard NLP test problems

Prerequisites: familiar with the basics from notebooks/quickstart.ipynb.

import os

os.environ["JAX_PLATFORMS"] = "cpu"
os.environ["JAX_ENABLE_X64"] = "1"

import discopt.modeling as dm
import numpy as np

print("discopt loaded successfully")

discopt loaded successfully

1. Differentiable Optimization#

discopt can differentiate the optimal objective value with respect to model parameters. This is useful for sensitivity analysis, bilevel optimization, and end-to-end learning.

Level 1: Envelope Theorem#

For a parametric NLP \(\min_x f(x; p)\) subject to \(g(x; p) \leq 0\), the sensitivity of the optimal value is given by the envelope theorem [Fiacco, 1983]:

\[\frac{d f^*}{d p} = \frac{\partial L}{\partial p}\bigg|_{x^*, \lambda^*} = \frac{\partial f}{\partial p}\bigg|_{x^*} + \lambda^{*T} \frac{\partial g}{\partial p}\bigg|_{x^*}\]

where \(\lambda^*\) are the optimal dual variables (Lagrange multipliers) returned by Ipopt. No additional linear system solve is needed.

Example: how does optimal cost change with the price parameter?#

\[\min_{x, y} \; p \cdot x + y^2 \quad \text{s.t.} \quad x + y \geq 2, \; x, y \in [0, 5]\]

from discopt._jax.differentiable import differentiable_solve

m = dm.Model("parametric_cost")
x = m.continuous("x", lb=0, ub=5)
y = m.continuous("y", lb=0, ub=5)
price = m.parameter("price", value=3.0)

m.minimize(price * x + y**2)
m.subject_to(x + y >= 2)

result = differentiable_solve(m)
print(f"Status:    {result.status}")
print(f"Objective: {result.objective:.6f}")
print(f"x* = {result.x['x'].item():.6f}")
print(f"y* = {result.x['y'].item():.6f}")
print(f"d(obj*)/d(price) = {result.gradient(price):.6f}")
print()
print("Interpretation: increasing price by 1 changes optimal cost by ~d(obj*)/d(price)")

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

Status:    optimal
Objective: 3.750000
x* = 0.500000
y* = 1.500000
d(obj*)/d(price) = 0.500000

Interpretation: increasing price by 1 changes optimal cost by ~d(obj*)/d(price)

Tracing sensitivity across a range of parameter values#

Let’s sweep the price parameter and see how the optimal objective changes.

prices = np.linspace(0.5, 5.0, 10)
objectives = []
sensitivities = []

for p_val in prices:
    price.value = np.float64(p_val)
    res = differentiable_solve(m)
    objectives.append(res.objective)
    sensitivities.append(res.gradient(price))

print(f"{'Price':>8s} {'Obj*':>10s} {'dObj*/dPrice':>14s}")
print("-" * 34)
for p, o, s in zip(prices, objectives, sensitivities):
    print(f"{p:8.2f} {o:10.4f} {s:14.6f}")

   Price       Obj*   dObj*/dPrice
----------------------------------
50     0.9375       1.750000
00     1.7500       1.500000
50     2.4375       1.250000
00     3.0000       1.000000
50     3.4375       0.750000
00     3.7500       0.500000
50     3.9375       0.250000
00     4.0000       0.000047
50     4.0000      -0.000000
00     4.0000      -0.000000

Level 3: Implicit Differentiation via KKT System#

For more accurate sensitivities (including dx*/dp, not just dobj*/dp), use L3 implicit differentiation through the KKT conditions [Agrawal et al., 2019, Amos and Kolter, 2017].

\[\begin{split}\begin{bmatrix} H_{xx} & J_a^T \\ J_a & 0 \end{bmatrix} \begin{bmatrix} dx/dp \\ d\lambda/dp \end{bmatrix} = \begin{bmatrix} -H_{xp} \\ -\partial g_a/\partial p \end{bmatrix}\end{split}\]

The L3 result also caches the KKT factorization for sIPOPT-style sensitivity queries [Pirnay et al., 2012] — see the dedicated Sensitivity Analysis notebook for approximate_resolve, dual_sensitivity, reduced_hessian, and sensitivity methods.

from discopt._jax.differentiable import differentiable_solve_l3

# Reset parameter
price.value = np.float64(3.0)

result_l3 = differentiable_solve_l3(m)
print(f"Status:         {result_l3.status}")
print(f"Objective:      {result_l3.objective:.6f}")
print()
print("L1 gradient (envelope theorem):")
print(f"  d(obj*)/d(price) = {result_l3.gradient(price):.6f}")
print()
print("L3 gradient (implicit differentiation):")
print(f"  d(obj*)/d(price) = {result_l3.implicit_gradient(price):.6f}")
print()
sm = result_l3.sensitivity_matrix()
if sm is not None:
    print(f"Full dx*/dp matrix shape: {sm.shape}")
    print(f"  dx*/d(price) = {sm[:, 0]}")

Status:         optimal
Objective:      3.750000

L1 gradient (envelope theorem):
  d(obj*)/d(price) = 0.500000

L3 gradient (implicit differentiation):
  d(obj*)/d(price) = 0.500000

Full dx*/dp matrix shape: (2, 1)
  dx*/d(price) = [-0.5  0.5]

JAX-native differentiable solve (custom_jvp)#

For embedding optimization inside a larger JAX computation, use _make_jax_differentiable_solve which returns a function compatible with jax.grad.

import jax
from discopt._jax.differentiable import _flatten_params, _make_jax_differentiable_solve

# Build a differentiable solve function
price.value = np.float64(3.0)
solve_fn = _make_jax_differentiable_solve(m)

# Get the flat parameter vector
p_flat = _flatten_params(m)
print(f"Parameter vector: {p_flat}")

# Evaluate and differentiate with jax.grad
obj_star = solve_fn(p_flat)
grad_fn = jax.grad(solve_fn)
grad_val = grad_fn(p_flat)

print(f"Optimal objective via JAX: {float(obj_star):.6f}")
print(f"Gradient via jax.grad:     {float(grad_val[0]):.6f}")

Parameter vector: [3.]

Optimal objective via JAX: 3.750000
Gradient via jax.grad:     0.500001

2. GPU Acceleration#

discopt’s JAX-based architecture enables GPU acceleration for:

Automatic differentiation (gradient, Hessian, Jacobian computation)
McCormick relaxation evaluation via jax.vmap
Batch NLP solving with the pure-JAX IPM [Nocedal and Wright, 2006]

To use GPU, set JAX_PLATFORMS=gpu at the top of your script (before importing JAX). On macOS, only CPU is available.

os.environ["JAX_PLATFORMS"] = "gpu"  # Use NVIDIA GPU

The nlp_solver="ipm" backend is a pure-JAX IPM that supports jax.vmap for solving multiple NLP relaxations simultaneously.

# Demonstrate batch IPM: solve the same problem with different bounds
# This is what happens internally during Branch & Bound
import time

m = dm.Model("batch_demo")
x = m.continuous("x", lb=0, ub=5)
y = m.continuous("y", lb=0, ub=5)
z = m.binary("z")

m.minimize(x**2 + y**2 + z)
m.subject_to(x + y >= 1)

# With nlp_solver="ipm" and batch_size > 1, the solver uses vmap'd IPM
t0 = time.perf_counter()
result_ipm = m.solve(nlp_solver="ipm", batch_size=8)
t_ipm = time.perf_counter() - t0

print("IPM backend:")
print(f"  Status:    {result_ipm.status}")
print(f"  Objective: {result_ipm.objective}")
print(f"  Time:      {t_ipm:.3f}s")
print(f"  Nodes:     {result_ipm.node_count}")

IPM backend:
  Status:    optimal
  Objective: 0.500000000011067
  Time:      0.416s
  Nodes:     1

3. Piecewise McCormick Relaxations#

Standard McCormick relaxations [McCormick, 1976] can be loose for highly nonlinear functions. Piecewise McCormick [Bergamini et al., 2005, Castro, 2015] divides the variable domain into \(k\) partitions and computes tighter relaxations on each piece.

Use the partitions parameter in solve():

partitions=0 (default): standard McCormick relaxation
partitions=k for \(k > 0\): \(k\) partitions per variable for tighter bounds

Tighter relaxations mean smaller gaps and fewer Branch & Bound nodes, at the cost of more work per node.

# Compare standard vs. piecewise McCormick on a nonlinear MINLP
m = dm.Model("mccormick_demo")
x = m.continuous("x", lb=0.1, ub=5)
y = m.continuous("y", lb=0.1, ub=5)
z = m.binary("z")

m.minimize(dm.exp(x) * y + z)
m.subject_to(x + y >= 2)
m.subject_to(x * y <= 5 * z + 1)

# Standard relaxation
result_std = m.solve(partitions=0, max_nodes=500)
print("Standard McCormick:")
print(f"  Status: {result_std.status}, Obj: {result_std.objective}, Nodes: {result_std.node_count}")

# Piecewise McCormick with 4 partitions
result_pw = m.solve(partitions=4, max_nodes=500)
print("\nPiecewise McCormick (k=4):")
print(f"  Status: {result_pw.status}, Obj: {result_pw.objective}, Nodes: {result_pw.node_count}")

Standard McCormick:
  Status: optimal, Obj: 0.6685893748848494, Nodes: 1

Piecewise McCormick (k=4):
  Status: optimal, Obj: 0.6685893748848494, Nodes: 1

4. GNN Branching#

discopt includes a Graph Neural Network (GNN) for learned variable selection during Branch & Bound, following the architecture of Gasse et al. [2019].

The GNN operates on a bipartite variable-constraint graph:

Variable nodes: features include value, bounds, fractionality
Constraint nodes: features include slack, violation
Edges: encode which variables appear in which constraints

After 2 rounds of message passing, the GNN scores each integer variable. The variable with the highest score is selected for branching.

Use branching_policy="gnn" to enable it. Currently this runs inference with random weights (no pre-trained model), serving as a demonstration of the architecture. With trained weights it can significantly outperform most-fractional branching.

m = dm.Model("gnn_branching_demo")
x1 = m.continuous("x1", lb=0, ub=5)
x2 = m.continuous("x2", lb=0, ub=5)
y1 = m.binary("y1")
y2 = m.binary("y2")

m.minimize(x1**2 + x2**2 + 2 * y1 + 3 * y2)
m.subject_to(x1 + x2 >= 1)
m.subject_to(x1 <= 4 * y1)
m.subject_to(x2 <= 4 * y2)

# Standard branching
result_frac = m.solve(branching_policy="fractional", max_nodes=200)
print("Fractional branching:")
print(
    f"  Status: {result_frac.status}, Obj: {result_frac.objective}, Nodes: {result_frac.node_count}"
)

# GNN branching (inference with random weights for demonstration)
result_gnn = m.solve(branching_policy="gnn", max_nodes=200)
print("\nGNN branching:")
print(f"  Status: {result_gnn.status}, Obj: {result_gnn.objective}, Nodes: {result_gnn.node_count}")

Fractional branching:
  Status: optimal, Obj: 3.000000000065412, Nodes: 7

GNN branching:
  Status: optimal, Obj: 3.000000000065412, Nodes: 7

5. Cutting Planes#

discopt supports two types of cutting planes to tighten relaxations:

RLT (Reformulation-Linearization Technique) [Sherali and Adams, 1990]: McCormick linearization envelopes for bilinear terms \(x_i \cdot x_j\)
OA (Outer Approximation) [Duran and Grossmann, 1986, Fletcher and Leyffer, 1994]: gradient-based tangent hyperplanes at NLP relaxation solutions

Enable with cutting_planes=True. Cuts are accumulated across B&B iterations.

m = dm.Model("cutting_planes_demo")
x1 = m.continuous("x1", lb=0, ub=5)
x2 = m.continuous("x2", lb=0, ub=5)
y = m.binary("y")

m.minimize(x1**2 + x2**2 + y)
m.subject_to(x1 * x2 >= 1)  # bilinear term -> RLT cuts apply
m.subject_to(x1 + x2 <= 5 * y)

# Without cutting planes
r1 = m.solve(cutting_planes=False, max_nodes=200)
print("Without cutting planes:")
print(f"  Status: {r1.status}, Obj: {r1.objective}, Nodes: {r1.node_count}")

# With cutting planes
r2 = m.solve(cutting_planes=True, max_nodes=200)
print("\nWith cutting planes (RLT + OA):")
print(f"  Status: {r2.status}, Obj: {r2.objective}, Nodes: {r2.node_count}")

Without cutting planes:
  Status: optimal, Obj: 2.9999896593231137, Nodes: 13

With cutting planes (RLT + OA):
  Status: optimal, Obj: 2.9999896593231137, Nodes: 13

6. Multi-Start Heuristics#

For difficult MINLP problems, finding a good initial feasible solution is critical. discopt provides two heuristic strategies:

Multi-start NLP#

Launches NLP solves from diverse starting points generated by stratified random sampling. Tracks the best feasible (and integer-feasible) solution found.

Feasibility Pump#

The feasibility pump [Bonami et al., 2008, Fischetti et al., 2005] iteratively rounds and re-solves to find integer-feasible solutions:

Round integer variables to the nearest integer
Fix integer variables and re-solve the NLP for continuous variables
If feasible, return. Otherwise perturb and retry.

from discopt._jax.primal_heuristics import MultiStartNLP

m = dm.Model("multistart_demo")
x = m.continuous("x", lb=-5, ub=5)
y = m.continuous("y", lb=-5, ub=5)

# Rosenbrock: notoriously hard for single-start solvers due to banana valley
m.minimize((1 - x) ** 2 + 100 * (y - x**2) ** 2)

ms = MultiStartNLP(m, n_starts=16, seed=42)
ms_result = ms.solve()

print("Multi-start results:")
print(f"  Starts launched:     {ms_result.n_starts}")
print(f"  Feasible solutions:  {ms_result.n_feasible}")
print(f"  Best objective:      {ms_result.best_objective:.8f}  (expected: 0.0)")
if ms_result.all_objectives:
    print(f"  Worst among feasible: {max(ms_result.all_objectives):.4f}")
    print(f"  Best among feasible:  {min(ms_result.all_objectives):.8f}")

Multi-start results:
  Starts launched:     16
  Feasible solutions:  16
  Best objective:      0.00000000  (expected: 0.0)
  Worst among feasible: 0.0000
  Best among feasible:  0.00000000

from discopt._jax.nlp_evaluator import NLPEvaluator
from discopt._jax.primal_heuristics import feasibility_pump
from discopt.solvers.nlp_ipopt import solve_nlp

# Build an MINLP and get the NLP relaxation solution
m2 = dm.Model("pump_demo")
x = m2.continuous("x", lb=0, ub=5)
n = m2.integer("n", lb=0, ub=3)

m2.minimize((x - 1.7) ** 2 + (n - 2) ** 2)
m2.subject_to(x + n <= 4)

# Solve the NLP relaxation (ignoring integrality)
evaluator = NLPEvaluator(m2)
lb, ub = evaluator.variable_bounds
x0 = 0.5 * (np.clip(lb, -10, 10) + np.clip(ub, -10, 10))
nlp_result = solve_nlp(evaluator, x0, options={"print_level": 0})

print(f"NLP relaxation: x={nlp_result.x[0]:.4f}, n={nlp_result.x[1]:.4f} (fractional!)")

# Run feasibility pump to get an integer-feasible solution
x_feas = feasibility_pump(m2, nlp_result.x)
if x_feas is not None:
    print(f"Feasibility pump: x={x_feas[0]:.4f}, n={x_feas[1]:.4f} (integer-feasible)")
else:
    print("Feasibility pump did not find an integer-feasible solution")

NLP relaxation: x=1.7000, n=2.0000 (fractional!)

Feasibility pump: x=1.7000, n=2.0000 (integer-feasible)

7. CUTEst Interface#

discopt can load and solve problems from the CUTEst [Gould et al., 2015] collection, the standard test set for nonlinear optimization with over 1500 problems.

Requirements: pycutest is an optional dependency.

pip install discopt[cutest]

You also need gfortran and the CUTEst/SIFDecode libraries. See the pycutest documentation for setup.

API overview#

from discopt.interfaces.cutest import CUTEstProblem, list_cutest_problems

# List available problems matching criteria
names = list_cutest_problems(constraints="U", max_n=10)  # unconstrained, <= 10 vars

# Load a specific problem
prob = CUTEstProblem("ROSENBR")
print(prob.info)       # metadata: n, m, classification
print(prob.x0)         # starting point
print(prob.bl, prob.bu)  # variable bounds

# Create an evaluator for use with discopt's solvers
evaluator = prob.to_evaluator()
obj = evaluator.evaluate_objective(prob.x0)
grad = evaluator.evaluate_gradient(prob.x0)

The NLPEvaluatorFromCUTEst provides the same interface as the JAX-based NLPEvaluator, so CUTEst problems can be solved with Ipopt or the pure-JAX IPM.

# Demonstrate the CUTEst interface (only runs if pycutest + CUTEst are installed)
try:
    from discopt.interfaces.cutest import CUTEstProblem

    prob = CUTEstProblem("ROSENBR")
    print(f"Problem: {prob.name}")
    print(f"  Variables:   {prob.n}")
    print(f"  Constraints: {prob.m}")
    print(f"  x0 = {prob.x0}")
    print(f"  Classification: {prob.info.classification}")

    evaluator = prob.to_evaluator()
    obj = evaluator.evaluate_objective(prob.x0)
    grad = evaluator.evaluate_gradient(prob.x0)
    print(f"  f(x0)    = {obj:.6f}")
    print(f"  ||grad|| = {np.linalg.norm(grad):.6f}")

except ImportError:
    print("pycutest is not installed. Install with: pip install discopt[cutest]")
    print("Skipping CUTEst demo.")
except RuntimeError as e:
    print(f"CUTEst libraries not found: {e}")
    print("Install CUTEst and set the CUTEST environment variable.")
    print("See: https://jfowkes.github.io/pycutest/")

pycutest is not installed. Install with: pip install discopt[cutest]
Skipping CUTEst demo.

Summary of Advanced Solver Options#

All advanced features are controlled through m.solve() parameters or dedicated API functions:

Feature	How to Use
Differentiable solving (L1)	`differentiable_solve(model)`
Differentiable solving (L3)	`differentiable_solve_l3(model)`
sIPOPT sensitivity queries	`result.approximate_resolve()`, `result.dual_sensitivity()`, etc.
JAX-native differentiation	`_make_jax_differentiable_solve(model)`
GPU acceleration	`os.environ["JAX_PLATFORMS"] = "gpu"`
Pure-JAX IPM	`m.solve(nlp_solver="ipm")`
Batch NLP solving	`m.solve(nlp_solver="ipm", batch_size=16)`
Piecewise McCormick	`m.solve(partitions=4)`
GNN branching	`m.solve(branching_policy="gnn")`
Cutting planes	`m.solve(cutting_planes=True)`
Multi-start	`MultiStartNLP(model, n_starts=64).solve()`
Feasibility pump	`feasibility_pump(model, x_nlp)`
CUTEst problems	`CUTEstProblem("ROSENBR").to_evaluator()`

See the Sensitivity Analysis notebook for detailed examples of the sIPOPT-style features [Pirnay et al., 2012].