Neural Network Embedding#
This tutorial shows how to embed a trained neural network as algebraic constraints in a discopt optimization model, enabling optimization over ML surrogates with global optimality guarantees.
The approach follows the ideas from OMLT [Ceccon et al., 2022] and the strong MIP formulations of [Anderson et al., 2020], adapted to discopt’s expression DAG, JAX automatic differentiation, and McCormick relaxations for spatial branch-and-bound.
Overview#
The discopt.nn module provides:
NetworkDefinition: An intermediate representation for sequential feedforward networks.NNFormulation: A builder that embeds the network into adiscopt.Modelas variables and constraints.Three formulation strategies:
full_space— Explicit pre/post-activation variables with smooth activation constraints (sigmoid, tanh, softplus).relu_bigm— Big-M MILP formulation for ReLU networks using binary variables.reduced_space— Nested expressions with no intermediate variables (small networks).
load_onnx: Load trained models from ONNX format (covers scikit-learn, PyTorch, Keras exports).
Example: Optimize over a trained neural network#
We’ll train a small neural network to approximate a function, then find the input that minimizes the network’s output subject to constraints.
import discopt.modeling as dm
import numpy as np
from discopt.nn import (
Activation,
DenseLayer,
NetworkDefinition,
NNFormulation,
)
Step 1: Define a trained network#
Here we manually specify a small 2-layer ReLU network.
In practice, you would train this with scikit-learn, PyTorch, etc.
and load via load_onnx().
# A small 2-input, 4-hidden, 1-output ReLU network
np.random.seed(0)
W1 = np.random.randn(2, 4) * 0.5
b1 = np.random.randn(4) * 0.1
W2 = np.random.randn(4, 1) * 0.5
b2 = np.random.randn(1) * 0.1
net = NetworkDefinition(
layers=[
DenseLayer(W1, b1, Activation.RELU),
DenseLayer(W2, b2, Activation.LINEAR),
],
input_bounds=(np.array([-2.0, -2.0]), np.array([2.0, 2.0])),
)
print(f"Network: {net.input_size} -> {net.layers[0].n_outputs} -> {net.output_size}")
print(f"Forward pass at [0, 0]: {net.forward(np.array([0.0, 0.0]))}")
Network: 2 -> 4 -> 1
Forward pass at [0, 0]: [0.17936536]
Step 2: Embed in a discopt model with big-M formulation#
The relu_bigm strategy introduces binary variables for each ReLU neuron
where the pre-activation value can be positive or negative.
Interval arithmetic bound propagation computes tight big-M constants.
m = dm.Model("nn_optimization")
nn = NNFormulation(m, net, strategy="relu_bigm", prefix="surrogate")
nn.formulate()
# Minimize the network output
m.minimize(nn.outputs[0])
# Add constraints on the inputs
m.subject_to(nn.inputs[0] + nn.inputs[1] >= -1.0, name="sum_lb")
print(m.summary())
Model: nn_optimization
Variables: 15 (11 continuous, 4 integer/binary)
Constraints: 22
Objective: minimize surrogate_zhat_1[0]
Parameters: 0
result = m.solve(time_limit=60)
print(f"Status: {result.status}")
print(f"Optimal objective: {result.objective:.6f}")
print(f"Optimal inputs: {result.value(nn.inputs)}")
print(f"Network output: {result.value(nn.outputs)}")
# Verify against numpy forward pass
x_opt = result.value(nn.inputs)
nn_check = net.forward(x_opt)
print(f"NumPy forward check: {nn_check}")
Status: optimal
Optimal objective: 0.149408
Optimal inputs: [-2. 1.90020791]
Network output: [0.14940791]
NumPy forward check: [0.14940791]
Step 3: Compare with smooth formulation#
For networks with smooth activations (sigmoid, tanh, softplus),
the full_space strategy creates nonlinear constraints that get
McCormick relaxations and JAX automatic differentiation.
# Same network structure but with tanh activation
net_smooth = NetworkDefinition(
layers=[
DenseLayer(W1, b1, Activation.TANH),
DenseLayer(W2, b2, Activation.LINEAR),
],
input_bounds=(np.array([-2.0, -2.0]), np.array([2.0, 2.0])),
)
m2 = dm.Model("smooth_nn")
nn2 = NNFormulation(m2, net_smooth, strategy="full_space")
nn2.formulate()
m2.minimize(nn2.outputs[0])
result2 = m2.solve(time_limit=60)
print(f"Status: {result2.status}")
print(f"Optimal objective: {result2.objective:.6f}")
print(f"Optimal inputs: {result2.value(nn2.inputs)}")
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
Status: optimal
Optimal objective: -0.579405
Optimal inputs: [-1.99999999 -0.90869585]
Formulation strategies#
Strategy |
Activations |
Variables |
Method |
|---|---|---|---|
|
sigmoid, tanh, softplus |
Explicit per neuron |
NLP with McCormick relaxations |
|
ReLU (+ smooth) |
Binary per ReLU neuron |
MILP big-M constraints |
|
All |
None (nested expressions) |
NLP, best for small nets |
Choose relu_bigm for ReLU networks (most common in practice).
Choose full_space for smooth networks when you want tight McCormick
relaxations for global optimization. Choose reduced_space for very
small networks where compilation overhead matters.