Neural Network Embedding#

This tutorial shows how to embed a trained neural network as algebraic constraints in a discopt optimization model, enabling optimization over ML surrogates with global optimality guarantees.

The approach follows the ideas from OMLT [Ceccon et al., 2022] and the strong MIP formulations of [Anderson et al., 2020], adapted to discopt’s expression DAG, JAX automatic differentiation, and McCormick relaxations for spatial branch-and-bound.

Overview#

The discopt.nn module provides:

  • NetworkDefinition: An intermediate representation for sequential feedforward networks.

  • NNFormulation: A builder that embeds the network into a discopt.Model as variables and constraints.

  • Three formulation strategies:

    • full_space — Explicit pre/post-activation variables with smooth activation constraints (sigmoid, tanh, softplus).

    • relu_bigm — Big-M MILP formulation for ReLU networks using binary variables.

    • reduced_space — Nested expressions with no intermediate variables (small networks).

  • load_onnx: Load trained models from ONNX format (covers scikit-learn, PyTorch, Keras exports).

Example: Optimize over a trained neural network#

We’ll train a small neural network to approximate a function, then find the input that minimizes the network’s output subject to constraints.

import discopt.modeling as dm
import numpy as np
from discopt.nn import (
    Activation,
    DenseLayer,
    NetworkDefinition,
    NNFormulation,
)

Step 1: Define a trained network#

Here we manually specify a small 2-layer ReLU network. In practice, you would train this with scikit-learn, PyTorch, etc. and load via load_onnx().

# A small 2-input, 4-hidden, 1-output ReLU network
np.random.seed(0)
W1 = np.random.randn(2, 4) * 0.5
b1 = np.random.randn(4) * 0.1
W2 = np.random.randn(4, 1) * 0.5
b2 = np.random.randn(1) * 0.1

net = NetworkDefinition(
    layers=[
        DenseLayer(W1, b1, Activation.RELU),
        DenseLayer(W2, b2, Activation.LINEAR),
    ],
    input_bounds=(np.array([-2.0, -2.0]), np.array([2.0, 2.0])),
)

print(f"Network: {net.input_size} -> {net.layers[0].n_outputs} -> {net.output_size}")
print(f"Forward pass at [0, 0]: {net.forward(np.array([0.0, 0.0]))}")
Network: 2 -> 4 -> 1
Forward pass at [0, 0]: [0.17936536]

Step 2: Embed in a discopt model with big-M formulation#

The relu_bigm strategy introduces binary variables for each ReLU neuron where the pre-activation value can be positive or negative. Interval arithmetic bound propagation computes tight big-M constants.

m = dm.Model("nn_optimization")

nn = NNFormulation(m, net, strategy="relu_bigm", prefix="surrogate")
nn.formulate()

# Minimize the network output
m.minimize(nn.outputs[0])

# Add constraints on the inputs
m.subject_to(nn.inputs[0] + nn.inputs[1] >= -1.0, name="sum_lb")

print(m.summary())
Model: nn_optimization
  Variables: 15 (11 continuous, 4 integer/binary)
  Constraints: 22
  Objective: minimize surrogate_zhat_1[0]
  Parameters: 0
result = m.solve(time_limit=60)
print(f"Status: {result.status}")
print(f"Optimal objective: {result.objective:.6f}")
print(f"Optimal inputs: {result.value(nn.inputs)}")
print(f"Network output: {result.value(nn.outputs)}")

# Verify against numpy forward pass
x_opt = result.value(nn.inputs)
nn_check = net.forward(x_opt)
print(f"NumPy forward check: {nn_check}")
Status: optimal
Optimal objective: 0.149408
Optimal inputs: [-2.          1.90020791]
Network output: [0.14940791]
NumPy forward check: [0.14940791]

Step 3: Compare with smooth formulation#

For networks with smooth activations (sigmoid, tanh, softplus), the full_space strategy creates nonlinear constraints that get McCormick relaxations and JAX automatic differentiation.

# Same network structure but with tanh activation
net_smooth = NetworkDefinition(
    layers=[
        DenseLayer(W1, b1, Activation.TANH),
        DenseLayer(W2, b2, Activation.LINEAR),
    ],
    input_bounds=(np.array([-2.0, -2.0]), np.array([2.0, 2.0])),
)

m2 = dm.Model("smooth_nn")
nn2 = NNFormulation(m2, net_smooth, strategy="full_space")
nn2.formulate()
m2.minimize(nn2.outputs[0])

result2 = m2.solve(time_limit=60)
print(f"Status: {result2.status}")
print(f"Optimal objective: {result2.objective:.6f}")
print(f"Optimal inputs: {result2.value(nn2.inputs)}")
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

Status: optimal
Optimal objective: -0.579405
Optimal inputs: [-1.99999999 -0.90869585]

Formulation strategies#

Strategy

Activations

Variables

Method

full_space

sigmoid, tanh, softplus

Explicit per neuron

NLP with McCormick relaxations

relu_bigm

ReLU (+ smooth)

Binary per ReLU neuron

MILP big-M constraints

reduced_space

All

None (nested expressions)

NLP, best for small nets

Choose relu_bigm for ReLU networks (most common in practice). Choose full_space for smooth networks when you want tight McCormick relaxations for global optimization. Choose reduced_space for very small networks where compilation overhead matters.