Introduction to optimization

Introduction to optimization#

KEYWORDS: scipy.optimize.minimize

Where are we again? map

Function extrema#

It is pretty common to need to find extreme values of a function in engineering analysis. An extreme value is often a maximum or minimum in a function, and we seek them when we want to maximize a profit function, or minimize a cost function, identify a maximum safe operating condition, etc.

Let’s consider an example function with a graphical solution approach. We want a quantitative estimate of the minimum in this function.

The function is $f(x) = x^2 + \exp(-5 x^2)$

Think about what this function should look like. One term is quadratic, and one is exponentially decaying.

import numpy as np
import matplotlib.pyplot as plt


def f(x):
    return x**2 + np.exp(-5 * x**2)


x = np.linspace(-2, 2)
y = f(x)
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("y");

../../_images/a7927228c2fec4865a23baf78986e1e7d9fe5cb8db2680eab853ac6b063be11d.png

You can see there is a minimum near +/- 0.6. We can find the minimum in a crude kind of way by finding the index of the minimum value in the y-array, and then getting the corresponding value of the x-array. You control the accuracy of this answer by the number of points you discretize the function over.

array([4.        , 3.68013329, 3.37359438, 3.08038338, 2.80050062,
       2.53394733, 2.28072752, 2.04085335, 1.81435899, 1.60133337,
       1.40198984, 1.21679943, 1.04671548, 0.89349729, 0.76008211,
       0.65085188, 0.57152145, 0.52831859, 0.52624337, 0.56656244,
       0.64424335, 0.74650297, 0.85365611, 0.94276622, 0.99337071,
       0.99337071, 0.94276622, 0.85365611, 0.74650297, 0.64424335,
       0.56656244, 0.52624337, 0.52831859, 0.57152145, 0.65085188,
       0.76008211, 0.89349729, 1.04671548, 1.21679943, 1.40198984,
       1.60133337, 1.81435899, 2.04085335, 2.28072752, 2.53394733,
       2.80050062, 3.08038338, 3.37359438, 3.68013329, 4.        ])

min(y)  # this is the min value in y

0.5262433682550042

It is helpful to find the index associated with the minimum, so that we can find the corresponding x value.

np.argmin(y)

x[np.argmin(y)], y[np.argmin(y)]

(-0.5306122448979593, 0.5262433682550042)

As always, you should plot your results, or do some analysis to make sure they make sense to you.

plt.plot(x, y, x[np.argmin(y)], y[np.argmin(y)], "ro");

../../_images/e82a599126c67843cf932098fe349753aafdba3c3e3c0fe74194650313d84c03.png

You can get a more accurate solution by using a finer grid of points.

x = np.linspace(0, 2, 15000)
y = f(x)
i = np.argmin(y)  # gives us the index of the minimum in y
print(x[i])  # estimate for the x where y is a min
plt.plot(x, y, x[np.argmin(y)], y[np.argmin(y)], "ro");

0.5673711580772052

../../_images/f435489bc735aacb44a72686f15da7d11d2b2264983f355e9ba494ef71ff66f2.png

What are the pros and cons of this method:

Pros:

It is easy.
You see the whole domain you are looking at, and it is easy to see how many extrema there are

Cons:

Lots of function evaluations. Imagine if it took a long time to compute each value.
Somewhat tedious.
Not so easy to reproduce
Not scalable to large problems, your time to do this becomes a limiting factor.

Find the derivative, and solve for where it is zero#

We can also derive the first derivative:

$f' = 2 * x + e^{-5 x^2} (-10 * x)$

and solve it for zero using root.

def fp(x):
    return 2 * x + np.exp(-5 * x**2) * (-10 * x)


from scipy.optimize import root

sol = root(fp, 0.01)
sol

 message: The solution converged.
 success: True
  status: 1
     fun: [ 0.000e+00]
       x: [ 0.000e+00]
    nfev: 7
    fjac: [[-1.000e+00]]
       r: [ 8.000e+00]
     qtf: [ 2.465e-32]

We need to see that the second derivative is positive to be sure we are at a minimum. We approximate that by looking at values on each side of our solution, at a minimum, each one should be larger. Here we see they are smaller, which means we found a maximum.

d = 0.001
f(sol.x - d), f(sol.x), f(sol.x + d)  # this is a maximum at x=0

(array([0.999996]), array([1.]), array([0.999996]))

plt.plot(x, y, sol.x, f(sol.x), "ro");

../../_images/0f1864fbbdc6ec512683e5f824dc09d3013237e6d5b5f1a4896b3811b6d68a8e.png

Let us try again with a different guess.

sol = root(fp, 0.5)
sol

 message: The solution converged.
 success: True
  status: 1
     fun: [ 2.665e-15]
       x: [ 5.674e-01]
    nfev: 6
    fjac: [[-1.000e+00]]
       r: [-6.438e+00]
     qtf: [-4.850e-09]

f(sol.x - d) > f(sol.x), f(sol.x + d) > f(sol.x)  # this is a minimum, both values near by are greater than the solution

(array([ True]), array([ True]))

plt.plot(x, y, sol.x, f(sol.x), "ro");

This depends on your ability to correctly derive and implement the derivative. It is good to know you can solve this problem by more than one method. Here, we use a numerical derivative in the function instead to check our derivative. You can check the convergence of the derivative by varying the dx.

Note

We are approximating the derivative here, so the solution is also approximate, and you should check it to see if it is accurate enough for you.

from numdifftools import Derivative

def ypd(x):
    return Derivative(f)(x)

ans = root(ypd, 0)

print(f"""at x=0: y'(0) = {ypd(ans.x)}
The second derivative is {Derivative(f, n=2)(ans.x)}""")
# second derivative is negative => maximum

at x=0: y'(0) = [0.]
The second derivative is [-8.]

ans = root(ypd, 0.5)
print(f"""at x={ans.x}: y'(0) = {ypd(ans.x)}
The second derivative is {Derivative(f, n=2)(ans.x)}""")
# second derivative is positive => minimum

at x=[0.56735137]: y'(0) = [6.85722723e-14]
The second derivative is [6.43775165]

These look the same within tolerance. This is not a beautiful solution, but it is hard to argue with success here!

Newton-Raphson method of minima finding#

Finding a minimum is like finding $f'(x) = 0)$. We already know how to use Newton’s method to solve $g(x) = 0$:

$x_{new} = x_{old} + \frac{g(x_{old})}{g'(x_{old})}$.

We can substitutue in $g(x) = f'(x)$ here to a new formula to find the minimum of a function.

To use the Newton-Raphson method to get the minimum, we use an iterative approach with:

$x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}$.

We have to derive these formulas if you want to use analytical derivatives:

$y' = 2 * x + e^{-5 x^2} (-10 * x)$

$y'' = 2 + e^{-5 x^2} (-10 * x)^2 - 10 e^{-5 x^2}$

Alternatively, we can estimate the derivatives numerically using numdifftools.Derivative. This has the downside of numerical instability for dx that is too small, or low accuracy if it is too large, and the need to check if you made a good choice for it. On the plus side, it avoids making mistakes in the derivative derivation and implementation.

from numdifftools import Derivative

x0 = 0.2

for i in range(15):
    yp = Derivative(f)(x0)  # first derivative
    ypp = Derivative(f, n=2)(x0)  # second derivative
    xnew = x0 - yp / ypp
    fnew = f(xnew)

    if np.abs(yp) <= 1e-6:  # this y' \approx 0
        break
    x0 = xnew

xnew, fnew, yp, i

(0.5673513747994342, 0.5218875824868201, array(-8.71627422e-12), 5)

This answer also agrees to at least 5 decimal places. This is the gist of what happens in root.

As we have seen many times, finding minima is such a common task that there are dedicated functions available for doing it. One of them is scipy.optimize.minimize. This has a similar signature as scipy.optimize.root, you give it a function and an initial guess, and it iteratively searches for a minimum.

scipy.optimize.minimize#

from scipy.optimize import minimize

minimize?

Here is the basic use of minimize. As always, we should plot the answer where feasible to make sure it is the minimum we wanted.

def f(x):
    return x**2 + np.exp(-5 * x**2)


guess = 0.1
sol = minimize(f, guess)
sol

  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.521887582486822
        x: [ 5.674e-01]
      nit: 3
      jac: [-1.118e-07]
 hess_inv: [[ 1.556e-01]]
     nfev: 14
     njev: 7

The sol.jac is the derivative of your function. It should be “close” to zero in an unconstrained minimization.

x = np.linspace(0, 2)
y = f(x)

plt.plot(x, y, "b-")
plt.plot(sol.x, sol.fun, "ro")
plt.xlabel("x")
plt.ylabel("y")
plt.legend(["f(x)", "fmin"]);

../../_images/8b428dbd5668883b0f8facb11c7e845e18d137533da3de6de38d256173556615.png

Note this answer is only the same in the first 4 decimal places. Remember that these iterative approaches stop when a tolerance is met. Check the defaults on minimize!

Note

There are some legacy minimization functions like scipy.optimize.fmin that also work, but they are no longer preferred and are less flexible than minimize.

Multiple minima#

It is possible for functions to have more than one minimum. In this case, your guess will determine which minimum is found. Here is an example where there is a minimum near 2.2, and one near 4.5.

def h(x):
    return 2 + np.cos(x) + np.cos(2 * x - 0.5) / 2


x = np.linspace(0, 2 * np.pi)

plt.plot(x, h(x))
plt.xlabel("x")
plt.ylabel("h(x)");

../../_images/cd81a548c9677dbb0a478455571cc4cbeee18fcc402a4d0f209a533742790689.png

This guess finds the one near 2.2:

minimize(h, 2)

  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 1.0448871783746692
        x: [ 2.261e+00]
      nit: 3
      jac: [-1.490e-08]
 hess_inv: [[ 5.234e-01]]
     nfev: 10
     njev: 5

and this guess finds the one near 4.5

minimize(h, 4)

  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 1.4758979742813436
        x: [ 4.355e+00]
      nit: 5
      jac: [-9.090e-07]
 hess_inv: [[ 9.473e-01]]
     nfev: 14
     njev: 7

It does not appear possible to provide multiple initial guesses, so you need use list comprehension for this.

[minimize(h, x).x for x in np.linspace(0, 6, 10)]

[array([-1.92772842]),
 array([2.26106178]),
 array([2.26106266]),
 array([2.26106175]),
 array([2.26106197]),
 array([2.26106154]),
 array([4.355456]),
 array([4.35544928]),
 array([4.35544955]),
 array([2.26106163])]

You have to decide which one is better for the problem at hand. If this were a cost function, the one at the lower cost is probably better! Note that all we can say here is which one is lower in the interval we are looking at. By inspection of the function, you can see it will be periodic, so there will be many other minima that also exist.

Finding maxima#

minimize is for finding minima. We can use it to find maxima though, by finding the minima of $-f(x)$. You can see here that when we plot $-h(x)$ the minima becomes maxima, and vice-versa. Now you can see there are two definite minima, one near zero, and one near 3.5, which correspond to the maxima of $h(x)$.

plt.plot(x, h(x), x, -h(x))
plt.xlabel("x")
plt.ylabel("-h(x)");

../../_images/c918607b064b7c4310c86e4907f773282dd59cc9ddd7801b99368cd5d4e946a2.png

The standard way to use minimize is to define an optional argument for the sign that defaults to one. Then, when we call minimize, we will pass -1 as the sign to the function, so we find the minimum of -h(x). Then, we evaluate h(x) at that x-value to get the actual value of the maximum. It is not necessary to do this, you can also manually pass around the sign and try to keep it straight.

Tip

We use a parameterized objective function to make maximization easier.

Here is an example to find the maximum near 3.5.

def h(x, sign=1):
    return sign * (2 + np.cos(x) + np.cos(2 * x - 0.5) / 2)


sol = minimize(h, 3.5, args=(-1,))  # set sign=-1 here to minimize -h(x). Note (-1,) the comma
print(
    f"the maximum is {h(sol.x)[0]:1.2f} at x={sol.x[0]:1.2f}"
)  # sign defaults to 1 here, so we get the maximum value

plt.plot(x, h(x))
plt.plot(sol.x, h(sol.x), "ro")
plt.xlabel("x")
plt.ylabel("h(x)");

the maximum is 1.56 at x=3.64

../../_images/bed6e9c6732b7365580005863fb55beececd02bc1279f5024417448801feb900.png

def negh(x):
    return -1 * (2 + np.cos(x) + np.cos(2 * x - 0.5) / 2)


sol = minimize(negh, 3.5)
print(f"the maximum is {-1 * negh(sol.x)[0]:1.2f} at x={sol.x[0]:1.2f}")

the maximum is 1.56 at x=3.64

with the optional sign argument, we can find minima and maxima with one function.

def h(x, sign=1):
    return sign * (2 + np.cos(x) + np.cos(2 * x - 0.5) / 2)


sol1 = minimize(h, 2)  # sign=1 by default so we find a minimum
sol2 = minimize(h, 4)  # sign=1 by default so we find a minimum
sol3 = minimize(h, 3.5, args=(-1,))  # sign=-1, get maxima
print(
    f"the maximum is {h(sol.x)[0]:1.2f} at x={sol.x[0]:1.2f}"
)  # sign defaults to 1 here, so we get the maximum value

plt.plot(x, h(x))
plt.plot(sol1.x, h(sol1.x), "ro")
plt.plot(sol2.x, h(sol2.x), "bo")
plt.plot(sol3.x, h(sol3.x), "gs")
plt.xlabel("x")
plt.ylabel("h(x)");

the maximum is 1.56 at x=3.64

../../_images/febf1d11302e3ceae1eacdd565d3782c0d2bdc304e58ab4cef4fcaa20ea97a79.png

The function is periodic though, and depending on your guess you can find some surprising results. Here we make many guesses in the domain. Surprisingly, some solutions are found outside of it! We can visualize the solutions found in a histogram.

SOL = [minimize(h, g, args=(-1,)) for g in np.linspace(0, 6, 50)]
plt.hist([sol.x[0] for sol in SOL], bins=20);

../../_images/46f4f6dc00fb6ab6cf02775394064fa58dba78ac6a21b3ed79670f68561a0c46.png

plt.plot(x, h(x))
# the real function is periodic
x0 = np.linspace(-5, 0)
plt.plot(x0, h(x0), 'b--')
x1 = np.linspace(6, 40)
plt.plot(x1, h(x1), 'b--')
plt.plot([sol.x[0] for sol in SOL], [-sol.fun for sol in SOL], "ro");

../../_images/d016113a4ab1ed8e50ff2f5684b7a3df3b3129cd039470d9b0b2baa4aab1af87.png

Once again, here you have to decide which maximum is relevant

Application to maximizing profit in a PFR#

Compound X with concentration of $C_{X0} = 2.5$ kmol / m³ at a flow rate of 12 m³/min is converted to Y in a first order reaction with a rate constant of 30 1/min in a tubular reactor. The value of Y is $1.5/kmol. The cost of operation is $2.50 per minute per m³. Find the reactor length that maximizes the profit (profit is value of products minus operating costs).

First, consider why there is a maximum. At low volumes the operating cost is low, and the production of Y is low. At high volumes, you maximize production of Y, so you have the most value, but the operating costs go up (to infinity for complete conversion!). Somewhere in the middle is where a maximum is.

Here are some relevant constants.

cost = 2.5  # dollar/min/m**3
y_value = 1.5  # dollar / kmol

Cx0 = 2.5  # kmol / m**3
v0 = 12.0  # m**3 / min

k = 30.0  # 1/min

To compute the profit as a function of reactor volume, we need to compute how much Y is produced, then multiply that by the value of Y and subtract the operating cost. To compute how much Y is produced, we use a mole balance on X and Y, and integrate it to the volume to get the molar flows of X and Y. I like to write mole balances like this.

def dFdV(V, F):
    "PFR mole balances on X and Y."
    Fx, Fy = F
    Cx = Fx / v0
    rx = -k * Cx
    ry = -rx

    dFdX = rx
    dFdY = ry
    return [dFdX, dFdY]


# initial conditions
F0 = [Cx0 * v0, 0.0]  # Fx0  # Fy0

Now, we can write a profit function. It will take a V as the argument, integrate the PFR to that volume to find the molar exit flow rates, and then compute the profit.

import numpy as np
from scipy.integrate import solve_ivp


def profit(V, sign=1):
    Vspan = (0, V)
    sol = solve_ivp(dFdV, Vspan, F0)
    Fx, Fy = sol.y
    Fy_exit = Fy[-1]
    return sign * (Fy_exit * y_value - cost * V)

profit(10)

19.999999935332582

It is always a good idea to plot the profit function. We use a list comprehension here because the profit function is not vectorized, which means we cannot pass an array of volumes in and get an array of profits out.

# alternative 1 to list comprehension - accumulate by appending to a list
Vspan = np.linspace(0, 10, 5)

PA = []
for V in Vspan:
    PA += [profit(V)]  # append a list to the list
PA

[0.0,
662803773064404,
49983056415947,
249999599753814,
999999935332582]

# alternative 2 to list comprehension - use indexing to set values in an array
PA2 = np.zeros(Vspan.shape)
for i, V in enumerate(Vspan):
    PA2[i] = profit(V)  # brackets in this line are for indexing

PA2

array([ 0.        , 38.66280377, 32.49983056, 26.2499996 , 19.99999994])

# plot each point, one at a time
maxP = 0.0
maxV = 0
for V in Vspan:
    plt.plot(V, profit(V), "r.")
    if profit(V) > maxP:
        maxP = profit(V)
        maxV = V
V, maxP

(10.0, 38.662803773064404)

../../_images/f035607cf74254f59be87ef40431a1d9cf450175c5773e68315eb99a467d863d.png

Vspan = np.linspace(0, 4)
profit_array = [profit(V) for V in Vspan]  # these brackets are list comprehension

import matplotlib.pyplot as plt

plt.plot(Vspan, profit_array)
plt.xlabel("V")
plt.ylabel("profit");

../../_images/d48e88a0abe840f2fe55fe0a122fa7eebe5ebee39563e1021a1b9897c002a41b.png

Vspan[np.argmax(profit_array)]  # approximate V where the maximum is

1.5510204081632653

You can see from this plot there is a maximum near V=1.5. We can use that as a guess for minimize.

from scipy.optimize import minimize

sol = minimize(profit, 1.5, args=(-1,))  # minimize -profit, which is equivalent to maximizing it

print(f"The optimal volume is {sol.x[0]:1.2f} m^3 with a profit of ${profit(sol.x[0]):1.2f}/min.")

The optimal volume is 1.52 m^3 with a profit of $40.19/min.

This problem highlights the opportunities we have to integrate many ideas together to solve complex problems. We have integration of an ODE, nonlinear algebra/minimization, with graphical estimates of the solution.

Challenge Can you solve this with an event and solve_ivp?

Summary#

Today we introduced the concept of finding minima/maxima in functions. This is an iterative process, much like finding the roots of a nonlinear function. You can think of it as finding the zeros of the derivative of a nonlinear function! This method is the root of many important optimization problems including regression.

scipy.optimize.minimize is the preferred function for doing minimization. There are other more specific ones described at https://docs.scipy.org/doc/scipy/reference/optimize.html, but minimize has a more consistent interface and provides almost all the functionality of those other methods.

Next time, we will look at how to apply minimization to regression problems.

from jupyterquiz import display_quiz
display_quiz('.quiz.json')