Uncertainty quantification in nonlinear regression#

  • KEYWORDS: scipy.optimize.minimize

Uncertainty estimates from curvefit and scipy.optimize.minimize#

We previously examined how to estimate uncertainty from the covariance matrix returned from curve_fit. Recall we need the diagonal of the covariance matrix, which is estimated during the fitting. The covariance matrix is related to the inverse Hessian matrix. We will explore how these are related here.

We will consider fitting a line to the following data.

import numpy as np
from scipy.optimize import curve_fit

x = np.array([0.,    2.5,   5.,    7.5,  10. ])
y = np.array([1.14,    1.91,  2.48,  2.2,  4.0])

def model(x, m, b):
    return m * x + b

p, pcov = curve_fit(model, x, y, [0.2, 0.1])
print(p)
pcov
[0.2404     1.14399999]
array([[ 0.00430672, -0.0215336 ],
       [-0.0215336 ,  0.161502  ]])

scipy.optimize.minimize does not return the covariance matrix; with some of the methods, it returns an estimate of the inverse Hessian matrix. In theory, the covariance matrix and the inverse hessian are related to each other with \(cov = 0.5 * H^{-1}\). Note this relationship is specific to the minimization of the summed squared errors.

from scipy.optimize import minimize

def model(pars, x):
    x = np.array(x)
    m, b = pars
    return m * x + b

def objective(pars):
    errs = y - model(pars, x)
    return np.sum(errs**2)

sol = minimize(objective, [0.2, 1])
print(sol.x)
0.5 * sol.hess_inv
[0.2404     1.14399997]
array([[ 0.01019113, -0.06596866],
       [-0.06596866,  0.49131361]])

That doesn’t look very good. But, remember that it is an estimate of the Hessian and we need to be careful about the accuracy. The minimizer terminates when the solution reaches the tolerance, not when the Hessian is accurate! If we increase the tolerance, we get a more accurate result.

sol = minimize(objective, [0.2, 1], tol=1e-9)
print(sol.x)
print(0.5 * sol.hess_inv)
[0.24039999 1.144     ]
[[ 0.00426772 -0.02217504]
 [-0.02217504  0.16724067]]

With the increased accuracy, you can see the covariance is approximately equal to 1/2 the inverse Hessian. That means you can use it to estimate the uncertainty in the same way we did with curve_fit.

Not all solvers generate the inverse Hessian matrix, e.g. SLSQP does not do it. You have three options. One is always to compute the Hessian analytically. The other two options rely on libraries that use automatic differentiation to compute the relevant derivatives. One is to use numdifftools (which you may have to install). Either way, you have to compute the Hessian on the objective function that is being minimized. One way to get this is to use a numerical package designed to compute this.

Now, similar to what we did with scipy.misc.derivative, we can write a function and then use numdifftools to get the Hessian of the function. Here, we define the sum of the squared errors function, then create a Hessian function for that. We can use the Hessian function to evaluate the Hessian at the parameters at the minimum. We use numpy.linalg.inv to get the inverse of the Hessian to compute the covariance.

import numdifftools as nd

def f(pars):
    m, b = pars
    return np.sum((y - (m * x + b))**2)

H = nd.Hessian(f)  # H is an executable function now that takes one argument, the pars.
0.5 * np.linalg.inv(H(p))
array([[ 0.004, -0.02 ],
       [-0.02 ,  0.15 ]])

numdifftools (https://pypi.org/project/numdifftools/) is a numerical differentiation package. It is more sophisticated than scipy.misc.derivative but is fundamentally still a numerical approximation to the derivatives. Now you can use these to estimate the uncertainties even for optimizers that don’t provide the estimated inverse Hessian.

Later we will learn about one more approach to getting the derivatives that is used in machine learning called automatic differentiation.

Effects of outliers on regression#

Outliers can have a significant effect on the fit of a model to data. Let’s consider this example, where we want to fit a line to some data that has an outlier in it. This is just a linear regression, and we start out using numpy.polyfit.

import numpy as np
import matplotlib.pyplot as plt

x = [0.,      2.5,   5.,    7.5,  10. ]
y = [1.14,    1.91,  2.48,  2.2,  4.0]
#                            ^
#                            |
#                         outlier

p = np.polyfit(x, y, 1)
print(p)
xfit = np.linspace(0, 10)

plt.plot(x, y, 'bo')
plt.plot(xfit, np.polyval(p, xfit));
plt.xlabel('x')
plt.ylabel('y');
[0.2404 1.144 ]
../_images/50a24b31eeb9440c3ec33433faa7920a68da8fd28c0f7196bf1be680fa78f86d.png

You can see that the fitted line is “dragged” towards the outlier. We say that least squares minimization is not robust to outliers.

This may be undesirable because if you believe there is an outlier, perhaps due to experimental error, then this point affects the accuracy of the model more than the other points you believe to be more accurate.

Today we will consider a variety of approaches to minimize the effects of outliers. We first begin by re-examining how these parameters are obtained. Here, we illustrate that the results from polyfit are equivalent to minimizing the summed squared errors between the model and the data.

from scipy.optimize import minimize

def model(pars, x):
    x = np.array(x)
    m, b = pars
    return m * x + b

def objective(pars):
    errs = y - model(pars, x)
    return np.sum(errs**2)

minimize(objective, [0.2, 1])
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.8075100000000082
        x: [ 2.404e-01  1.144e+00]
      nit: 2
      jac: [ 5.290e-07 -3.055e-07]
 hess_inv: [[ 2.038e-02 -1.319e-01]
            [-1.319e-01  9.826e-01]]
     nfev: 15
     njev: 5

The problem is that we are minimizing the error2, which puts more weight on large errors than small errors.

Least squares regression is also called L2 norm regression, that is we minimize the L2 norm of the vector.

Minimizing the summed absolute errors#

We can choose to minimize another objective function, for example the summed absolute value of the errors. This will reduce the emphasis on large errors. This is also called L1 norm regression.

def objective(pars):
    errs = y - model(pars, x)
    return np.sum(np.abs(errs))

L1_sol = minimize(objective, [0.2, 1])
print(L1_sol.x)
plt.plot(x, y, 'bo')
plt.plot(xfit, model(L1_sol.x, xfit));
[0.26845682 1.14      ]
../_images/1f008bc43da29186dcec00a2d3d82eb7d26c04a1fc6d7ad055f14e8e4a1d5501.png

There is a historical reason this is not done a lot, and that is the absolute value function has a discontinuity in its first derivative at the origin which can be problematic in some optimization algorithms. It is obviously not a problem here, and you can see that the outlier has less of an effect on the fitted line in this case.

Finally, we can generalize these ideas to something called Lp norm regressions where we seek to minimize:

\(\sum |\epsilon_i|^p\)

In this paper a value of \(p=1.5\) is recommended for general use. Note this is less than two, and greater than one, so it is expected to have an intermediate effect compared to L1 and L2 norm regression.

def objective(pars):
    p = 1.5
    errs = y - model(pars, x)
    return np.sum(np.abs(errs)**p)

Lp_sol = minimize(objective, [0.2, 1])
print(Lp_sol.x)
plt.plot(x, y, 'bo')
plt.plot(xfit, model(Lp_sol.x, xfit));
[0.25741034 1.15352086]
../_images/82baa9bd3824e20be124bc02b93bb94d618e57f24a2c9023c052961f1ba2784c.png

The downside of these approaches is that they complicate the analysis of uncertainty. The uncertainty analysis we have considered so far is only formally correct when we minimize the summed squared errors. It is only approximately correct when something else is minimized.

Robust regression approaches#

An alternative approach to least squares or absolute error minimization is called robust regression (see Applied Regression Analysis, 3rd edition, Draper and Smith, chapter 25). This is a class of methods that uses a different metric to minimize in the objective function.

The simplest approach is to minimize the median of the squared error. Note that minimizing the sum of squared errors is practically like minimizing the average or mean squared error. If you have a symmetric distribution of errors, then the mean and median are practically the same. If there is an outlier, however, the mean will be skewed towards the outlier, while the median will be at a position that splits the distribution in half, and is closer to what you believe the mean to be.

Here we show that given an asymmetric distribution, the median is smaller than the mean.

errs = np.array([0.1, 0.01, 0.05, 0.02, 0.8])
s = errs**2
plt.hist(s, density=True)
plt.axvline(np.mean(s), color='r')
plt.axvline(np.median(s), color='k')
plt.legend(['mean', 'median'])
plt.xlabel('');
../_images/ca5dcf249533dc2f3cc23f50588a934f040cec3873f1522f7426d714e95776f2.png

Least Median regression#

It is straightforward to modify the objective function to minimize the median of the squared errors.

def objective(pars):
    errs = y - model(pars, x)
    return np.median(errs**2)

LMS_sol = minimize(objective, [0.2, 1])
print(LMS_sol.x)
plt.plot(x, y, 'bo')
plt.plot(xfit, model(LMS_sol.x, xfit));
[0.26804924 1.18981534]
../_images/f51e4995321638745143dcbd7630ae020087d539ce5549346cfa1f3ed3e2d1e4.png

Weighted nonlinear regression#

Outliers often are associated with larger uncertainties about their values. An alternative approach to the methods described above is to use weights to say how important each data point is. This example is adapted from https://www.mathworks.com/help/stats/examples/weighted-nonlinear-regression.html.

import numpy as np
import matplotlib.pyplot as plt

x = [1,   2,   3,   5,   7,   10]
y = [109, 149, 149, 191, 213, 224]
plt.plot(x, y, 'bo')
plt.xlabel('Incubation (days)')
plt.ylabel('BOD');
../_images/dcbd7930c9cbf618815e772be83c0e1be41d21c5877a3d6c159bf6f28e1457a4.png

The aim of this work is to fit a nonlinear model \(y= a (1 - e^{-b x})\) to this data. We first consider a standard minimization of the sum squared errors. Inspection of the model suggests at large x, \(a\) is a plateau value, which we can read from the graph. For the value of \(b\), we might estimate a half-life at about one day and solve \(110 = 240(1 - e^-b)\)

-np.log(-(110 / 240 - 1))
np.float64(0.6131044728864088)
def model(pars, x):
    a, b = pars
    x = np.array(x)
    return a * (1 - np.exp(-b * x))

def objective(pars):
    errs = y - model(pars, x)
    return np.sum(errs**2)


guesses = [240, 0.6]

from scipy.optimize import minimize

sol = minimize(objective, guesses)
pars = sol.x

plt.plot(x, y, 'bo')
xfit = np.linspace(0, 10)
plt.plot(xfit, model(pars, xfit))
plt.xlabel('Incubation (days)')
plt.ylabel('BOD');
../_images/1e49d3740ba9709460af0faea773adad327f8c155a50e50b7b58bc840208043d.png

The fit generally goes through the data, but it is not clear if there is a small outlier near 2 that is skewing the fit, and perhaps leading to an inaccurate asymptote at long times.

Suppose, however, that these data points represent averages from multiple measurements, and we only measured the first two points once, and the rest of the points 5 times. In this case, we might want to put more weight on the points we measured multiple times.

We achieve this by modifying the objective function, in this case multiplying each error by the number of times the measurement was made. This makes reducing errors on points we measured a lot more important than the points we measured less.

w = np.array([1, 1, 5, 5, 5, 5])

def objective(pars):
    errs = (y - model(pars, x)) * w
    return np.sum(errs**2)


guesses = [240, 0.5]

from scipy.optimize import minimize

sol = minimize(objective, guesses)
pars = sol.x
print(pars)
plt.plot(x, y, 'bo')
xfit = np.linspace(0, 10)
plt.plot(xfit, model(pars, xfit))
plt.xlabel('Incubation (days)')
plt.ylabel('BOD');
[230.77020888   0.35563066]
../_images/40f4ab5c07fc14e34ff8c3055b6effdad0c8cc5efe4aee1c0ece67da47799732.png

The result here is that the model fits the points we measured a lot better than the points we measured once.

There are many ways you could choose to weight the points depending on what you know about them. If you have uncertainties about the measured data, you can weight the points accordingly, e.g. defining the weights as inversely proportional to the uncertainty.

Summary#

Regression is an important technical skill required in modern engineering. It is the method which we use to convert data into models. Sometimes it is the parameters that are important, e.g. when they represent properties of a system that we are interested in. Sometimes it is the model that is interesting, e.g. when we need to use it for optimization or predictions.

At the core, regression involves minimization of some error function. The standard method is to minimize the summed squared error between the model and data. There are some benefits to this method: it is straight forward and there are well established methods to estimate the uncertainty in the parameters. However, it is known to be sensitive to outliers.

A variety of alternative approaches exist to reduce the influence of outliers, including minimizing the summed absolute errors, robust regression methods, and weighted regression methods. It is not always obvious what the right method to use is, this takes experience and an understanding of what you know about the model, the data, and the goals of the regression.