Interpolation

Interpolation#

KEYWORDS: numpy.linalg.solve, scipy.interpolate.interp1d, numpy.linalg.eigvals, numpy.linalg.eig, numpy.argsort

Interpolating between data points#

It is a common need to interpolate between data points, especially when we don’t have knowledge of the function relating the data. There are a variety of approaches you can use for interpolation. We will consider a few approaches that use linear algebra here. Given \(N\) points, construct an \(N^{th}\) order polynomial that goes through all the points, and that can be used to estimate new values between the points.

First we consider some data.

import numpy as np

x = np.array([1.2, 2.9, 4.1])
y = np.array([4.4, 5.5, 8.9])

import matplotlib.pyplot as plt

plt.plot(x, y);

../_images/ec519ef6b58748cb6c6d67e0e12b1b79a642a7e30f5d80d710f2afbdf8f4bf21.png

We would like an equation like \(y(x) = a_2 x^2 + a_1 x + a_0\). If we write these out for each data point we get:

\(y_0 = a_2 x_0^2 + a_1 x_0 + a_0\)

\(y_1 = a_2 x_1^2 + a_1 x_1 + a_0\)

and so on. Here, the things we don’t know are the parameters \(\mathbf{a} [a_2, a_1, a_0]\). We can write these as:

\(\mathbf{X} \mathbf{a} = \mathbf{y}\)

Where \(\mathbf{X} = [\mathbf{x^2}, \mathbf{x}, \mathbf{1}]\), and is called a Vandermonde matrix. We can readily create these with numpy.vander.

np.vander([2, 3, 4], 3)

array([[ 4,  2,  1],
       [ 9,  3,  1],
       [16,  4,  1]])

The first column is \(x^2\), the second column is \(x\), and the last column is all ones. To compute the polynomial coefficients, we just make the \(\mathbf{X}\) array and solve the equations.

X = np.vander(x, 3)
a = np.linalg.solve(X, y)
a

array([ 0.75388776, -2.443881  ,  6.24705882])

np.polyfit(x, y, 2)

array([ 0.75388776, -2.443881  ,  6.24705882])

Now, we can use the parameters to compute new values.

xfit = np.linspace(-2, 5)
Y = np.vander(xfit, 3) @ a

plt.plot(x, y, 'bo', xfit, Y);
plt.legend(['data', 'fit']);

../_images/ee2accfc0d708d094883475d21cc4cc95517ed4d23ab529ede7592ea3f1749fc.png

What we have done here is fit an N^th order polynomial to \(N\) data points. There is a possibility that we have overfit this data, and extrapolation is not reliable. However, interpolation by this method may be useful. We will return to this for larger data sets where \(N\) is much larger than the order of the polynomial when we talk about linear regression next week.

Interpolation libraries#

There are several interpolating functions available in scipy.interpolate. These are usually more flexible and convenient than writing your own interpolating code. They are more sophisticated, and have some features you should be aware of.

from scipy.interpolate import interp1d

Linear interpolation is the default, and we have to explicitly allow extrapolation.

xfit = np.linspace(0, 5)
Y = interp1d(x, y, kind='quadratic', bounds_error=False)
print(Y(xfit))
plt.plot(x, y, 'bo', xfit, Y(xfit));

from scipy.optimize import minimize
minimize(Y, 8)

[       nan        nan        nan        nan        nan        nan
        nan        nan        nan        nan        nan        nan
 4.38491214 4.33177969 4.2943467  4.27261317 4.26657909 4.27624446
 4.30160929 4.34267356 4.3994373  4.47190048 4.56006312 4.66392522
 4.78348677 4.91874777 5.06970822 5.23636813 5.41872749 5.61678631
 5.83054458 6.0600023  6.30515947 6.5660161  6.84257219 7.13482772
 7.44278271 7.76643716 8.10579106 8.46084441 8.83159721        nan
        nan        nan        nan        nan        nan        nan
        nan        nan]

  message: NaN result encountered.
  success: False
   status: 3
      fun: nan
        x: [ 8.000e+00]
      nit: 0
      jac: [       nan]
 hess_inv: [[1]]
     nfev: 2
     njev: 1

../_images/b0acad1ec47da5e03a0903a55b56f3e183af9ef2d5bc2ce1104cb17034631d60.png

def obj(x):
    return Y(x) - 7

from scipy.optimize import fsolve
fsolve(obj, 7)

/tmp/ipykernel_2687/3098855599.py:5: RuntimeWarning: The iteration is not making good progress, as measured by the 
 improvement from the last ten iterations.
  fsolve(obj, 7)

array([7.])

We can also specify quadratic spline interpolation.

Y = interp1d(x, y, kind='quadratic', bounds_error=False, fill_value='extrapolate')

plt.plot(x, y, 'bo', xfit, Y(xfit));

../_images/25c3ff6438ee13d708a5932d1c79cbb29d4c51a14c12925e5710c8c154e8d8f9.png

With more data points you can also use cubic interpolation, which fits a cubic polynomial between the points, and ensures smoothness and continuity of the derivatives at the endpoints.

Note that you have to make some decisions about how to interpolate. These functions can introduce wiggles that are not real. Especially when there are step or sharp changes in values.

x = np.array([1, 2, 3, 4, 5, 6, 7])
y = np.array([1, 2, 1, 1, 0, 0, 0])

s = interp1d(x, y, kind='cubic')
X = np.linspace(1, 7)
Y = s(X)

plt.plot(x, y, 'bo', X, Y);

../_images/5cb1e06982e4a11707ab957889ed1cedf17dc159143d5f96d0d5097d735f1b85.png

def f(xx):
    z = interp1d(x, y, kind='linear')(xx) * (xx >= 5)
    z += interp1d(x, y, kind='cubic')(xx) * (xx < 5)
    return z

plt.plot(X, f(X));

../_images/5667e1db6822c114994ef338079c06eaabac057c20b34648eda2fe849f931849.png

plt.plot(x, y, 'bo')

xfit = np.linspace(0, 7)
p3 = np.polyfit(x, y, 3)  # polyfit is a global fit of one polynomial
plt.plot(xfit, np.polyval(p3, xfit));

../_images/105215c6ae6bc2712861a717de031b20e57f8b7f490472af6f7f2bea6a386eb8.png

Interpolation is a kind of data driven model for developing a mathematical model from data that can be used to predict new values. These models are not based on physics, but they can be used for predicting new values, estimating derivatives, integrals, etc. Of course, you must be careful with extrapolation; all polynomials tend to ± infinity eventually, which is probably not physically relevant in most cases.

There are multidimensional interpolation functions in scipy.interpolate,

Eigenvalues#

Eigenvalues and eigenvectors form an important class of linear algebra problems. They are an unusual class of problems though. Recall that we can interpret the equation \(\mathbf{A}\mathbf{x} = \mathbf{b}\) as a linear transformation of the vector \(\mathbf{x}\) into the vector \(\mathbf{b}\). This will in general lead to rotation and stretching of the input vector. Sometimes though the new vector \(\mathbf{b}\) is simply a rescaling of the original vector, i.e. \(\mathbf{b} = \lambda \mathbf{x}\). λ is the scaling factor, and it just changes the magnitude of the \(\mathbf{x}\) vector. In this case, we call λ an \eigenvalue\, and \(\mathbf{x}\) and \eigenvector\ of the matrix \(\mathbf{A}\).

When you see a problem in the form:

\(\mathbf{A}\mathbf{x} = \lambda \mathbf{x}\)

It is called an eigenvalue problem. It is conventional to write it in the following form:

\((\mathbf{A} - \lambda \mathbf{I})\mathbf{x} = \mathbf{0}\)

Based on this equation, since \(\mathbf{x}\) can be anything, it is necessary for the determinant of the matrix on the left to be zero. The eigenvalues of \(\mathbf{A}\) are the ones that are solutions to

\(det(\mathbf{A} - \lambda \mathbf{I}) = 0\)

Computing the determinant leads to a characteristic polynomial in λ, and the roots of this polynomial are the eigenvalues of the matrix.

For an \(N \times N\) array there will be \(N\) eigenvalues, although some may be degenerate. The eigenvalues can be real or complex. For some matrices, we know some properties of the eigenvalues. We will consider some of them here.

For example, the eigenvalues of a symmetric matrix are always real. We can make a symmetric matrix with some algebra:

A = np.random.rand(3,3)
A += A.T  # This makes a symmetric matrix
A

array([[0.01208479, 0.44741897, 0.41062104],
       [0.44741897, 0.95874323, 0.18326069],
       [0.41062104, 0.18326069, 1.4907775 ]])

We get the eigenvalues with numpy.linalg.eigvals.

A = np.random.rand(3,3)
A += A.T  # This makes a symmetric matrix
np.linalg.eigvals(A)  # these should always be real

array([ 2.6253406 , -0.30967186,  0.25076246])

np.linalg.eigvals(np.random.rand(3, 3)) # not a symmetric matrix, can have complex eigenvalues, or all real values

array([1.61801431, 0.35677781, 0.21071087])

You can see these are all real.

The trace of a matrix is the sum of the diagonal elements. You can do this manually, or use numpy.trace.

np.sum(np.diag(A)), np.trace(A)

(np.float64(2.5664311992730493), np.float64(2.5664311992730493))

It is a property that the sum of the eigenvalues is equal to the trace:

\(trace \mathbf{A} = \sum \lambda_k\)

np.sum(np.linalg.eigvals(A))

np.float64(2.5664311992730497)

It is also true that the product of the eigenvalues is equal to the determinant:

\(det \mathbf{A} = \prod \lambda_k\)

np.prod(np.linalg.eigvals(A)), np.linalg.det(A)

(np.float64(-0.20386839897582312), np.float64(-0.20386839897582337))

We can also see the eigenvectors. The numpy.linalg.eig function returns both eigenvalues and eigenvectors. The eigenvectors are in columns

e, v = np.linalg.eig(A)  # e is the eigenvalues, v are the eigenvectors
e, v

(array([ 2.6253406 , -0.30967186,  0.25076246]),
 array([[-0.56318452, -0.78192401,  0.26724153],
        [-0.69063639,  0.26783811, -0.67177684],
        [-0.45370098,  0.56290104,  0.69086746]]))

These eigenvectors have the property that the are normalized to unit length:

# Both rows and columns have norm = 1
np.linalg.norm(v[0]), np.linalg.norm(v[:, 0])

(np.float64(1.0), np.float64(1.0))

A @ v[0], e[0] * v[0]  # This is the row

(array([-0.95831361, -1.46026576, -0.8461751 ]),
 array([-1.47855117, -2.05281686,  0.70160005]))

A @ v[:, 0], e[0] * v[:, 0] # this is the column. A x = lambda x
# This is correct, the eigenvectors are in the columns.

(array([-1.47855117, -1.81315576, -1.19111959]),
 array([-1.47855117, -1.81315576, -1.19111959]))

[np.linalg.norm(v[:, i]) for i in [0, 1, 2]]

[np.float64(1.0), np.float64(0.9999999999999999), np.float64(1.0)]

The eigenvectors are in columns in the order corresponding to the order of the eigenvalues (these are not necessarily sorted). Here, we show that the eigenvalue/eigenvector pairs satisfy \(\mathbf{A} \mathbf{v} = \lambda \mathbf{v}\).

[np.allclose(A @ v[:, 0], e[0] * v[:, 0]),
 np.allclose(A @ v[:, 1], e[1] * v[:, 1]),
 np.allclose(A @ v[:, 2], e[2] * v[:, 2])]

[True, True, True]

If you mix and match these, they do not satisfy the equations.

[np.allclose(A @ v[:, 0], e[1] * v[:, 2]),
 np.allclose(A @ v[:, 1], e[0] * v[:, 1]),
 np.allclose(A @ v[:, 2], e[2] * v[:, 0])]

[False, False, False]

The eigenvalues are not sorted. It is often useful to know the smallest, or largest eigenvalue, and to have the eigenvalues sorted. The tricky point to consider is the eigenvectors have to be sorted in the same order. It is also tricky that the eigenvectors are stored in columns, but sorting is done on rows. You can simply transpose the eigenvector array, sort on rows, and then transpose it back to columns.

i = np.argsort(e)

sorted_e = e[i]
sorted_v = v.T[i].T
sorted_e, sorted_v

(array([-0.30967186,  0.25076246,  2.6253406 ]),
 array([[-0.78192401,  0.26724153, -0.56318452],
        [ 0.26783811, -0.67177684, -0.69063639],
        [ 0.56290104,  0.69086746, -0.45370098]]))

As always it is a good idea to check that we did not mess up:

for i, se in enumerate(sorted_e):
    sv = sorted_v[:, i]
    print(np.allclose(A @ sv, se * sv))

True
True
True

Application to roots of a polynomial#

The eigenvalues of a matrix are related to the roots of the characteristic polynomial of the matrix. We can leverage this to find the roots of a polynomial by constructing a matrix that has as its characteristic polynomial the polynomial we want the roots for. Then, the roots of the polynomial are just the eigenvalues of that matrix.

This example is adapted from http://www.math.utah.edu/~gustafso/s2018/2270/labs/lab7-polyroot-qrmethod.pdf

First, we construct the companion matrix. For the polynomial \(p(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1} + x^n\) we construct:

\(C = \left[\begin{array}{ccccc} 0 & 1 & 0 & ... & 0\\ 0 & 0 & 1 & ... & 0\\ ... & ... & ... & \ddots & \vdots \\ 0 & 0 & 0 & ... & 1\\ -a_0 & -a_1 & -a_2 & ... & -a_{n-1} \end{array}\right]\)

Then, the eigenvalues of this matrix are equal to the roots of the polynomial. This matrix has ones on the diagonal above the main diagonal, and the coefficients up to the leading power on the bottom row. Note the coefficients are in the opposite order as we usually define them for np.roots.

The main diagonal has N elements in it, and the diagonal above that has N-2 elements in it.

There are a few ways to reverse the coefficients, here we use numpy.flipud which reverses the elements.

Let \(p(x) = 4 x^2 + 3x - 1\). We write the coefficient vector in the same order as used in np.roots.

p = np.array([4, 3, -1])
N = len(p)

C = np.diag(np.ones(N - 2), 1)
C

array([[0., 1.],
       [0., 0.]])

np.flipud(-p[1:] / p[0])

array([ 0.25, -0.75])

-p[1:] / p[0]

array([-0.75,  0.25])

C[-1, :] = np.flipud(-p[1:] / p[0])
C

array([[ 0.  ,  1.  ],
       [ 0.25, -0.75]])

Now the roots are found as the eigenvalues of the matrix.

np.linalg.eigvals(C), np.roots(p)

(array([ 0.25, -1.  ]), array([-1.  ,  0.25]))

This is essentially what the np.roots function does, although it uses a slightly different way to define the companion matrix.

import numpy as np

p = [4, 3, -1]
np.roots(p)

array([-1.  ,  0.25])

The order of the roots is not important; they may or may not be sorted.

Applications to optimization#

We can use eigenvalues to detect what kind of stationary point (f’(x) = 0) we are at. We have to know the Hessian matrix at the stationary point. The eigenvalues of this matrix tell us about the stationary point.

If all the eigenvalues are all positive, the matrix is called positive definite, and it means the stationary point is a minimum.
If all the eigenvalues are negative, the matrix is called negative definite, and it means the stationary point is a maximum.
If the signs of the eigenvalues are mixed then the stationary point is a saddle point.
If there are zeros, it is inconclusive, and further analysis is needed.

Let’s consider an example:

from scipy.optimize import minimize

def f(X):
    x, y = X
    return 2 * x**2 + 2 * x * y + 2 * y**2 - 6 * x

sol = minimize(f, [0, 0])
sol

  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: -5.999999999999998
        x: [ 2.000e+00 -1.000e+00]
      nit: 4
      jac: [ 1.192e-07  5.960e-08]
 hess_inv: [[ 3.333e-01 -1.667e-01]
            [-1.667e-01  3.333e-01]]
     nfev: 18
     njev: 6

We get an estimate of the inverse hessian here, so we convert it to a hessian first.

h = np.linalg.inv(sol['hess_inv'])
h

array([[3.99999959, 1.99999994],
       [1.99999994, 4.00000009]])

Now we check the eigenvalues:

np.linalg.eigvals(h)

array([1.9999999 , 5.99999978])

We have two positive eigenvalues, so the Hessian is positive definite, and we are at a minimum.

We can also use tools to compute the Hessian more directly (of course you can derive the partial derivatives by hand also):

import numdifftools as nd
H = nd.Hessian(f)
np.linalg.eigvals(H(sol.x))

array([2., 6.])

Note the order of the eigenvalues is not important.

We will see more about numerical tools for computing Hessians and derivatives next.

Summary#

Today we introduced the ideas behind interpolation which is a data-drive approach to model building that involves locally fitting functions to a few data points. We also introduced eigenvalues and eigenvectors, and some applications of how they are used.

Next week we will conclude linear algebra with linear regression.