* Visualizing uncertainty in linear regression
:PROPERTIES:
:categories: data analysis, uncertainty
:date: 2013/07/18 19:13:40
:updated: 2013/07/18 19:13:40
:END:
In this example, we show how to visualize uncertainty in a fit. The idea is to fit a model to [[http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm][data]], and get the uncertainty in the model parameters. Then we sample the parameters according to the normal distribution, and plot the corresponding distribution of models. We use transparent lines and allow the overlap to indicate the density of the fits.
The data is stored in a text file download PT.txt , with the following structure:
#+BEGIN_EXAMPLE
Run Ambient Fitted
Order Day Temperature Temperature Pressure Value Residual
1 1 23.820 54.749 225.066 222.920 2.146
...
#+END_EXAMPLE
We need to read the data in, and perform a regression analysis on P vs. T. In python we start counting at 0, so we actually want columns 3 and 4.
#+BEGIN_SRC python
import numpy as np
import matplotlib.pyplot as plt
from pycse import regress
data = np.loadtxt('../../pycse/data/PT.txt', skiprows=2)
T = data[:, 3]
P = data[:, 4]
A = np.column_stack([T**0, T])
p, pint, se = regress(A, P, 0.05)
print p, pint, se
plt.plot(T, P, 'k.')
plt.plot(T, np.dot(A, p))
# Now we plot the distribution of possible lines
N = 2000
B = np.random.normal(p[0], se[0], N)
M = np.random.normal(p[1], se[1], N)
x = np.array([min(T), max(T)])
for b,m in zip(B, M):
plt.plot(x, m*x + b, '-', color='gray', alpha=0.02)
plt.savefig('images/plotting-uncertainty.png')
#+END_SRC
#+RESULTS:
: [ 7.74899739 3.93014044] [[ 2.97964903 12.51834576]
: [ 3.82740876 4.03287211]] [ 2.35384765 0.05070183]
[[./images/plotting-uncertainty.png]]
Here you can see 2000 different lines that have some probability of being correct. The darkest gray is near the fit, as expected; the darker the gray the more probable it is the line. This is a qualitative way of judging the quality of the fit.
Note, this is not the prediction error that we are plotting, that is the uncertainty in where a predicted y-value is.