<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <atom:link href="http://kitchingroup.cheme.cmu.edu/blog/feed/index.xml" rel="self" type="application/rss+xml" />
    <title>The Kitchin Research Group</title>
    <link>https://kitchingroup.cheme.cmu.edu/blog</link>
    <description>Chemical Engineering at Carnegie Mellon University</description>
    <pubDate>Sat, 01 Nov 2025 13:47:46 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    
    <item>
      <title>Visualizing uncertainty in linear regression</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/07/18/Visualizing-uncertainty-in-linear-regression</link>
      <pubDate>Thu, 18 Jul 2013 19:13:40 EDT</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[uncertainty]]></category>
      <guid isPermaLink="false">6et-kvuDQR-6PXnXSkJyua0xEhc=</guid>
      <description>Visualizing uncertainty in linear regression</description>
      <content:encoded><![CDATA[




&lt;p&gt;
In this example, we show how to visualize  uncertainty in a fit. The idea is to fit a model to &lt;a href="http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm"&gt;data&lt;/a&gt;, and get the uncertainty in the model parameters. Then we sample the parameters according to the normal distribution, and plot the corresponding distribution of models. We use transparent lines and allow the overlap to indicate the density of the fits.
&lt;/p&gt;

&lt;p&gt;
The data is stored in a text file download PT.txt , with the following structure:
&lt;/p&gt;

&lt;pre class="example"&gt;
Run          Ambient                            Fitted
 Order  Day  Temperature  Temperature  Pressure    Value    Residual
  1      1      23.820      54.749      225.066   222.920     2.146
...
&lt;/pre&gt;

&lt;p&gt;
We need to read the data in, and perform a regression analysis on P vs. T. In python we start counting at 0, so we actually want columns 3 and 4.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; pycse &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; regress

data = np.loadtxt(&lt;span style="color: #228b22;"&gt;'../../pycse/data/PT.txt'&lt;/span&gt;, skiprows=2)
T = data[:, 3]
P = data[:, 4]

A = np.column_stack([T**0, T])

p, pint, se = regress(A, P, 0.05)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; p, pint, se
plt.plot(T, P, &lt;span style="color: #228b22;"&gt;'k.'&lt;/span&gt;)
plt.plot(T, np.dot(A, p))

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# Now we plot the distribution of possible lines&lt;/span&gt;
N = 2000
B = np.random.normal(p[0], se[0], N)
M = np.random.normal(p[1], se[1], N)
x = np.array([&lt;span style="color: #8b0000;"&gt;min&lt;/span&gt;(T), &lt;span style="color: #8b0000;"&gt;max&lt;/span&gt;(T)])

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; b,m &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(B, M):
    plt.plot(x, m*x + b, &lt;span style="color: #228b22;"&gt;'-'&lt;/span&gt;, color=&lt;span style="color: #228b22;"&gt;'gray'&lt;/span&gt;, alpha=0.02)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/plotting-uncertainty.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[ 7.74899739  3.93014044] [[  2.97964903  12.51834576]
 [  3.82740876   4.03287211]] [ 2.35384765  0.05070183]
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/plotting-uncertainty.png"&gt;&lt;p&gt;

&lt;p&gt;
Here you can see 2000 different lines that have some probability of being correct. The darkest gray is near the fit, as expected; the darker the gray the more probable it is the line. This is a qualitative way of judging the quality of the fit.
&lt;/p&gt;

&lt;p&gt;
Note, this is not the prediction error that we are plotting, that is the uncertainty in where a predicted y-value is. 
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/07/18/Visualizing-uncertainty-in-linear-regression.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Uncertainty in polynomial roots - Part II</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/07/06/Uncertainty-in-polynomial-roots-Part-II</link>
      <pubDate>Sat, 06 Jul 2013 15:31:38 EDT</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[uncertainty]]></category>
      <guid isPermaLink="false">McvWDyZQgz4sfhRBKgJxaDZOLjA=</guid>
      <description>Uncertainty in polynomial roots - Part II</description>
      <content:encoded><![CDATA[


&lt;p&gt;
We previously looked at uncertainty in polynomial roots where we had an analytical formula for the roots of the polynomial, and we knew the uncertainties in the polynomial parameters. It would be inconvenient to try this for a cubic polynomial, although there may be formulas for the roots. I do not know of there are general formulas for the roots of a 4&lt;sup&gt;th&lt;/sup&gt; order polynomial or higher. 
&lt;/p&gt;

&lt;p&gt;
Unfortunately, we cannot use the uncertainties package out of the box directly here.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
c, b, a = [-0.99526746, -0.011546,    1.00188999]
sc, sb, sa = [ 0.0249142,   0.00860025,  0.00510128]

A = u.ufloat((a, sa))
B = u.ufloat((b, sb))
C = u.ufloat((c, sc))

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.roots([A, B, C])
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; Traceback (most recent call last):
  File "&amp;lt;stdin&amp;gt;", line 1, in &amp;lt;module&amp;gt;
  File "c:\Users\jkitchin\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\lib\polynomial.py", line 218, in roots
    p = p.astype(float)
  File "c:\Users\jkitchin\AppData\Local\Enthought\Canopy\User\lib\site-packages\uncertainties\__init__.py", line 1257, in raise_error
    % (self.__class__, coercion_type))
TypeError: can't convert an affine function (&amp;lt;class 'uncertainties.Variable'&amp;gt;) to float; use x.nominal_value
&lt;/pre&gt;

&lt;p&gt;
To make some progress, we have to understand how the &lt;a href="https://github.com/numpy/numpy/blob/v1.7.0/numpy/lib/polynomial.py#L149"&gt;numpy.roots&lt;/a&gt; function works. It constructs a &lt;a href="http://en.wikipedia.org/wiki/Companion_matrix"&gt;Companion matrix&lt;/a&gt;, and the eigenvalues of that matrix are the same as the roots of the polynomial.  
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

c0, c1, c2 = [-0.99526746, -0.011546,    1.00188999]

p = np.array([c2, c1, c0])
N = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(p)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# we construct the companion matrix like this&lt;/span&gt;
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# see https://github.com/numpy/numpy/blob/v1.7.0/numpy/lib/polynomial.py#L220&lt;/span&gt;
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# for this code.&lt;/span&gt;
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# build companion matrix and find its eigenvalues (the roots)&lt;/span&gt;
A = np.diag(np.ones((N-2,), p.dtype), -1)
A[0, :] = -p[1:] / p[0]

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; A

roots = np.linalg.eigvals(A)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; roots
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[[ 0.01152422  0.99338996]
 [ 1.          0.        ]]
[ 1.00246827 -0.99094405]
&lt;/pre&gt;

&lt;p&gt;
This definition of the companion matrix is a little different than the one &lt;a href="http://en.wikipedia.org/wiki/Companion_matrix"&gt;here&lt;/a&gt;, but primarily in the scaling of the coefficients. That does not seem to change the eigenvalues, or the roots. 
&lt;/p&gt;

&lt;p&gt;
Now, we have a path to estimate the uncertainty in the roots. Since we know the polynomial coefficients and their uncertainties from the fit, we can use Monte Carlo sampling to estimate the uncertainty in the roots. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u

c, b, a = [-0.99526746, -0.011546,    1.00188999]
sc, sb, sa = [ 0.0249142,   0.00860025,  0.00510128]

NSAMPLES = 100000
A = np.random.normal(a, sa, (NSAMPLES, ))
B = np.random.normal(b, sb, (NSAMPLES, ))
C = np.random.normal(c, sc, (NSAMPLES, ))

roots = [[] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; i &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;range&lt;/span&gt;(NSAMPLES)]

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; i &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;range&lt;/span&gt;(NSAMPLES):
    p = np.array([A[i], B[i], C[i]])
    N = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(p)
    
    M = np.diag(np.ones((N-2,), p.dtype), -1)
    M[0, :] = -p[1:] / p[0]
    r = np.linalg.eigvals(M)
    r.sort()  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# there is no telling what order the values come out in&lt;/span&gt;
    roots[i] = r
    
avg = np.average(roots, axis=0)
std = np.std(roots, axis=0)

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; r, s &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(avg, std):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'{0: f} +/- {1: f}'&lt;/span&gt;.format(r, s)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
-0.990949 +/-  0.013435
 1.002443 +/-  0.013462
&lt;/pre&gt;

&lt;p&gt;
Compared to our previous approach with the uncertainties package where we got:
&lt;/p&gt;

&lt;pre class="example"&gt;
: -0.990944048037+/-0.0134208013339
:  1.00246826738 +/-0.0134477390832
&lt;/pre&gt;

&lt;p&gt;
the agreement is quite good! The advantage of this approach is that we do not have to know the formula for the roots of higher order polynomials to estimate the uncertainty in the roots. The downside is we have to evaluate the eigenvalues of a matrix a large number of times to get good estimates of the uncertainty. For high power polynomials this could be problematic. I do not currently see a way around this, unless it becomes possible to get the uncertainties package to propagate through the numpy.eigvals function. It is possible to &lt;a href="http://pythonhosted.org/uncertainties/user_guide.html#making-custom-functions-accept-numbers-with-uncertainties"&gt;wrap&lt;/a&gt; some functions with uncertainties, but so far only functions that return a single number.
&lt;/p&gt;

&lt;p&gt;
There are some other potential problems with this approach.  It is assumed that the accuracy of the eigenvalue solver is much better than the uncertainty in the polynomial parameters. You have to use some judgment in using these uncertainties. We are approximating the uncertainties of a nonlinear problem. In other words, the uncertainties of the roots are not linearly dependent on the uncertainties of the polynomial coefficients.  
&lt;/p&gt;

&lt;p&gt;
It is possible to &lt;a href="http://pythonhosted.org/uncertainties/user_guide.html#making-custom-functions-accept-numbers-with-uncertainties"&gt;wrap&lt;/a&gt; some functions with uncertainties, but so far only functions that return a single number. Here is an example of getting the n&lt;sup&gt;th&lt;/sup&gt; root and its uncertainty.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

@u.wrap
&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;f&lt;/span&gt;(n=0, *P):
    &lt;span style="color: #228b22;"&gt;''' compute the nth root of the polynomial P and the uncertainty of the root'''&lt;/span&gt;
    p =  np.array(P)
    N = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(p)
    
    M = np.diag(np.ones((N-2,), p.dtype), -1)
    M[0, :] = -p[1:] / p[0]
    r = np.linalg.eigvals(M)
    r.sort()  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# there is no telling what order the values come out in&lt;/span&gt;
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; r[n]

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# our polynomial coefficients and standard errors&lt;/span&gt;
c, b, a = [-0.99526746, -0.011546,    1.00188999]
sc, sb, sa = [ 0.0249142,   0.00860025,  0.00510128]

A = u.ufloat((a, sa))
B = u.ufloat((b, sb))
C = u.ufloat((c, sc))

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; result &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; [f(n, A, B, C) &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; n &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; [0, 1]]:
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; result
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
-0.990944048037+/-0.013420800377
1.00246826738+/-0.0134477388218
&lt;/pre&gt;

&lt;p&gt;
It is good to see this is the same result we got earlier, with &lt;i&gt;a lot less work&lt;/i&gt; (although we do have to solve it for each root, which is a bit redundant)! It is a bit more abstract though, and requires a specific formulation of the function for the wrapper to work.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/07/06/Uncertainty-in-polynomial-roots---Part-II.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Uncertainty in polynomial roots</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/07/05/Uncertainty-in-polynomial-roots</link>
      <pubDate>Fri, 05 Jul 2013 09:10:09 EDT</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[uncertainty]]></category>
      <guid isPermaLink="false">6-Z6PdLBsxMl0CJ9rnMQKQEamfI=</guid>
      <description>Uncertainty in polynomial roots</description>
      <content:encoded><![CDATA[



&lt;p&gt;
Polynomials are convenient for fitting to data. Frequently we need to derive some properties of the data from the fit, e.g. the minimum value, or the slope, etc&amp;#x2026; Since we are fitting data, there is uncertainty in the polynomial parameters, and corresponding uncertainty in any properties derived from those parameters. 
&lt;/p&gt;

&lt;p&gt;
Here is our data.
&lt;/p&gt;

&lt;table id="data" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col class="right"/&gt;

&lt;col class="right"/&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="right"&gt;-3.00&lt;/td&gt;
&lt;td class="right"&gt;8.10&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;-2.33&lt;/td&gt;
&lt;td class="right"&gt;4.49&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;-1.67&lt;/td&gt;
&lt;td class="right"&gt;1.73&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;-1.00&lt;/td&gt;
&lt;td class="right"&gt;-0.02&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;-0.33&lt;/td&gt;
&lt;td class="right"&gt;-0.90&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;0.33&lt;/td&gt;
&lt;td class="right"&gt;-0.83&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;1.00&lt;/td&gt;
&lt;td class="right"&gt;0.04&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;1.67&lt;/td&gt;
&lt;td class="right"&gt;1.78&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;2.33&lt;/td&gt;
&lt;td class="right"&gt;4.43&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;3.00&lt;/td&gt;
&lt;td class="right"&gt;7.95&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

x = [a[0] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; a &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data]
y = [a[1] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; a &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data]
plt.plot(x, y)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/uncertain-roots.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="/img/./images/uncertain-roots.png"&gt;&lt;p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; pycse &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; regress

x = np.array([a[0] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; a &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data])
y = [a[1] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; a &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data]

A = np.column_stack([x**0, x**1, x**2])

p, pint, se = regress(A, y, alpha=0.05)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; p

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; pint

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; se

plt.plot(x, y, &lt;span style="color: #228b22;"&gt;'bo '&lt;/span&gt;)

xfit = np.linspace(x.min(), x.max())
plt.plot(xfit, np.dot(np.column_stack([xfit**0, xfit**1, xfit**2]), p), &lt;span style="color: #228b22;"&gt;'b-'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/uncertain-roots-1.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[-0.99526746 -0.011546    1.00188999]
[[-1.05418017 -0.93635474]
 [-0.03188236  0.00879037]
 [ 0.98982737  1.01395261]]
[ 0.0249142   0.00860025  0.00510128]
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/uncertain-roots-1.png"&gt;&lt;p&gt;

&lt;p&gt;
Since this is a quadratic equation, we know the roots analytically: \(x = \frac{-b \pm \sqrt{b^2 - 4 a c}{2 a}\). So, we can use the uncertainties package to directly compute the uncertainties in the roots. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u

c, b, a = [-0.99526746, -0.011546,    1.00188999]
sc, sb, sa = [ 0.0249142,   0.00860025,  0.00510128]

A = u.ufloat((a, sa))
B = u.ufloat((b, sb))
C = u.ufloat((c, sc))

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# np.sqrt does not work with uncertainity&lt;/span&gt;
r1 = (-B + (B**2 - 4 * A * C)**0.5) / (2 * A)
r2 = (-B - (B**2 - 4 * A * C)**0.5) / (2 * A)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; r1
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; r2
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1.00246826738+/-0.0134477390832
-0.990944048037+/-0.0134208013339
&lt;/pre&gt;

&lt;p&gt;
The minimum is also straightforward to analyze here. The derivative of the polynomial is \(2 a x + b\) and it is equal to zero at \(x = -b / (2 a)\).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u

c, b, a = [-0.99526746, -0.011546,    1.00188999]
sc, sb, sa = [ 0.0249142,   0.00860025,  0.00510128]

A = u.ufloat((a, sa))
B = u.ufloat((b, sb))

zero = -B / (2 * A)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'The minimum is at {0}.'&lt;/span&gt;.format(zero)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
The minimum is at 0.00576210967034+/-0.00429211341136.
&lt;/pre&gt;

&lt;p&gt;
You can see there is uncertainty in both the roots of the original polynomial, as well as the minimum of the data. The approach here worked well because the polynomials were low order (quadratic or linear) where we know the formulas for the roots. Consequently, we can take advantage of the uncertainties module with little effort to propagate the errors. For higher order polynomials, we would probably have to do some wrapping of functions to propagate uncertainties.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/07/05/Uncertainty-in-polynomial-roots.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Estimating where two functions intersect using data</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/07/04/Estimating-where-two-functions-intersect-using-data</link>
      <pubDate>Thu, 04 Jul 2013 14:38:07 EDT</pubDate>
      <category><![CDATA[data analysis]]></category>
      <guid isPermaLink="false">JMMs3AViG7GstYqrPY7Mux6TAwM=</guid>
      <description>Estimating where two functions intersect using data</description>
      <content:encoded><![CDATA[



&lt;p&gt;
Suppose we have two functions described by this data:
&lt;/p&gt;

&lt;table id="data" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"&gt;


&lt;colgroup&gt;
&lt;col class="right"/&gt;

&lt;col class="right"/&gt;

&lt;col class="right"/&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope="col" class="right"&gt;T(K)&lt;/th&gt;
&lt;th scope="col" class="right"&gt;E1&lt;/th&gt;
&lt;th scope="col" class="right"&gt;E2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class="right"&gt;300&lt;/td&gt;
&lt;td class="right"&gt;-208&lt;/td&gt;
&lt;td class="right"&gt;-218&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;400&lt;/td&gt;
&lt;td class="right"&gt;-212&lt;/td&gt;
&lt;td class="right"&gt;-221&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;500&lt;/td&gt;
&lt;td class="right"&gt;-215&lt;/td&gt;
&lt;td class="right"&gt;-220&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;600&lt;/td&gt;
&lt;td class="right"&gt;-218&lt;/td&gt;
&lt;td class="right"&gt;-222&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;700&lt;/td&gt;
&lt;td class="right"&gt;-220&lt;/td&gt;
&lt;td class="right"&gt;-222&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;800&lt;/td&gt;
&lt;td class="right"&gt;-223&lt;/td&gt;
&lt;td class="right"&gt;-224&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;900&lt;/td&gt;
&lt;td class="right"&gt;-227&lt;/td&gt;
&lt;td class="right"&gt;-225&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;1000&lt;/td&gt;
&lt;td class="right"&gt;-229&lt;/td&gt;
&lt;td class="right"&gt;-227&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;1100&lt;/td&gt;
&lt;td class="right"&gt;-233&lt;/td&gt;
&lt;td class="right"&gt;-228&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;1200&lt;/td&gt;
&lt;td class="right"&gt;-235&lt;/td&gt;
&lt;td class="right"&gt;-227&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class="right"&gt;1300&lt;/td&gt;
&lt;td class="right"&gt;-240&lt;/td&gt;
&lt;td class="right"&gt;-229&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
We want to determine the temperature at which they intersect, and more importantly what the uncertainty on the intersection is. There is noise in the data, which means there is uncertainty in any function that could be fit to it, and that uncertainty would propagate to the intersection. Let us examine the data.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

T = [x[0] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data]
E1 = [x[1] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data]
E2 = [x[2] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data]

plt.plot(T, E1, T, E2)
plt.legend([&lt;span style="color: #228b22;"&gt;'E1'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'E2'&lt;/span&gt;])
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/intersection-0.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="/img/./images/intersection-0.png"&gt;&lt;p&gt;

&lt;p&gt;
Our strategy is going to be to fit functions to each data set, and get the confidence intervals on the parameters of the fit. Then, we will solve the equations to find where they are equal to each other and propagate the uncertainties in the parameters to the answer.
&lt;/p&gt;

&lt;p&gt;
These functions look approximately linear, so we will fit lines to each function. We use the regress function in pycse to get the uncertainties on the fits. Then, we use the uncertainties package to propagate the uncertainties in the analytical solution to the intersection of two lines.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; pycse &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; regress
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u

T = np.array([x[0] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data])
E1 = np.array([x[1] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data])
E2 = np.array([x[2] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data])

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;columns of the x-values for a line: constant, T&lt;/span&gt;
A = np.column_stack([T**0, T])

p1, pint1, se1 = regress(A, E1, alpha=0.05)

p2, pint2, se2 = regress(A, E2, alpha=0.05)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Now we have two lines: y1 = m1*T + b1 and y2 = m2*T + b2&lt;/span&gt;
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;they intersect at m1*T + b1 = m2*T + b2&lt;/span&gt;
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;or at T = (b2 - b1) / (m1 - m2)&lt;/span&gt;
b1 = u.ufloat((p1[0], se1[0]))
m1 = u.ufloat((p1[1], se1[1]))

b2 = u.ufloat((p2[0], se2[0]))
m2 = u.ufloat((p2[1], se2[1]))

T_intersection = (b2 - b1) / (m1 - m2)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; T_intersection

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;plot the data, the fits and the intersection and \pm 2 \sigma.&lt;/span&gt;
plt.plot(T, E1, &lt;span style="color: #228b22;"&gt;'bo '&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'E1'&lt;/span&gt;)
plt.plot(T, np.dot(A,p1), &lt;span style="color: #228b22;"&gt;'b-'&lt;/span&gt;)
plt.plot(T, E2, &lt;span style="color: #228b22;"&gt;'ro '&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'E2'&lt;/span&gt;)
plt.plot(T, np.dot(A,p2), &lt;span style="color: #228b22;"&gt;'r-'&lt;/span&gt;)

plt.plot(T_intersection.nominal_value,
        (b1 + m1*T_intersection).nominal_value, &lt;span style="color: #228b22;"&gt;'go'&lt;/span&gt;,
        ms=13, alpha=0.2, label=&lt;span style="color: #228b22;"&gt;'Intersection'&lt;/span&gt;)
plt.plot([T_intersection.nominal_value - 2*T_intersection.std_dev(),
          T_intersection.nominal_value + 2*T_intersection.std_dev()],
         [(b1 + m1*T_intersection).nominal_value, 
          (b1 + m1*T_intersection).nominal_value],
         &lt;span style="color: #228b22;"&gt;'g-'&lt;/span&gt;, lw=3, label=&lt;span style="color: #228b22;"&gt;'$\pm 2 \sigma$'&lt;/span&gt;)
       
plt.legend(loc=&lt;span style="color: #228b22;"&gt;'best'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/intersection-1.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
813.698630137+/-62.407180552
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/intersection-1.png"&gt;&lt;p&gt;

&lt;p&gt;
You can see there is a substantial uncertainty in the temperature at approximately the 90% confidence level (&amp;plusmn; 2 &amp;sigma;).
&lt;/p&gt;


&lt;p&gt;
&lt;span class="underline"&gt;Update 7-7-2013&lt;/span&gt;
&lt;/p&gt;

&lt;p&gt;
After a suggestion from Prateek, here we subtract the two data sets, fit a line to that data, and then use fsolve to find the zero. We &lt;a href="http://pythonhosted.org/uncertainties/user_guide.html#making-custom-functions-accept-numbers-with-uncertainties"&gt;wrap&lt;/a&gt; fsolve in the uncertainties package to directly get the uncertainty on the root. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; pycse &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; regress
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; fsolve


T = np.array([x[0] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data])
E1 = np.array([x[1] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data])
E2 = np.array([x[2] &lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; x &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; data])

E = E1 - E2

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;columns of the x-values for a line: constant, T&lt;/span&gt;
A = np.column_stack([T**0, T])

p, pint, se = regress(A, E, alpha=0.05)

b = u.ufloat((p[0], se[0]))
m = u.ufloat((p[1], se[1]))

@u.wrap
&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;f&lt;/span&gt;(b, m):
    X, = fsolve(&lt;span style="color: #8b0000;"&gt;lambda&lt;/span&gt; x: b + m * x, 800)
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; X

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; f(b, m)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
813.698630137+/-54.0386903923
&lt;/pre&gt;

&lt;p&gt;
Interesting that this uncertainty is a little smaller than the previously computed uncertainty. Here you can see we have to wrap the function in a peculiar way. The function must return a single float number, and take arguments with uncertainty. We define the polynomial fit (a line in this case) in a lambda function inside the function. It works ok.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/07/04/Estimating-where-two-functions-intersect-using-data.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Constrained fits to data</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/06/11/Constrained-fits-to-data</link>
      <pubDate>Tue, 11 Jun 2013 19:39:59 EDT</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[optimization]]></category>
      <guid isPermaLink="false">tkPJI7DfYBi--ElDtj1PvwvMk08=</guid>
      <description>Constrained fits to data</description>
      <content:encoded><![CDATA[


&lt;p&gt;
Our objective here is to fit a quadratic function in the least squares sense to some data, but we want to constrain the fit so that the function has specific values at the end-points. The application is to fit a function to the lattice constant of an alloy at different compositions. We constrain the fit because we know the lattice constant of the pure metals, which are at the end-points of the fit and we want these to be correct. 
&lt;/p&gt;

&lt;p&gt;
We define the alloy composition in terms of the mole fraction of one species, e.g. \(A_xB_{1-x}\). For \(x=0\), the alloy is pure B, whereas for \(x=1\) the alloy is pure A. According to Vegard's law the lattice constant is a linear composition weighted average of the pure component lattice constants, but sometimes small deviations are observed. Here we will fit a quadratic function that is constrained to give the pure metal component lattice constants at the end points. 
&lt;/p&gt;

&lt;p&gt;
The quadratic function is \(y = a x^2 + b x + c\). One constraint is at \(x=0\) where \(y = c\), or \(c\) is the lattice constant of pure B. The second constraint is at \(x=1\), where \(a + b + c\) is equal to the lattice constant of pure A. Thus, there is only one degree of freedom. \(c = LC_B\), and \(b = LC_A - c - a\), so \(a\) is our only variable.
&lt;/p&gt;

&lt;p&gt;
We will solve this problem by minimizing the summed squared error between the fit and the data. We use the &lt;code&gt;fmin&lt;/code&gt; function in &lt;code&gt;scipy.optimize&lt;/code&gt;. First we create a fit function that encodes the constraints. Then we create an objective function that will be minimized. We have to make a guess about the value of \(a\) that minimizes the summed squared error. A line fits the data moderately well, so we guess a small value, i.e. near zero, for \(a\). Here is the solution.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Data to fit to&lt;/span&gt;
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;x=0 is pure B&lt;/span&gt;
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;x=1 is pure A&lt;/span&gt;
X = np.array([0.0, 0.1,  0.25, 0.5,  0.6,  0.8,  1.0])
Y = np.array([3.9, 3.89, 3.87, 3.78, 3.75, 3.69, 3.6])

&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;func&lt;/span&gt;(a, XX):
    LC_A = 3.6
    LC_B = 3.9

    c = LC_B
    b = LC_A - c - a

    yfit = a * XX**2 + b * XX + c
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; yfit

&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;objective&lt;/span&gt;(a):
    &lt;span style="color: #228b22;"&gt;'function to minimize'&lt;/span&gt;
    SSE = np.sum((Y - func(a, X))**2)
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; SSE


&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; fmin

a_fit = fmin(objective, 0)
plt.plot(X, Y, &lt;span style="color: #228b22;"&gt;'bo '&lt;/span&gt;)

x = np.linspace(0, 1)
plt.plot(x, func(a_fit, x))
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/constrained-quadratic-fit.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
Optimization terminated successfully.
         Current function value: 0.000445
         Iterations: 19
         Function evaluations: 38
&lt;/pre&gt;

&lt;p&gt;
Here is the result:
&lt;p&gt;&lt;img src="/img/./images/constrained-quadratic-fit.png"&gt;&lt;p&gt;
&lt;/p&gt;

&lt;p&gt;
You can see that the end points go through the end-points as prescribed. 
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/06/11/Constrained-fits-to-data.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Peak finding in Raman spectroscopy</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/27/Peak-finding-in-Raman-spectroscopy</link>
      <pubDate>Wed, 27 Feb 2013 10:55:57 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <guid isPermaLink="false">NK2O2N0NBPSKTUWhZZVAXTLPj0k=</guid>
      <description>Peak finding in Raman spectroscopy</description>
      <content:encoded><![CDATA[


&lt;div id="table-of-contents"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;div id="text-table-of-contents"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#sec-1"&gt;1. Summary notes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;
Raman spectroscopy is a vibrational spectroscopy. The data typically comes as intensity vs. wavenumber, and it is discrete. Sometimes it is necessary to identify the precise location of a peak. In this post, we will use spline smoothing to construct an interpolating function of the data, and then use fminbnd to identify peak positions.
&lt;/p&gt;

&lt;p&gt;
This example was originally worked out in Matlab at &lt;a href="http://matlab.cheme.cmu.edu/2012/08/27/peak-finding-in-raman-spectroscopy/"&gt;http://matlab.cheme.cmu.edu/2012/08/27/peak-finding-in-raman-spectroscopy/&lt;/a&gt; 
&lt;/p&gt;

&lt;p&gt;
numpy:loadtxt
&lt;/p&gt;

&lt;p&gt;
Let us take a look at the raw data.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;w&lt;/span&gt;, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;i&lt;/span&gt; = np.loadtxt(&lt;span style="color: #228b22;"&gt;'data/raman.txt'&lt;/span&gt;, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;usecols&lt;/span&gt;=(&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;0&lt;/span&gt;, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;1&lt;/span&gt;), &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;unpack&lt;/span&gt;=&lt;span style="color: #8b0000;"&gt;True&lt;/span&gt;)

plt.plot(w, i)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Raman shift (cm$^{-1}$)'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Intensity (counts)'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/raman-1.png'&lt;/span&gt;)
plt.show()
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x10b1d3190&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x10b1b1b10&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x10bc7f310&amp;gt;
&lt;/pre&gt;


&lt;div class="figure"&gt;
&lt;p&gt;&lt;img src="/media/2013-02-27-Peak-finding-in-Raman-spectroscopy/raman-1.png"&gt; 
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
The next thing to do is narrow our focus to the region we are interested in between 1340 cm^{-1} and 1360 cm^{-1}.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;ind&lt;/span&gt; = (w &amp;gt; &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;1340&lt;/span&gt;) &amp;amp; (w &amp;lt; &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;1360&lt;/span&gt;)
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;w1&lt;/span&gt; = w[ind]
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;i1&lt;/span&gt; = i[ind]

plt.plot(w1, i1, &lt;span style="color: #228b22;"&gt;'b. '&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Raman shift (cm$^{-1}$)'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Intensity (counts)'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/raman-2.png'&lt;/span&gt;)
plt.show()
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x10bc7a4d0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x10bc08090&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x10bc49710&amp;gt;
&lt;/pre&gt;


&lt;div class="figure"&gt;
&lt;p&gt;&lt;img src="/media/2013-02-27-Peak-finding-in-Raman-spectroscopy/raman-2.png"&gt; 
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Next we consider a scipy:UnivariateSpline. This function "smooths" the data.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.interpolate &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; UnivariateSpline

# &lt;span style="color: #ff0000; font-weight: bold;"&gt;s is a "smoothing" factor&lt;/span&gt;
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;sp&lt;/span&gt; = UnivariateSpline(w1, i1, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;k&lt;/span&gt;=&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;4&lt;/span&gt;, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;s&lt;/span&gt;=&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;2000&lt;/span&gt;)

plt.plot(w1, i1, &lt;span style="color: #228b22;"&gt;'b. '&lt;/span&gt;)
plt.plot(w1, sp(w1), &lt;span style="color: #228b22;"&gt;'r-'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Raman shift (cm$^{-1}$)'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Intensity (counts)'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/raman-3.png'&lt;/span&gt;)
plt.show()
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; ... &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x1105633d0&amp;gt;]
[&amp;lt;matplotlib.lines.Line2D object at 0x10dd70250&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x10dd65f10&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x1105409d0&amp;gt;
&lt;/pre&gt;


&lt;div class="figure"&gt;
&lt;p&gt;&lt;img src="/media/2013-02-27-Peak-finding-in-Raman-spectroscopy/raman-3.png"&gt; 
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Note that the UnivariateSpline function returns a "callable" function! Our next goal is to find the places where there are peaks. This is defined by the first derivative of the data being equal to zero. It is easy to get the first derivative of a UnivariateSpline with a second argument as shown below.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;# &lt;span style="color: #ff0000; font-weight: bold;"&gt;get the first derivative evaluated at all the points&lt;/span&gt;
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;d1s&lt;/span&gt; = sp.derivative()

&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;d1&lt;/span&gt; = d1s(w1)

# &lt;span style="color: #ff0000; font-weight: bold;"&gt;we can get the roots directly here, which correspond to minima and&lt;/span&gt;
# &lt;span style="color: #ff0000; font-weight: bold;"&gt;maxima.&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt;(&lt;span style="color: #228b22;"&gt;'Roots = {}'&lt;/span&gt;.format(sp.derivative().roots()))
&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;minmax&lt;/span&gt; = sp.derivative().roots()

plt.clf()
plt.plot(w1, d1, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;label&lt;/span&gt;=&lt;span style="color: #228b22;"&gt;'first derivative'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Raman shift (cm$^{-1}$)'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'First derivative'&lt;/span&gt;)
plt.grid()

plt.plot(minmax, d1s(minmax), &lt;span style="color: #228b22;"&gt;'ro '&lt;/span&gt;, &lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;label&lt;/span&gt;=&lt;span style="color: #228b22;"&gt;'zeros'&lt;/span&gt;)
plt.legend(&lt;span style="color: #000000; background-color: #cccccc; font-weight: bold;"&gt;loc&lt;/span&gt;=&lt;span style="color: #228b22;"&gt;'best'&lt;/span&gt;)

plt.plot(w1, i1, &lt;span style="color: #228b22;"&gt;'b. '&lt;/span&gt;)
plt.plot(w1, sp(w1), &lt;span style="color: #228b22;"&gt;'r-'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Raman shift (cm$^{-1}$)'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Intensity (counts)'&lt;/span&gt;)
plt.plot(minmax, sp(minmax), &lt;span style="color: #228b22;"&gt;'ro '&lt;/span&gt;)

plt.savefig(&lt;span style="color: #228b22;"&gt;'images/raman-4.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; ... ... Roots = [ 1346.4623087   1347.42700893  1348.16689639]
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x1106b2dd0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x110623910&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x110c0a090&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x10b1bacd0&amp;gt;]
&amp;lt;matplotlib.legend.Legend object at 0x1106b2650&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x1106b2b50&amp;gt;]
[&amp;lt;matplotlib.lines.Line2D object at 0x110698550&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x110623910&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x110c0a090&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x110698a10&amp;gt;]
&lt;/pre&gt;


&lt;div class="figure"&gt;
&lt;p&gt;&lt;img src="/media/2013-02-27-Peak-finding-in-Raman-spectroscopy/raman-4.png"&gt; 
&lt;/p&gt;
&lt;/div&gt;



&lt;p&gt;
In the end, we have illustrated how to construct a spline smoothing interpolation function and to find maxima in the function, including generating some initial guesses. There is more art to this than you might like, since you have to judge how much smoothing is enough or too much. With too much, you may smooth peaks out. With too little, noise may be mistaken for peaks.
&lt;/p&gt;

&lt;div id="outline-container-sec-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Summary notes&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
Using org-mode with :session allows a large script to be broken up into mini sections. However, it only seems to work with the default python mode in Emacs, and it does not work with emacs-for-python or the latest python-mode. I also do not really like the output style, e.g. the output from the plotting commands.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2014 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/27/Peak-finding-in-Raman-spectroscopy.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;&lt;p&gt;Org-mode version = 8.2.7c&lt;/p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Graphical methods to help get initial guesses for multivariate nonlinear regression</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Graphical-methods-to-help-get-initial-guesses-for-multivariate-nonlinear-regression</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[plotting]]></category>
      <guid isPermaLink="false">hc41rt6G_ccSfgMQLPZz_EZhnt0=</guid>
      <description>Graphical methods to help get initial guesses for multivariate nonlinear regression</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/10/09/graphical-methods-to-help-get-initial-guesses-for-multivariate-nonlinear-regression/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Fit the model f(x1,x2; a,b) = a*x1 + x2^b to the data given below. This model has two independent variables, and two parameters.
&lt;/p&gt;

&lt;p&gt;
We want to do a nonlinear fit to find a and b that minimize the summed squared errors between the model predictions and the data. With only two variables, we can graph how the summed squared error varies with the parameters, which may help us get initial guesses. Let us assume the parameters lie in a range, here we choose 0 to 5. In other problems you would adjust this as needed.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; mpl_toolkits.mplot3d &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; Axes3D
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

x1 = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
x2 = [0.2, 0.4, 0.8, 0.9, 1.1, 2.1]
X = np.column_stack([x1, x2]) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# independent variables&lt;/span&gt;

f = [ 3.3079,    6.6358,   10.3143,   13.6492,   17.2755,   23.6271]

fig = plt.figure()
ax = fig.gca(projection = &lt;span style="color: #228b22;"&gt;'3d'&lt;/span&gt;)

ax.plot(x1, x2, f)
ax.set_xlabel(&lt;span style="color: #228b22;"&gt;'x1'&lt;/span&gt;)
ax.set_ylabel(&lt;span style="color: #228b22;"&gt;'x2'&lt;/span&gt;)
ax.set_zlabel(&lt;span style="color: #228b22;"&gt;'f(x1,x2)'&lt;/span&gt;)

plt.savefig(&lt;span style="color: #228b22;"&gt;'images/graphical-mulvar-1.png'&lt;/span&gt;)


arange = np.linspace(0,5);
brange = np.linspace(0,5);

A,B = np.meshgrid(arange, brange)

&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;model&lt;/span&gt;(X, a, b):
    &lt;span style="color: #228b22;"&gt;'Nested function for the model'&lt;/span&gt;
    x1 = X[:, 0]
    x2 = X[:, 1]
    
    f = a * x1 + x2**b
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; f

&lt;span style="color: #8b0000;"&gt;@np&lt;/span&gt;.vectorize
&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;errfunc&lt;/span&gt;(a, b):
    &lt;span style="color: #ff0000; font-weight: bold;"&gt;# function for the summed squared error&lt;/span&gt;
    fit = model(X, a, b)
    sse = np.sum((fit - f)**2)
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; sse

SSE = errfunc(A, B)

plt.clf()
plt.contourf(A, B, SSE, 50)
plt.plot([3.2], [2.1], &lt;span style="color: #228b22;"&gt;'ro'&lt;/span&gt;)
plt.figtext( 3.4, 2.2, &lt;span style="color: #228b22;"&gt;'Minimum near here'&lt;/span&gt;, color=&lt;span style="color: #228b22;"&gt;'r'&lt;/span&gt;)

plt.savefig(&lt;span style="color: #228b22;"&gt;'images/graphical-mulvar-2.png'&lt;/span&gt;)

guesses = [3.18, 2.02]

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; curve_fit

popt, pcov = curve_fit(model, X, f, guesses)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; popt

plt.plot([popt[0]], [popt[1]], &lt;span style="color: #228b22;"&gt;'r*'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/graphical-mulvar-3.png'&lt;/span&gt;)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; model(X, *popt)

fig = plt.figure()
ax = fig.gca(projection = &lt;span style="color: #228b22;"&gt;'3d'&lt;/span&gt;)

ax.plot(x1, x2, f, &lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'data'&lt;/span&gt;)
ax.plot(x1, x2, model(X, *popt), &lt;span style="color: #228b22;"&gt;'r-'&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'fit'&lt;/span&gt;)
ax.set_xlabel(&lt;span style="color: #228b22;"&gt;'x1'&lt;/span&gt;)
ax.set_ylabel(&lt;span style="color: #228b22;"&gt;'x2'&lt;/span&gt;)
ax.set_zlabel(&lt;span style="color: #228b22;"&gt;'f(x1,x2)'&lt;/span&gt;)

plt.savefig(&lt;span style="color: #228b22;"&gt;'images/graphical-mulvar-4.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[ 3.21694798  1.9728254 ]
[  3.25873623   6.59792994  10.29473657  13.68011436  17.29161001
  23.62366445]
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/graphical-mulvar-1.png"&gt;&lt;p&gt;

&lt;p&gt;&lt;img src="/img/./images/graphical-mulvar-2.png"&gt;&lt;p&gt;

&lt;p&gt;&lt;img src="/img/./images/graphical-mulvar-3.png"&gt;&lt;p&gt;

&lt;p&gt;&lt;img src="/img/./images/graphical-mulvar-4.png"&gt;&lt;p&gt;

&lt;p&gt;
It can be difficult to figure out initial guesses for nonlinear fitting problems. For one and two dimensional systems, graphical techniques may be useful to visualize how the summed squared error between the model and data depends on the parameters.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Graphical-methods-to-help-get-initial-guesses-for-multivariate-nonlinear-regression.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Model selection</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Model-selection</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">rva6twbQxMfmVwYOCZ2cgugbYrw=</guid>
      <description>Model selection</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/10/01/model-selection/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
adapted from &lt;a href="http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm" &gt;http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
In this example, we show some ways to choose which of several models fit data the best. We have data for the total pressure and temperature of a fixed amount of a gas in a tank that was measured over the course of several days. We want to select a model that relates the pressure to the gas temperature.
&lt;/p&gt;

&lt;p&gt;
The data is stored in a text file download PT.txt , with the following structure:
&lt;/p&gt;

&lt;pre class="example"&gt;
Run          Ambient                            Fitted
 Order  Day  Temperature  Temperature  Pressure    Value    Residual
  1      1      23.820      54.749      225.066   222.920     2.146
...
&lt;/pre&gt;

&lt;p&gt;
We need to read the data in, and perform a regression analysis on P vs. T. In python we start counting at 0, so we actually want columns 3 and 4.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

data = np.loadtxt(&lt;span style="color: #228b22;"&gt;'data/PT.txt'&lt;/span&gt;, skiprows=2)
T = data[:, 3]
P = data[:, 4]

plt.plot(T, P, &lt;span style="color: #228b22;"&gt;'k.'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Pressure'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-1.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x00000000084398D0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x000000000841F6A0&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x0000000008423DD8&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-1.png"&gt;&lt;p&gt;

&lt;p&gt;
It appears the data is roughly linear, and we know from the ideal gas law that PV = nRT, or P = nR/V*T, which says P should be linearly correlated with V. Note that the temperature data is in degC, not in K, so it is not expected that P=0 at T = 0. We will use linear algebra to compute the line coefficients. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;A = np.vstack([T**0, T]).T
b = P

x, res, rank, s = np.linalg.lstsq(A, b)
intercept, slope = x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'b, m ='&lt;/span&gt;, intercept, slope

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(b)
k = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(x)

sigma2 = np.sum((b - np.dot(A,x))**2) / (n - k)

C = sigma2 * np.linalg.inv(np.dot(A.T, A))
se = np.sqrt(np.diag(C))

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t
alpha = 0.05

sT = t.ppf(1-alpha/2., n - k) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# student T multiplier&lt;/span&gt;
CI = sT * se

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'CI = '&lt;/span&gt;,CI
&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; beta, ci &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(x, CI):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'[{0} {1}]'&lt;/span&gt;.format(beta - ci, beta + ci)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; b, m = 7.74899739238 3.93014043824
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; CI =  [ 4.76511545  0.1026405 ]
... ... [2.98388194638 12.5141128384]
[3.82749994079 4.03278093569]
&lt;/pre&gt;

&lt;p&gt;
The confidence interval on the intercept is large, but it does not contain zero at the 95% confidence level.
&lt;/p&gt;

&lt;p&gt;
The R^2 value accounts roughly for the fraction of variation in the data that can be described by the model. Hence, a value close to one means nearly all the variations are described by the model, except for random variations.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;ybar = np.mean(P)
SStot = np.sum((P - ybar)**2)
SSerr = np.sum((P - np.dot(A, x))**2)
R2 = 1 - SSerr/SStot
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; R2
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 0.993715411798
&lt;/pre&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;plt.figure(); plt.clf()
plt.plot(T, P, &lt;span style="color: #228b22;"&gt;'k.'&lt;/span&gt;, T, np.dot(A, x), &lt;span style="color: #228b22;"&gt;'b-'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Pressure'&lt;/span&gt;)
plt.title(&lt;span style="color: #228b22;"&gt;'R^2 = {0:1.3f}'&lt;/span&gt;.format(R2))
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-2.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;lt;matplotlib.figure.Figure object at 0x0000000008423860&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x00000000085BE780&amp;gt;, &amp;lt;matplotlib.lines.Line2D object at 0x00000000085BE940&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x0000000008449898&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x000000000844CCF8&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x000000000844ED30&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-2.png"&gt;&lt;p&gt;

&lt;p&gt;
The fit looks good, and R^2 is near one, but is it a good model? There are a few ways to examine this. We want to make sure that there are no systematic trends in the errors between the fit and the data, and we want to make sure there are not hidden correlations with other variables. The residuals are the error between the fit and the data. The residuals should not show any patterns when plotted against any variables, and they do not in this case.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;residuals = P - np.dot(A, x)

plt.figure()

f, (ax1, ax2, ax3) = plt.subplots(3)

ax1.plot(T,residuals,&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
ax1.set_xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)


run_order = data[:, 0]
ax2.plot(run_order, residuals,&lt;span style="color: #228b22;"&gt;'ko '&lt;/span&gt;)
ax2.set_xlabel(&lt;span style="color: #228b22;"&gt;'run order'&lt;/span&gt;)

ambientT = data[:, 2]
ax3.plot(ambientT, residuals,&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
ax3.set_xlabel(&lt;span style="color: #228b22;"&gt;'ambient temperature'&lt;/span&gt;)

plt.tight_layout() &lt;span style="color: #ff0000; font-weight: bold;"&gt;# make sure plots do not overlap&lt;/span&gt;

plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-3.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;lt;matplotlib.figure.Figure object at 0x00000000085C21D0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x0000000008861CC0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x00000000085D3A58&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x0000000008861E80&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x00000000085EC5F8&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x0000000008861C88&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x0000000008846828&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-3.png"&gt;&lt;p&gt;

&lt;p&gt;
There may be some correlations in the residuals with the run order. That could indicate an experimental source of error.
&lt;/p&gt;

&lt;p&gt;
We assume all the errors are uncorrelated with each other. We can use a lag plot to assess this, where we plot residual[i] vs residual[i-1], i.e. we look for correlations between adjacent residuals. This plot should look random, with no correlations if the model is good.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;plt.figure(); plt.clf()
plt.plot(residuals[1:-1], residuals[0:-2],&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'residual[i]'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'residual[i-1]'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-correlated-residuals.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;lt;matplotlib.figure.Figure object at 0x000000000886EB00&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x0000000008A02908&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x00000000089E8198&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x00000000089EB908&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-correlated-residuals.png"&gt;&lt;p&gt;

&lt;p&gt;
It is hard to argue there is any correlation here. 
&lt;/p&gt;

&lt;p&gt;
Lets consider a quadratic model instead.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;A = np.vstack([T**0, T, T**2]).T
b = P;

x, res, rank, s = np.linalg.lstsq(A, b)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; x

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(b)
k = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(x)

sigma2 = np.sum((b - np.dot(A,x))**2) / (n - k)

C = sigma2 * np.linalg.inv(np.dot(A.T, A))
se = np.sqrt(np.diag(C))

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t
alpha = 0.05

sT = t.ppf(1-alpha/2., n - k) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# student T multiplier&lt;/span&gt;
CI = sT * se

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'CI = '&lt;/span&gt;,CI
&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; beta, ci &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(x, CI):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'[{0} {1}]'&lt;/span&gt;.format(beta - ci, beta + ci)


ybar = np.mean(P)
SStot = np.sum((P - ybar)**2)
SSerr = np.sum((P - np.dot(A,x))**2)
R2 = 1 - SSerr/SStot
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'R^2 = {0}'&lt;/span&gt;.format(R2)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [  9.00353031e+00   3.86669879e+00   7.26244301e-04]
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; CI =  [  1.38030344e+01   6.62100654e-01   7.48516727e-03]
... ... [-4.79950412123 22.8065647329]
[3.20459813681 4.52879944409]
[-0.00675892296907 0.00821141157035]
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; R^2 = 0.993721969407
&lt;/pre&gt;

&lt;p&gt;
You can see that the confidence interval on the constant and T^2 term includes zero. That is a good indication this additional parameter is not significant. You can see also that the R^2 value is not better than the one from a linear fit,  so adding a parameter does not increase the goodness of fit. This is an example of overfitting the data. Since the constant in this model is apparently not significant, let us consider the simplest model with a fixed intercept of zero.
&lt;/p&gt;

&lt;p&gt;
Let us consider a model with intercept = 0, P = alpha*T. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;A = np.vstack([T]).T
b = P;

x, res, rank, s = np.linalg.lstsq(A, b)

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(b)
k = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(x)

sigma2 = np.sum((b - np.dot(A,x))**2) / (n - k)

C = sigma2 * np.linalg.inv(np.dot(A.T, A))
se = np.sqrt(np.diag(C))

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t
alpha = 0.05

sT = t.ppf(1-alpha/2.0, n - k) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# student T multiplier&lt;/span&gt;
CI = sT * se

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; beta, ci &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(x, CI):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'[{0} {1}]'&lt;/span&gt;.format(beta - ci, beta + ci)

plt.figure()
plt.plot(T, P, &lt;span style="color: #228b22;"&gt;'k. '&lt;/span&gt;, T, np.dot(A, x))
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Pressure'&lt;/span&gt;)
plt.legend([&lt;span style="color: #228b22;"&gt;'data'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'fit'&lt;/span&gt;])

ybar = np.mean(P)
SStot = np.sum((P - ybar)**2)
SSerr = np.sum((P - np.dot(A,x))**2)
R2 = 1 - SSerr/SStot
plt.title(&lt;span style="color: #228b22;"&gt;'R^2 = {0:1.3f}'&lt;/span&gt;.format(R2))
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-no-intercept.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; ... ... [4.05680124495 4.12308349899]
&amp;lt;matplotlib.figure.Figure object at 0x0000000008870BE0&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x00000000089F4550&amp;gt;, &amp;lt;matplotlib.lines.Line2D object at 0x00000000089F4208&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x0000000008A13630&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x0000000008A16DA0&amp;gt;
&amp;lt;matplotlib.legend.Legend object at 0x00000000089EFD30&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;lt;matplotlib.text.Text object at 0x000000000B26C0B8&amp;gt;
&lt;/pre&gt;

&lt;p&gt;
&lt;p&gt;&lt;img src="/img/./images/model-selection-no-intercept.png"&gt;&lt;p&gt;
The fit is visually still pretty good, and the R^2 value is only slightly worse. Let us examine the residuals again. 
&lt;/p&gt;


&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;residuals = P - np.dot(A,x)

plt.figure()
plt.plot(T,residuals,&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'residuals'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-no-incpt-resid.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;lt;matplotlib.figure.Figure object at 0x0000000008A0F5C0&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x000000000B29B0F0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x000000000B276FD0&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x000000000B283780&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-no-incpt-resid.png"&gt;&lt;p&gt;

&lt;p&gt;
You can see a slight trend of decreasing value of the residuals as the Temperature increases. This may indicate a deficiency in the model with no intercept. For the ideal gas law in degC: \(PV = nR(T+273)\) or \(P = nR/V*T + 273*nR/V\), so the intercept is expected to be non-zero in this case. Specifically, we expect the intercept to be 273*R*n/V. Since the molar density of a gas is pretty small, the intercept may be close to, but not equal to zero. That is why the fit still looks ok, but is not as good as letting the intercept be a fitting parameter. That is an example of the deficiency in our model.
&lt;/p&gt;

&lt;p&gt;
In the end, it is hard to justify a model more complex than a line in this case. 
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Model-selection.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Linear least squares fitting with linear algebra</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Linear-least-squares-fitting-with-linear-algebra</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[linear algebra]]></category>
      <guid isPermaLink="false">pouOC1bSgsp9CbyPQLG4lOITSjw=</guid>
      <description>Linear least squares fitting with linear algebra</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/09/24/linear-least-squares-fitting-with-linear-algebra/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
The idea here is to formulate a set of linear equations that is easy to solve. We  can express the equations in terms of our unknown fitting parameters \(p_i\) as:
&lt;/p&gt;

&lt;pre class="example"&gt;
x1^0*p0 + x1*p1 = y1
x2^0*p0 + x2*p1 = y2
x3^0*p0 + x3*p1 = y3
etc...
&lt;/pre&gt;

&lt;p&gt;
Which we write in matrix form as \(A p = y\) where \(A\) is a matrix of column vectors, e.g. [1, x_i]. \(A\) is not a square matrix, so we cannot solve it as written. Instead, we form \(A^T A p = A^T y\) and solve that set of equations.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
x = np.array([0, 0.5, 1, 1.5, 2.0, 3.0, 4.0, 6.0, 10])
y = np.array([0, -0.157, -0.315, -0.472, -0.629, -0.942, -1.255, -1.884, -3.147])

A = np.column_stack([x**0, x])

M = np.dot(A.T, A)
b = np.dot(A.T, y)

i1, slope1 = np.dot(np.linalg.inv(M), b)
i2, slope2 = np.linalg.solve(M, b) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# an alternative approach.&lt;/span&gt;

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; i1, slope1
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; i2, slope2

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# plot data and fit&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

plt.plot(x, y, &lt;span style="color: #228b22;"&gt;'bo'&lt;/span&gt;)
plt.plot(x, np.dot(A, [i1, slope1]), &lt;span style="color: #228b22;"&gt;'r--'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'x'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'y'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/la-line-fit.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.00062457337884 -0.3145221843
0.00062457337884 -0.3145221843
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/la-line-fit.png"&gt;&lt;p&gt;

&lt;p&gt;
This method can be readily extended to fitting any polynomial model, or other linear model that is fit in a least squares sense. This method does not provide confidence intervals.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Linear-least-squares-fitting-with-linear-algebra.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Fit a line to numerical data</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Fit-a-line-to-numerical-data</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <guid isPermaLink="false">_V8UPZpDmEPwsmNZomF8kJip6jw=</guid>
      <description>Fit a line to numerical data</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/08/04/fit-a-line-to-numerical-data/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
We want to fit a line to this data:
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;x = [0, 0.5, 1, 1.5, 2.0, 3.0, 4.0, 6.0, 10]
y = [0, -0.157, -0.315, -0.472, -0.629, -0.942, -1.255, -1.884, -3.147]
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
We use the polyfit(x, y, n) command where n is the polynomial order, n=1 for a line.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

p = np.polyfit(x, y, 1)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; p
slope, intercept = p
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; slope, intercept
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [-0.31452218  0.00062457]
&amp;gt;&amp;gt;&amp;gt; -0.3145221843 0.00062457337884
&lt;/pre&gt;

&lt;p&gt;
To show the fit, we can use numpy.polyval to evaluate the fit at many points.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

xfit = np.linspace(0, 10)
yfit = np.polyval(p, xfit)

plt.plot(x, y, &lt;span style="color: #228b22;"&gt;'bo'&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'raw data'&lt;/span&gt;)
plt.plot(xfit, yfit, &lt;span style="color: #228b22;"&gt;'r-'&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'fit'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'x'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'y'&lt;/span&gt;)
plt.legend()
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/linefit-1.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x053C1790&amp;gt;]
[&amp;lt;matplotlib.lines.Line2D object at 0x0313C610&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x052A4950&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x052B9A10&amp;gt;
&amp;lt;matplotlib.legend.Legend object at 0x053C1CD0&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/linefit-1.png"&gt;&lt;p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Fit-a-line-to-numerical-data.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
  </channel>
</rss>
