<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <atom:link href="http://kitchingroup.cheme.cmu.edu/blog/feed/index.xml" rel="self" type="application/rss+xml" />
    <title>The Kitchin Research Group</title>
    <link>https://kitchingroup.cheme.cmu.edu/blog</link>
    <description>Chemical Engineering at Carnegie Mellon University</description>
    <pubDate>Sat, 01 Nov 2025 13:47:46 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    
    <item>
      <title>Uncertainty in implicit functions</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/03/08/Uncertainty-in-implicit-functions</link>
      <pubDate>Fri, 08 Mar 2013 17:04:02 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">uZ3z5IZ_T0zBrn5IN31tAWkXmdY=</guid>
      <description>Uncertainty in implicit functions</description>
      <content:encoded><![CDATA[


&lt;p&gt;
Suppose we have an equation \(y = e^{a y}\) that we want to solve, where \(a\) is a constant with some uncertainty. What is the uncertainty in the solution \(y\)?
&lt;/p&gt;

&lt;p&gt;
Finding a solution is not difficult. The uncertainty in the solution, however, is not easy, since we do not have an explicit function to propagate errors through. Let us examine the solution first.
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; fsolve

a = 0.20

&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;f&lt;/span&gt;(y):
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; y - np.exp(a * y)

sol, = fsolve(f, 1)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; sol
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1.2958555091
&lt;/pre&gt;

&lt;p&gt;
A way to estimate the uncertainty is by Monte Carlo simulation. We solve the equation many times, using values sampled from the uncertainty distribution. Here we assume that the \(a\) parameter is normally distributed  with an average of 0.2 and a std deviation of 0.02. We solve the equation 10000 times for different values of \(a\) sampled according to the normal distribution. That gives us a distribution of solutions that we can do statistical analysis of to get the average and std deviation.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; fsolve
N = 10000

A = np.random.normal(0.2, 0.02, size=N)

sol = np.zeros(A.shape)

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; i, a &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;enumerate&lt;/span&gt;(A):
    s, = fsolve(&lt;span style="color: #8b0000;"&gt;lambda&lt;/span&gt; y:y - np.exp(a * y), 1)
    sol[i] = s

ybar = np.mean(sol)
s_y = np.std(sol)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; ybar, s_y, s_y / ybar

&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt
count, bins, ignored = plt.hist(sol)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/implicit-uncertainty.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1.29887470397 0.0465110111613 0.0358086973433
&lt;/pre&gt;

&lt;p&gt;
We get approximately the same answer, and you can see here the distribution of solution values is not quite normal. We compute the standard deviation anyway, and find the standard deviation is about 3.6%. It would be nice to have some analytical method to estimate this uncertainty. So far I have not figured that out.
&lt;/p&gt;

&lt;p&gt;
This method could have relevance in estimating the uncertainty in the friction factor for turbulent flow (\(Re &gt; 2100\)). In that case we have the implicit equation \(\frac{1}{\sqrt{f_F}}=4.0 \log(Re \sqrt{f_F})-0.4\). Uncertainties in the Re number would lead to uncertainties in the friction factor. Whether those uncertainties are larger than the uncertainties from the original correlation would require some investigation.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/03/08/Uncertainty-in-implicit-functions.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Another approach to error propagation</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/03/07/Another-approach-to-error-propagation</link>
      <pubDate>Thu, 07 Mar 2013 09:26:06 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">OBnsNAW8FQRy-i_4fZrjIjNTIRA=</guid>
      <description>Another approach to error propagation</description>
      <content:encoded><![CDATA[


&lt;p&gt;
In the previous section we examined an analytical approach to error propagation, and a simulation based approach. There is another approach to error propagation, using the uncertainties module (&lt;a href="https://pypi.python.org/pypi/uncertainties/" &gt;https://pypi.python.org/pypi/uncertainties/&lt;/a&gt;). You have to install this package, e.g. &lt;code&gt;pip install uncertainties&lt;/code&gt;. After that, the module provides new classes of numbers and functions that incorporate uncertainty and propagate the uncertainty through the functions. In the examples that follow, we repeat the calculations from the previous section using the uncertainties module. 
&lt;/p&gt;

&lt;p&gt;
&lt;span style="text-decoration:underline;"&gt;Addition and subtraction&lt;/span&gt;
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u

A = u.ufloat((2.5, 0.4))
B = u.ufloat((4.1, 0.3))
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; A + B
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; A - B
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 6.6+/-0.5
-1.6+/-0.5
&lt;/pre&gt;

&lt;p&gt;
&lt;span style="text-decoration:underline;"&gt;Multiplication and division&lt;/span&gt;
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;F = u.ufloat((25, 1))
x = u.ufloat((6.4, 0.4))

t = F * x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; t

d = F / x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; d
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 160.0+/-11.8726576637
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 3.90625+/-0.289859806243
&lt;/pre&gt;

&lt;p&gt;
&lt;span style="text-decoration:underline;"&gt;Exponentiation&lt;/span&gt;
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;t = u.ufloat((2.03, 0.0203))
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; t**5

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; uncertainties.umath &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; sqrt
A = u.ufloat((16.07, 0.06))
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; sqrt(A)
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;print np.sqrt(A) # this does not work&lt;/span&gt;

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; unumpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; unp
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; unp.sqrt(A)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
34.4730881243+/-1.72365440621
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 4.00874045057+/-0.00748364738749
... &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 4.00874045057+/-0.00748364738749
&lt;/pre&gt;

&lt;p&gt;
Note in the last example, we had to either import a function from uncertainties.umath or import a special version of numpy that handles uncertainty. This may be a limitation of teh uncertainties package as not all functions in arbitrary modules can be covered. Note, however, that you can wrap a function to make it handle uncertainty like this.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

wrapped_sqrt = u.wrap(np.sqrt)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; wrapped_sqrt(A)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 4.00874045057+/-0.00748364738774
&lt;/pre&gt;

&lt;p&gt;
&lt;span style="text-decoration:underline;"&gt;Propagation of errors in an integral&lt;/span&gt;
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u

x = np.array([u.ufloat((1, 0.01)), 
              u.ufloat((2, 0.1)),
              u.ufloat((3, 0.1))])

y = 2 * x

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.trapz(x, y)
&lt;/pre&gt;
&lt;/div&gt;
&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; ... ... &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 8.0+/-0.600333240792
&lt;/pre&gt;

&lt;p&gt;
&lt;span style="text-decoration:underline;"&gt;Chain rule in error propagation&lt;/span&gt;
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;v0 = u.ufloat((1.2, 0.02))
a = u.ufloat((3.0, 0.3))
t = u.ufloat((12.0, 0.12))

v = v0 + a * t
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; v
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 37.2+/-3.61801050303
&lt;/pre&gt;

&lt;p&gt;
&lt;span style="text-decoration:underline;"&gt;A real example?&lt;/span&gt;
This is what I would setup for a real working example. We try to compute the exit concentration from a CSTR. The idea is to wrap the &amp;ldquo;external&amp;rdquo; fsolve function using the &lt;code&gt;uncertainties.wrap&lt;/code&gt; function, which handles the units. Unfortunately, it does not work, and it is not clear why. But see the following discussion for a fix. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; fsolve

Fa0 = u.ufloat((5.0, 0.05))
v0 = u.ufloat((10., 0.1))

V = u.ufloat((66000.0, 100))  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;reactor volume L^3&lt;/span&gt;
k = u.ufloat((3.0, 0.2))      &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;rate constant L/mol/h&lt;/span&gt;

&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;func&lt;/span&gt;(Ca):
    &lt;span style="color: #228b22;"&gt;"Mole balance for a CSTR. Solve this equation for func(Ca)=0"&lt;/span&gt;
    Fa = v0 * Ca     &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;exit molar flow of A&lt;/span&gt;
    ra = -k * Ca**2  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;rate of reaction of A L/mol/h&lt;/span&gt;
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; Fa0 - Fa + V * ra

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;CA guess that that 90 % is reacted away&lt;/span&gt;
CA_guess = 0.1 * Fa0 / v0

wrapped_fsolve = u.wrap(fsolve)
CA_sol = wrapped_fsolve(func, CA_guess)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'The exit concentration is {0} mol/L'&lt;/span&gt;.format(CA_sol)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; ... ... ... ... ... &amp;gt;&amp;gt;&amp;gt; ... &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;lt;function fsolve at 0x148f25f0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; The exit concentration is NotImplemented mol/L
&lt;/pre&gt;

&lt;p&gt;
I got a note from the author of the uncertainties package explaining the cryptic error above, and a solution for it. The error arises because fsolve does not know how to deal with uncertainties. The idea is to create a function that returns a float, when everything is given as a float. Then, we wrap the fsolve call, and finally wrap the wrapped fsolve call! 
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step 1. Write the function to solve with arguments for all unitted quantities. This function may be called with uncertainties, or with floats.
&lt;/li&gt;

&lt;li&gt;Step 2. Wrap the call to fsolve in a function that takes all the parameters as arguments, and that returns the solution.
&lt;/li&gt;

&lt;li&gt;Step 3. Use uncertainties.wrap to wrap the function in Step 2 to get the answer with uncertainties.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Here is the code that does work:
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; uncertainties &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; u
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; fsolve

Fa0 = u.ufloat((5.0, 0.05))
v0 = u.ufloat((10., 0.1))

V = u.ufloat((66000.0, 100.0))  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;reactor volume L^3&lt;/span&gt;
k = u.ufloat((3.0, 0.2))      &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;rate constant L/mol/h&lt;/span&gt;

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Step 1&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;func&lt;/span&gt;(Ca, v0, k, Fa0, V):
    &lt;span style="color: #228b22;"&gt;"Mole balance for a CSTR. Solve this equation for func(Ca)=0"&lt;/span&gt;
    Fa = v0 * Ca     &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;exit molar flow of A&lt;/span&gt;
    ra = -k * Ca**2  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;rate of reaction of A L/mol/h&lt;/span&gt;
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; Fa0 - Fa + V * ra

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Step 2&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;Ca_solve&lt;/span&gt;(v0, k, Fa0, V): 
    &lt;span style="color: #228b22;"&gt;'wrap fsolve to pass parameters as float or units'&lt;/span&gt;
    &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;this line is a little fragile. You must put [0] at the end or&lt;/span&gt;
    &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;you get the NotImplemented result&lt;/span&gt;
    sol = fsolve(func, 0.1 * Fa0 / v0, args=(v0, k, Fa0, V))[0]
    &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; sol

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Step 3&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; u.wrap(Ca_solve)(v0, k, Fa0, V)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.005+/-0.000167764327667
&lt;/pre&gt;

&lt;p&gt;
It would take some practice to get used to this, but the payoff is that you have an &amp;ldquo;automatic&amp;rdquo; error propagation method.
&lt;/p&gt;

&lt;p&gt;
Being ever the skeptic, let us compare the result above to the Monte Carlo approach to error estimation below.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.optimize &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; fsolve

N = 10000
Fa0 = np.random.normal(5, 0.05, (1, N))
v0 = np.random.normal(10.0, 0.1, (1, N))
V =  np.random.normal(66000, 100, (1,N))
k = np.random.normal(3.0, 0.2, (1, N))

SOL = np.zeros((1, N))

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; i &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;range&lt;/span&gt;(N):
    &lt;span style="color: #8b0000;"&gt;def&lt;/span&gt; &lt;span style="color: #8b2323;"&gt;func&lt;/span&gt;(Ca):
        &lt;span style="color: #8b0000;"&gt;return&lt;/span&gt; Fa0[0,i] - v0[0,i] * Ca + V[0,i] * (-k[0,i] * Ca**2)
    SOL[0,i] = fsolve(func, 0.1 * Fa0[0,i] / v0[0,i])[0]

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'Ca(exit) = {0}+/-{1}'&lt;/span&gt;.format(np.mean(SOL), np.std(SOL))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
Ca(exit) = 0.00500829453185+/-0.000169103578901
&lt;/pre&gt;

&lt;p&gt;
I am pretty content those are the same!
&lt;/p&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Summary&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
The uncertainties module is pretty amazing. It automatically propagates errors through a pretty broad range of computations. It is a little tricky for third-party packages, but it seems doable.
&lt;/p&gt;

&lt;p&gt;
Read more about the package at &lt;a href="http://pythonhosted.org/uncertainties/index.html" &gt;http://pythonhosted.org/uncertainties/index.html&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/03/07/Another-approach-to-error-propagation.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Model selection</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Model-selection</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">rva6twbQxMfmVwYOCZ2cgugbYrw=</guid>
      <description>Model selection</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/10/01/model-selection/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
adapted from &lt;a href="http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm" &gt;http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd44.htm&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
In this example, we show some ways to choose which of several models fit data the best. We have data for the total pressure and temperature of a fixed amount of a gas in a tank that was measured over the course of several days. We want to select a model that relates the pressure to the gas temperature.
&lt;/p&gt;

&lt;p&gt;
The data is stored in a text file download PT.txt , with the following structure:
&lt;/p&gt;

&lt;pre class="example"&gt;
Run          Ambient                            Fitted
 Order  Day  Temperature  Temperature  Pressure    Value    Residual
  1      1      23.820      54.749      225.066   222.920     2.146
...
&lt;/pre&gt;

&lt;p&gt;
We need to read the data in, and perform a regression analysis on P vs. T. In python we start counting at 0, so we actually want columns 3 and 4.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

data = np.loadtxt(&lt;span style="color: #228b22;"&gt;'data/PT.txt'&lt;/span&gt;, skiprows=2)
T = data[:, 3]
P = data[:, 4]

plt.plot(T, P, &lt;span style="color: #228b22;"&gt;'k.'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Pressure'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-1.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x00000000084398D0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x000000000841F6A0&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x0000000008423DD8&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-1.png"&gt;&lt;p&gt;

&lt;p&gt;
It appears the data is roughly linear, and we know from the ideal gas law that PV = nRT, or P = nR/V*T, which says P should be linearly correlated with V. Note that the temperature data is in degC, not in K, so it is not expected that P=0 at T = 0. We will use linear algebra to compute the line coefficients. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;A = np.vstack([T**0, T]).T
b = P

x, res, rank, s = np.linalg.lstsq(A, b)
intercept, slope = x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'b, m ='&lt;/span&gt;, intercept, slope

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(b)
k = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(x)

sigma2 = np.sum((b - np.dot(A,x))**2) / (n - k)

C = sigma2 * np.linalg.inv(np.dot(A.T, A))
se = np.sqrt(np.diag(C))

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t
alpha = 0.05

sT = t.ppf(1-alpha/2., n - k) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# student T multiplier&lt;/span&gt;
CI = sT * se

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'CI = '&lt;/span&gt;,CI
&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; beta, ci &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(x, CI):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'[{0} {1}]'&lt;/span&gt;.format(beta - ci, beta + ci)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; b, m = 7.74899739238 3.93014043824
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; CI =  [ 4.76511545  0.1026405 ]
... ... [2.98388194638 12.5141128384]
[3.82749994079 4.03278093569]
&lt;/pre&gt;

&lt;p&gt;
The confidence interval on the intercept is large, but it does not contain zero at the 95% confidence level.
&lt;/p&gt;

&lt;p&gt;
The R^2 value accounts roughly for the fraction of variation in the data that can be described by the model. Hence, a value close to one means nearly all the variations are described by the model, except for random variations.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;ybar = np.mean(P)
SStot = np.sum((P - ybar)**2)
SSerr = np.sum((P - np.dot(A, x))**2)
R2 = 1 - SSerr/SStot
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; R2
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 0.993715411798
&lt;/pre&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;plt.figure(); plt.clf()
plt.plot(T, P, &lt;span style="color: #228b22;"&gt;'k.'&lt;/span&gt;, T, np.dot(A, x), &lt;span style="color: #228b22;"&gt;'b-'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Pressure'&lt;/span&gt;)
plt.title(&lt;span style="color: #228b22;"&gt;'R^2 = {0:1.3f}'&lt;/span&gt;.format(R2))
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-2.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;lt;matplotlib.figure.Figure object at 0x0000000008423860&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x00000000085BE780&amp;gt;, &amp;lt;matplotlib.lines.Line2D object at 0x00000000085BE940&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x0000000008449898&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x000000000844CCF8&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x000000000844ED30&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-2.png"&gt;&lt;p&gt;

&lt;p&gt;
The fit looks good, and R^2 is near one, but is it a good model? There are a few ways to examine this. We want to make sure that there are no systematic trends in the errors between the fit and the data, and we want to make sure there are not hidden correlations with other variables. The residuals are the error between the fit and the data. The residuals should not show any patterns when plotted against any variables, and they do not in this case.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;residuals = P - np.dot(A, x)

plt.figure()

f, (ax1, ax2, ax3) = plt.subplots(3)

ax1.plot(T,residuals,&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
ax1.set_xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)


run_order = data[:, 0]
ax2.plot(run_order, residuals,&lt;span style="color: #228b22;"&gt;'ko '&lt;/span&gt;)
ax2.set_xlabel(&lt;span style="color: #228b22;"&gt;'run order'&lt;/span&gt;)

ambientT = data[:, 2]
ax3.plot(ambientT, residuals,&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
ax3.set_xlabel(&lt;span style="color: #228b22;"&gt;'ambient temperature'&lt;/span&gt;)

plt.tight_layout() &lt;span style="color: #ff0000; font-weight: bold;"&gt;# make sure plots do not overlap&lt;/span&gt;

plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-3.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;lt;matplotlib.figure.Figure object at 0x00000000085C21D0&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x0000000008861CC0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x00000000085D3A58&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x0000000008861E80&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x00000000085EC5F8&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [&amp;lt;matplotlib.lines.Line2D object at 0x0000000008861C88&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x0000000008846828&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-3.png"&gt;&lt;p&gt;

&lt;p&gt;
There may be some correlations in the residuals with the run order. That could indicate an experimental source of error.
&lt;/p&gt;

&lt;p&gt;
We assume all the errors are uncorrelated with each other. We can use a lag plot to assess this, where we plot residual[i] vs residual[i-1], i.e. we look for correlations between adjacent residuals. This plot should look random, with no correlations if the model is good.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;plt.figure(); plt.clf()
plt.plot(residuals[1:-1], residuals[0:-2],&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'residual[i]'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'residual[i-1]'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-correlated-residuals.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;lt;matplotlib.figure.Figure object at 0x000000000886EB00&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x0000000008A02908&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x00000000089E8198&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x00000000089EB908&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-correlated-residuals.png"&gt;&lt;p&gt;

&lt;p&gt;
It is hard to argue there is any correlation here. 
&lt;/p&gt;

&lt;p&gt;
Lets consider a quadratic model instead.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;A = np.vstack([T**0, T, T**2]).T
b = P;

x, res, rank, s = np.linalg.lstsq(A, b)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; x

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(b)
k = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(x)

sigma2 = np.sum((b - np.dot(A,x))**2) / (n - k)

C = sigma2 * np.linalg.inv(np.dot(A.T, A))
se = np.sqrt(np.diag(C))

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t
alpha = 0.05

sT = t.ppf(1-alpha/2., n - k) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# student T multiplier&lt;/span&gt;
CI = sT * se

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'CI = '&lt;/span&gt;,CI
&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; beta, ci &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(x, CI):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'[{0} {1}]'&lt;/span&gt;.format(beta - ci, beta + ci)


ybar = np.mean(P)
SStot = np.sum((P - ybar)**2)
SSerr = np.sum((P - np.dot(A,x))**2)
R2 = 1 - SSerr/SStot
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'R^2 = {0}'&lt;/span&gt;.format(R2)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; [  9.00353031e+00   3.86669879e+00   7.26244301e-04]
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; CI =  [  1.38030344e+01   6.62100654e-01   7.48516727e-03]
... ... [-4.79950412123 22.8065647329]
[3.20459813681 4.52879944409]
[-0.00675892296907 0.00821141157035]
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; R^2 = 0.993721969407
&lt;/pre&gt;

&lt;p&gt;
You can see that the confidence interval on the constant and T^2 term includes zero. That is a good indication this additional parameter is not significant. You can see also that the R^2 value is not better than the one from a linear fit,  so adding a parameter does not increase the goodness of fit. This is an example of overfitting the data. Since the constant in this model is apparently not significant, let us consider the simplest model with a fixed intercept of zero.
&lt;/p&gt;

&lt;p&gt;
Let us consider a model with intercept = 0, P = alpha*T. 
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;A = np.vstack([T]).T
b = P;

x, res, rank, s = np.linalg.lstsq(A, b)

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(b)
k = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(x)

sigma2 = np.sum((b - np.dot(A,x))**2) / (n - k)

C = sigma2 * np.linalg.inv(np.dot(A.T, A))
se = np.sqrt(np.diag(C))

&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t
alpha = 0.05

sT = t.ppf(1-alpha/2.0, n - k) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# student T multiplier&lt;/span&gt;
CI = sT * se

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; beta, ci &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(x, CI):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'[{0} {1}]'&lt;/span&gt;.format(beta - ci, beta + ci)

plt.figure()
plt.plot(T, P, &lt;span style="color: #228b22;"&gt;'k. '&lt;/span&gt;, T, np.dot(A, x))
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Pressure'&lt;/span&gt;)
plt.legend([&lt;span style="color: #228b22;"&gt;'data'&lt;/span&gt;, &lt;span style="color: #228b22;"&gt;'fit'&lt;/span&gt;])

ybar = np.mean(P)
SStot = np.sum((P - ybar)**2)
SSerr = np.sum((P - np.dot(A,x))**2)
R2 = 1 - SSerr/SStot
plt.title(&lt;span style="color: #228b22;"&gt;'R^2 = {0:1.3f}'&lt;/span&gt;.format(R2))
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-no-intercept.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; ... ... [4.05680124495 4.12308349899]
&amp;lt;matplotlib.figure.Figure object at 0x0000000008870BE0&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x00000000089F4550&amp;gt;, &amp;lt;matplotlib.lines.Line2D object at 0x00000000089F4208&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x0000000008A13630&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x0000000008A16DA0&amp;gt;
&amp;lt;matplotlib.legend.Legend object at 0x00000000089EFD30&amp;gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;lt;matplotlib.text.Text object at 0x000000000B26C0B8&amp;gt;
&lt;/pre&gt;

&lt;p&gt;
&lt;p&gt;&lt;img src="/img/./images/model-selection-no-intercept.png"&gt;&lt;p&gt;
The fit is visually still pretty good, and the R^2 value is only slightly worse. Let us examine the residuals again. 
&lt;/p&gt;


&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;residuals = P - np.dot(A,x)

plt.figure()
plt.plot(T,residuals,&lt;span style="color: #228b22;"&gt;'ko'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Temperature'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'residuals'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/model-selection-no-incpt-resid.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;lt;matplotlib.figure.Figure object at 0x0000000008A0F5C0&amp;gt;
[&amp;lt;matplotlib.lines.Line2D object at 0x000000000B29B0F0&amp;gt;]
&amp;lt;matplotlib.text.Text object at 0x000000000B276FD0&amp;gt;
&amp;lt;matplotlib.text.Text object at 0x000000000B283780&amp;gt;
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/model-selection-no-incpt-resid.png"&gt;&lt;p&gt;

&lt;p&gt;
You can see a slight trend of decreasing value of the residuals as the Temperature increases. This may indicate a deficiency in the model with no intercept. For the ideal gas law in degC: \(PV = nR(T+273)\) or \(P = nR/V*T + 273*nR/V\), so the intercept is expected to be non-zero in this case. Specifically, we expect the intercept to be 273*R*n/V. Since the molar density of a gas is pretty small, the intercept may be close to, but not equal to zero. That is why the fit still looks ok, but is not as good as letting the intercept be a fitting parameter. That is an example of the deficiency in our model.
&lt;/p&gt;

&lt;p&gt;
In the end, it is hard to justify a model more complex than a line in this case. 
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Model-selection.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Random thoughts</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Random-thoughts</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <category><![CDATA[math]]></category>
      <guid isPermaLink="false">WxC4iFvd_IIy4a7cK_25x-C-2HQ=</guid>
      <description>Random thoughts</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/09/04/random-thoughts/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Random numbers are used in a variety of simulation methods, most notably Monte Carlo simulations. In another later example, we will see how we can use random numbers for error propagation analysis. First, we discuss two types of pseudorandom numbers we can use in python: uniformly distributed and normally distributed numbers.
&lt;/p&gt;

&lt;p&gt;
Say you are the gambling type, and bet your friend $5 the next random number will be greater than 0.49. Let us ask Python to roll the random number generator for us.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

n = np.random.uniform()
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'n = {0}'&lt;/span&gt;.format(n)

&lt;span style="color: #8b0000;"&gt;if&lt;/span&gt; n &amp;gt; 0.49:
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'You win!'&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;else:&lt;/span&gt;
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'you lose.'&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
n = 0.381896986693
you lose.
&lt;/pre&gt;

&lt;p&gt;
The odds of you winning the last bet are slightly stacked in your favor. There is only a 49% chance your friend wins, but a 51% chance that you win. Lets play the game a lot of times times and see how many times you win, and your friend wins. First, lets generate a bunch of numbers and look at the distribution with a histogram.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

N = 10000
games = np.random.uniform(size=(1,N))

wins = np.sum(games &amp;gt; 0.49)
losses = N - wins

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'You won {0} times ({1:%})'&lt;/span&gt;.format(wins, &lt;span style="color: #8b0000;"&gt;float&lt;/span&gt;(wins) / N)

&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt
count, bins, ignored = plt.hist(games)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/random-thoughts-1.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
You won 5090 times (50.900000%)
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/random-thoughts-1.png"&gt;&lt;p&gt;

&lt;p&gt;
As you can see you win slightly more than you lost.
&lt;/p&gt;

&lt;p&gt;
It is possible to get random integers. Here are a few examples of getting a random integer between 1 and 100. You might do this to get random indices of a list, for example.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.random.random_integers(1, 100)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.random.random_integers(1, 100, 3)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.random.random_integers(1, 100, (2,2))
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
96
[ 95  49 100]
[[69 54]
 [41 93]]
&lt;/pre&gt;

&lt;p&gt;
The normal distribution is defined by \(f(x)=\frac{1}{\sqrt{2\pi \sigma^2}} \exp (-\frac{(x-\mu)^2}{2\sigma^2})\) where \(\mu\) is the mean value, and \(\sigma\) is the standard deviation. In the standard distribution, \(\mu=0\) and \(\sigma=1\).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

mu = 1
sigma = 0.5
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.random.normal(mu, sigma)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.random.normal(mu, sigma, 2)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1.04225842065
[ 0.58105204  0.64853157]
&lt;/pre&gt;

&lt;p&gt;
Let us compare the sampled distribution to the analytical distribution. We generate a large set of samples, and calculate the probability of getting each value using the matplotlib.pyplot.hist command.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

mu = 0; sigma = 1

N = 5000
samples = np.random.normal(mu, sigma, N)

counts, bins, ignored = plt.hist(samples, 50, normed=&lt;span style="color: #8b0000;"&gt;True&lt;/span&gt;)

plt.plot(bins, 1.0/np.sqrt(2 * np.pi * sigma**2)*np.exp(-((bins - mu)**2)/(2*sigma**2)))
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/random-thoughts-2.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;img src="/img/./images/random-thoughts-2.png"&gt;&lt;p&gt;

&lt;p&gt;
What fraction of points lie between plus and minus one standard deviation of the mean?
&lt;/p&gt;

&lt;p&gt;
samples &amp;gt;= mu-sigma will return a vector of ones where the inequality is true, and zeros where it is not. (samples &amp;gt;= mu-sigma) &amp;amp; (samples &amp;lt;= mu+sigma) will return a vector of ones where there is a one in both vectors, and a zero where there is not. In other words, a vector where both inequalities are true. Finally, we can sum the vector to get the number of elements where the two inequalities are true, and finally normalize by the total number of samples to get the fraction of samples that are greater than -sigma and less than sigma.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

mu = 0; sigma = 1

N = 5000
samples = np.random.normal(mu, sigma, N)

a = np.sum((samples &amp;gt;= (mu - sigma)) &amp;amp; (samples &amp;lt;= (mu + sigma))) / &lt;span style="color: #8b0000;"&gt;float&lt;/span&gt;(N) 
b = np.sum((samples &amp;gt;= (mu - 2*sigma)) &amp;amp; (samples &amp;lt;= (mu + 2*sigma))) / &lt;span style="color: #8b0000;"&gt;float&lt;/span&gt;(N) 
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'{0:%} of samples are within +- standard deviations of the mean'&lt;/span&gt;.format(a)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'{0:%} of samples are within +- 2standard deviations of the mean'&lt;/span&gt;.format(b)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
67.500000% of samples are within +- standard deviations of the mean
95.580000% of samples are within +- 2standard deviations of the mean
&lt;/pre&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Summary&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;p&gt;
We only considered the numpy.random functions here, and not all of them. There are many distributions of random numbers to choose from. There are also random numbers in the python random module. Remember these are only &lt;a href="http://en.wikipedia.org/wiki/Pseudorandom_number_generator" &gt;pseudorandom&lt;/a&gt; numbers, but they are still useful for many applications.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Random-thoughts.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Are averages different</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Are-averages-different</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">ns9GzZce9hWygHMEDbuGQ7eAES4=</guid>
      <description>Are averages different</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2012/01/28/are-two-averages-different/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Adapted from &lt;a href="http://stattrek.com/ap-statistics-4/unpaired-means.aspx" &gt;http://stattrek.com/ap-statistics-4/unpaired-means.aspx&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Class A had 30 students who received an average test score of 78, with standard deviation of 10. Class B had 25 students an average test score of 85, with a standard deviation of 15. We want to know if the difference in these averages is statistically relevant. Note that we only have estimates of the true average and standard deviation for each class, and there is uncertainty in those estimates. As a result, we are unsure if the averages are really different. It could have just been luck that a few students in class B did better.
&lt;/p&gt;

&lt;p&gt;
The hypothesis:
&lt;/p&gt;

&lt;p&gt;
the true averages are the same. We need to perform a two-sample t-test of the hypothesis that \(\mu_1 - \mu_2 = 0\) (this is often called the null hypothesis). we use a two-tailed test because we do not care if the difference is positive or negative, either way means the averages are not the same.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

n1 = 30  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;students in class A&lt;/span&gt;
x1 = 78.0  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;average grade in class A&lt;/span&gt;
s1 = 10.0  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;std dev of exam grade in class A&lt;/span&gt;

n2 = 25  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;students in class B&lt;/span&gt;
x2 = 85.0  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;average grade in class B&lt;/span&gt;
s2 = 15.0  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;std dev of exam grade in class B&lt;/span&gt;

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;the standard error of the difference between the two averages. &lt;/span&gt;
SE = np.sqrt(s1**2 / n1 + s2**2 / n2)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;compute DOF&lt;/span&gt;
DF = (n1 - 1) + (n2 - 1)
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
see the discussion at &lt;a href="http://stattrek.com/Help/Glossary.aspx?Target=Two-sample%20t-test" &gt;http://stattrek.com/Help/Glossary.aspx?Target=Two-sample%20t-test&lt;/a&gt; for a more complex definition of degrees of freedom. Here we simply subtract one from each sample size to account for the estimation of the average of each sample.
&lt;/p&gt;


&lt;p&gt;
compute the t-score for our data
&lt;/p&gt;

&lt;p&gt;
The difference between two averages determined from small sample numbers follows the t-distribution. the t-score is the difference between the difference of the means and the hypothesized difference of the means, normalized by the standard error. we compute the absolute value of the t-score to make sure it is positive for convenience later.
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;tscore = np.abs(((x1 - x2) - 0) / SE)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; tscore
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
1.99323179108
&lt;/pre&gt;

&lt;p&gt;
Interpretation
&lt;/p&gt;

&lt;p&gt;
A way to approach determinining if the difference is significant or not is to ask, does our computed average fall within a confidence range of the hypothesized value (zero)? If it does, then we can attribute the difference to statistical variations at that confidence level. If it does not, we can say that statistical variations do not account for the difference at that confidence level, and hence the averages must be different.
&lt;/p&gt;

&lt;p&gt;
Let us compute the t-value that corresponds to a 95% confidence level for a mean of zero with the degrees of freedom computed earlier. This means that 95% of the t-scores we expect to get will fall within \(\pm\) t95.
&lt;/p&gt;


&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t

ci = 0.95;
alpha = 1 - ci;
t95 = t.ppf(1.0 - alpha/2.0, DF)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; t95
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 2.00574599354
&lt;/pre&gt;

&lt;p&gt;
since tscore &amp;lt; t95, we conclude that at the 95% confidence level we cannot say these averages are statistically different because our computed t-score falls in the expected range of deviations. Note that our t-score is very close to the 95% limit. Let us consider a smaller confidence interval.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;ci = 0.94
alpha = 1 - ci;
t95 = t.ppf(1.0 - alpha/2.0, DF)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; t95
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 1.92191364181
&lt;/pre&gt;

&lt;p&gt;
at the 94% confidence level, however, tscore &amp;gt; t94, which means we can say with 94% confidence that the two averages are different; class B performed better than class A did. Alternatively, there is only about a 6% chance we are wrong about that statement.
another way to get there
&lt;/p&gt;

&lt;p&gt;
An alternative way to get the confidence that the averages are different is to directly compute it from the cumulative t-distribution function. We compute the difference between all the t-values less than tscore and the t-values less than -tscore, which is the fraction of measurements that are between them. You can see here that we are practically 95% sure that the averages are different.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;f = t.cdf(tscore, DF) - t.cdf(-tscore, DF)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; f
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.948605075732
&lt;/pre&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Are-averages-different.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Basic statistics</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Basic-statistics</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">xXhTkngAd2XzJ6UHTJLZQUjHS7o=</guid>
      <description>Basic statistics</description>
      <content:encoded><![CDATA[


&lt;p&gt;
Given several measurements of a single quantity, determine the average value of the measurements, the standard deviation of the measurements and the 95% confidence interval for the average.
&lt;/p&gt;

&lt;p&gt;
This is a recipe for computing the confidence interval. The strategy is:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;compute the average
&lt;/li&gt;
&lt;li&gt;Compute the standard deviation of your data
&lt;/li&gt;
&lt;li&gt;Define the confidence interval, e.g. 95% = 0.95
&lt;/li&gt;
&lt;li&gt;compute the student-t multiplier. This is a function of the confidence
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
interval you specify, and the number of data points you have minus 1. You
subtract 1 because one degree of freedom is lost from calculating the
average. The confidence interval is defined as
ybar +- T_multiplier*std/sqrt(n).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t

y = [8.1, 8.0, 8.1]

ybar = np.mean(y)
s = np.std(y)

ci = 0.95
alpha = 1.0 - ci

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(y)
T_multiplier = t.ppf(1-alpha/2.0, n-1)

ci95 = T_multiplier * s / np.sqrt(n-1)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; [ybar - ci95, ybar + ci95]
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
[7.9232449090029595, 8.210088424330376]
&lt;/pre&gt;

&lt;p&gt;
We are 95% certain the next measurement will fall in the interval above.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Basic-statistics.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Introduction to statistical data analysis</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Introduction-to-statistical-data-analysis</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">ISkaFgVZO_3LI30XvquFpMp7mwE=</guid>
      <description>Introduction to statistical data analysis</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/08/27/introduction-to-statistical-data-analysis/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Given several measurements of a single quantity, determine the average value of the measurements, the standard deviation of the measurements and the 95% confidence interval for the average.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np

y = [8.1, 8.0, 8.1]

ybar = np.mean(y)
s = np.std(y, ddof=1)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; ybar, s
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 8.06666666667 0.057735026919
&lt;/pre&gt;

&lt;p&gt;
Interesting, we have to specify the divisor in numpy.std by the ddof argument. The default for this in Matlab is 1, the default for this function is 0.
&lt;/p&gt;

&lt;p&gt;
Here is the principle of computing a confidence interval.
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;compute the average
&lt;/li&gt;
&lt;li&gt;Compute the standard deviation of your data
&lt;/li&gt;
&lt;li&gt;Define the confidence interval, e.g. 95% = 0.95
&lt;/li&gt;
&lt;li&gt;compute the student-t multiplier. This is a function of the
confidence interval you specify, and the number of data points
you have minus 1. You subtract 1 because one degree of freedom
is lost from calculating the average.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
The confidence interval is defined as ybar +- T_multiplier*std/sqrt(n).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t
ci = 0.95
alpha = 1.0 - ci

n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(y)
T_multiplier = t.ppf(1.0 - alpha / 2.0, n - 1)

ci95 = T_multiplier * s / np.sqrt(n)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'T_multiplier = {0}'&lt;/span&gt;.format(T_multiplier)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'ci95 = {0}'&lt;/span&gt;.format(ci95)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'The true average is between {0} and {1} at a 95% confidence level'&lt;/span&gt;.format(ybar - ci95, ybar + ci95)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; T_multiplier = 4.30265272991
ci95 = 0.143421757664
The true average is between 7.923244909 and 8.21008842433 at a 95% confidence level
&lt;/pre&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Introduction-to-statistical-data-analysis.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Numerical propagation of errors</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/16/Numerical-propagation-of-errors</link>
      <pubDate>Sat, 16 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">UCGBQdUnIIcd1L1zyNuNidjLmW8=</guid>
      <description>Numerical propagation of errors</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/09/05/numerical-propogation-of-errors/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Propagation of errors is essential to understanding how the uncertainty in a parameter affects computations that use that parameter. The uncertainty propagates by a set of rules into your solution. These rules are not easy to remember, or apply to complicated situations, and are only approximate for equations that are nonlinear in the parameters.
&lt;/p&gt;

&lt;p&gt;
We will use a Monte Carlo simulation to illustrate error propagation. The idea is to generate a distribution of possible parameter values, and to evaluate your equation for each parameter value. Then, we perform statistical analysis on the results to determine the standard error of the results.
&lt;/p&gt;

&lt;p&gt;
We will assume all parameters are defined by a normal distribution with known mean and standard deviation.
&lt;/p&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Addition and subtraction&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

N = 1e6 &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;number of samples of parameters&lt;/span&gt;

A_mu = 2.5; A_sigma = 0.4
B_mu = 4.1; B_sigma = 0.3

A = np.random.normal(A_mu, A_sigma, size=(1, N))
B = np.random.normal(B_mu, B_sigma, size=(1, N))

p = A + B
m = A - B

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(p)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(m)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt(A_sigma**2 + B_sigma**2) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;the analytical std dev&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 0.500505424616
0.500113385681
&amp;gt;&amp;gt;&amp;gt; 0.5
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-2" class="outline-2"&gt;
&lt;h2 id="sec-2"&gt;&lt;span class="section-number-2"&gt;2&lt;/span&gt; Multiplication&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;F_mu = 25.0; F_sigma = 1;
x_mu = 6.4; x_sigma = 0.4;

F = np.random.normal(F_mu, F_sigma, size=(1, N))
x = np.random.normal(x_mu, x_sigma, size=(1, N))

t = F * x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(t)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt((F_sigma / F_mu)**2 + (x_sigma / x_mu)**2) * F_mu * x_mu
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 11.8900166284
11.8726576637
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-3" class="outline-2"&gt;
&lt;h2 id="sec-3"&gt;&lt;span class="section-number-2"&gt;3&lt;/span&gt; Division&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
This is really like multiplication: F / x = F * (1 / x).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;d = F / x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(d)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt((F_sigma / F_mu)**2 + (x_sigma / x_mu)**2) * F_mu / x_mu
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.293757533168
0.289859806243
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-4" class="outline-2"&gt;
&lt;h2 id="sec-4"&gt;&lt;span class="section-number-2"&gt;4&lt;/span&gt; exponents&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
This rule is different than multiplication (A^2 = A*A) because in the previous examples we assumed the errors in A and B for A*B were uncorrelated. in A*A, the errors are not uncorrelated, so there is a different rule for error propagation.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;t_mu = 2.03; t_sigma = 0.01*t_mu; &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;1% error&lt;/span&gt;
A_mu = 16.07; A_sigma = 0.06;

t = np.random.normal(t_mu, t_sigma, size=(1, N))
A = np.random.normal(A_mu, A_sigma, size=(1, N))

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Compute t^5 and sqrt(A) with error propagation&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(t**5)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; (5 * t_sigma / t_mu) * t_mu**5
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; ... 1.72454836176
1.72365440621
&lt;/pre&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(np.sqrt(A))
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; 1.0 / 2.0 * A_sigma / A_mu * np.sqrt(A_mu)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.00748903477329
0.00748364738749
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-5" class="outline-2"&gt;
&lt;h2 id="sec-5"&gt;&lt;span class="section-number-2"&gt;5&lt;/span&gt; the chain rule in error propagation&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
let v = v0 + a*t, with uncertainties in vo,a and t
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;vo_mu = 1.2; vo_sigma = 0.02;
a_mu = 3.0;  a_sigma  = 0.3;
t_mu = 12.0; t_sigma  = 0.12;

vo = np.random.normal(vo_mu, vo_sigma, (1, N))
a = np.random.normal(a_mu, a_sigma, (1, N))
t = np.random.normal(t_mu, t_sigma, (1, N))

v = vo + a*t

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(v)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt(vo_sigma**2 + t_mu**2 * a_sigma**2 + a_mu**2 * t_sigma**2)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 3.62232509326
3.61801050303
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-6" class="outline-2"&gt;
&lt;h2 id="sec-6"&gt;&lt;span class="section-number-2"&gt;6&lt;/span&gt; Summary&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;

&lt;p&gt;
You can numerically perform error propagation analysis if you know the underlying distribution of errors on the parameters in your equations. One benefit of the numerical propogation is you do not have to remember the error propagation rules, and you directly look at the distribution in nonlinear cases. Some limitations of this approach include
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You have to know the distribution of the errors in the parameters
&lt;/li&gt;
&lt;li&gt;You have to assume the errors in parameters are uncorrelated.
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/16/Numerical-propagation-of-errors.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Numerical propogation of errors</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/16/Numerical-propogation-of-errors</link>
      <pubDate>Sat, 16 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">ahPgbhuYz2mK3iDz3T1xljFRSDs=</guid>
      <description>Numerical propogation of errors</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/09/05/numerical-propogation-of-errors/" &gt;Matlab post&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
Propagation of errors is essential to understanding how the uncertainty in a parameter affects computations that use that parameter. The uncertainty propogates by a set of rules into your solution. These rules are not easy to remember, or apply to complicated situations, and are only approximate for equations that are nonlinear in the parameters.
&lt;/p&gt;

&lt;p&gt;
We will use a Monte Carlo simulation to illustrate error propogation. The idea is to generate a distribution of possible parameter values, and to evaluate your equation for each parameter value. Then, we perform statistical analysis on the results to determine the standard error of the results.
&lt;/p&gt;

&lt;p&gt;
We will assume all parameters are defined by a normal distribution with known mean and standard deviation.
&lt;/p&gt;

&lt;div id="outline-container-1" class="outline-2"&gt;
&lt;h2 id="sec-1"&gt;&lt;span class="section-number-2"&gt;1&lt;/span&gt; Addition and subtraction&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-1"&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt

N = 1e6 &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;number of samples of parameters&lt;/span&gt;

A_mu = 2.5; A_sigma = 0.4
B_mu = 4.1; B_sigma = 0.3

A = np.random.normal(A_mu, A_sigma, size=(1, N))
B = np.random.normal(B_mu, B_sigma, size=(1, N))

p = A + B
m = A - B

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(p)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(m)

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt(A_sigma**2 + B_sigma**2) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;the analytical std dev&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 0.500505424616
0.500113385681
&amp;gt;&amp;gt;&amp;gt; 0.5
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-2" class="outline-2"&gt;
&lt;h2 id="sec-2"&gt;&lt;span class="section-number-2"&gt;2&lt;/span&gt; Multiplication&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-2"&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;F_mu = 25.0; F_sigma = 1;
x_mu = 6.4; x_sigma = 0.4;

F = np.random.normal(F_mu, F_sigma, size=(1, N))
x = np.random.normal(x_mu, x_sigma, size=(1, N))

t = F * x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(t)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt((F_sigma / F_mu)**2 + (x_sigma / x_mu)**2) * F_mu * x_mu
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 11.8900166284
11.8726576637
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-3" class="outline-2"&gt;
&lt;h2 id="sec-3"&gt;&lt;span class="section-number-2"&gt;3&lt;/span&gt; Division&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-3"&gt;
&lt;p&gt;
This is really like multiplication: F / x = F * (1 / x).
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;d = F / x
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(d)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt((F_sigma / F_mu)**2 + (x_sigma / x_mu)**2) * F_mu / x_mu
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.293757533168
0.289859806243
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-4" class="outline-2"&gt;
&lt;h2 id="sec-4"&gt;&lt;span class="section-number-2"&gt;4&lt;/span&gt; exponents&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-4"&gt;
&lt;p&gt;
This rule is different than multiplication (A^2 = A*A) because in the previous examples we assumed the errors in A and B for A*B were uncorrelated. in A*A, the errors are not uncorrelated, so there is a different rule for error propagation.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;t_mu = 2.03; t_sigma = 0.01*t_mu; &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;1% error&lt;/span&gt;
A_mu = 16.07; A_sigma = 0.06;

t = np.random.normal(t_mu, t_sigma, size=(1, N))
A = np.random.normal(A_mu, A_sigma, size=(1, N))

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Compute t^5 and sqrt(A) with error propogation&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(t**5)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; (5 * t_sigma / t_mu) * t_mu**5
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; ... 1.72454836176
1.72365440621
&lt;/pre&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(np.sqrt(A))
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; 1.0 / 2.0 * A_sigma / A_mu * np.sqrt(A_mu)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
0.00748903477329
0.00748364738749
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-5" class="outline-2"&gt;
&lt;h2 id="sec-5"&gt;&lt;span class="section-number-2"&gt;5&lt;/span&gt; the chain rule in error propogation&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-5"&gt;
&lt;p&gt;
let v = v0 + a*t, with uncertainties in vo,a and t
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;vo_mu = 1.2; vo_sigma = 0.02;
a_mu = 3.0;  a_sigma  = 0.3;
t_mu = 12.0; t_sigma  = 0.12;

vo = np.random.normal(vo_mu, vo_sigma, (1, N))
a = np.random.normal(a_mu, a_sigma, (1, N))
t = np.random.normal(t_mu, t_sigma, (1, N))

v = vo + a*t

&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.std(v)
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; np.sqrt(vo_sigma**2 + t_mu**2 * a_sigma**2 + a_mu**2 * t_sigma**2)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
&amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; &amp;gt;&amp;gt;&amp;gt; 3.62232509326
3.61801050303
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id="outline-container-6" class="outline-2"&gt;
&lt;h2 id="sec-6"&gt;&lt;span class="section-number-2"&gt;6&lt;/span&gt; Summary&lt;/h2&gt;
&lt;div class="outline-text-2" id="text-6"&gt;

&lt;p&gt;
You can numerically perform error propogation analysis if you know the underlying distribution of errors on the parameters in your equations. One benefit of the numerical propogation is you do not have to remember the error propogation rules, and you directly look at the distribution in nonlinear cases. Some limitations of this approach include
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You have to know the distribution of the errors in the parameters
&lt;/li&gt;
&lt;li&gt;You have to assume the errors in parameters are uncorrelated.
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/16/Numerical-propogation-of-errors.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
    <item>
      <title>Confidence interval on an average</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/10/Confidence-interval-on-an-average</link>
      <pubDate>Sun, 10 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[statistics]]></category>
      <guid isPermaLink="false">L4UUVQckmqvVxsk3NLeMhwbP5AE=</guid>
      <description>Confidence interval on an average</description>
      <content:encoded><![CDATA[


&lt;p&gt;
nil has a statistical package available for getting statistical distributions. This is useful for computing confidence intervals using the student-t tables. Here is an example of computing a 95% confidence interval on an average.
&lt;/p&gt;
&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t

n = 10 &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;number of measurements&lt;/span&gt;
dof = n - 1 &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;degrees of freedom&lt;/span&gt;
avg_x = 16.1 &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;average measurement&lt;/span&gt;
std_x = 0.01 &lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;standard deviation of measurements&lt;/span&gt;

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# &lt;/span&gt;&lt;span style="color: #ff0000; font-weight: bold;"&gt;Find 95% prediction interval for next measurement&lt;/span&gt;

alpha = 1.0 - 0.95

pred_interval = t.ppf(1-alpha/2.0, dof) * std_x / np.sqrt(n)

s = [&lt;span style="color: #228b22;"&gt;'We are 95% confident the next measurement'&lt;/span&gt;,
       &lt;span style="color: #228b22;"&gt;' will be between {0:1.3f} and {1:1.3f}'&lt;/span&gt;]
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;''&lt;/span&gt;.join(s).format(avg_x - pred_interval, avg_x + pred_interval)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
We are 95% confident the next measurement will be between 16.093 and 16.107
&lt;/pre&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/10/Confidence-interval-on-an-average.org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
  </channel>
</rss>
