<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <atom:link href="http://kitchingroup.cheme.cmu.edu/blog/feed/index.xml" rel="self" type="application/rss+xml" />
    <title>The Kitchin Research Group</title>
    <link>https://kitchingroup.cheme.cmu.edu/blog</link>
    <description>Chemical Engineering at Carnegie Mellon University</description>
    <pubDate>Sat, 01 Nov 2025 13:47:46 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    
    <item>
      <title>Linear regression with confidence intervals.</title>
      <link>https://kitchingroup.cheme.cmu.edu/blog/2013/02/18/Linear-regression-with-confidence-intervals</link>
      <pubDate>Mon, 18 Feb 2013 09:00:00 EST</pubDate>
      <category><![CDATA[data analysis]]></category>
      <category><![CDATA[linear regression]]></category>
      <category><![CDATA[confidence interval]]></category>
      <guid isPermaLink="false">ouU1Cp122xYewitLr1OY7aToQLI=</guid>
      <description>Linear regression with confidence intervals.</description>
      <content:encoded><![CDATA[


&lt;p&gt;
&lt;a href="http://matlab.cheme.cmu.edu/2011/08/28/linear-regression-with-confidence-intervals/" &gt;Matlab post&lt;/a&gt;
Fit a fourth order polynomial to this data and determine the confidence interval for each parameter. Data from example 5-1 in Fogler, Elements of Chemical Reaction Engineering.
&lt;/p&gt;

&lt;p&gt;
We want the equation \(Ca(t) = b0 + b1*t + b2*t^2 + b3*t^3 + b4*t^4\) fit to the data in the least squares sense. We can write this in a linear algebra form as: T*p = Ca where T is a matrix of columns [1 t t^2 t^3 t^4], and p is a column vector of the fitting parameters. We want to solve for the p vector and estimate the confidence intervals.
&lt;/p&gt;

&lt;div class="org-src-container"&gt;

&lt;pre class="src src-python"&gt;&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; numpy &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; np
&lt;span style="color: #8b0000;"&gt;from&lt;/span&gt; scipy.stats.distributions &lt;span style="color: #8b0000;"&gt;import&lt;/span&gt;  t

time = np.array([0.0, 50.0, 100.0, 150.0, 200.0, 250.0, 300.0])
Ca = np.array([50.0, 38.0, 30.6, 25.6, 22.2, 19.5, 17.4])*1e-3

T = np.column_stack([time**0, time, time**2, time**3, time**4])

p, res, rank, s = np.linalg.lstsq(T, Ca)
&lt;span style="color: #ff0000; font-weight: bold;"&gt;# the parameters are now in p&lt;/span&gt;

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# compute the confidence intervals&lt;/span&gt;
n = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(Ca)
k = &lt;span style="color: #8b0000;"&gt;len&lt;/span&gt;(p)

sigma2 = np.sum((Ca - np.dot(T, p))**2) / (n - k)  &lt;span style="color: #ff0000; font-weight: bold;"&gt;# RMSE&lt;/span&gt;

C = sigma2 * np.linalg.inv(np.dot(T.T, T)) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# covariance matrix&lt;/span&gt;
se = np.sqrt(np.diag(C)) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# standard error&lt;/span&gt;

alpha = 0.05 &lt;span style="color: #ff0000; font-weight: bold;"&gt;# 100*(1 - alpha) confidence level&lt;/span&gt;

sT = t.ppf(1.0 - alpha/2.0, n - k) &lt;span style="color: #ff0000; font-weight: bold;"&gt;# student T multiplier&lt;/span&gt;
CI = sT * se

&lt;span style="color: #8b0000;"&gt;for&lt;/span&gt; beta, ci &lt;span style="color: #8b0000;"&gt;in&lt;/span&gt; &lt;span style="color: #8b0000;"&gt;zip&lt;/span&gt;(p, CI):
    &lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'{2: 1.2e} [{0: 1.4e} {1: 1.4e}]'&lt;/span&gt;.format(beta - ci, beta + ci, beta)

SS_tot = np.sum((Ca - np.mean(Ca))**2)
SS_err = np.sum((np.dot(T, p) - Ca)**2)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;#  http://en.wikipedia.org/wiki/Coefficient_of_determination&lt;/span&gt;
Rsq = 1 - SS_err/SS_tot
&lt;span style="color: #8b0000;"&gt;print&lt;/span&gt; &lt;span style="color: #228b22;"&gt;'R^2 = {0}'&lt;/span&gt;.format(Rsq)

&lt;span style="color: #ff0000; font-weight: bold;"&gt;# plot fit&lt;/span&gt;
&lt;span style="color: #8b0000;"&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style="color: #8b0000;"&gt;as&lt;/span&gt; plt
plt.plot(time, Ca, &lt;span style="color: #228b22;"&gt;'bo'&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'raw data'&lt;/span&gt;)
plt.plot(time, np.dot(T, p), &lt;span style="color: #228b22;"&gt;'r-'&lt;/span&gt;, label=&lt;span style="color: #228b22;"&gt;'fit'&lt;/span&gt;)
plt.xlabel(&lt;span style="color: #228b22;"&gt;'Time'&lt;/span&gt;)
plt.ylabel(&lt;span style="color: #228b22;"&gt;'Ca (mol/L)'&lt;/span&gt;)
plt.legend(loc=&lt;span style="color: #228b22;"&gt;'best'&lt;/span&gt;)
plt.savefig(&lt;span style="color: #228b22;"&gt;'images/linregress-conf.png'&lt;/span&gt;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class="example"&gt;
 5.00e-02 [ 4.9680e-02  5.0300e-02]
-2.98e-04 [-3.1546e-04 -2.8023e-04]
 1.34e-06 [ 1.0715e-06  1.6155e-06]
-3.48e-09 [-4.9032e-09 -2.0665e-09]
 3.70e-12 [ 1.3501e-12  6.0439e-12]
R^2 = 0.999986967246
&lt;/pre&gt;

&lt;p&gt;&lt;img src="/img/./images/linregress-conf.png"&gt;&lt;p&gt;

&lt;p&gt;
A fourth order polynomial fits the data well, with a good R^2 value. All of the parameters appear to be significant, i.e. zero is not included in any of the parameter confidence intervals. This does not mean this is the best model for the data, just that the model fits well.
&lt;/p&gt;
&lt;p&gt;Copyright (C) 2013 by John Kitchin. See the &lt;a href="/copying.html"&gt;License&lt;/a&gt; for information about copying.&lt;p&gt;&lt;p&gt;&lt;a href="/org/2013/02/18/Linear-regression-with-confidence-intervals..org"&gt;org-mode source&lt;/a&gt;&lt;p&gt;]]></content:encoded>
    </item>
  </channel>
</rss>
