Introduction to statistical data analysis

Posted February 18, 2013 at 09:00 AM | categories: statistics | tags:

Updated February 27, 2013 at 02:34 PM

Given several measurements of a single quantity, determine the average value of the measurements, the standard deviation of the measurements and the 95% confidence interval for the average.

import numpy as np

y = [8.1, 8.0, 8.1]

ybar = np.mean(y)
s = np.std(y, ddof=1)

print ybar, s

>>> >>> >>> >>> >>> >>> 8.06666666667 0.057735026919

Interesting, we have to specify the divisor in numpy.std by the ddof argument. The default for this in Matlab is 1, the default for this function is 0.

Here is the principle of computing a confidence interval.

compute the average
Compute the standard deviation of your data
Define the confidence interval, e.g. 95% = 0.95
compute the student-t multiplier. This is a function of the confidence interval you specify, and the number of data points you have minus 1. You subtract 1 because one degree of freedom is lost from calculating the average.

The confidence interval is defined as ybar +- T_multiplier*std/sqrt(n).

from scipy.stats.distributions import  t
ci = 0.95
alpha = 1.0 - ci

n = len(y)
T_multiplier = t.ppf(1.0 - alpha / 2.0, n - 1)

ci95 = T_multiplier * s / np.sqrt(n)

print 'T_multiplier = {0}'.format(T_multiplier)
print 'ci95 = {0}'.format(ci95)
print 'The true average is between {0} and {1} at a 95% confidence level'.format(ybar - ci95, ybar + ci95)

>>> >>> >>> >>> >>> >>> >>> >>> T_multiplier = 4.30265272991
ci95 = 0.143421757664
The true average is between 7.923244909 and 8.21008842433 at a 95% confidence level

org-mode source

The Kitchin Research Group

Chemical Engineering at Carnegie Mellon University

Introduction to statistical data analysis