## Introduction to statistical data analysis

| categories: statistics | tags:

Given several measurements of a single quantity, determine the average value of the measurements, the standard deviation of the measurements and the 95% confidence interval for the average.

import numpy as np

y = [8.1, 8.0, 8.1]

ybar = np.mean(y)
s = np.std(y, ddof=1)

print ybar, s

>>> >>> >>> >>> >>> >>> 8.06666666667 0.057735026919


Interesting, we have to specify the divisor in numpy.std by the ddof argument. The default for this in Matlab is 1, the default for this function is 0.

Here is the principle of computing a confidence interval.

1. compute the average
2. Compute the standard deviation of your data
3. Define the confidence interval, e.g. 95% = 0.95
4. compute the student-t multiplier. This is a function of the confidence interval you specify, and the number of data points you have minus 1. You subtract 1 because one degree of freedom is lost from calculating the average.

The confidence interval is defined as ybar +- T_multiplier*std/sqrt(n).

from scipy.stats.distributions import  t
ci = 0.95
alpha = 1.0 - ci

n = len(y)
T_multiplier = t.ppf(1.0 - alpha / 2.0, n - 1)

ci95 = T_multiplier * s / np.sqrt(n)

print 'T_multiplier = {0}'.format(T_multiplier)
print 'ci95 = {0}'.format(ci95)
print 'The true average is between {0} and {1} at a 95% confidence level'.format(ybar - ci95, ybar + ci95)

>>> >>> >>> >>> >>> >>> >>> >>> T_multiplier = 4.30265272991
ci95 = 0.143421757664
The true average is between 7.923244909 and 8.21008842433 at a 95% confidence level