## Introduction to statistical data analysis

Posted February 18, 2013 at 09:00 AM | categories: statistics | tags: | View Comments

Updated February 27, 2013 at 02:34 PM

Given several measurements of a single quantity, determine the average value of the measurements, the standard deviation of the measurements and the 95% confidence interval for the average.

import numpy as np y = [8.1, 8.0, 8.1] ybar = np.mean(y) s = np.std(y, ddof=1) print ybar, s

>>> >>> >>> >>> >>> >>> 8.06666666667 0.057735026919

Interesting, we have to specify the divisor in numpy.std by the ddof argument. The default for this in Matlab is 1, the default for this function is 0.

Here is the principle of computing a confidence interval.

- compute the average
- Compute the standard deviation of your data
- Define the confidence interval, e.g. 95% = 0.95
- compute the student-t multiplier. This is a function of the confidence interval you specify, and the number of data points you have minus 1. You subtract 1 because one degree of freedom is lost from calculating the average.

The confidence interval is defined as ybar +- T_multiplier*std/sqrt(n).

from scipy.stats.distributions import t ci = 0.95 alpha = 1.0 - ci n = len(y) T_multiplier = t.ppf(1.0 - alpha / 2.0, n - 1) ci95 = T_multiplier * s / np.sqrt(n) print 'T_multiplier = {0}'.format(T_multiplier) print 'ci95 = {0}'.format(ci95) print 'The true average is between {0} and {1} at a 95% confidence level'.format(ybar - ci95, ybar + ci95)

>>> >>> >>> >>> >>> >>> >>> >>> T_multiplier = 4.30265272991 ci95 = 0.143421757664 The true average is between 7.923244909 and 8.21008842433 at a 95% confidence level

Copyright (C) 2013 by John Kitchin. See the License for information about copying.