Class A had 30 students who received an average test score of 78, with standard deviation of 10. Class B had 25 students an average test score of 85, with a standard deviation of 15. We want to know if the difference in these averages is statistically relevant. Note that we only have estimates of the true average and standard deviation for each class, and there is uncertainty in those estimates. As a result, we are unsure if the averages are really different. It could have just been luck that a few students in class B did better.
the true averages are the same. We need to perform a two-sample t-test of the hypothesis that \(\mu_1 - \mu_2 = 0\) (this is often called the null hypothesis). we use a two-tailed test because we do not care if the difference is positive or negative, either way means the averages are not the same.
import numpy as np n1 = 30 # students in class A x1 = 78.0 # average grade in class A s1 = 10.0 # std dev of exam grade in class A n2 = 25 # students in class B x2 = 85.0 # average grade in class B s2 = 15.0 # std dev of exam grade in class B # the standard error of the difference between the two averages. SE = np.sqrt(s1**2 / n1 + s2**2 / n2) # compute DOF DF = (n1 - 1) + (n2 - 1)
see the discussion at http://stattrek.com/Help/Glossary.aspx?Target=Two-sample%20t-test for a more complex definition of degrees of freedom. Here we simply subtract one from each sample size to account for the estimation of the average of each sample.
compute the t-score for our data
The difference between two averages determined from small sample numbers follows the t-distribution. the t-score is the difference between the difference of the means and the hypothesized difference of the means, normalized by the standard error. we compute the absolute value of the t-score to make sure it is positive for convenience later.
tscore = np.abs(((x1 - x2) - 0) / SE) print tscore
A way to approach determinining if the difference is significant or not is to ask, does our computed average fall within a confidence range of the hypothesized value (zero)? If it does, then we can attribute the difference to statistical variations at that confidence level. If it does not, we can say that statistical variations do not account for the difference at that confidence level, and hence the averages must be different.
Let us compute the t-value that corresponds to a 95% confidence level for a mean of zero with the degrees of freedom computed earlier. This means that 95% of the t-scores we expect to get will fall within \(\pm\) t95.
from scipy.stats.distributions import t ci = 0.95; alpha = 1 - ci; t95 = t.ppf(1.0 - alpha/2.0, DF) print t95
>>> >>> >>> >>> >>> 2.00574599354
since tscore < t95, we conclude that at the 95% confidence level we cannot say these averages are statistically different because our computed t-score falls in the expected range of deviations. Note that our t-score is very close to the 95% limit. Let us consider a smaller confidence interval.
ci = 0.94 alpha = 1 - ci; t95 = t.ppf(1.0 - alpha/2.0, DF) print t95
>>> >>> >>> 1.92191364181
at the 94% confidence level, however, tscore > t94, which means we can say with 94% confidence that the two averages are different; class B performed better than class A did. Alternatively, there is only about a 6% chance we are wrong about that statement. another way to get there
An alternative way to get the confidence that the averages are different is to directly compute it from the cumulative t-distribution function. We compute the difference between all the t-values less than tscore and the t-values less than -tscore, which is the fraction of measurements that are between them. You can see here that we are practically 95% sure that the averages are different.
f = t.cdf(tscore, DF) - t.cdf(-tscore, DF) print f
Copyright (C) 2013 by John Kitchin. See the License for information about copying.