HomeTechniques and TipsStatTools"Equal Variances" and "Unequal Variances" in Two-Sample Inferences

17.11. "Equal Variances" and "Unequal Variances" in Two-Sample Inferences

Applies to:
StatTools 6.x/7.x

In StatTools, I'm selecting a confidence interval or hypothesis test about the difference in means of two independent samples. StatTools gives two columns of results, headed "Equal Variances" and "Unequal Variances". What do those mean?

Here's the short answer: just use the Unequal Variances column. Unless you want more details, you can stop reading now.

More details:

Formula for degrees of freedom in inferences about the difference of meansThe sampling distribution of the difference of sample means follows a Student's t distribution. As you know, there are an infinite number of t distributions, each one determined by its degrees of freedom. For one-sample inferences, df = n − 1. For two-sample inferences, the general formula for degrees of freedom is shown at right.

However, if you know that the population variances are equal, you can use df = n1 + n2 − 2. (Note: population variances, not sample variances.) Tha is usually (not always) a bit higher than the degrees of freedom computed by the general formula. Higher degrees of freedom translate to a higher critical t and lower p-value. In turn that means your confidence interval is usually a bit narrower and you are more likely to be able to reject the null hypothesis.

Some books and calculators use the term "pooling"—if the variances are equal then you can "pool the data sets", treating them as coming from one population.

But how can you know if the population variances are equal? Well, there's the rub: in the vast majority of cases you can't know. You can perform an F test, but even if you get a large p-value in the F test you have only failed to reject the hypothesis that the population variances are equal; you haven't proved it. Also, an F test requires that both populations be normally distributed, not just approximately normal as with a t test, and you virtually never know for sure that the populations are normal. For these reasons, the whole idea of pooling is controversial, and some textbooks don't even mention it as a possibility.

Finally, even after you go through all that, pooling or not ("Equal Variances" column or "Unequal Variances" column in StatTools results) usually makes only a minor difference. The conservative choice is to use the "Unequal Variances" column, meaning that the data sets are not pooled. This doesn't require you to make assumptions that you can't really be sure of, and it almost never makes much of a change in your results.

Last edited: 2018-10-22

This page was: Helpful | Not Helpful