Analysis of Variance (ANOVA)
Note on Notation
Once again, we will use different notations for the various things can denote:
The test and distribution’s name:
The distribution:
The test statistic:
The pdf:
The cdf:
Cheat sheet
Description: Tests if the variance of two normally-distributed samples, and , are equal,
Statistic: (ratio of variances)
Distribution: ()
Sidedness: Two-sided
Null Hypothesis:
Alternative Hypothesis:
Test Statistic:
Description
In the last section, we went over how the distribution is derived from summing standard normal distribution. Well, the distribution is derived from dividing two distributions. Specifically if and then This distribution might seem like they come from nowhere in this test, but recall the definition of , the sample variance.
Since under the assumptions of the test, A final thing to note is that, unlike other tests that assume that samples are normally distributed, the test is extremely sensitive to violations of non-normality. Thus, it would take an ever larger sample size than for other tests for a test to stay valid for non-normal samples.
Another thing to note is since tests are only two-sided, we can’t determine the “direction” of the test. In this test, for example, the result doesn’t tell you if or is larger than the other.
Test for Comparison of Multiple Means (Omnibus Test of Means)
Cheat Sheet
Description: Tests if the means of normally-distributed samples, , with observations each ( total)differ in at least one pair-wise comparison
Statistic: (difference of means between any two groups)
Distribution: ()
Sidedness: Two-sided
Null Hypothesis:
Alternative Hypothesis: for at least one pair where
Test Statistic:
Description
The and in denote the “explained variance” and “unexplained variance” respectively. They are also called be the “between-group variability” and “within-group variability”. The idea is that we can take the ratio of the sum of the variances of each group and the variance of the samples taken as a whole as a proxy to determine if there is a difference of means between groups. This is because we would expect these two “variances” to not differ if all groups share the same mean.
As for definitions, is defined as:
where is the mean of all samples when combined. is defined as:
Where is the th observation in the th sample.
This test is useful because using multiple tests can greatly increase the chance of making a type I error. This is because running multiple -tests exponentially increases the probability of getting false positives (also called “type I errors”). “Exponentially” here is not a placeholder for “a lot.” If each test has false-positive probability , the probability of never getting a false positive in many tests is , which clearly tends to zero as . The -test doesn’t have this issue since it’s an “omnibus test”, meaning it tests all of these hypotheses “all at once.”
49%
CompletedYou have 39 sections remaining on this learning path.
Advance your learning journey! Go Premium and unlock 40+ hours of specialized content.