Back to Statistics & AB Testing
Statistics & AB Testing

Statistics & AB Testing

38 of 77 Completed

Fundamentals of Hypothesis Testing

In this section, you’ll learn what a hypothesis test is, when to use it, and how to calculate it.

A hypothesis is a formalized guess about the value of a random variable. A hypothesis test is a statistical tool that allows you to make an inference about a population from a sample drawn from that population.

Hypothesis tests are organized into a null hypothesis (H0H_0) and an alternative hypothesis (H1H_1).

The goal of hypothesis testing is to determine whether there is enough evidence in the sample data to reject the null hypothesis in favor of the alternative hypothesis.

Use Case

For example, you might hypothesize that a new drug will reduce blood pressure in a population of patients with high blood pressure. You would then split your patients into two groups, giving one of them the drug and one of them a placebo and measuring their blood pressure.

The hypotheses might look like this:

H0H_0: The drug had no impact on patients’ high blood pressure.

HAH_A: The drug did have an impact on patients’ high blood pressure.

You would then calculate the test statistic to determine if the data obtained from the sample is enough to reject the null in favor of the alternative. Different hypothesis tests can have different test statistics, it really depends on what you’re comparing.

The Null Hypothesis is No Effect

Note that the null and alternative hypotheses are set up in this way, with the hypothesis you’re interested in testing set as the alternative hypothesis.

That is, your guess of the value of the variable is the alternative hypothesis. The null hypothesis is often that there was no effect, no difference, or no change in the population.

The alternative hypothesis is the effect, difference, or change you’re trying to support, and you collect data to see if you can reject the null in favor of the alternative.

The Null and Alternative Hypothesis

Together, the null hypothesis H0H_0 and the alternative hypothesis HAH_A represent all possible outcomes being tested.

A common hypothesis test is to check if some statistic θ\theta is different from a given value, denoted θ0\theta_0.

For example, you might want to check if the mean of your sample (μ\mu) is different from a set number, such as 5. Here μ\mu is called the test statistic. The hypotheses look like this:

H0:μ=5H_0: \mu = 5

HA:μ5H_A: \mu \neq 5

Or you may want to see if the mean μ\mu is greater than 5.

H0:μ5H_0: \mu \leq 5

HA:μ>5H_A: \mu > 5

Notice that your guess is in the alternative hypothesis, not the null! Let’s look at why.

Why H0H_0 Is No Effect

In statistics, you can never be absolutely sure of an outcome. So you would never say that you accept a hypothesis. Instead, you say that there’s enough evidence to reject a hypothesis.

Your data can’t prove a hypothesis. However, you can show how unlikely it would be to get the data you collected if your hypothesis were false.

For example, if out of 20 people you surveyed, 13 said that they like coffee more than tea, you can’t say this definitively proves that more people like coffee than tea. But, you can calculate how likely it would be to get this result if, in reality, only 50% of people prefer coffee to tea.

Statistical Significance of a Hypothesis Test

A hypothesis test has a power or significance level of the test, denoted by α\alpha, and takes values in (0,1). The significance level is also sometimes called a p-value.

The significance level is the probability of rejecting H0H_0 when H0H_0 is actually true.

The significance level is chosen by you the researcher before carrying out the hypothesis test. Commonly used values are 0.05, 0.01, or 0.10. The lower the significance level, the harder it is to reject the null hypothesis. But the lower the significance level is, the more that a result that rejects the null is statistically significant.

Using a Significance Level

After a significance level is chosen, the next step is to calculate the test statistic. The test statistic depends on the kind of test being done: for example, a t-test, a Chi-square test, or an F-test.

The test statistic computed from the sample data is then compared to a threshold of that same statistic that corresponds to the previously chosen p-value or significance level α\alpha. If the test statistic is below the threshold, then you can reject the null hypothesis at the chosen significance level, for example, 0.05 or 0.01.

Alternatively, you can compute a p-value directly. If the p-value computed from the sample data is below your previously chosen significance level, you reject the null hypothesis in favor of the alternative.

When you reject the null in favor of the alternative, it means that the data you collected is unlikely to have looked that way if the null hypothesis were true. This is what it means to have a statistically significant result.

Type I and Type II Error

The type of error where you reject H0H_0 when it is true is called a type I error (read: “type one error”) or a false positive.

Similarly, the type of error where you fail to reject H0H_0 when it is false is called a type II error (read: “type two error”) or a false negative.

One-Sided and Two-Sided Tests

The sidedness of a test is the number of scenarios where H0H_0 is false. A test can be one-sided or two-sided.

A two-sided test:

H0:θ=0H_0: \theta = 0

H1:θ0H_1: \theta ≠ 0

H0H_0 can be rejected by either concluding that θ>0\theta > 0 or θ<0\theta < 0, making it a two-sided test.

A one-sided test:

H0:θ0H_0: \theta \leq 0

H1:θ>0H_1: \theta > 0

The only way to reject H0H_0 is to conclude θ>0\theta > 0, so this is a one-sided test.

Practically, the only difference between one-sided and two-sided tests is that two-sided tests need their value of α\alpha adjusted to α/2\alpha/2 to take into account the multiple ways to reject H0H_0.

How to Carry Out a Hypothesis Test

The following reviews the steps you’d carry out when using a hypothesis test.

  1. Formulate your hypotheses
    • Null Hypothesis H0H_0: The default hypothesis, which often states that there is no effect, no difference, or no change in the population.
    • Alternative Hypothesis HAH_A: The opposite of the null hypothesis, suggesting an effect, difference, or change. It is what you are trying to support with the data collected.
  2. Choose a Significance Level α
    • The significance level, denoted by α, is the probability of rejecting the null hypothesis when it is actually true. Commonly used values are 0.05, 0.01, or 0.10.
    • The smaller the significance level, the harder it will be to reject the null hypothesis.
    • Conversely, the smaller the significance level is, the more confidence you have in the alternative hypothesis if you end up rejecting the null. This is statistical significance.
  3. Collect and Analyze Data
    • Collect a sample of data and calculate the test statistic. This could involve calculating means, standard deviations, or conducting tests like t-tests or chi-square tests.
  4. Make a Decision
    • Compare the test statistic with the significance level. If the p-value is less than or equal to α, you reject the null hypothesis. If the p-value is greater than α, you fail to reject the null hypothesis.
  5. Draw Conclusions
    • If you reject the null hypothesis, you conclude that there is evidence to support the claim of the alternative hypothesis.
    • If you fail to reject the null hypothesis, you do not have enough evidence to support the alternative hypothesis.
    • You never say that you accept the alternative hypothesis, only that you reject the null hypothesis or fail to reject the null.
Good job, keep it up!

49%

Completed

You have 39 sections remaining on this learning path.

Advance your learning journey! Go Premium and unlock 40+ hours of specialized content.