Back to Statistics & AB Testing
Statistics & AB Testing

Statistics & AB Testing

38 of 77 Completed

The Z and t test

The ZZ and tt tests are very basic tests that are most often used to test if means are equal/greater than/less than some hypothesized value. Generally, ZZ tests are used when the sample size is greater than 30, whereas tt-tests are used with sample sizes less than 30.

ZZ tests

Cheat Sheet

  • Description: Tests if the mean μ\mu is equal/less than/ greater than μ0\mu_0
  • Statistic: μ\mu (mean)
  • Distribution: N(μ,σ2)\mathcal{N}(\mu,\sigma^2) (normal)
  • Sidedness: Either
  • Null Hypothesis: H0:μ=μ0H_0: \mu = \mu_0 (two-sided), μμ0,μμ0\mu \geq\mu_0,\mu\leq\mu_0 (one-sided)
  • Alternative Hypothesis: Ha:μμ0H_a: \mu \neq \mu_0 (two-sided), μ<μ0,μ>μ0\mu\lt\mu_0,\mu\gt\mu_0 (one-sided)
  • Test Statistic: Z=μ^μ0s/n(s=σ if known) Z=\frac{\hat{\mu}-\mu_0}{s/\sqrt{n}}\quad (s=\sigma \ \text{if known})

Description

The ZZ-test compares the mean of a sample (μ^\hat{\mu}, also denoted as xˉ\bar{x}) to some value, μ0\mu_0. It’s the go-to test for questions about the mean of some population. For example, if you had a sample of the GPA of students, you could run a ZZ-test to see if the students are doing well on average.

It assumes that the sample follows a normal distribution with some mean μ\mu and variance σ2\sigma^2.

In a two-sided ZZ-test, we want to check if μ^μ0\hat{\mu}\neq\mu_0. In a one-sided ZZ-test, we want to either test that μ^<μ0\hat{\mu}\lt\mu_0 or μ^>μ0\hat{\mu}\gt\mu_0.

The mean of the sample x={x1,x2,,xn}\vec{x}=\{x_1,x_2,\dots,x_n\} is defined by the familiar notion of the arithmetic mean of a set of numbers: μ^x=1ni=1nxi \hat{\mu}_{\vec{x}}=\frac{1}{n}\sum_{i=1}^nx_i The test-statistic of the ZZ-test, called the ZZ-values or ZZ-score, is defined as

Z=μ^μ0σ/n Z=\frac{\hat{\mu}-\mu_0}{\sigma/\sqrt{n}} Where σ\sigma is the standard deviation of the assumed sample distribution N(μ,σ)\mathcal{N}(\mu,\sigma). Most of the time, σ\sigma is not known in advance, so it is estimated through the sample standard deviation, ss.

s=1n1i=1n(xiμ^)2 s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i-\hat{\mu})^2} This is because ss is an unbias estimator for σ\sigma, meaning that E[s]=E[σ]\mathbb{E}[s]=\mathbb{E}[\sigma]. By the way, this is why the 1/(n1)1/(n-1) term is included, because using the more intuitive constant of 1/n1/n results in the bias estimator of σ\sigma, where E[s]E[σ]\mathbb{E}[s]\not=\mathbb{E}[\sigma].

Note how ZZ fits the general form of a test statistic we talked about in the previous section. T=θ^θ0Var[θ^θ0]n,θ^:=μ^θ0:=μ0Var[θ^θ0]:=σ T=\frac{\hat{\theta}-\theta_0}{\sqrt{\frac{\mathsf{Var}\left[\hat{\theta}|\theta_0\right]}{n}}},\quad \begin{matrix} \hat{\theta}:=\hat{\mu}\\\\ \theta_0:=\mu_0\\\\ \mathsf{Var}\left[\hat{\theta}|\theta_0\right]:=\sigma \end{matrix}

What is the decision function for the ZZ-test? Well, recall the general form of a decision function.

D(T,F,α,s)={reject H0textif 1F(T)<α/saccept H0textif 1F(T)α/s \mathcal{D}(T,F,\alpha,s)=\begin{cases} \text{reject} \ H_0\\text{if} \ 1-F(T)\lt\alpha/s\\\\ \text{accept} \ H_0\\text{if} \ 1-F(T)\geq\alpha/s \end{cases}

Remember, ss here refers to the sidedness of the test, which is 1 or 2, not the sample standard deviation.

Our test statistic is ZZ, and our assumed distribution is normal, so our cdf F=ΦF=\Phi. So our decision function is: D(Z,Φ,α,s)={reject H0 if 1Φ(Z)<α/saccept H0textif 1Φ(Z)α/s \mathcal{D}(Z,\Phi,\alpha,s)=\begin{cases} \text{reject} \ H_0 \ \text{if} \ 1-\Phi(Z)\lt\alpha/s\\\\ \text{accept} \ H_0\\text{if} \ 1-\Phi(Z)\geq\alpha/s \end{cases}

Note: Throughout this course and in (almost all) real interviews, you will not be expected to calculate cdfs by hand. You can just describe what conclusions you would draw if you were given the value of F(θ)F(\theta)

Example

Let’s go back to the question of student GPAs. Let’s say our sample is

3.2,2.9,3.7,2.5,3.1,3.8,2.7,3.0,3.3,2.8,3.6,2.6,3.5,2.4,3.4,2.32.2,4.0,2.1,3.8,2.9,3.7,2.5,3.1,3.6,2.7,3.0,3.3,2.8,3.5,2.6,3.43.2,2.3,3.9,2.2,4.0,2.1,3.7,2.9,3.6,2.5,3.1,3.8,2.7,3.0,3.3,2.8 3.2, 2.9, 3.7, 2.5, 3.1, 3.8, 2.7, 3.0, 3.3, 2.8, 3.6, 2.6, 3.5, 2.4, 3.4, 2.3\\\\ 2.2, 4.0, 2.1, 3.8, 2.9, 3.7, 2.5, 3.1, 3.6, 2.7, 3.0, 3.3, 2.8, 3.5, 2.6, 3.4\\\\ 3.2, 2.3, 3.9, 2.2, 4.0, 2.1, 3.7, 2.9, 3.6, 2.5, 3.1, 3.8, 2.7, 3.0, 3.3, 2.8 and we want to know if students have at least a 3.0 on average. Our hypotheses would be:

H0:μ3.0 H_0: \mu \leq 3.0 Ha:μ>3.0 H_a: \mu\gt3.0 Since there are n=48n=48 samples here, we can use a one-sided ZZ-test to test these hypotheses. Here μ^=3.064583\hat{\mu}=3.064583 and s=0.5459676s=0.5459676 so our ZZ-statistic is

Z=μ^μ0s/n=3.06458330.5459676/48=0.819543 Z=\frac{\hat{\mu}-\mu_0}{s/\sqrt{n}}=\frac{3.064583-3}{0.5459676/\sqrt{48}}=0.819543 If we set α=0.05\alpha=0.05, since 1Φ(0.819543)0.21-\Phi(0.819543)\approx 0.2, we fail to reject H0H_0. Thus, we do not have statistically significant evidence to say that the average student has a GPA higher than 3.0.

tt Tests

Note on notation

The tt-test is a bit of a notational nightmare to describe because tt is used to denote - The name of the test and distribution (we will use tt) - The test statistic of the test (we will use τ\tau) - The notation of the tt-distribution (we will use T\mathcal{T}) - the pdf of the tt-distribution (we will use φt\varphi_t) - the cdf of the tt-distribution (we will use Φt\Phi_t) To avoid confusion, we’ll use the notation denoted in parentheses above to refer to these elements in an unambiguous way.

Cheat sheet

  • Description: Tests if the mean μ\mu is equal/less than/ greater than μ0\mu_0 for small samples (n30n\lesssim30)
  • Statistic: μ\mu (mean)
  • Distribution: T(n1)\mathcal{T}(n-1) (tt)
  • Sidedness: Either
  • Null Hypothesis: H0:μ=μ0H_0: \mu = \mu_0 (two-sided), μμ0,μμ0\mu \geq\mu_0,\mu\leq\mu_0 (one-sided)
  • Alternative Hypothesis: Ha:μμ0H_a: \mu \neq \mu_0 (two-sided), μltμ0,μ>μ0\mu lt\mu_0,\mu\gt\mu_0 (one-sided)
  • Test Statistic: τ=μ^μ0s/n(s=σ if known) \tau=\frac{\hat{\mu}-\mu_0}{s/\sqrt{n}}\quad (s=\sigma \ \text{if known})

Description

The tt-test is a modified version of the ZZ-test that allows for more conservative inferences when the sample size is small. Generally, a sample same size of n=30n=30 is considered the cut-off between when to use a ZZ-test and a tt-test, samples with a size lower than 30. Thus in a situation where data collection is costly, like if we are collecting data via manual observation in a scientific experiment.

The only change to the decision function of the tt test compared to the ZZ is the assumed distribution of the sample. Instead of a normal distribution, it is assumed to follow a tt-distribution, which itself is a modification of the normal distribution.

The tt distribution does have a closed-form formula, but it is so complicated that it’s not very useful. It’s more important to focus on its shape and single parameter, ν\nu, called the degrees of freedom. Because of this, the distribution T(ν)\mathcal{T}(\nu) can be called a tt-distribution with ν\nu degrees of freedom. Importantly as ν\nu gets large, the tt-distribution resembles a standard normal distribution more and more. That is

limνφt(xν)=φ(x)andlimνΦt(xν)=Φ(x) \lim_{\nu\rightarrow\infty}\varphi_t(x|\nu)=\varphi(x)\quad\text{and}\quad \lim_{\nu\rightarrow\infty}\Phi_t(x|\nu)=\Phi(x) Animation showing the t distribution converging to the standard normal distribution

Image Credit to T.J. Kyner

As you can see in the above animation, when ν<30\nu \lt 30 the tt-distribution has very “long tails”. This represents the fact that in small sample sizes, we have a much higher probability of getting “extreme results” that deviate greatly from the mean. These long tails are how the tt-test controls for large sample sizes; they make it harder to reject H0H_0 compared to the ZZ-test. This fact can also be seen in the fact that the variance of the tt-distribution is ν/(ν2)\nu/(\nu-2), which is greater than 1 (the variance of the standard normal) but also converges to 1 as ν\nu gets larger.

Because of this convergence, some professionals think it’s better to just always use a tt-test, since it will be equivalent to the ZZ-test for large sample sizes.

In the tt-test, ν\nu is set to n1n-1.

Example

Let’s only consider the GPAs of the first row of students in the ZZ-test

3.2,2.9,3.7,2.5,3.1,3.8,2.7,3.0,3.3,2.8,3.6,2.6,3.5,2.4,3.4,2.3 3.2, 2.9, 3.7, 2.5, 3.1, 3.8, 2.7, 3.0, 3.3, 2.8, 3.6, 2.6, 3.5, 2.4, 3.4, 2.3 Here n=16,ν=15,μ^=3.05,s=0.4760952 n = 16,\nu=15,\hat{\mu}=3.05,s=0.4760952 So τ=3.0530.4760952/16=0.420081Φt(τ)0.34 \tau=\frac{3.05-3}{0.4760952/\sqrt{16}}=0.42008\rightarrow 1-\Phi_t(\tau)\approx0.34 Since 0.340.050.34\not\lt0.05, we still fail to reject H0H_0 and make the same conclusion as before.

In case you’re curious, if we used the entire sample, our pp-value from this tt-test would be about 0.2, just like the ZZ-test!

Good job, keep it up!

49%

Completed

You have 39 sections remaining on this learning path.

Advance your learning journey! Go Premium and unlock 40+ hours of specialized content.