### Statistics & AB Testing

36 of 68 Completed

Introduction to Statistics and A/B Testing
Hypothesis Testing
A/B Testing & Experiment Design
Confidence Intervals
A/B Testing Common Scenarios
Statistics
Generalized Linear Models and Regression

## Non-parametric Tests

So far, all of our tests have been parametric, meaning they assume that they make assumptions about the sample distribution. We assumed that the sample followed a normal distribution in the $Z$, $t$, and $F$ tests. In the proportions and $\chi^2$ tests, we assume that the samples follow a binomial or multinomial distribution, respectively.

But sometimes, particularly with small samples, it is not far to make these assumptions. For this reason, we have non-parametric tests that do not make any assumptions about the distribution of the sample.

While there are many, many non-parametric tests. We will go over two of the most popular ones: $U$ test and the paired signed-rank test.

Please note that there are ways to calculate $p$-values for the test statistics of non-parametric tests, but we don’t describe how to do it here due to their esoteric nature.

# $U$ Test

## Cheat Sheet

• Description: Tests if the median of two independent samples (say $\vec{x}_1$ and $\vec{x}_2$) are different/more than/less than the median of another sample
• Statistic: $m_1-m_2$ (difference of medians)
• Sidedness: Either
• Null Hypothesis: $H_0: m_1=m_2$ (one-sided), $m_1\leq m_2,m_1\geq m_2$ (two-sided)
• Alternative Hypothesis: $H_a: m_1\neq m_2$ (one-sided), $m_1\gt m_2,m_1\lt m_2$ (two-sided)
• Test Statistic: $U=\min(U_1,U_2)$ where $U_1 = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2}S(x_{1,i},x_{2,j}),\quad S(x,y)=\begin{cases}1,\quad\text{if} \ x\gt y\\\\ 0.5,\quad\text{if} \ x=y\\\\ 0,\quad\text{if} \ x\lt y\end{cases}$

$U_2 = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2}S_{-1}(x_{1,i},x_{2,j}),\quad S _{-1}(x,y)=\begin{cases}0,\quad\text{if} \ x\gt y\\\\ 0.5,\quad\text{if} \ x=y\\\\ 1,\quad\text{if} \ x\lt y\end{cases}$

## Description

The idea behind this test is that $U_1$ and $U_2$ are proxies for $\mathbb{P}(X_1\gt X_2)$. In fact, the hypotheses of the $U$-test can be restated as $H_0:\mathbb{P}(X_1\gt X_2)=\mathbb{P}(X_1\lt X_2)$ $H_a:\mathbb{P}(X_1\gt X_2)\neq\mathbb{P}(X_1\lt X_2)$ Since the median is just defined as $m=x$ such that $\mathbb{P}(X\lt x)=0.5$

As stated before, there is a way to calculate a cdf for $U$ and test it against a significance level $\alpha$, but it is beyond this course’s scope and better left to software.

# Paired Signed-ranked Test

## Cheat Sheet

• Description: Tests if the sample median of a sample at one point in time ($\vec{x}$) is different/more than/less than the median of a sample at a different point in time ($\vec{x}'$)

• Statistic: $m-m'$ (difference of medians)

• Sidedness: Either

• Null Hypothesis: $H_0: m=m'$ (one-sided), $m\leq m',m\geq m'$ (two-sided)

• Alternative Hypothesis: $H_a: m\neq m'$ (one-sided), $m\gt m',m\lt m'$ (two-sided)

• Test Statistic: $W=\sum_{i=1}^n\text{sgn}(\Delta x_i)R_{\vec{x}-\vec{x}'}(\Delta x_i)$

## Description

The function $R$ in the $W$-statistic is called the rank function. It returns the index of $x\in\vec{x}$ when $\vec{x}$ is sorted in ascending order. For example, if $\vec{x}=[5,3,8]$, then sorted that would be $[3,5,8]$, so $R_{\vec{x}}(5)=2$.

The $\text{sgn}$ function (read “sign”) in $W$ takes the “sign” of its input. It is defined as:

$\text{sgn}(x)=\begin{cases}1\ \text{if}\ x\gt 0\\\\ -1\ \text{if} \ x\lt 0\\\\ 0\ \text{if} \ x=0 \end{cases}$ So $W$ contains information about the relative ranks of the difference between the observations at the time of $\vec{x}$ and $\vec{x}'$.

Good job, keep it up!

## 52%

Completed

You have 32 sections remaining on this learning path.

There's so much more to Interview Query! Sign up to access hundreds of interview questions, expert coaching and a flourishing data science community.