Statistics and Web Analytics – Hypothesis Testing
Most people using web analytics don’t need the complicated mathematics a company like NASA does, but you might still be asked to use some common statistics methods. In my earlier post, I discussed predictive analysis in web analytics. Today, I will introduce a statistics tool – hypothesis testing, which is commonly used in A/B Testing.
Hypothesis testing is something that you can use to make decisions based on experimental data. It can systematically quantify how certain you are of the result of a statistical experiment. For example, you might want to test if coin flipping would give a fair result. So you make an experiment where you flip the coin 100 times and you get results of 52 times for head side and 48 times for tails side. Would that be fair enough from the statistical view? That’s why you want to do a hypothesis testing.
Most hypothesis testing uses null-hypothesis. The null-hypothesis, denoted H0, typically proposes a default position, such as the coin flipping is fair. And it is typically paired with an alternative hypothesis, such as the coin is biased, denoted H1. Normally, we put the hypothesis we care or we want to be true as the null-hypothesis. And the main goal of hypothesis testing is to tell us whether we have enough evidence to reject the null-hypothesis.
Turning to statistics
After stating the relevant null and alternative hypotheses, we’ll need to think in a statistical way. In the above coin example, if we say the coin is fair, that means the probability of getting a head side should be 50%. But the results of the experiment we got were 52%. But is this normal because we just flipped it 100 times as the experiment?
Statistically, if the probability of something is very low, we consider it as impossible. In this case if probability of having the variance between the experiment results and hypothesis is small enough, we could reject the hypothesis. That’s because if something is considered to be impossible and it happens in the small set of observations, there must be something wrong with the hypothesis.
Now we have to do it completely mathematically. The null hypothesis in the coin example could be expressed as
H0:p0 = 0.5
A 95% confidence level means we reject the null hypothesis if p falls outside 95% the area of the normal curve given above. We can see this corresponds to approximately 1.98 standard deviations.
Then we use “Z-Test” to get the z-score which tells us how many standard deviations away from the mean our sample is, and it’s calculated as
The p is the sample mean, and the P0 is the expected mean, and N is the sample size. For the coin example, P0 is 0.5 and N = 100.
In our experiment we flipped the coin 100 times and got heads 52 times, so the sample mean is 0.52%. After calculating the z-score by the formula above, we get the z-score as 0.4 and make the conclusion that the coin is fair. If we use coin 2 and get heads 60 times, we can reject the hypothesis and say that coin 2 is biased.
Of course, this could be much more complex than coin flipping in real world applications. Fortunately, hypothesis testing for the A/B testing is usually quite similar to coin flipping. In my next post, I will show you how to apply our hypothesis testing knowledge to A/B testing to determine whether new features actually affect user behaviour.Tweet