Hypothesis Testing with Python

In this article we are going to use Python to test whether a coin is fair. We will do this by making use of the statsmodels package to perform a hypothesis test.

The problems we will be solving will be of the form:

I flipped a coin 100 times and it landed on heads 55 times. Can I conclude the coin is biased towards heads?

Python provides many excellent tools for working with data and statistics. These include libraries like pandas, numpy, scipy, matplotlib and, in the case of today’s task, statsmodels. In order to use these tools you will need to either have them installed as part of your Python setup, or install them yourself, generally using pip. You can learn about how to do this here.

In general, when working with maths and statistics, there is the ability to apply formulas and make meaningful inferences, and the ability to understand how the formulas work. Both of these are important, but depending on the task at hand, one may be more appropriate than the other.

In the spirit of the first of these, I’m just going to give you some Python code which will get the job done. You can easily modify the code to solve many related problems without necessarily understanding much about what is going on “under the hood”.

Python Program to Test Whether a Coin is Fair

The code below is fairly self-explanatory. You can tweak variables such as sample_success and significance to explore the results for different samples with different level of confidence.

import statsmodels.api as sm

significance = 0.05 # Confidence level is 1 - significance level. Here it is 95%.

# In our sample, 55% of flips landed heads

sample_success = 550 sample_size = 1000

# Our null hypothesis (Ho) is 50% for heads
# The alternate hypothesis (Ha) is that the proportion of heads is > 50%

null_hypothesis = 0.50

test_statistic, p_value = sm.stats.proportions_ztest(count=sample_success, nobs=sample_size, value=null_hypothesis, alternative='larger')

# Results

print(f"z_statistic: {test_statistic:.3f}, p_value: {p_value:.3f}") if p_value > significance: print("Failed to reject the null hypothesis.") else: print(f"Reject the null hypothesis - accept the alternative hypothesis at {significance} level of significance.")

Output for sample_success = 550:

z_statistic: 3.178, p_value: 0.001
Reject the null hypothesis - accept the alternative hypothesis at 0.05 level of significance.

If you want to try some variations on the basic hypothesis, you can change the alternative argument for sm.stats.proportions_ztest():

  • for Ha < Ho use alternative='smaller'
  • for Ha > Ho use alternative='larger'
  • for Ha != Ho use alternative='two-sided'

The last one is for when you suspect the coin is biased but you don’t know in which direction.

Hypothesis Test Explanation

To fully understand how hypothesis testing works takes some studying and practice. However, that basic concept is quite simple. If you’ve ever come across proof by contradiction in maths, as for example in the proof of the infinitude of prime numbers, or in geometric proofs, you’ll be familiar with the basic idea. In terms of this specific example, the steps are:

  • Assume the coin is fair (This is called the null hypothesis, or H₀)
  • State the alternative hypothesis (Hₐ) – in this case that the coin is biased towards heads
  • Count how many times the coin lands heads-up out of a given number of flips
  • Calculate how far this result lies from the expected value from the null hypothesis (this is called the z-score)
  • If it is very unlikely that such an extreme value would occur (low p-value), reject the null hypothesis and accept the alternative hypothesis with the specified level of confidence
  • Otherwise we “fail to reject the null hypothesis”, meaning the result is not statistically significant

A p-value is a measure of the probability that an observed difference could have occurred just by random chance. The lower the p-value, the greater the statistical significance of the observed difference.

The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%.


This article has shown how to use Python to investigate whether a coin is fair or biased, based on how many times it lands head-up in a given sample. I hope you found it interesting and helpful.

Happy computing!

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *