Updated: Jun 3, 2020
Let's do a project on an interesting statistical problem named Birthday paradox. We go to the era where renaissance took place and randomly select famous artists, scientists, mathematicians. In this project, we will mathematically prove the birthday paradox theory in a fun way. In the process, you will also learn about renaissance folks and their contribution to humanity.
The objective of this project is to prove that with at least 23 people randomly selected, we can achieve a 50-50 chance that at least 2 people share their birthday. And when we increase the list of 75 people, chances are 99.9%.
According to Wikipedia, The birthday paradox also known as the birthday problem - states that in a random group of 23 people, there is about a 50% chance that two people have the same birthday. In a room of 75, there’s even a 99.9% chance of two people matching. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367.
Real-world applications for the birthday problem include a cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of finding a collision for a hash function, as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of the population.
Before starting the project, let's go through some basics.
Let's calculate the probability of 2 people sharing the same birthday. Suppose we have person X & person Y. For brevity, we can ignore leap years. If we ask Person X first, then he can be born on any day of the year. So his probability is 1 (365/365). Coming back to person Y, he must be born on the same day as X, his probability is 1/365. We want both events to occur, let's multiply their probabilities for the combined occurrence. Total Probability of 2 people sharing the same birthday = (365/365)*(1/365) = 0.002739 Using complement theory, now let's calculate the probability of no one is sharing same birthday. We know the probability of at least 2 people sharing their birthday and probability of no one sharing their birthday covers almost all possible cases. Hence the sum of these 2 probabilities = 1 We can rewrite below formula as: P(at least 2 people shares birthday ) = 1 - P( no one sharing their birthday)
Computer with Internet Access
1. Create a dataset i.e. a simple spreadsheet or a CSV file. Use Google to study different scientists, artists, and mathematicians from the Renaissance period and note down your favorite people in the below format.
Artist Name, Birthday Leonardo da Vinci, Apr 15 Nicolaus Copernicus, Feb 19 and so on......
2. Let's work on an example of 30 artists: Randomly select 30 such artists with their birthdays.
3. Calculate the probability of at least 2 artists having the same birthday.
P (1st artist birthday) = 365/365 P (2nd artist birthday) = 364/365 P (3rd artist birthday) = 363/365 P (4th artist birthday) = 362/365 . . P (28th artist birthday) = 337/365 P (29th artist birthday) = 336/