## Wednesday, November 25, 2009

First a message from David Johnson for proposals on locations for SODA 2012 both in and outside the US.

Here's an interesting approach to the birthday paradox using variances.

Suppose we have m people who have birthdays spread uniformly over n days. We want to bound m such that the probability that there are are at least two people with the same birthay is about one-half.

For 1 ≤ i < j ≤ m, let Ai,j be a random variable taking value 1 if the ith and jth person share the same birthday and zero otherwise. Let A be the sum of the Ai,j. At least two people have the same birthday if A ≥ 1.

E(Ai,j) = 1/n so by linearity of expectations, E(A) = m(m-1)/2n. By Markov's inequality, Prob(A &ge 1) ≤ E(A) so if m(m-1)/2n ≤ 1/2 (approximately m ≤ n1/2), the probability that two people have the same birthday is less than 1/2.

How about the other direction? For that we need to compute the variance. Var(Ai,j) = E(Ai,j2)-E2(Ai,j) = 1/n-1/n2 = (n-1)/n2.

Ai,j and Au,v are independent random variables, obvious if {i,j}∩{u,v} = ∅ but still true even if they share an index: Prob(Ai,jAi,v = 1) = Prob(The ith, jith and vth person all share the same birthday) = 1/n2 = Prob(Ai,j=1)Prob(Ai,v=1).

The variance of a sum of pairwise independent random variables is the sum of the variances so we have Var(A) = m(m-1)(n-1)/2n2.

Since A is integral we have Prob(A < 1) = Prob(A = 0) ≤ Prob(|A-E(A)| ≥ E(A)) ≤ Var(A)/E2(A) by Chebyshev's inequality. After simplifying we get Prob(A < 1) ≤ 2(n-1)/(m(m-1)) or approximately 2n/m2. Setting that equal to 1/2 says that if m ≥ 2n1/2 the probability that everyone has different birthdays is at most 1/2.

If m is the value that gives probability one-half that we have at least two people with the same birthday, we get n1/2 ≤ m ≤ 2n1/2, a factor of 2 difference. Not bad for a simple variance calculation.

Plugging in n = 365 into the exact formulas we get 19.612 ≤ m ≤ 38.661 where the real answer is about m = 23.

Enjoy the Thanksgiving holiday. We'll be back on Monday.

1. Thanksgiving was approx. a month ago!

2. well, the birthday problem aint really new neither is the approach so i aint sure wat this post aint about

3. Seems David Johnson is behind for about a year, next SODA in Austin is SODA 2010 ;-)

4. cool way of using basically the same inequality but with the unknown variable once in the numerator and once in the denominator

5. that should be obvious...

6. The chebyshev's inequality has greater and equal in the probability statement. Is it simply OK to remove the equality here?

7. Lance, you have some typos in the second part of the argument. In particular, you should be proving an upper bound on Prob(A = 0), not Prob(A >= 1) again.

8. last anon: all right, all right, but give us a break. it's thanks-giving. instead of typo-giving you should be giving something else.

9. hey waowao, this is supposed to be a mathematically and scientifically centered blog, having errors in math is really bad and worth pointing out whatever day of the week it is

10. hey lance something you could explain a little bit. Why the need to advertise for wolfram and all the clunky wolfram alpha utilities.
I find it a little bit disappointing ... Seems like Wolfram and Co clearly won you over to his side.

11. Would this not be dedicated to "Thanksgiving holiday", then I would ask "Lance -- what is the message?"

Birthday paradox is no paradox -- it is just counting. We count with weights, forget what we count -- and here is a "paradox" ... I wonder how people (also my students) find this "strange". Markov, Chebysachev = trivial counting. Chernoff = a bit mmore delicate counting. Nothing more.