Thursday, June 12, 2014

Wassily Hoeffding 1914-1991

In honor of the 100th anniversary of the birth of Wassily Hoeffding today, let's recall his incredibly useful inequality.
Prob(|X-pn|>εn) < 2e-2ε2n

where X is the random variable representing the number of heads from n independent coin flips each with a probability p of being heads. Informally the sum of many independent events will, with extremely high probability, be very close to the expected value. Notice that as long as ε = 1/o(√n) the probability goes down exponentially in n and this is tight as one expects X to be about O(√n) away from pn for constant p.

Back in the 80's, probabilistic computation started to play a major role in algorithms, cryptography, learning theory and interactive proofs. To analyze these computations one needed to deal with Chernoff bounds and it was difficult to find clean and easy statements of the bounds one could use in proofs. One of my officemates in grad school, Bob Sloan, took it upon himself to write up a short document giving Hoeffding's inequality and a few generalizations. Having that write-up was like holding Excalibur in my hands, ready to do battle with probabilistic complexity.

Later Alon and Spencer gave a nice treatment in their Probabilistic Methods text and these days you can get all the formulas you want on Wikipedia.


  1. Does anyone know the history behind why we call this Hoeffding's bound and not Bernstein's? Bernstein's paper was in the 1920s and seems to also have the result, at least according to what's on wiki:

    S.N.Bernstein, "On a modification of Chebyshev’s inequality and of the error formula of Laplace" vol. 4, #5 (original publication: Ann. Sci. Inst. Sav. Ukraine, Sect. Math. 1, 1924)

    1. Enough with this canard that we name things after their discoverer. Naming conventions in math/CS are fairly random. For very many results we can find earlier references that foretell if not totally pre-discover the result. It's the nature of the scientific endeavour that many people will converge to the same result independently and that some of these efforts will get more attention than others leading to naming rights.

    2. The Columbus Principle: things are named after the last person who discovered them (Columbus was the last person to ``discover'' america).

      Trivia: Who first phrased the term `The Columbus Princeiple' ?

    3. Shouldn't the question be who _last_ phrased the term `The Columbus Principle'? ;)

  2. Hoeffding's inequality is, I believe, more general. It says that if X=X_1+X_2+...+X_n for a_i <= X_i <= b_i for real numbers a_i and b_i, then P(E[X]-X >= epsilon * n) <= exp(2*n^2*epsilon^2 / ((a_1-b_1)^2+...+(a_n-b_n)^2)). (Sorry for poor formatting.)

    (Disclaimer: I read Hoeffding's original paper several months ago, but I never read Bernstein's paper you refer to, so I'm not sure what exactly he published. Based on Wikipedia, it seems his result is less general.)