Monday, March 13, 2006

March Madness

It happens every spring, America's favorite binary tree, the NCAA Men's Basketball Tournament Bracket was announced Sunday night. This is a single elimination tournament; win and move up the tree. In offices across America, people print and fill out these brackets guessing the winners of each game.

How do you score a person's predictions given the final outcome of the games? One could simply give one point for each game, other pools double the points in each round so each round is worth a total 32 points. Or one could base the score on seeds—there are four regions where each has 16 teams in some predetermined order of strength. Some pools give more points to predicting an upset, like seed 13 beating seed 4.

Is there a mathematically ideal way to score the predictions in the tournament yet simple enough for the average American office worker to understand? Billions of dollars are wagered on the NCAA tourney, so creating the perfect scheme can make quite a splash in the world of office pools.


  1. Minimum number of inversions between a permutation that could have led to the predicted tree and a permutation that could have led to the actual tree (assuming stronger team always beats weaker one)?

    Not that I have any idea offhand how to calculate that, let alone whether the calculation could be simple enough to be done by hand. But who needs to do things by hand when we have computers?

  2. This paper might be relevant:

    "March Madness is NP-hard"
    David Liben-Nowell, Moses Liskov, abhi shelat, Adam Smith, and Grant Wang

  3. The guesses simply define a probabilistic model for the outcome. The likelihood of each model given the outcome must be your score.

    Obviously I need to define a prior on the models. This would of course depend on the seeding. Unlike the usual setting where models "close" to the seeding have a higher prior, I recommend models far from the seeding should have a higher prior. This will give you more credit for predicting an upset! Of course the prior can be a tunable parameter depending on how risk-averse the office workers are. I'll leave the exact details to someone who knows a little more about this stuff.