Thursday, September 28, 2006

Predictions Markets Map of Senate Races

David Pennock, Yiling Chen and I created a Prediction Markets Map of the 2006 US Senates Elections. When you load the map it downloads the latest prices from Tradesports and colors each state appropriately, a mix of red (Republican), blue (democrats) and green (other candidates). We created the map because predictions based on market prices tend to do better than experts or polls. As Wisdom of Crowds author James Surowiecki wrote recently in the New Yorker
In the past few years, a host of prediction markets, as they're usually called, have appeared online, offering people the chance to speculate on subjects ranging from the box-office performance of Hollywood films to the outcome of Presidential elections and the spread of bird flu. These markets' forecasts have proved remarkably accurate—just as bettors collectively do an exceptionally good job of predicting sports results. (In 2004, for instance, Tradesports, a Dublin-based prediction market, called thirty-three out of thirty-four races in the Senate correctly, and called all fifty states correctly in the results for the electoral college.)
Scaling colors according to probabilities is a surprisingly hard problem. It relates to human perception, we want our eyes to distinguish colors related to probabilities that are not that close. For the map I used a transformation based on the XYZ color space that seems to give reasonable though far from perfect results. Even small projects like this can really bring up complex issues from outside the theory world.

Wednesday, September 27, 2006

Sponsored Search Panel Discussion

The last Electronic Commerce conference in June had a workshop on sponsored search. Sponsored search are those ads you see when you run a Google or Yahoo search. People bid on keywords to position their ads and how to model and run these auctions is an exciting question both theoretically and practically.

The workshop had a panel discussion "Models for Sponsored Search: What are the Right Questions?" organized by Rakesh Vohra and yours truly. Panelists were Kamal Jain (Microsoft), David Pennock (Yahoo!), Michael Schwarz (Yahoo, UC Berkeley) and Rakesh Vohra (Kellogg School at Northwestern). The panel was moderated by Jason Hartline.

Jason recorded the discussion which I converted to MP3 (98 minutes, 17 MB).

Alternatively you can read the transcript (PDF, postscript). Thanks to Nina Balcan, Jianqing Chen, Nikhil Devanur, and Anuj Kumar for their hard work transcribing the audio.

Tuesday, September 26, 2006

Contributions of Co-Authors

Guest Post from Jérémy Barbay.

In multi-authors publications from other fields (biology, physics), the order of the author's names roughly indicates the importance of the contribution from each author, while this is not the case in CS. I was told that it was to avoid meaningless fights about the order, but in my personal experience I found out that it created other conflicts, mainly the frustration of the main author toward co-author who did not "contribute enough."

The fact is, not recognizing the difference of importance in relative contributions pushes to two quite negative behaviors:

  1. Some authors will expect all co-authors to contribute an equal part to the paper, and be disappointed and frustrated when it is not the case. This is bad either way: having a frustrated co-author is not a nice experience, but on the other hand trying to balance the work so that each co-author contributes the same amount does not necessarily corresponds to the balance of skills for the particular publication, and may not be the most efficient way to work on a paper.
  2. Authors have an incentive to play the game of the "minimum contribution deserving authorship", eventually (for the highest in the academic hierarchy) pushing the other authors to contribute more in exchange of immaterial promises (such as future recommendations, or the "teaching" by experience). I was explained by a senior faculty that "students have to accept to eat a lot of shit (meaning, do a lot of stupid work), and young faculty to eat some shit but to pass down to their student most of it".
Both those behaviors are bad for the community in general. I saw it literally destroying a brilliant Ph.D. student, and I see its perverse effects on some promising younger students. Ranking the names of the authors is but a poor palliative: they are many measures of contributions on which to argue and I still see frustration among my fellow biologists and physicists.

One obvious solution would be to add a paragraph at the end of each paper, in a similar way to the "acknowledgement" paragraph commonly added, describing the contribution of each author. It would be a natural way to create an incentive for each author to contribute as much as this paper is worth to him, inversing the "minimum contribution deserving authorship" incentive, and remove most of the frustration. It would remove the ambiguity on "who did what", that we have anyway to explicitly remove when writing recommendation letters or applying for "best student paper". From a game theory point of view (for the little of game theory that I know), it would make the publication mechanism "truthful", and anybody opposing this mechanism would risk to look bad.

Now, I assume that this solution has some hidden drawback(s) (other than taking an additional five lines in each publication), as otherwise some field or other would already have adopted it. Or is it just that senior faculty members would not support such a measure?

Monday, September 25, 2006

Extended Abstracts

A non-theorist saw the upcoming STOC call for papers and asked about the following line.
Authors should submit an extended abstract (not a full paper).
and asked why we theorists only wanted extended abstracts and why she couldn't submit a full paper if it met the page limit.

An abstract just states the main results of a research paper. An extended abstract gives a little motivation, background and a hint of the proofs. The extended abstract serves not as a paper in itself but as an advertisement of the journal paper to come. If one had a full paper published in the proceeding than one couldn't publish the same results in a journal later on. That's the way conferences work.

Actually that's the way conferences work in all fields except computer science. In CS conferences play a more important role than journals; in most cases people judge the quality of a paper based more on the conference than the journal. Many conference papers never see a journal at all and many journal submissions are only slight variations of the conference proceedings version.

You really need to send something closer to a full paper to STOC if you hope to get accepted. You need to convince the committee that the results are new, important, correct and nontrivial. Most people submit a rough draft of a final paper, using the appendix if needed to put in their proofs.

Why do we keep the fiction of the extended abstract? Ethical and copyright rules prohibit us from publishing the same results in multiple places. We call the proceedings version an "extended abstract" so we can believe we don't violate these rules.

Non-theory conferences don't even pretend. The EC Conference, for example, solicits full papers though they do include the following note.

To accommodate the publishing traditions of different fields, authors may instead submit working papers that are under review or nearly ready for journal review. These submissions will be subject to review and considered for presentation at the conference but not for publication in the proceedings. These submissions need not conform to the conference paper format. Abstracts of accepted working papers will be included in the proceedings and must be coupled with a URL that points to the full paper and will be reliable for at least two years.

Sunday, September 24, 2006

Favorite Theorems: Parity

August Edition

In the 70's we had essentially no super-polynomial lower bounds for natural problems on general circuits beyond depth 2. Two papers kick started circuit complexity by independently showing no polynomial-size constant-depth family of circuits can compute the parity function (inputs with an odd number of ones).

Merrick Furst, James Saxe and Michael Sipser, Parity, Circuits, and the Polynomial-Time Hierarchy, FOCS 1981, MST 1984.

Miklós Ajtai, Σ11-Formulae on Finite Structures, APAL, 1983.

Furst et al. developed random restrictions, i.e., randomly choose a set of variables and set them to 0 and 1 randomly. This will create a circuit on a smaller set of variables. For the parity function any such restricted circuit will compute parity (or its negation) on the unrestricted variables. They argue that the restricted circuit will give a polynomial-size circuit for parity of smaller depth. This leads to a contradiction because one can easily show that parity requires exponential-size depth-2 circuits.

Ajtai showed lower bounds against expressions described by first-order formula, equivalent to a uniform version of constant-depth circuits. I believe Ajtai's proof carries to the nonuniform case as well.

Furst et al. also show that super-quasipolynomial bounds for constant-depth parity circuits would imply an oracle relative to which the polynomial-time hierarchy differs from PSPACE. They don't achieve these stronger bounds but Yao does a few years later.

Håstad used a more clever random restriction and much more difficult analysis to create a switching lemma that implies essentially tight exponential lower bounds for parity on constant depth circuits. Razborov gives a simpler proof of the switching lemma. Paul Beame has a nice self-contained write-up of Razborov's proof and Fortnow and Laplante has a version using Kolmogorov complexity. The simplest proof of the original Furst-Saxe-Siper-Ajtai result comes from the lower bounds on circuits with modulo p gates for some fixed prime p due to Razborov (in Russian) and Smolensky. Alas, Razborov and Smolensky's 1987 papers mark the last major super-polynomial lower bounds for general circuit models.

Thursday, September 21, 2006

Short Takes

Boaz Barak reports that Bernard Chazelle has updated his essay The Algorithm: Idiom of Modern Science complete with bells, whistles, cartoons, PCP's and zero knowledge.

Sanjeev Arora and Boaz are nearing completion of their complexity book and they need your help. Please send them comments big and small. They particularly need you readers who are not (yet) experts in complexity to gauge the readability of the text.

The Royal Society has placed all of their journals online with free access through December. Their archives go back to '65 (that's 1665) in case you want to read Ben Franklin's paper on electricity.

Wednesday, September 20, 2006

Don't Get Mad, Get a Lawyer

Shing-Tung Yau, who doesn't like his portrayal in Sylvia Nasar and David Gruber's New Yorker article on the Poincaré conjecture. So Yau, who apparently has discovered the American way, hires a lawyer who sends a letter to the authors and the fact checker. A PR firm emails me (and Scott) a press release.
Dr. Yau has demanded that The New Yorker and Nasar make a prominent correction of the errors in the article, and apologize for an insulting illustration that accompanied it.

"Beyond repairing the damage to my own reputation, we seek to minimize the damage done to the mathematics community itself, which is ludicrously portrayed as contentious rather than cooperative and more competitive than collegial," Dr. Yau said. "Mathematicians from the foremost institutions – from Beijing to Berkeley – have been appalled at the fictionalizing of our profession."

The old "you are not just attacking me, you are attacking all of mathematics" argument. Most of the mainstream media has not picked up this part of the story. The Boston Herald did and they also published the New Yorker's response.
"Manifold Destiny," a 10,000-word article by Sylvia Nasar and David Gruber published in the August 28, 2006 issue of The New Yorker, is the product of more than four months of thorough, careful reporting and meticulous fact-checking. Ms. Nasar and Mr. Gruber spent over twenty hours interviewing Dr. Yau; they conducted approximately 100 other interviews with people in the field; corresponded by email with Dr. Yau and many others; and traveled to China where they conducted interviews and attended speeches and events discussed in the article. In addition, the magazine's fact-checkers spoke with Dr. Yau for approximately eight hours, they examined notes, tapes, and documents gathered by the authors, and the checkers conducted their own thorough research. Contrary to Dr. Yau's assertions, the article is nuanced and fair, and was prepared using ethical standards of journalism. Dr. Yau, his supporters and his point of view were given ample space in the article.
Whatever the merits of Yau's claims (a reliable source tells me the article is mostly a reasonable and accurate reporting of events), Yau only hurts his reputation with this newest action. Yau should have written a short letter to the New Yorker editor with a pointer to a detailed discussion on his web site. Instead by having an argumentative letter written by a lawyer with the implicit threat of a lawsuit, Yau only encourages the very portrayal he tries so hard to fight against.

Tuesday, September 19, 2006

Some New Geniuses

The MacArthur Foundation named their 2006 Geniuses including
  • Luis von Ahn, Carnegie-Mellon cryptography, who has worked in steganography but best known for Captcha and The ESP Game.
  • Recent Fields Medalist Terence Tao.
  • Claire Tomlin, Stanford Aeronautical Engineer and Berkeley EECS Ph.D. From the announcement.
    Much of Tomlin's research concentrates on aeronautical applications of hybrid systems research, particularly aircraft flight control and air traffic conflict resolution. As the number of variables increases and their interactions become more complex, it becomes ever more difficult to guarantee that systems will always be within safe limits. Tomlin has developed practical algorithms for determining when unsafe conditions may arise, and for establishing feedback control laws for a hybrid system guaranteed to remain within a safe subset of all reachable states.
Congratulations to all!

Monday, September 18, 2006

Back to School Night

My daughter started middle school and as parents we attended back to school night. In the evening we follow our daughters schedule and see her various classrooms and teachers.

One teacher talked about the need to improve the students' "keyboarding" skills. Too bad the typing lessons I took in high school have gone to waste.

In the math classroom the teacher had two problems on the board and asked the parents to solve them.

  1. 1/2 × 2/3 × 3/4 × 4/5 × 5/6 × 6/7 × 7/8
  2. What is the next three letters in this sequence: T F S E
It truly felt like I had gone back to sixth grade, the little math nerd who knew the answers but didn't want to show off.

I got caught a little off guard on how the other parents struggled with the questions, at least those that tried. No wonder math education suffers when the parents, most of whom have professional careers, don't use much math techniques past the sixth grade level. How can they stress the importance of continuing mathematical education when they don't use it themselves.

Or do they? Math teaches more than just how to multiply fractions and solve equations. Learning math involves seeing how to tackle a problem, logically analyzing it and finding the right approach and then applying that approach. The tools students learn in math class will help them in their careers even if the problems don't involve numbers at all.

Thursday, September 14, 2006

Magic Numbers Revisited

Google for Yankees Magic Number and short down the list you'll find my 2004 post The Beauty of the Magic Number.

As I write this the Yankees have 88 wins and Boston with 68 losses which means the Yankees current magic number is 162+1-(88-68)=7. This means any combination of 7 Yankees wins and Red Sox losses would guarantee that the Yankees win their division.

Yesterday I got the following email.

I'm in disagreement with my brother—and, apparently, all major media—regarding the Yankees' magic number. My reasoning is that if the Red Sox win all their remaining games, that's 94 victories. Since the Yankees lead the season series and will thus win the division in the case of a tie with Boston, they'll clinch the division with 94 wins. They have 88, so only need eight more…thus, a magic number of six.

The formula you reference (with the "+1"), which I guess others use as well, doesn't account for the fact that a team only need to tie for first if they've won the season series vs. the other team.

The standard magic number is for winning the division outright. According to Wikipedia
In the event of a tie in the standings at the close of the regular season, league rules provide for a one-game playoff (with the home field determined by coin flip) to determine which of two teams participate in the Division Series. If three teams are involved in a tie, a two-game playoff may be played. If two teams are tied, but a tiebreaker would result in both participating in the Division Series anyway (due to one being division champion and the other being wild card), then no playoff is played and seedings are determined by head-to-head record.
Since the Red Sox are unlikely to get the wild card this year, the Yankees would have to win a playoff game in case of a tie so they would need the extra win anyway and the magic number of 7 is valid.

The situation is different in the AL Central where the second place team will likely win the wild card. Detroit has 87 wins and Minnesota has 60 losses and Detroit has the better head-to-head. So Detroit only needs 162-(87+60)=15 wins and/or Twins losses to clinch the division since they win the division in case of a tie. No "+1" here.

Now if the Twins fade and my White Sox (with 62 losses) get into contention then Detroit needs 162+1-(87+62)=14 wins and/or White Sox losses to clinch over the White Sox. Why? Because the White Sox will have the better head-to-head over Detroit.

Nothing like Major League Baseball and their wild-card rules to ruin the beauty of the magic number.

Wednesday, September 13, 2006

Putting the Pieces Together

A student proves what I feel is quite a nice result but the student laments that the proof was easy, they had simply put together various tools from earlier papers. Guess what? That's called research. Every mathematical result is simply of combination of logical statements put together in the right way. But unless P=NP we cannot automate this process. Our job is to figure out how to combine results and techniques we already know to prove things we didn't know before.

Most proofs seem simple once we've proved them. With some notable exceptions, every proof has (at most) one new idea, the rest just connecting the dots through techniques we've seen before.

From the outside or from the eyes of a young graduate students, research seems like a magical process that mathematicians somehow pull proofs out of a hat. Successful theorists are not magicians, just people who have read and understood techniques from a variety of sources and find news ways to put those techniques together to solve the problem on hand.

Tuesday, September 12, 2006

Styles of Research

In the seventh Complexitycast, Bill Gasarch returns to talk about approaches to research. MP3 (22 minutes, 3.9MB)

Monday, September 11, 2006

Memories of 9/11

We all have 9/11 stories, here's mine, more of a 9/12 story.

Prelude. I was packing up my hotel room in WĂĽrzburg after the 1993 STACS conference. I flipped on the TV, the hotel only had German stations, and saw police and fire engines around the World Trade Center and caught the word "explosion". It wasn't until I reached the airport later that day that I learned the towers suffered no significant damage.

In 2001 I worked for the NEC Research Institute in New Jersey. In September I attended the first EQIS workshop in Tokyo and the following week visited the quantum computing group near the University of Tokyo. On the 11th I took a day trip to visit the quantum computing group at the NEC Lab in Tsukuba. The first plane hit the towers at 9:45 PM Japan time after I had returned to Tokyo and I went to bed that night unaware of the events unfolding back in the states. I learned the news when I booted up my laptop the following morning. I then turned on the Japanese TV (I never get CNN when I really need it) and saw the shocking images all at once.

Tokyo had a normal day on September 12 when I wanted to world to stop. I was the only remaining American visiting the group and I felt quite alone. During lunch one of the non-Japanese asked what America had done to deserve this. I almost slugged him.

I gave a talk as scheduled in the afternoon still in quite a daze. That night I insisted on doing something completely mindless and the Japanese complied, taking me to a small Karaoke bar.

I returned to the states on the 16th, three days later than my original schedule. As the plane approached Newark I had a clear view of Southern Manhattan and saw a plume of smoke still arising from where the World Trade Center had stood. The America I was returning to was vastly different from the one I had left two weeks earlier.

As many of you know we lost a member of our community that fateful day, Danny Lewin, an MIT graduate student and co-founder of Akamai. The STOC best student paper award now bears his name.

Friday, September 08, 2006

Conferences and More

FOCS registration is now available. Early registration deadline is September 18, hotel rooms held until September 29.

SODA accepted papers have been posted. STACS submission deadline (September 17) is fast approaching.

A few call for papers for next summer's FCRC conferences: Complexity and Electronic Commerce. Still waiting on some others including STOC and COLT.

Harvard theory professor and former dean Harry Lewis (of Lewis and Papadimitriou fame) will be on BookTV on CSPAN 2 Sunday at 1:00 PM EDT talking about his new book Excellence Without a Soul: How a Great University Forgot Education. You can also stream CSPAN 2 and sometime after the program airs the video will be available.

Thanks to all who sent me pointers.

Thursday, September 07, 2006

The Haystack

Derandomizing an RP algorithm is like finding hay in a haystack. In that vein comes the following cartoon from the Danish paper Politiken (via Peter Bro Miltersen).

Haystack: What's with you people? Why are you all obsessed with that stupid little thing!? What about wonderful me!? Love me! Give me love!
Caption: The haystack is upset that the needle gets all the attention.

Wednesday, September 06, 2006

A Continuum of Quality

Claire's post on French Universities brings up an interesting point. In most countries universities are almost all funded and run by the national government. The country divides the universities into a small number of groups and tries, with varying degrees of success, to keep the quality within each group about the same. This system keeps the quality of the elite schools reasonably strong but a student who just misses admission at that level, like the Grand Ecoles in France or one of the IITs in India, ends up with a weaker education and lesser job prospects down the road.

The United States, outside of the military academies, does not run any universities directly. Rather most universities are either run privately or by individual states. These schools form a near continuum of quality from the best research schools in the world down to small community colleges. We have no large gaps, if a student just fails to get accepted to school A then they can go to school B with nearly as strong a program. Also universities in the US often have strength in different areas; one can find schools whose CS department have a much greater reputation than their university as a whole (and vice versa).

Since US universities report to different masters they compete, trying to increase the reputation of their schools both informally and in published lists like the US News Rankings. Some of this competition leads to spending not directly related to education or research such as athletic programs and student amenities. But much effort does go into recruiting top faculty and students including from outside the US.

Fifty computer science departments have a stated goal of becoming a top ten department in the long term. 80% of them will fail but the process will strengthen CS research across the board.

Tuesday, September 05, 2006

The Box and The Net

What invention of the second half of the 20th century had the single greatest effect on global commerce? In a neat new book The Box, Marc Levinson makes a strong argument for the shipping container those boring rectangular boxes you often see carried by trucks and trains as well as by large ships that carry thousands at a time. Of course that is the point, as one can move these containers easily and quickly between trucks, trains and ships. Containers allowed ports to be highly automated and very large single-purpose container ships were built driving shipping costs to negligible amounts. Containers greatly reduced theft at ports, particularly of small electronics. Only with container ships did producing small electronics overseas become financially feasible on a large scale and led, in part, to the computer revolution that fuels our field.

Not everything about the container world rings positive. Most dock workers and many domestic manufacturing workers have lost their jobs. Cities and countries that didn't modernize their docks for container ships got left behind. We can't inspect every container arriving into the US, some of which might contain illegal or deadly weapons. Used containers litter parts of our landscape. Nevertheless containers have greatly increased wealth in several undeveloped countries while delivering many developed countries with low-cost goods.

The Internet shares many parallels with shipping containers. It matters little whether a shipping container has stuffed animals, MP3 players or automobile parts, likewise the internet doesn't care whether it delivers email, pictures, music, movies, blogs and the like. The Internet has allowed many service jobs to move to developing countries. One can hide malicious programs and websites that appear to be something else on the Internet easier than one can hide a bomb in a shipping container.

Increased commerce have led to bigger and bigger ships and calls to expand or replace the Panama canal, now too small and crowded to handle all of the shipping from Asia to the East Coast. Likewise we have to continually increase the volume and speeds of the Internet to meet growing demands. As we learn to take advantage of new technology, especially those that help quickly and cheaply move goods and information, we learn to use it for purposes beyond expectations and we have to cope with expanding the good while limiting the bad uses of such technologies.

Friday, September 01, 2006

Reception for FOCS 2007?

A request from Claire Kenyon.

We are finalizing the budget for FOCS 2007 in Providence. The traditional Saturday evening reception will cost at least $53 per person. The alternative is to skip the reception altogether. I would like to poll the Theory community and hear what they would prefer. Can you give your comments or preferences?

Favorite Theorems: Unique Witnesses

July Edition

This is the August edition of Favorite Theorems.

Perhaps you could easily find a satisfying assignment of a Boolean formula when only one solution exists, somehow using the uniqueness to cleverly guide your search. Valiant and Vazirani show that any such procedure would give a randomized algorithm for general satisfiability and put the class NP in RP.

NP is as easy as detecting unique solutions, Leslie Valiant and Vijay Vazirani, STOC 1985, TCS 1986.

Valiant and Vazirani show there is some efficient randomized algorithm f mapping Boolean formula to Boolean formula such that for all formula φ,

  • f(φ) will always have the same set of variables of φ and every satisfying assignment of f(φ) is a satisfying assignment of φ. In particular if φ is not satisfiable then f(φ) is never satisfiable.
  • If φ is satisfiable then Pr(f(φ) has exactly one satisfying assignment) ≥ 1/(4n), where n is the number of variables of φ.
By repeating the process a polynomial number of times one of the formulas produced by f on a satisfiable φ will have a unique witness with exponentially small error,

Valiant and Vazirani's proof takes random half-spaces of the solution set and argues that with some reasonable probability repeating this process will yield a single solution. The techniques are quite general and one can get similar results for nearly every known NP problem.

Mulmuley, Vazirani and Vazirani prove an "isolating lemma" which one can use to give an alternate proof of a similar theorem by putting random weights on edges and with some reasonable probability there is a unique maximum weight clique. Mulmuley et. al. use the isolating lemma to give a simple randomized parallel algorithm for maximum matching.

Besides the immediate hardness of finding unique solutions, Valiant-Vazirani has had impact in complexity in many ways. Just a couple:

  • One can use Valiant-Vazirani to find a satisfying assignment of a formula using non-adaptive queries to SAT.
  • Valiant-Vazirani gives the base case of Toda's theorem.
  • Similar ideas show how to create a low-degree polynomial that correctly computes the OR function at most inputs (which one can extend to AC0 by recursion).