Thursday, June 15, 2023

Randomized Acceptances

NeurIPS recently released their 2021 consistency report, a sequel to the 2014 experiment. While the conference has grown dramatically, the results remain "consistent", about 23% disagreement from two separated program committee groups. As before I don't find this too surprising--different committee members have different tastes.

Roughly conference submissions fall into three categories

  1. Clearly strong papers
  2. Clear rejects
  3. A bunch that could go either way.
A typical program committee quickly sorts out the first two groups and then painfully spends considerable time arguing over the others.

What if instead we took a different approach. Accept all the strong papers and reject the weak ones. Choose the rest randomly, either with a uniform or weighted distribution based on the ranking. Maybe reduce the probability of those who submit multiple papers.

Choosing randomly reduces biases and can increase diversity, if there is diversity in submissions. Knowing there is randomness in the process allows those with rejected papers to blame the randomness and those whose papers gets in claim they were in the first group. Randomness encourages more submissions and is fair over time.

Note we're just acknowledging the randomness in the process instead of pretending there is a perfect linear order to the papers that only a lengthy program committee discussion can suss out.

We should do the same for grant proposals--all worthy proposals should get a chance to be funded.

I doubt any of this will ever happen. People would rather trust human decisions with all their inconsistencies over pure randomness. 


  1. Isn't there a difference between "subjective" and "random"? If you replace the former with the latter, then something essential will change.

    1. If it is "essential", then why does it take the committee so much time to decide? Instead of picking the papers at random, you could pick one committee member at random to make the rest of the decisions.

  2. Reminds me of this short rhyme by Piet Hien

    “Whenever you're called on to make up your mind,
    and you're hampered by not having any,
    the best way to solve the dilemma, you'll find,
    is simply by spinning a penny.
    No - not so that chance shall decide the affair
    while you're passively standing there moping;
    but the moment the penny is up in the air,
    you suddenly know what you're hoping. ”

  3. This myth of program committees “debating” and “deciding” is just that..a myth. In my experience as reviewer and author, quite often the base reviews are terrible, both objectively (poor understanding, factually wrong, or adversarial positioning bordering on malicious) and subjectively (focus on some weird notions of “novelty” and “significance” that boil down to “it’s just less effort to reject on a made up reason”). The metareview summary is then a blatant cut paste of all reviews with equal weightage—“Mixed reviews, with concerns of novelty”—which make it clear that no actual judgment has been expended, not just in glancing through the paper (seems infeasible) but even in assessing review quality and coherence.

    Tldr the noise is more in the base reviews and the rubber stamping by acs than in the committee itself.

  4. The random approach would be something to consider only assuming that indeed there are "the 3 categories", and the "middle" category is not like 20x larger than the "good" one. Also I think that even if there was an agreement that after initial assignment to categories, we do random sampling from the "middle" one, then the committee would spend an equal amount of time as it is now, to decide on papers "on the boundary between categories" anyway.

  5. I’ve advocated for this system for years; people usually think I’m joking.

    The PC’s job should be to assign probabilities to papers. Most submissions would get either 0s or 1s, but a significant fraction would fall in between. Probabilities might’ve bumped up for student papers, or bumped down for authors with more 1s, but to first approximation, the PC decided on probabilities, everyone shakes hands and goes home, and then the program is generated randomly from the resulting distribution. The final probabilities are strictly confidential, perhaps not even revealed to the PC.

    The same system should also be used for admissions, for faculty interviews, and for grant proposals.

  6. As we generate more papers, the resource constraint has become quality reviews, and opinions from experts. The justification for rejecting papers that are scientifically above threshold (and do not have issues with writing, addressing related work etc) are two fold (i) slots in the physical conference (ii) quality maintenance. ML conferences have addressed the slot issue by using posters and spotlight presentation and this should be feasible for theory conferences too. I feel that randomization is a cop out because it is mainly to solve the slot problem which can be easily addressed in various ways and most authors would be happy to have their paper accepted in poster form and move on. This is unlike admissions (students, faculty) where the slots are hard constraints. Quality maintenance is where people are unsure because we cling to the older models.

  7. Some countries have actually chosen to pick grants at random.

  8. Why would you want to "encourage more submissions," which will presumably be weaker on average that the current set of submissions?

  9. This would actually lower diversity, assuming there are agents who argue/advocate for what they identify as diverse, 'mid' papers to pass through.

  10. The idea is good but we should also understand and think about second order effects and mitigate them.

    More submissions can lead to more resources needed for reviews.

    Explainability goes down, no one likes random rejections.

    Recognition goes down, your paper is randomly selected among many similar ones

  11. There is a joke going around that people are going to ask LLMs to write emails for them based on a list of key points and receivers will ask LLM to summarize then back to a list of points.