Wednesday, May 22, 2024

Peer Review

Daniel Lemire wrote a blog post Peer Review is Not the Gold Standard in Science. I wonder who was claiming it was. There is whole section of an online Responsible Conduct in Research we are required to take on peer review which discussing its challenges: "honesty, objectivity, quality control, confidentiality and security, fairness, bias, conflicts of interest, editorial independence, and professionalism". With apologies to Winston Churchill, Peer Review is the worst form of measuring academic quality, except for all of the others.

Peer review requires answering two questions.

  1. Has the research been done properly?
  2. What is the value of the research?
For theoretical research, the first comes down to checking the proofs, which sounds like an objective check. Here we have a "gold standard", formalizing the proof so it can be verified in a proof system like Lean. That's a heavy burden so we generally only require authors to give enough details so it's clear that we could formalize the proof given enough time. That becomes subjective and reviewers, especially for conferences, may not have the time or inclination to check the details of a 40-page proof. Maybe one day AI can take a well-written informal proof and formalize it for a proof system.

But the second question is almost entirely subjective. How does the work advance previous research? What value does it give to a field and how does it set up future research? Different researchers will give different opinions. And then there are the people who consciously or unconsciously cheat, helping their friends get papers accepted to citations rings. As we focus on metrics to judge researchers, too many people will game the system to pump up those metrics.

In 2013, NeurIPS had over 13,000 submission for 3500 slots. Even with the best or reviewer's intentions, it's impossible to maintain any sense of consistency for these large volume conferences.

Despite the problems with peer review, you'd hate to us a different system, say delegating the reviewing to some AI process, even if it could lead to more consistency. I suspect many reviews are being delegated anyway.

Peer review grew in importance as journals and conferences had to make choices to fill a limited proceedings. These days we have the capacity to distribute every papers. So perhaps the best form of measuring academic quality is no review at all.


  1. The problem with no peer review (or numbers like NeurIPS that causes peer review to be low enough quality that it is very random) is attention. Who is going to use/build on work if it isn't distinguishable amid the thousands of (accepted) papers? From what I have heard, dueling social media posts have replaced solid peer review for attention: If you have a good paper you post about it. Those with the bigger social media following get the lion's share of credit, despite probably multiple others with similar ideas in their papers. That seems at least as unfair...

  2. We currently rely on decision making by delegation.

    It is not the only possible way. You can have conference attendees votes on which papers they want to hear about.

    You can send weight the view by the perceived reputation of the voters in the community.

    For correctness and other discussions about papers, you can use Open Review open public online. If a paper is of outmost interest it would draw attention of more experts and various parts checked, discussed, and commented on.

    There is potential for a lot of experimentation and innovation.

  3. Reviewers are not obligated to check proofs. If checking the details of a proof helps the reviewer evaluate the paper, then they may do that. The author is responsible for correctness. The reviewer is evaluating whether it should be published. Of course, an incorrect proof should not be published, but that level of checking is not the reviewer's job.