Thursday, May 08, 2025

Using AI for Reviews

I reviewed a paper recently and I had to agree not to use AI in any aspect of the reviewing process. So I didn't but it felt strange, like I wouldn't be able to use a calculator to check calculations in a paper. Large language models aren't perfect but they've gotten very good and while we shouldn't trust them to find issues in a paper, they are certainly worth listening to. What we shouldn't do is have AI just write the review with little or no human oversight, and the journal wanted me to check the box probably to ensure I wouldn't just do that, though I'm sure some do and check the box anyway.

I've been playing with OpenAI's o3 model and color me impressed especially when it comes to complexity. It solves all my old homework problems and cuts through purported P v NP proofs like butter. I've tried it on some of my favorite open problems where it doesn't make new progress but it doesn't create fake proofs and does a good job giving the state of the art, some of which I didn't even know about beforehand.

We now have AI at the level of new graduate students. We should treat them as such. Sometimes we give grad students papers to review for conferences but we need to look over what they say afterwards, the same way we should treat these new AI systems. Just because o3 can't find a bug doesn't mean there isn't one. The analogy isn't perfect, we give students papers to review so they can learn the state of the art and become better critical thinkers, in addition to getting help in our reviews. 

We do have a privacy issue. Most papers under review are not for public consumption and if uploaded into a large-language model they could become training data and be revealed if someone asks a relevant question. Ideally we should use a system that doesn't train on our inputs if we use AI for reviewing but both the probability of leakage and amount of damage is low, so I wouldn't worry too much about it.

If you are an author, have AI review your paper before you submit it. Make sure you ask AI to give you a critical review and make suggestions. Maybe in the future we'd required all submitted papers to be AI-certified. It would make the conference reviewers jobs less onerous.

For now, humans alone or AI alone is just not the best way to do conference reviews. For now when you do a review, working with an AI system as an assistant will lead to a stronger review. I suspect in the future, perhaps not that far, AI alone might do a better job. We're not there yet, but we're further than you'd probably expect.

9 comments:

  1. You have no right to hand the submitted paper's content over to the AI thieves.

    ReplyDelete
  2. Many AI systems by more serious players have an option in paid version to not use your data for training.

    This exists in ChatGPT as well, you can disable it in the settings.

    ReplyDelete
  3. That wasn't me (I sometimes forget to put my name in, sorry), but it's exactly right.

    Copyright isn't perfect*, but intellectual property (stuff that humans, not random text generators, create) needs to be protected, and our AI bro friends are not particularly respectful of other people's property rights.

    *: Thanks to (as I understand it) Disney insisting on lengthening of the protection term, said term is way to long. Also, who owns the copyright can be hard. As a translator, all of my work was work for hire and not mine, which was fine. But scientists and academics are also largely producing "works for hire", since they get paid a salary by their university or employer.

    ReplyDelete
  4. Rodney Brooks makes some interesting points about AI https://rodneybrooks.com/predictions-scorecard-2025-january-01/ and so does Ed Zitron https://www.wheresyoured.at/wheres-the-money/

    ReplyDelete
  5. If you actually believe AI is currently at the level of a grad student, most faculty would behave completely differently than they currently do. Faculty would no longer bother recruiting grad students (or decrease the extent to which they do drastically), and instead fulfill the function of human grad students just by prompting your favorite chat bots to have research discussions or solve problems. This would be much more efficient.

    Currently faculty don’t do that. How do you square this irrational behavior with your claim?

    ReplyDelete
    Replies
    1. I'm talking about a model released last week and it takes years for technology to change behavior. Also AI can't (yet) do the significant original research we require for PhD Dissertations.

      More importantly, the main role of the PhD program is educational, to train the next generation of computer scientists. I have supervised many students who's research did not contribute to my own, as well as many who did.

      Delete
  6. I really disagree with this post. I am obviously blown away by the ability of LLMs to synthesize known knowledge, answer many homework questions, etc. Someday (even soon), what you suggest might be possible. However, I am yet to be convinced that they can solve things they haven’t seen a million examples of: I say this because I regularly write tricky but still easy homework questions that fool all the fancy new models. What strikes me as a truly bad idea is to start rearranging our research tastes to align with something that is trained to predict the next word based on the content of the internet.

    This is a sore spot for me because I am 95% sure I received a review from an LLM on a conceptual paper in which we suggested a new model/parametrization of difficulty of approximation. The review was lukewarm, neither supportive nor damning. The dominant vibe was its non-ability to be an arbiter of taste, one way or another. On the flip side, it’s very conceivable to me that AI reviewers are game-able. Do we know for certain that there aren’t ways to change a paper superficially to make it likable to an LLM? If I fill my paper with grandiose statements about the implications of my theorem, do we really trust the model to distinguish between bluster and true achievement? I have heard similar complaints from many others. To your point that you can have AI play the role of an assistant, I would push back and say it is too tempting to trust and be biased by what it has to say.

    A final point is that if we hand over the responsibility of evaluating new work to the machine, then what are we even doing in a field that usually has no practical implications (let’s be honest). To produce theorems with the use of a computer that humans evaluate as beautiful is one thing; to produce theorems that a computer evaluates as beautiful and humans don’t understand strikes me as a waste of everybody’s time.

    All to say, I think we should emphatically forbid the use of AI for academic reviewing.

    ReplyDelete
  7. There was an article on using AI in customer support by companies making it much less efficient and pleasant for the customers.

    But who cares, they can lay off people and save money and they can claim to be doing AI and sell the hype to investors.

    If these were so good, they would first replace the CEOs and CFOs and COOs.

    ReplyDelete
  8. While we're at it, why not have AI evaluate tenure cases and watch job talks for us? Think of the time savings!

    ReplyDelete