Wednesday, May 27, 2026

Authorship in the AI Age

The technical paper for the Erdős Unit Distance Problem lists only "OpenAI" as an author. When Bill posted on Sunday about the Erdős distance problems, he mentioned the names of OpenAI researchers who prompted and checked the proof. Sebastien Bubeck of OpenAI reached out directly and later as a comment saying this was really an OpenAI-wide achievement, and we shouldn't just single out people at the end of the funnel. I agree with Bubeck for this paper, but it brings up some challenging authorship issues for the future.

Most publishers follow the position of the Committee on Publication Ethics that AI tools cannot be listed as an author of a paper.

AI tools cannot meet the requirements for authorship as they cannot take responsibility for the submitted work. As non-legal entities, they cannot assert the presence or absence of conflicts of interest nor manage copyright and license agreements.

The paper doesn't list the model as an author, rather the organization (OpenAI) that produced it. The closest analog might be the paper that verified the Higgs boson, which has only "ATLAS Collaboration" on the author line and lists the nearly three thousand authors at the end of the paper. If OpenAI were to publish their paper, they should list everyone who worked on the model with at least a corresponding author to handle the legal issues mentioned in the COPE position.

There is also all the mathematicians whose work was fed into the training data, but I wouldn't list them even if we could. We don't add as authors those whose work we rely on, rather we cite them, and the OpenAI paper does have a healthy references section.

Shortly after the OpenAI announcement, Princeton Professor Will Sawin improved and gave an explicit exponent for the lower bound in a currently singularly authored paper. Often, but not always, when a paper improves a bound of a recently announced result, the papers get merged. I don't expect that to happen here, but if it did I would have no idea how to handle the authorship of the combined paper beyond just listing Sawin and OpenAI as the authors.

OpenAI used an internal model for this work, but one would hope they will eventually release this model to the public. If someone outside of OpenAI uses a released model to answer another open math question, even with a single prompt, do they need to add OpenAI as a co-author? I think not, as the model now becomes a tool when released. The researcher should disclose the role the model played, but the model's creators no longer need to be authors.

How about a thought experiment. Suppose you are working with a student by email on a paper and you put in equal effort so you will both be co-authors. Then you discover the student was mostly just feeding your emails to ChatGPT and sending you back the responses. Now who should be on the author list? The student should come off but it would feel strange writing this a solely authored paper. 

What if someone builds an AI mathematician that doesn't take any prompts? It just reads papers and occasionally writes its own. Who is the author? That "someone". What if they just used a regular AI agent and gave it the prompt "You are a research mathematician. Go read papers, prove theorems and write up your results." Maybe we'll have AI math journals filled with papers written, edited and refereed by AI models. I wonder how we'll COPE with that.

8 comments:

  1. @Lance, thanks for this post and addressing some of the key thoughts shared when i made the inquiry in comments about transparency.

    ReplyDelete
  2. Did Erdos pose his problems to get at their solutions (AI certainly excelled at deriving a negative answer to one of his famous problems and a human would feel triumphant if they had discovered such a counter-example) or with an intention of motivating new lines of mathematical reasoning (does not seem to be the case as per the assessment of authors in the companion paper)?
    In a big-picture sense both are equally useless human endeavors whose main purpose were to serve as recreational brain teasers for people who have the luxury of time to expend on such efforts rather than scrambling for basic survival needs.
    One of the big favors that AI has done is to reflect back to humans that machines can "waste time" in far more effective ways than humans ever can, bringing out a primal sense of urgency to the question what exactly are people needed for on this Earth.

    ReplyDelete
  3. A new complication. Anthropic just announced an alternate disproving of the conjecture (with a weaker bound than Sawin's proof). So now we have "Claude" as an author :)

    ReplyDelete
  4. Someone publishes a paper based on pure simulations on randomness (say largest prime). The author is the person. Here it should be the person who gave the prompt in AI.

    ReplyDelete
    Replies
    1. According to the paper, the prompt itself was generated by AI. So, I suppose the person that wrote the prompt that wrote the prompt should be the author.

      Delete
  5. I found the excuse to not list the humans total bs.

    There are a lot of papers that get published and a lot of people work on them and all of them get cited. They could have done that here.

    The reason they did not is purely deceptive marketing pre-IPO.

    ReplyDelete
  6. There are two different aspects here. One is what is the right way to give credits for a discovery. The other is the impact of the cost of discovery going down significantly, which will strain various processes we previously had.

    Francis Fukuyama shared some thoughts recently that one of the challenges we are having in liberal democracies is that various social and government processes that were designed with a particular load expectation now are failing under the massive increase of the load they are under. He gave as an example the retirement for agencies to review and address all comments from the citizens by agencies.

    We will need to step back and revisionist the processes and systems and the purposes they were intended to serve.

    Publishing papers with author names served various purposes that might not make sense once pricing new important results and verifying them becomes easy. It is important to remember the scientific publication system is a relatively recent development, it was common not to publish at the time of Gauss, ... It might not be valuable enough to pay many people to do research at some point. It might also become centralized at the hands of few who have financial resources to afford more compute, increasing the inequality in research and society in general.

    The technology tends not to be an equalizer initially but rather concentrator. It might increase total utility and production but in a very uneven way.

    Lots of new challenges that we are going to need to deal with during this AI transition.

    ReplyDelete
  7. This post really piqued our curiosity about the ethical dimensions of AI authorship, so much so that we ended up doing a full write-up on our ethics blog (link in name)! The 411 of our argument is that academics debating this issue need to clarify whether care about this as an **ethical** (as the COPE's name suggests) or **legal** (as the content of the COPE statement implies) matter.

    Needless to say, as an ethics consultancy, we think the ethical aspects are more important, because laws and legal regimes change, but right and wrong are timeless!

    ReplyDelete