Computational Complexity: What is "PhD-Level Intelligence"?

Wednesday, September 17, 2025

What is "PhD-Level Intelligence"?

When announcing Open-AI's latest release last month, Sam Altman said "GPT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert." Before we discuss whether GPT-5 got there, what does "PhD-Level intelligence" even mean?

We could just dismiss the idea but I'd rather try to formulate a reasonable definition, which I would expect from a good PhD student. It's not about knowing stuff, which we can always look up, but the ability to talk and engage about current research. Here is my suggestion.

The ability to understand a (well-presented) research paper or talk in the field.

The word "field" has narrowed over time as knowledge has become more specialized, but since the claim is that GPT-5 is an expert over all fields, that doesn't matter. The word "understand" causes more problems, it is hard to define for humans let alone machines.

In many PhD programs, there's an oral exam where we have previously given the candidate a list of research papers and they are expected to answer questions about these papers. If we claim a LLM has "PhD-level" knowledge, I'd expect the LLM to pass this test.

Does GPT-5 get there? I did an experiment with two recent papers, one showing Dijkstra's algorithm was optimal and another showing Dijkstra is not optimal. I used the GPT 5 Thinking model and GPT 5 Pro on the last question about new directions. A little more technical answers than I would have liked but it would likely have passed the oral exam. A good PhD student may work harder to get a more intuitive idea of the paper in order to understand it, and later on extend it.

You could ask for far more--getting a PhD requires significant original research, and LLMs for the most part haven't gotten there (yet). I've not had luck getting any large-language model to make real progress on open questions and haven't seen many successful examples from other people trying to do the same.

So while large-language models might have PhD-level expertise they can't replace PhD students who actually do the research.

19 comments:

David Marcus8:48 AM, September 17, 2025
I think the purpose of the oral exam is to test understanding. This works pretty well for humans, since humans can't remember lots of stuff without understanding it. By the way, when I took my orals, it was based on entire courses, not specific papers.
ReplyDelete
Replies
Anonymous9:41 AM, September 17, 2025
I think it's reasonably easy to define it: PhD-level intelligence is an entity who can be directed to write a PhD thesis (I'll take math/TCS as the illustrating example, the story for humanities is an exercise for the reader). You tell the bot "this is some research paper, try to generalize it to this case", then the bot asks the kinda clarifying questions a bright student might ask, then the bot writes the paper with the generalized theorem.

Of course, this would depend on the PhD advisor, as a Fields Medalist might ask the student to do less "incremental"-ish contributions, but I would be satisfied with the minimum effort PhD thesis that contains results publishable in non-predatory math journals.
ReplyDelete
Replies
Anonymous10:28 AM, September 17, 2025
The newer Dijkstra paper cites the older one so I would expect a LLM to adequately navigate the tension there.
ReplyDelete
Replies
Jim Hefferon11:59 AM, September 17, 2025
Is there a sentence missing from the paragraph that begins with "Does"? There is a period after the word "about" that makes me wonder.
ReplyDelete
Replies
Dave Doty12:39 PM, September 17, 2025
I similarly interpret the "PhD-level intelligence" claims as being a bit overblown... though not by much. Similar to your observation, I think GPT5 and Claude Opus 4.1 are as skilled as any PhD student in what they can accomplish over a very short time horizon. But the key with these models now is getting them to be coherent over a long time horizon. Even the best PhD student requires several years to hammer away on a few problems in a coherent way, slowly learning and building ideas. In any 30-minute conversation, GPT5 and Claude 4 seem to me to be as insightful and skilled as any good PhD student. But lacking the ability to focus on one single problem for months to produce a paper, and then to repeat that a few more times to keep learning and cement the skills, is currently out of reach.

It's similar to Claude Code being very useful for implementing certain features in software quickly, and with guidance from a human software engineer who's paying close attention and course-correcting, one can build a large complex app faster than one human alone. But currently it seems to be out of reach for the LLM to build the large, complex app completely autonomously, for similar reasons that it cannot focus long enough on one mathematical problem to do original paper-worthy research.

I know this "time-horizon" idea is a focus of the major labs (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) so I'll be curious if the next versions will continue making progress on time-horizons. Even if current models lose focus after a short time, certainly I don't see any argument why it's fundamentally difficult to scale time horizons to months, though maybe fundamentally difficult with the current transformer architecture, with its quadratic dependence of computation speed on context window length. So perhaps some new architectural discovery is needed for this problem to be solved.
ReplyDelete
Replies
E9:24 PM, September 17, 2025
Hard to say what entails a good oral exam. Recall Terry Tao's disappointing performance during his oral exam? What was he asked, and why it a disappointment? Would ChatGPT have performed better?

There's a real risk of performance and brain activity atrophy using ChatGPT; we've seen analogous technologies that supposedly "help/assist/support" tedious or menial tasks and really all they did is actually rob us of certain brain activities that have critical (yet un-quantifiable) second and third order impact on critical/creative thinking.

Getting back to your point, I'd assume that ChatGPT can actually help the clueless PhD student in attacking open problems. Surely it won't provide a flawless nor correct resolution of famous conjectures, but "smaller problems" perhaps more concrete problems it might be able to crack and hence deprive you of your mental workout! Welcome, Miss Atrophia!

What about Solitude is the true school of genius ... in which Chat did this phrase walk out of the room?
ReplyDelete
Replies
David in Tokyo3:07 AM, September 18, 2025
"The word "understand" causes more problems, it is hard to define for humans let alone machines."

Sure. But. We normally think of "understanding", at it's absolute mininum, to be some sort of _symbolic_ reasoning (that is reasoning about named concepts*), and LLMs don't do that. If you ask an LLM to multiply 2 integers, the probability of the answer being wrong increases with the length of the integers. Huh? Is this a joke? The LLM technology can't deal with multiplication as a concept, because it doesn't deal with concepts (it deals with tokens). So they look up the answer. In no reasonable sense of the word "understand" do LLMs "understand multiplication".

Ditto on rotations. (See the discussion of tic tac toe on Andrew Gelman's blog.) Long story short: if you can't reason about rotations, you can't do group theory, which is rather basic math, so it's insane to claim that LLMs "solve math problems".

As always, if you ask "What can X do?" (or "Can X do Y?") the answer should always be informed by a basic understanding of the operation of X, and, in particular, shouldn't be something X cannot, in principle, do. (Or has been demonstrated not to do.)

*: Again, back in the 70s and 80s we thought we were figuring out how to do this. It turned out to be harder than we thought. The historical position of the LLM technology is to get around this failure without actually doing the work of figuring out how to do symbolic conceptual reasoning. Which is why some of us see it as It's an off-ramp from any sort of reasonable path towards progress on understanding what understanding actually is.

ReplyDelete
Replies
Anonymous2:06 AM, September 19, 2025
The commercially available ones might not, but private ones, which are much more compute intensive, are getting there, e.g. Deepmind's co-scientist.

The idea behind them is generate a bunch of ideas and then do a search in the tree based on them. It is essentially AlohaGo's algorithm adopted to use LLM. You still need a verifier to discard bad branches, but overall with a lot of compute, I would not be surprised if they can solve new problems.

Whether it is going to be cheaper than hiring a PhD student, they don't seem to be close to that. But the costs in AI have been going down fast so who knows what we would have 2 years from now.

Same is true about programming, if you can have a verifier. The problem of course is that program verification is computationally hard, and proof search for correctness using the same technique is still very expensive.

When Deepmind says they used their systems to solve some new problem, what they mean is that they spent a large chunk of Google's massive compute resources for a few months on something with hundreds of top notch engineers and ML researchers helping it to have a few successful runs.
ReplyDelete
Replies
Anonymous2:13 AM, September 19, 2025
ChatGPT 5 was pretty underwhelming overall.

There is a lot of hype in the industry right now, afterall they are making billions from investors by selling the dream of AGI.

The model quality itself seems to have mostly plateaued for the past year, GPT 5 is not much better than GPT 4 or o3.

That is why there is a lot of interest in how to use the models in certain new ways, like thinking models (which are essentially the same model, but generating and then being used to comment on what it generated in a loop, etc.).

ReplyDelete
Replies
Latif Salum12:08 AM, September 23, 2025
"Logic will get you from A to B. Imagination will take you everywhere."
Albert Einstein.

Can AI imagine?
ReplyDelete
Replies

Add comment