![]() |
| Watching Mathematicians at Work (AI generated) |
The Smithsonian Natural History Museum has a FossiLab where visitors can peek through windows watching scientists prepare fossils for conservation. Maybe we should have a similar exhibit at math museums or universities. How else can we learn what mathematicians do?
In 2025, artificial intelligence has achieved gold medal status at the International Mathematical Olympiad but so far has only contributed modestly in finding new theorems. Of course, finding and proving new theorems requires a different set of skills than competition problems but it goes further than that.
The Internet has considerable text and video on how to solve math competition problems that machine learning systems can train on. On the other hand, mathematical research papers usually have little more than theorems and proofs. Maybe some intuition. Rarely do papers go into the thinking process and the false steps that one takes until one finds the proof. For some problems I've spent weeks proving a theorem but only the last day's work gets written up.
Now I doubt many mathematicians would give up their privacy and time to train AI systems to take over their jobs, but just suppose we wanted to do so. We could equip every mathematician with a camera recording every mathematical conversation and everything they write, especially the ideas that don't pan out. We can transcribe it all and feed it into an ML system. But it probably won't be enough.
Trouble is most mathematical breakthroughs just happen inside of people's heads. If you ask a mathematician how they came up with the clever idea that led to a major new result, they can rarely truly explain the process behind it. Not unlike neural nets.
If machines can't learn to prove theorems by watching mathematicians, perhaps the route mathematicians take: A grad school slog towards PhD research and learning from endless failure.

If I remember well, Terry Tao said several times that to train AI to do research, we should have a database of failed attempts.
ReplyDeleteFollowing the "gold medal status" link, it says, "We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions." For a human, you would just hand them the test.
ReplyDeleteSince LLMs are best described as "loose grammar engines", and since they don't understand addition, I'm skeptical that Gemini understands math, regardless of its purported score on the IMO. Maybe you don't have to understand math to be a useful assistant. But, many tasks require understanding.
The "gold medal" bit is more than a bit funky.
DeleteFirst: "for example, in the recent 2025 IMO, 72 gold medals were awarded (score >= 35), "
The LLM was one of a 117-way tie for 67th place. (Or something like that, maybe a 67-way tie for 47th place.)
It found online solutions for the five problems everyone solved, and didn't for the one only the smart kids solved.
To quote Peter Shor (yes, that Peter Shor):
"I have noticed that LLMs perform really well on math questions that have a bunch of solutions on the Internet. But if you ask them a question that only has one or two solutions posted, or one without any known solutions, they tend to fail miserably.
So they seem to be cribbing from existing papers, but they are really, really good at it. But this means the dreams of AI proponents that AI will now solve all our scientific problems are not likely to be realized."
Not so long ago, the primary responsibility of an academic towards the society (who have assiduously funded their efforts over many millennia) was to serve as guardians of rational thought and to vigorously repudiate charlatanry.
DeleteIt now seems that most everyone wants to "take it easy" and "reap the moola while it lasts".
Seems problematic at many levels...
ReplyDeleteIf humans can concoct "novel ideas" after exploring false-leads and dead-ends inside their heads (not just humans, all animals can do this to different extents) what exactly is preventing these AI apparitions from doing the same?
Why have AIs observe how humans falter and recover to replicate this ability (even if such a thing were possible) instead of formulating their own approaches to perform exploration?
We have numerous examples of mathematicians who have proved major results all on their own without much formal training or "watching how others do it"; the very definition of a standout mathematician in the past.
Going even further, asking a system that intrinsically defies any attempt at proving its internal correctness and consistency to magically begin proving theorems is simply outlandish; sure same can be said of humans as well but we know that error-prone mathematicians are quickly discredited as quacks.
You may already know this, but Timothy Gowers has recorded himself solving a few math problems because it might help people understand the mathematical process. https://www.youtube.com/@TimothyGowers0
ReplyDeleteThe underlying reason why all artificial reasoning (learning) approaches are doomed to fail is the vast differences in distributional characteristics of "positive" (i.e., correct) vs. "negative" (i.e., incorrect) reasoning (data) samples; we can easily infer that the positive side (say cat) has certain well-defined consistent characteristics (however complex) that facilitate modeling while the negative side (NOT-cat) is manifold more complicated (essentially undefined).
ReplyDeleteNoting new here; this is an age-old conflict between generative modelers (of which deep-learning models are only the latest flavor) vs. discriminiative modelers who have pretty much given-up hope due to the intractability of modeling the "negative" side.
Also why (as Marcus mentions above) AI models crave carefully curated (in general, *correctly* annotated) data samples and essentially discard the negative samples (other than to gather some generic feature vectors).
The only system that undeniably demonstrates the ability to incorporate both sides is life; because life interacts directly with the real-world so knows what represents (in fact, can imagine and construct) negative examples in the real-world.
We are not going to get anywhere close with deep-learning (or anything else) that is completely devoid of real-world experience.
reminds me of a previous guest post that discussed different settings for the mathematical mind to turn creative in different ways. interesting stuff.
ReplyDeleteIn my case this would be a webcam of me crying at a desk and occasionally knocking over a stack of books
ReplyDeleteDo you believe LLM-based systems will be able to resolve P vs. NP in the next 5 years? if not, why not?
ReplyDeleteNo, I don't think we'll see it resolved in the next five years. Given that we don't even have a good approach to P v NP, I doubt even AI will find new tools needed to prove P <> NP. But I'd be happy to be proven wrong.
DeleteNo. LLM's when they work are good at putting together phrases that are already out there, but not good at getting the NEW ideas which will be needed.
ReplyDeletehttps://www.linkedin.com/pulse/ai-bubble-kamal-jain-vnxtc
ReplyDelete