Slide in Lev Reyzin's JMM talk "Problems in AI and ML for Mathematicians" Reyzin is paraphrasing Telgarsky. Posted with permission. |
We Decide Our Future: Mathematics in the Age of AI
With little computational complexity in the conference, I attended many of the AI talks. You could divide them into Math for AI and AI for Math. Mathematics of course plays critical roles in machine learning optimization, and several talks focused on provably good learning algorithms, though they overemphasized the importance of such. Do you get on a plane because you understand the physics or because air travel has a strong safety record?
But let's focus on AI for Math which generated the most discussion and angst.
I started the conference sitting on a panel on the challenges of peer review in mathematics. Math doesn't have the replication crisis that dogs other scientific communities. But math does have papers with very specialized, long, detailed proofs and few qualified referees willing to check them over with care.
We're not there yet, but in the "near future" we ought to have AI systems that can verify well-written proofs by compiling them using proof assistants like Lean. Referees could spend less time on checking proofs and more on deciding whether the model, theorem and/or techniques merit publication. We might get to the point that you couldn't even submit a paper to a journal until the proofs have been verified.
It's what comes next that really spooks mathematicians. While AI has made significant progress in solving competition problems with DeepMind's AlphaProof and Open AI's O3, it has only played minor roles in developing new ideas for theorems. Eventually AI systems will find critical steps for publishable theorems. Who do we give the credit to? When AI systems become stronger that typical PhD students, what kinds of problems to we give the students?
We'll get plenty of AI slop, flooding journals with mathematical papers that technically prove new theorems but don't offer any particularly interesting or illuminating insights. But we'll also get mathematicians who can unleash an army of virtual grad students to find new connections between different subfields, or make significant progress on major open problems in the field.
Some mathematicians don't want AI trained on their papers and using their techniques and theorems, even though they wouldn't have problems with other mathematicians doing so. In any case, they might not have much choice as many publishers are making deals with AI companies.
The future of mathematicians might follow that of artists and programmers. The best will do fine since they can find proofs beyond what an AI can do. Mathematicians who know how to ask the right questions can harness AI to make themselves far more productive. All mathematicians will have to navigate this brave new world.
"At least they'll still need us to teach the courses," one mathematician told me. Don't be so sure.
Attending a lecture is like going to a drama theater, watching a recorded lecture from home is like watching a movie, and of course you can also read a (text)book. As long as people go to theaters, there will be lecturers. But I agree that a large part of education could go to the AI - hopefully grading homework assignments will be among them...
ReplyDeleteIt's a strange experience to constantly hear how AI will take over all intellectual work, while all my experiences of actually using shows it can't be used for anything beyond bad art and platitudes.
ReplyDeleteWatching AI mania and their champions is as sad as watching LA wildfires eviscerating vast stretches of land where people populated known wildfire prone regions; both will only end when they have fully exhausted their resources one fuel, other capital.
DeleteNot LMMs on their own. AlphaProof for example builds upon technologies used for AlphaZero, which certainly plays a mean game of chess.
DeleteDo we have a definition of 'intelligence' other than replicate what all living organisms (and indeed all the cells inside them) do already (and have been doing constantly for >> 0.5B yrs)?
DeleteHow does one claim an algorithm for a problem that defies a definition?
Its too complicated, intelligence is whatever AI savants say it should be; nothing to see here, keep calm and carry on.
Actually, the neural net stuff has fizzled in chess. The world chess computer championships require each program to run on exactly the same hardware (a generic peecee of some sort), and the last time I checked, an ordinary alpha-beta program won, with the trailing 5 or 6 top programs consisting of one neural net, one combined neural net/alpha beta, and a bunch of alpha beta programs. (I seem to recall there was some BS about AlphaChess, running on a server farm, beating Stockfish (one of the better chess programs) running on a peecee.)
DeleteSince the Go board has the same geometry as "neural nets", neural net Go programs are amazing. Here, Zen 7 (a commercially available non-GPU program) running on a recent high-end peecee plays really nasty viscious top-level human Go, and KataGo (reimplementation of AlphaGo (plus a few generations of improvements)), with the help of a midrange GPU (3080), trounces it horrifically.
But, yes. Gradient descent search does neat things in high-dimensional search spaces, and the head of Google's team is one seriously flipping brilliant bloke (see AlphaFold). But I'm still not fond of calling gradient descent "AI".
Stockfish is the strongest chess engine. It uses a hybrid system. See https://en.wikipedia.org/wiki/Top_Chess_Engine_Championship https://en.wikipedia.org/wiki/Stockfish_(chess)
DeleteGothamChess has a more recent video where ChatGPT at least played legal moves.
DeleteCorrection: In the more recent video, ChatGPT started out playing legal moves, but then played a whole bunch of illegal moves. E.g., jumping a rook over a pawn to capture, having a knight capture a pawn it was next to, putting pieces it didn't have onto the board, e.g., giving itself a new rook, putting a new pawn on the seventh rank so it could queen it next move, putting a new pawn on the eighth rank so it immediately became a queen. Despite allowing it to play these moves, Stockfish still beat it.
Delete@David Marcus -- the wikipedia article you linked states that "In July 2023, Stockfish removed the hand-crafted evaluation and transitioned to a fully neural network-based approach."
DeleteRegardless, is it fair to think about the hybrid approach as a heuristic/approximation for the neural-network based approach? E.g. neural nets will always win, given sufficient resources (data, power, compute), but when we're resource constrained, we take shortcuts and optimize?
In a similar way that a Merge Sort might use a bubble sort for a small enough dataset?
I missed that Stockfish had moved to fully neural network. I don't know any more than what the Wikipedia article says.
Delete"Referees could spend less time on checking proofs". I always thought it wasn't the referee's job to check the proof, except in special circumstances like solving a major problem.
ReplyDeleteThat's true is CS theory conferences but math journals still want their proofs checked. Some journals will start with a quick review process where an expert will check that a result is interesting before sending it out for a proof check.
Deletehttps://www.ams.org/journals/notices/202409/rnoti-p1183.pdf
Delete"They don't need me to carefully check their proofs"
"Given that you're already reviewing a paper, why not also check the details of the proofs? If you have the time and inclination for this, by all means. But as your career progresses and your responsibilities accumulate, you will have less time while simultaneously finding yourself qualified to review more papers. At some point, certifying correctness will become infeasible. Sure, authors will be grateful if you notice a mistake in a proof, just like they will be grateful if you point out a typo, but this isn't your job."
That's one person's take. No one in the room during my peer review panel suggested not checking the proofs.
DeleteYou should suggest it.
Deletehttps://maa.org/guide-for-referees/
Delete"We ask referees to review the articles and notes to help ensure that the math is accurate. However, it is the author, not the referee who is ultimately responsible for the content of his/her article. Referees are encouraged to offer constructive suggestions but not to serve as boards of certification or as ghostwriters for careless authors."
This is for the MAA journals, which are probably quicker to review than more technical journals. But, it does not say, "check all the proofs".
This is ridiculous. In math it is paramount to check proofs for journal reviewers. Often submitted papers are thrown back because they use a result from another paper that hasn't been published yet, so the claimed results might be false. This is of course not a perfect system since accepted papers might also contain flaws but it gives an extra level of confidence.
DeleteWhen one formalises a proof in a proof assistant, it is still important to check that the definitions and theorem statements match what is intended or written down in pen and paper. At scale this is a social activity of maintaining good libraries if mathematics and TCS. I have been working towards this, but have a hard time convincing my fellow TCS folks that they actually need to care about the rigour of their proofs. Thus has to be a community effort, similar to mathlib.
ReplyDeleteI know a case where someone solved a conjecture AS STATED but it was nothing like what was INTENDED. The paper should have advertised itself as showing a flaw in the conjecture, not as solving the conjecture. AI may have this problem- solving conjectures as stated, not as intended.
ReplyDeleteRelevant comic that just came out:
ReplyDeletehttps://www.smbc-comics.com/comic/applied
That makes the point that it goes the other way as well. As more math applications of AI emerge, the additional interactions with AI via rigorous symbolic formulations will also go a long way toward improving AI.
Here's some fallout from that meeting that you didn't mention:
ReplyDeletehttps://siliconreckoner.substack.com/p/the-frontier-math-scandal
TLDR: It seems that OpenAI trained their latest model on data they promised they wouldn't. It still only got 25%, but that would have been a humongously amazing score, had they not seen most of the test data in advance. Oops. Theranos level sleaze comes to AI.
(It irritates me no end that someone else got to make the Theranos comparison before I did.)
Since this is a site devoted to computational complexity, I would be interested to see what you think about the article by Dean and Naibo at https://arxiv.org/abs/2408.03345, "an updated version of a traditional argument that limitative results from computability and complexity theory show that proof discovery is an inherently difficult problem," which draws implications for AI proof discovery.
ReplyDeleteWe need to be careful not to confuse that fact that we won't find solutions to all mathematical proofs (which we've known since Gödel) and whether AI can help us find new approaches or insights into proofs for new theorems. We don't need an AI that's an oracle for the halting problem, just an AI who's a strong PhD student. We're not there yet, but getting close.
Delete