AI and Vacation
I'm back from my German vacation. This was my first AI vacation, by which I mean how I used AI to navigate a foreign country. Taking a picture of a hand-written menu board, not just to translate the dishes, but to get a description of each. Visiting a palace with a German-speaking tour guide and translating in real time. Even the more mundane taking pictures of buildings to learn about them.
On the TV I saw something about Künstliche Intelligenz, and Chatty told me it was German for Artificial Intelligence. The Germans use KI instead of AI partly because AI can be confused with Ei (egg). At least that's what AI tells me, so it must be true.
AI and Math
At the beginning of my vacation, Google announced that they achieved gold medal status in the International Mathematical Olympiad. An impressive achievement though Terry Tao makes a good point that comparing an AI system to a time-constrained high-school students is not an apples-to-apples comparison.
I already find talking to AI about math topics quite useful, though it's like talking to an early PhD student. Sometimes they just say things that aren't correct, but usually they are. The reasoning models are particularly good at finding holes in P v NP proofs. For example here's the conclusion of ChatGPT o3-pro's review of the paper from Eric's guest post.
The paper is a reminder that lower‑bound proofs live or die on the exact breadth of the algorithmic model they exclude—too narrow and the result is unsurprising, too broad and the proof tends to break. At present this work sits in the first category.
What I want to see is AI come up with a solution to an open math problem, a true new insight beyond just some optimized search. I'm not looking for P ≠ NP, just some result that would be publishable in a respectable conference or journal, even just a new completeness result. We haven't really seen that yet, but I suspect we will soon and then we can figure out where math goes from there.
AI and Bill
In his presidents question and solution, Bill states that AI had failed to find the right answer to his problem. Back in June, I saw Bill's draft post and tried AI to solve it.
AI initially failed the test but for a good reason. Bill's initial draft post had Ford and Dole in Group Two because they received LLBs instead of JDs. In the past the LLB was the professional law degree. Yale didn't change to JD until 1971. Ford got his LLB from Yale in 1941.
When I removed Ford and Dole, ChatGPT o3-pro correctly gave the reason for the partition, though it did take over 13 minutes.
Every name in Group One spent time in a law school—most completed a J.D. or LL.B., and the two exceptions (Al Gore and, in some accounts, Lloyd Bentsen) still enrolled in law studies.
examples – Walter Mondale (University of Minnesota Law) en.wikipedia.org,
JD Vance (Yale Law) en.wikipedia.org,
Al Gore (attended Vanderbilt Law) en.wikipedia.org.No one in Group Two ever attended a law school; their highest formal education is in fields such as engineering (Jimmy Carter), economics (George H. W. Bush), business (Donald Trump), political science (Paul Ryan), or acting (Ronald Reagan) en.wikipedia.orgen.wikipedia.org.
So the distinguishing property is legal education: every Group One figure went to law school, while none in Group Two did.
Another commentor got a similar result for ChatGPT o4-mini-high. I just tried it on Gemini 2.5-Pro and it also gave the correct response, this time in seconds.
On the other hand, E tried several base models and none of them succeeded. The lesson: You want to solve a tricky problem, pony up the $20/month and use a reasoning model.
2.5 pro might have pulled the information from your website.
ReplyDelete"What I want to see is AI come up with a solution to an open math problem.... We haven't really seen that yet". How about the problem whether a unit sphere in R^11 can be touched by 593 other non-overlapping unit spheres? This was open until recently, the answer is Yes, the construction found by AlphaEvolve https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
ReplyDeleteIt's a bit subjective, but it seems to me that this is more about optimizing a search space than coming up with a new proof approach.
DeleteIt seems there are still a few bugs in the system. Well, more accurately, it seems there are some new bugs in the system.
ReplyDeleteSeriously hillarious Mastodon thread at:
https://mastodon.mit.edu/@kjhealy@mastodon.social/114990301788026630
(tldr; GPT5 gets something wrong that GPT4 seems to do OK on. As always, the GPT folks will gussy up the front end to make sure it doesn't do this any more, so by the time you read this, there'll be a hack in place to prevent it from happening.)
But it shows that LLMs not only can't multiply, they can't count, either.
(The above isn't a cheap shot: it's a technically accurate description of the underlying technology.)
The bottom line is that it seems seriously nuts to claim that a system that can neither count nor multiply reliably can "solve math problems".
Really. This whole thing is seriously nuts.
Last time I ate a blueberry there were no "b"s or any other letters inside of it. Likewise, GPT doesn't get the word "blueberry" but a word vector of the concept.
DeleteIf you want GPT to count the number of characters of a string, ask it to write a program to do so.
The idea that "GPT doesn't get the word "blueberry"" as an excuse for the stupidity is interesting. It's right there in the input, so of course it has the word blueberry. That it can't see the input (or the output! (see the 16-letter words post in that thread)), really should be clueing you in to the point that the whole LLM idea is ridiculous. (Well, it's seriously kewl, but it's ridiculous for the things it's claimed to do. Which is the problem.))
DeleteTo reiterate: not only doesn't it "know" what the input means, or what the output means. It doesn't even know what they are. (Thanks for this. It points out the complete insanity of this game beautifully.)
FWIW, LLMs don't do "concepts" in the sense that anyone who has done linguistics, philosophy, or 1970s/1980s AI would use the term. There is no grounding in reality, just statistical correlations. Sure, lots of silly hot air has been spilt by philosophers, but they all at least understood that concepts were about reality. LLMs don't do reality.
On the programming bit, though, I think you mean "ask it to regurgitate some sample code that counts strings and see if it works". I thought you were a computer science type: it's one of the basic results of the field that "writing code" isn't possible. Finding sample code and letting you debug it isn't "writing code", no matter how good it is at finding the sample code.
But, seriously, can't you see how insane it is that we've got a whole industry based on a technology that can't even count reliably?
Here is how I write code: I write it. I compile it to catch syntax errors (like a spell checker). I then proofread it to make sure that it is correct. Then I try running it to see whether I made a mistake. (Pretty similar to the way I write math, but I can't "run" the math.) I doubt the LLMs are doing it the way I do.
DeleteI'm confused that people seem to be using LLMs to help with the "write it" part. Even assuming an LLM can help with that part, they seem to only be able to "write" fairly small self-contained programs. Almost all of my programming involves adding functionality to programs that have thousands of lines of code.
To pile on the ranting, here's a few lines lifted from my inbox (Technology Review, which I'm often irritated with (I prefer Science), but that's another rant):
ReplyDelete(The last line is why it's important to keep ranting.)
>>>>>>
4 A man suffered psychosis after ChatGPT suggested he take sodium bromide
The 60-year old ended up with bromism. (Ars Technica)
+ He’d been taking it for three months before he went to the ER. (The Independent)
+ AI companies have stopped warning you that their chatbots aren’t doctors. (MIT Technology Review)
<<<<<<
Double-standards I say! For many folks here clamouring for valid proofs is the only prize, but here we are imputing that a purely empirical approach is a "big step" towards the holy grail of "machine reasoning" whatever that means.
ReplyDeleteHow to formulate a proof for a claim "X (say a cat) is intelligent"; of course, I don't know how to do that in general
A) All life (down to the microscopic levels including its cellular machinery) is intelligent and is realized without violating any laws of physics. Why so? Because it has to be, for ilife's survival in a physical world is absolutely predicated on its intelligence. I acknowledge this is a bit of circular reasoning but anyone can surely tell a living thing vis-a-vis a non-living thing even though they can't explain why. Intelligence is simply a "dual formulation" of life itself; can't have one without the other.
B) How to prove a machine is intelligent? Turning test? Math olympiad scores? automatic translation? Tons of people using this and singing it's praise? All of that is empirical and in no way goes towards constituting a "proof" of its intelligence.
Long story short, given how egregiously, infact sadistically, cruel humans are it's not even the case that greater intelligence correlates with greater survivability; from the fossil record its anti-correlated.
According to you, "intelligence" is a synonym for "being alive". I don't think that's the usual meaning of the word.
Delete@DMarcus I understand your point; I'll refer readers to "How life works" by Philip Ball for the most lucidly written text on cells which was an eye opener to me to obtain a glimpse into how much stuff happens inside a cell to stay alive and reproduce; as you can tell we can clearly see the magic happen at the macroscopic level but for the most part we can't infer what's actually happening with all the complex molecules and their finely tuned orchestration. So if I roll back the time I'm saying you don't really see intelligent and non intelligent life but only life that stays alive by intelligently manipulating available resources at the cellular level likely organelle level and so on.
DeleteIn this sense I'm defining intelligence as the abilty to do just that: utilize available physical resources to stay alive and make more cells. I do grant the point there could be other types of intelligence so to speak I don't really see what those would be like
David M.
DeleteAgreed. But if you see scientists and philosophers trying to figure out what intelligence is, you see them flailing and failing massively. The latest stupidity was some blokes arguing that squid (or was it octopusses?) have distributed intelligence across their tentacles. Now, squid and octopusses are seriously kewl beasts, but intelligent? That's really silly.
Here, I see intelligence as the ability to do some sort of conceptual reasoning tied to reality, and to modify that reasoning when new information comes in. I've owned cats (and now feed two neighborhood cats when they come around asking for treats), but it's real hard to figure out if they have a concept for "human". My SO is new to cats, so when Pipa first started coming around, I made sure that it was said SO, not me, that gave Pipa her favorite treat. And now Pipa ignores me and looks for said SO when she comes around. There's some glimmer of "intelligence" there, but is it more than Pavlovian/Skinerian processing. I don't know. (But she's a cute cat. (Yes, I know. All cats are cute.))
My point is that intelligence is hard. But to get back to the subject at hand, the whole point of the LLM idea was to generate decent-looking text _without doing the work of intelligence and reasoning and dealing with reality_. It's not just intellectually vacuous, it is, in basic undelying principle, by definition, anti-intellectual.
I've seen some examples of octopuses doing things that suggest they have some intelligence. When I was in high school and college, we had a dog. I think he was intelligent. The best example I had was that there was a strip of our yard that was between our fence and the street. I'd sometimes take him with me when I was doing some gardening there. When I started doing this, I explained/showed him that it was fine to walk on the grass, but he must not go into the street ("here good", step into street, "here bad"). I'd then watch him. He'd walk down that strip of grass for a hundred feet and walk right up to the edge, but never go in the street. Obviously, he was still a dog. But, I thought he was more intelligent than some people I've known.
Delete