Computational Complexity: AI and ...

Thursday, August 07, 2025

AI and ...

AI and Vacation

I'm back from my German vacation. This was my first AI vacation, by which I mean how I used AI to navigate a foreign country. Taking a picture of a hand-written menu board, not just to translate the dishes, but to get a description of each. Visiting a palace with a German-speaking tour guide and translating in real time. Even the more mundane taking pictures of buildings to learn about them.

On the TV I saw something about Künstliche Intelligenz, and Chatty told me it was German for Artificial Intelligence. The Germans use KI instead of AI partly because AI can be confused with Ei (egg). At least that's what AI tells me, so it must be true.

AI and Math

At the beginning of my vacation, Google announced that they achieved gold medal status in the International Mathematical Olympiad. An impressive achievement though Terry Tao makes a good point that comparing an AI system to a time-constrained high-school students is not an apples-to-apples comparison.

I already find talking to AI about math topics quite useful, though it's like talking to an early PhD student. Sometimes they just say things that aren't correct, but usually they are. The reasoning models are particularly good at finding holes in P v NP proofs. For example here's the conclusion of ChatGPT o3-pro's review of the paper from Eric's guest post.

The paper is a reminder that lower‑bound proofs live or die on the exact breadth of the algorithmic model they exclude—too narrow and the result is unsurprising, too broad and the proof tends to break. At present this work sits in the first category.

What I want to see is AI come up with a solution to an open math problem, a true new insight beyond just some optimized search. I'm not looking for P ≠ NP, just some result that would be publishable in a respectable conference or journal, even just a new completeness result. We haven't really seen that yet, but I suspect we will soon and then we can figure out where math goes from there.

AI and Bill

In his presidents question and solution, Bill states that AI had failed to find the right answer to his problem. Back in June, I saw Bill's draft post and tried AI to solve it.

AI initially failed the test but for a good reason. Bill's initial draft post had Ford and Dole in Group Two because they received LLBs instead of JDs. In the past the LLB was the professional law degree. Yale didn't change to JD until 1971. Ford got his LLB from Yale in 1941.

When I removed Ford and Dole, ChatGPT o3-pro correctly gave the reason for the partition, though it did take over 13 minutes.

Every name in Group One spent time in a law school—most completed a J.D. or LL.B., and the two exceptions (Al Gore and, in some accounts, Lloyd Bentsen) still enrolled in law studies.

examples – Walter Mondale (University of Minnesota Law) en.wikipedia.org,
JD Vance (Yale Law) en.wikipedia.org,
Al Gore (attended Vanderbilt Law) en.wikipedia.org.

No one in Group Two ever attended a law school; their highest formal education is in fields such as engineering (Jimmy Carter), economics (George H. W. Bush), business (Donald Trump), political science (Paul Ryan), or acting (Ronald Reagan) en.wikipedia.orgen.wikipedia.org.

So the distinguishing property is legal education: every Group One figure went to law school, while none in Group Two did.

Another commentor got a similar result for ChatGPT o4-mini-high. I just tried it on Gemini 2.5-Pro and it also gave the correct response, this time in seconds.

On the other hand, E tried several base models and none of them succeeded. The lesson: You want to solve a tricky problem, pony up the $20/month and use a reasoning model.

15 comments:

Anonymous4:52 PM, August 07, 2025
2.5 pro might have pulled the information from your website.
ReplyDelete
Replies
Bogdan Grechuk1:50 AM, August 08, 2025
"What I want to see is AI come up with a solution to an open math problem.... We haven't really seen that yet". How about the problem whether a unit sphere in R^11 can be touched by 593 other non-overlapping unit spheres? This was open until recently, the answer is Yes, the construction found by AlphaEvolve https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
ReplyDelete
Replies
David in Tokyo6:22 AM, August 08, 2025
It seems there are still a few bugs in the system. Well, more accurately, it seems there are some new bugs in the system.

Seriously hillarious Mastodon thread at:

https://mastodon.mit.edu/@kjhealy@mastodon.social/114990301788026630

(tldr; GPT5 gets something wrong that GPT4 seems to do OK on. As always, the GPT folks will gussy up the front end to make sure it doesn't do this any more, so by the time you read this, there'll be a hack in place to prevent it from happening.)

But it shows that LLMs not only can't multiply, they can't count, either.

(The above isn't a cheap shot: it's a technically accurate description of the underlying technology.)

The bottom line is that it seems seriously nuts to claim that a system that can neither count nor multiply reliably can "solve math problems".

Really. This whole thing is seriously nuts.
ReplyDelete
Replies
David in Tokyo8:10 AM, August 08, 2025
To pile on the ranting, here's a few lines lifted from my inbox (Technology Review, which I'm often irritated with (I prefer Science), but that's another rant):

(The last line is why it's important to keep ranting.)

>>>>>>
4 A man suffered psychosis after ChatGPT suggested he take sodium bromide
The 60-year old ended up with bromism. (Ars Technica)
+ He’d been taking it for three months before he went to the ER. (The Independent)
+ AI companies have stopped warning you that their chatbots aren’t doctors. (MIT Technology Review)
<<<<<<
ReplyDelete
Replies
space20011:00 PM, August 08, 2025
Double-standards I say! For many folks here clamouring for valid proofs is the only prize, but here we are imputing that a purely empirical approach is a "big step" towards the holy grail of "machine reasoning" whatever that means.

How to formulate a proof for a claim "X (say a cat) is intelligent"; of course, I don't know how to do that in general

A) All life (down to the microscopic levels including its cellular machinery) is intelligent and is realized without violating any laws of physics. Why so? Because it has to be, for ilife's survival in a physical world is absolutely predicated on its intelligence. I acknowledge this is a bit of circular reasoning but anyone can surely tell a living thing vis-a-vis a non-living thing even though they can't explain why. Intelligence is simply a "dual formulation" of life itself; can't have one without the other.

B) How to prove a machine is intelligent? Turning test? Math olympiad scores? automatic translation? Tons of people using this and singing it's praise? All of that is empirical and in no way goes towards constituting a "proof" of its intelligence.

Long story short, given how egregiously, infact sadistically, cruel humans are it's not even the case that greater intelligence correlates with greater survivability; from the fossil record its anti-correlated.
ReplyDelete
Replies
space200110:09 AM, August 10, 2025
Could perhaps ponder this question in the following way

Humans use "proof of work" (aka "no free lunch") as the underlying currency for their transactions; life has been transacting in this currency across organismal boundaries from the very beginning. A plant proves to an insect (or bird) that it's done work by making nectar (or fruits & seeds), an insect or bird proves it's done work by pollinating and spreading seeds for dispersal. A plant harbours bacteria and fungi in its root system and gets nitrogen fixation in return. We have the same relationship with our gut bacteria (an estimated 37% of human genes originated in bacteria, partrnerships have been going on for a very long time). Each organism transacts with others in its vicinity by exchanging proofs of work for something it needs in return; life cannot exist in isolation.

One facet of intelligence arises when the two trading partners optimize their ask to complete the transaction (what exactly do you need to fulfill my need?). Another aspect is that organisms actively recruit other organisms to trade with. Crucially, the interactions between the cells within a multi-cellular organism are optimized for cooperation rather than transaction (same could be said for bee, ant or termite colony) but externally they are still transactional. Tenacity, the will to stay alive at all costs, is another facet.

Squids are a bad model to call out as a negative example for intelligence.
As the apex predators of the deep sea only the sperm whales predate on the giant squids; squids do put up a good fight because whales carry scars from squid's beaks and suckers. Feynman gives a good description of how squids have a different optical arrangement than vertebrates (vol1 36-6 'other eyes', he calls them the "highest invertebrates" :-)) optimized for the near complete darkness of deep sea where the only faintly visible light is bioluminescence. Mommy squids end their life areating her eggs unable to feed and becoming her babies' first meal, awesome footage here https://petapixel.com/2024/01/16/rare-footage-shows-squid-hauling-thousands-of-eggs-across-ocean/

The "measure" of intelligence (a positive value for all life but falling in a very wide range NI), just one facet among all other physical characteristics, of an organism is very carefully calibrated for its physical environment to maximize its species survivability. My own brain lavishly consumes an enormous amount of energy (20% of my body's total energy consumption on average) because I'm cheating by harnessing fossilized energy not accessible to other organisms (my predecessors used firewood to cook to get a big leg-up over other organisms), an unaffordable budget for a lot of life out there living in natural, resource-constrained, environments. Does an AI's measure fall inside the range NI? I say unambiguously no because it has zilch sense of what the physical world is like.
ReplyDelete
Replies
Anonymous12:25 AM, August 24, 2025
This IMO result is essentially glorified brute-force for proof search.

It is nothing surprising, it is the same technique fundamentally that gave us AlphaGo: efficient brute-force search guided by a policy network that guesses what a human might do in each step, and then verifying it.

It is practically useful (assuming its computational cost goes down a lot) and it is an interesting engineering challenge, but it is not surprising at all, and does not show AI can do anything we didn't think it can do.

This human-like guess + brute-force technique is going to lead to many improvements in areas where there is a verifier.

But that needs the cost of the whole system to improve massively. To be closer to a human expert, the number of dead end branches needs to be much lower than what it is now. and the verification step, for proof checking, it is entirely linear time, but not cheap practically.

If these teams published how much resources (computation, guess, abandoned branches, ...) it has used, we would be in a much better position to assess how close we are to actual availability of these tools.

Google Deep Mind is famous for over-hyping their results and never actually making them available because on practice they are too expensive to run. You can hire a human expert to do the same thing for a very small fraction of what these systems use to do the same job.
ReplyDelete
Replies

Add comment