BILL: Uh- I was only kidding.
Computational Complexity and other fun stuff in math and computer science from Lance Fortnow and Bill Gasarch
Sunday, August 10, 2025
My Tom L post inspired a mathematical definition of Rabbithole
BILL: Uh- I was only kidding.
Thursday, August 07, 2025
AI and ...
AI and Vacation
I'm back from my German vacation. This was my first AI vacation, by which I mean how I used AI to navigate a foreign country. Taking a picture of a hand-written menu board, not just to translate the dishes, but to get a description of each. Visiting a palace with a German-speaking tour guide and translating in real time. Even the more mundane taking pictures of buildings to learn about them.
On the TV I saw something about Künstliche Intelligenz, and Chatty told me it was German for Artificial Intelligence. The Germans use KI instead of AI partly because AI can be confused with Ei (egg). At least that's what AI tells me, so it must be true.
AI and Math
At the beginning of my vacation, Google announced that they achieved gold medal status in the International Mathematical Olympiad. An impressive achievement though Terry Tao makes a good point that comparing an AI system to a time-constrained high-school students is not an apples-to-apples comparison.
I already find talking to AI about math topics quite useful, though it's like talking to an early PhD student. Sometimes they just say things that aren't correct, but usually they are. The reasoning models are particularly good at finding holes in P v NP proofs. For example here's the conclusion of ChatGPT o3-pro's review of the paper from Eric's guest post.
The paper is a reminder that lower‑bound proofs live or die on the exact breadth of the algorithmic model they exclude—too narrow and the result is unsurprising, too broad and the proof tends to break. At present this work sits in the first category.
What I want to see is AI come up with a solution to an open math problem, a true new insight beyond just some optimized search. I'm not looking for P ≠ NP, just some result that would be publishable in a respectable conference or journal, even just a new completeness result. We haven't really seen that yet, but I suspect we will soon and then we can figure out where math goes from there.
AI and Bill
In his presidents question and solution, Bill states that AI had failed to find the right answer to his problem. Back in June, I saw Bill's draft post and tried AI to solve it.
AI initially failed the test but for a good reason. Bill's initial draft post had Ford and Dole in Group Two because they received LLBs instead of JDs. In the past the LLB was the professional law degree. Yale didn't change to JD until 1971. Ford got his LLB from Yale in 1941.
When I removed Ford and Dole, ChatGPT o3-pro correctly gave the reason for the partition, though it did take over 13 minutes.
Every name in Group One spent time in a law school—most completed a J.D. or LL.B., and the two exceptions (Al Gore and, in some accounts, Lloyd Bentsen) still enrolled in law studies.
examples – Walter Mondale (University of Minnesota Law) en.wikipedia.org,
JD Vance (Yale Law) en.wikipedia.org,
Al Gore (attended Vanderbilt Law) en.wikipedia.org.No one in Group Two ever attended a law school; their highest formal education is in fields such as engineering (Jimmy Carter), economics (George H. W. Bush), business (Donald Trump), political science (Paul Ryan), or acting (Ronald Reagan) en.wikipedia.orgen.wikipedia.org.
So the distinguishing property is legal education: every Group One figure went to law school, while none in Group Two did.
Another commentor got a similar result for ChatGPT o4-mini-high. I just tried it on Gemini 2.5-Pro and it also gave the correct response, this time in seconds.
On the other hand, E tried several base models and none of them succeeded. The lesson: You want to solve a tricky problem, pony up the $20/month and use a reasoning model.
Monday, August 04, 2025
Some thoughts on journals, refereeing, and the P vs NP problem
A guest post by Eric Allender prompted by an (incorrect) P ≠ NP proof recently published in Springer Nature's Frontiers of Computer Science.
For a time, I served as Editor-in-Chief of ACM Transactions on Computation Theory, and in this role I had to deal regularly with submissions that claimed to resolve the P vs NP problem. Finding referees for these papers was sometimes challenging, so I frequently ended up reviewing them myself. Dealing with such submissions involves enough overhead that ToCT, J.ACM and ACM Transactions on Algorithms limit the frequency with which authors can submit work of this sort. But for every submission, on any topic, the overall process was the same: Find someone on the Editorial Board who has sufficient expertise to find referees and make knowledgeable use of their recommendations. If there was no such person on the Editorial Board, then it was always the case that the submission was out of scope for the journal.
These thoughts are brought to mind by a recent case, where it seems to me that the editorial process broke down.
Springer publishes several high-quality academic journals that deal with Theoretical Computer Science. One Springer journal, Frontiers of Computer Science, recently published an article entitled SAT Requires Exhaustive Search, where one of the authors is a Deputy Editor-in-Chief of the journal. The abstract of the article states that it proves a result that is stronger than P not equal to NP. The Editorial Board of the journal has some members who are expert in computational complexity theory. However, all the ones whom I know personally have asserted that they had no knowledge of this paper, and that they were not involved at all in handling the paper.
When Ryan Williams and I learned about the publication of this article, we drafted a comment, which we sent to the Editor-in-Chief. We recommended that the paper be retracted, in which case there would be no need to publish our comment. However, the Editor-in-Chief declined to retract the article, saying that he could find no evidence of misconduct, and thus we have been assured that an edited version of our comment will appear in the journal.
Our comment calls attention to some shortcomings in the proof of the main theorem (similar to shortcomings in several other failed attempts to separate P from NP). But we were able to say more. Often, when one is looking for bugs in a purported proof, one has to deal with the situation where the claimed theorem is probably true, and the only problem is that the proof is not convincing. However, the main theorem in their paper (Theorem 3.2) states that a particular constraint satisfaction problem requires time more than \(d^{cn}\) for any constant \(c<1\) (where \(d\) is the domain size, and \(n\) is the number of variables). In particular, their purported proof claims that this holds even when \(k=2\) (meaning that each constraint has at most 2 variables). However, Ryan Williams presented an algorithm more than two decades ago that runs in time \(O(d^{(0.8)n})\) in this special case, contradicting the lower bound claimed in the article.
The article contains an appendix, with glowing testimonies from various researchers; the lead-in to the appendix also contains enthusiastic comments from Gregory Chaitin. I contacted Gregory Chaitin, and he asserts that he did not read the paper, and that he was quoted out of context.
The edited version of the comment that Ryan Williams and I wrote, which will supposedly appear in Frontiers of Computer Science soon, differs from the version linked here (our original submission) primarily in one respect: Our closing paragraph was removed. Here is that paragraph:
Finally, it is our opinion that the publication of this article is a complete embarrassment to this journal and its publisher. We believe that, at the very least, the paper should be withdrawn, and Springer should conduct an investigation to understand how such a paper could have made it through the peer review process.
Update (Lance, 8/5/25): The authors of the paper in question have asked me to link to their reply to the Allender-Williams comment. I do so with no endorsement.
Eric Allender asked me to note that their claim about Ryan’s algorithm requiring exponential space is misleading; the amount of space is not more than the runtime of the algorithm, which is less than the lower bound that they claim. (Their theorem does not state that \(d^{cn}\) time is required under the additional assumption that the algorithm use only \(n^{O(1)}\) space.)