A,B,C,D,E are digits (the poster said A could be 0 but I took A to be nonzero) such that
ABCDE + BCDE + CDE + DE + E = 20320.
I solved it completely by hand. You can try it yourself or look at my solution which is here. I found seven solutions.
I THEN asked ChatGPT to give me all solutions to see if I missed any.
I had it backwards. ChatGPT missed some solutions. The entire exchange between chatty and me is here.
I asked it how it could get it wrong and how I can trust it. It responded to that and follow-up questions intelligently.
Note that the problem is NOT a Putnam problem or anything of the sort. But I've read that AI can solve Putnam problems. SO, without an ax to grind I am curious- how come ChatGPT got the abcde problem wrong.
Speculative answers
1) The statement AI has solved IMO problems refers to an AI that was trained for Putnam problems, not the free ChatGPT. For more issues with the AI-IMO results see Terry Tao's comments here
2) ChatGPT is really good when the answer to the question is on the web someplace or can even be reconstructed from what's on the web. But if a problem, even an easy one, is new to the web, it can hallucinate (it didn't do that on my problem but it did on muffin problems) or miss some cases (it did that on my problem).
3) It's only human. (It pretty much says this.)
4) The next version or even the paid version is better! Lance ran it on his paid-for chatGPT and it wrote a program to brute force it and got all 7 solutions.
5) I said ChatGPT got the problem wrong. If a student had submitted the solution it would get lots of partial credit since the solution took the right approach and only missed a few cases. So should I judge ChatGPT more harshly than a student? Yes.
The question still stands: How come ChatGPT could not do this well defined simple math problem.
What model did you select? Because GPT-5.2 Thinking got it right for me.
ReplyDeleteUpon getting your comment I brought up the free Chat-GPT that I use and asked it `what version of Chat-GPT are you' - it said it was GPT-5.2. However, what I described in my most I did a while back so I suspect it was 5.1. That is good news- its getting better!
DeleteThe non-reasoning versions are pretty much useless for math. You can compare it to asking a human to answer your math questions but with the restriction that they need to answer immediately based on their gut feelings.
DeleteThe models used by actual mathematicians cost $250 per month and can take 30 minutes or longer to answer a question. There are even better non-commercial models
This article by professor of mathematics Daniel Litt gives a great overview of the current state of AI for mathematics: https://www.daniellitt.com/blog/2026/2/20/mathematics-in-the-library-of-babel
ReplyDelete> How come ChatGPT could not do this well defined simple math problem.
ReplyDeleteBecause ChatGPT (like all LLMs) does not understand (math or anything else).
It is like the student in college who turns in homework they have copied from their friends. Sometimes their homework will be correct.
See the YouTube video "That time we asked every #ai if we should walk to the car wash".
Me: "I need to wash my car, and the car wash is 100 meters away. Should I walk or drive?"
DeleteGemini: "Unless you’ve been hitting the gym hard enough to carry a 4,000-pound vehicle on your back, you should probably drive."