Sunday, December 14, 2025

Weird Al vs Weird AI

 ONE

The following headline confused me:
 

                  Trump, 79, Deletes Weird AI Video Shilling Magic Beds (see here). 

Was Weird Al selling magic beds? Magic beds?! How does that relate to President Trump? What’s going on?

The problem is the font: a capital I (as in AI) can look like a lowercase l (as in Al).
So the headline should really be:

Trump, 79, Deletes Weird Artificial Intelligence Video Shilling Magic Beds.

This case is particularly confusing because:

a) Weird AI videos clearly means Videos that Weird Al has that go with his songs. 


While we’re on this topic, here are some Weird Al videos related to computer science or math:

Don't Download This Song 

 It's all about the Pentiums

 Polka Patterns 

The first two  songs are dated but I still like them. They also show that Weird Al has had (and still has) a long career. 

b) The story, even when understood completely, is really weird. 

TWO

I pointed out the Weird AI vs Weird Al issue to a fellow Weird Al fan, and he emailed me the following links:

McDonald's Making job Applicants Take Weird AI Personality Tests

MAGA Rep Weights in on Sydney Seeney Jeans Debate with Weird AI Ad

A Weird AI Camera With a Human Name is Coming for Your Cell Phone

Leith Ross Denounces Weird AI Songs Uploaded to Their Spotify Profile

Why We Don't Believe MIT Nanda's Weird Al Study

Anthropic Researchers Discover the weird AI Proble: Why thinking longer makes models dumber

Samsung phones are getting a weird AI shopping platform nobody asked for

THREE

0) The one with Weird AI Songs is the most confusing since that clearly means songs by Weird Al. 

1) Am I the first person to notice this? Of course not-- see here

2) Has someone used this confusion? Of course--see here

3) Will this confusion help or hurt Weird AL's career? Time will teLL.


Thursday, December 11, 2025

Learning the Mathematical Process

Watching Mathematicians at Work (AI generated)

The Smithsonian Natural History Museum has a FossiLab where visitors can peek through windows watching scientists prepare fossils for conservation. Maybe we should have a similar exhibit at math museums or universities. How else can we learn what mathematicians do? 

In 2025, artificial intelligence has achieved gold medal status at the International Mathematical Olympiad but so far has only contributed modestly in finding new theorems. Of course, finding and proving new theorems requires a different set of skills than competition problems but it goes further than that.

The Internet has considerable text and video on how to solve math competition problems that machine learning systems can train on. On the other hand, mathematical research papers usually have little more than theorems and proofs. Maybe some intuition. Rarely do papers go into the thinking process and the false steps that one takes until one finds the proof. For some problems I've spent weeks proving a theorem but only the last day's work gets written up.

Now I doubt many mathematicians would give up their privacy and time to train AI systems to take over their jobs, but just suppose we wanted to do so. We could equip every mathematician with a camera recording every mathematical conversation and everything they write, especially the ideas that don't pan out. We can transcribe it all and feed it into an ML system. But it probably won't be enough.

Trouble is most mathematical breakthroughs just happen inside of people's heads. If you ask a mathematician how they came up with the clever idea that led to a major new result, they can rarely truly explain the process behind it. Not unlike neural nets. 

If machines can't learn to prove theorems by watching mathematicians, perhaps the route mathematicians take: A grad school slog towards PhD research and learning from endless failure. 

Sunday, December 07, 2025

Tom Stoppard 1937-2025

 
The playwright Tom Stoppard passed away at the age of 88 on Nov. 29, 2025.

ONE) He wrote many plays and some movies.  Below I highlight his works whose themes I think will be of interest to my readers (Or at least to me—your mileage may vary.)

1) Rosencrantz and Guildenstern are Dead (1966)

This is Hamlet told from the point of view of two minor characters who, in Shakespeare’s original, can best be described as plot furniture.

The play begins with, R and G are flipping coins. R bets heads ninety-two times in a row and wins each one. The play explores determinism and free will, as well as the mathematical question: At what point should you stop flipping coins and go buy a lottery ticket?

There is also a movie for this one. I think this is better as a play. 

2) Jumpers (1972)

A play about academic philosophers—so naturally it includes gymnastics, both literal and intellectual. There’s also a murder mystery, discussions of God, and enough moral philosophy to power an entire semester of office-hour arguments.

3) Travesties (1974) 

This play Imagines that Vladmir Lenin, James Joyce, and Tristan Tzara (a Dadaist poet, see here for the Dada Art Movement) met in Zurich in 1917. They actually were in Zurich in 1917, but the events are narrated by an unreliable octogenarian, so accuracy is...doubtful.

Literature, politics, and art are explored, often simultaneously.


4) Arcadia (1993)

The Stoppard play with the most math—a sentence that delights mathematicians and terrifies theater majors.

It takes place in two time periods: 1809 and 1993.

In 1809, a 13-year-old girl named Thomasina Coverly is (i) trying to prove Fermat’s Last Theorem and (ii) inventing chaos theory by stirring pudding. (This is the only known instance of dessert contributing to mathematics other than pie and muffins.)

In 1993, a historian is working on a theory about Lord Byron, which is only slightly less complicated than Fermat's Last Theorem.

Themes: math, determinism, academia, and the strong correlation between intellectual brilliance and household messes.

Note that this was written a year before FLT was proved. If it had come out a year after FLT was proved this would not change anything since (i) Thomasina Coverly is working in 1809, and (ii) FLT was still a well-known problem when the play came out. If the play had come out in 2025, then this might be a problem since FLT is not nearly as well-known as it was in 1993. 

Some say that FLT being proved was bad for math since it was 

(a) understandable to the non mathematician, 

(b) had prize money attached to it, 

(c) had the wonderful margin story, and 

(d) was open for many years.

(e) there are many poems about it, see here, which is a consequence of a,b,c,d. They were written without ChatGPT.   This criteria is no longer important since ChatGPT allows you to write poems about any math problem you want. I blogged on that here.

 I don't think anything has come close to replacing FLT.

 P vs NP: (a)-it's hard to get across to non-math people what a problem is , (b)-I think it's well known but perhaps I wouldn't know since the non-math people I hang out with know about it from me, (c)-no, (d)-no (hmm- is 50+years a long time?) (e)

Goldbach's conjecture has (a) and (d). As for (b): at one time there was a million dollar prize for solving it, see here, as a way to promote the book Uncle Petros and Goldbach's Conjecture but I think the prize expired. The link to find out the contest rules just points to that book companies website. In any case, this prize was not that well known.

While I would not expect a problem to have (c), does any open problem have (a), (b), some story like (c), and (d)? I doubt it.

5) Enigma (2001)

A fictional film about cracking the Enigma code.

Despite expectations, Alan Turing does not appear, nor is he even mentioned. This confused me, since Andrew Hodges's 1983 biography is titled Alan Turing: The Enigma (See herewhich was the inspiration for the movie The Imitation Game.

Note that Enigma-the-movie has zero real people, but Travesties-the-play has three real people.


TWO) The Tom Stoppard Prize

The Tom Stoppard Prize was established in 1983 and first awarded in 1984. It is given annually for:
 

outstanding primarily non-fiction work by a writer of Czech origin.

This raises a question: Which is the greater honor—winning an award, or having one named after you while you’re still alive? The answer probably depends on both the award you receive and the award named after you.

In computer science, the only award I know named after a living person is the Knuth Prize. If there are others, leave a comment.

If you ever get this trivia question: 

What do Tom Stoppard and Donald Knuth have in common?

you now know the answer: They were both famous enough to be turned into prizes while they could still appreciate it.

Thursday, December 04, 2025

Finding Papers Before the Web

Inspired by Daniel Litt's X Post

and Bill's recent post on finding papers on the web I would tell the story of the before times.

In the 1980s if you wanted to read a paper, you either had to find it in a journal or conference proceedings or have it mailed to you. You could reach out to an author or SIGACT News would publish a list of tech reports from various universities. Departments would keep a master copy of each paper. You would send a stamped self-addressed envelope to the department which would copy the paper, put on a tech-report cover and send it back to you.

If you had a particularly exciting result, you would share it by physically mailing it out to your colleagues. I found out about the latest circuit results from Håstad and Razborov, as they sent papers to my advisor Michael Sipser, often hand-written and in Razborov's case in Russian. Neil Immerman sent a copy of his nondeterministic space closed under complement paper to Sipser but he was away for the summer. I found out about the result from a Berkeley talk announcement

Email wasn't a common method of communication until the mid-80's and it wasn't until a few years after that that people figured out how to send papers by putting the latex or postscript text directly in the email. This was before attachments and PDFs. Old mail systems put a ">" before From so it wouldn't be confused as a header and LaTeX rendered ">From" as "¿From" which you'd often see in conference papers from around that time.

In my first year as an assistant professor in 1989-90, there was a flurry of emailed papers marking (and causing) the quick progress we had in interactive proofs, described so well by László Babai's E-mail and the Unexpected Power of Interaction. Babai had a warning about researchers disadvantaged because they weren't receiving these emails.

I got tired of emailing papers so as soon as the web became a thing in 1993, I put all my papers online and have maintained it since. Now with sites like arXiv and ECCC, everyone has access to the latest and greatest in complexity.

Now how long before the next generation asks how we discovered papers before we had chatbots to find them for us?

Sunday, November 30, 2025

Does ChatGPT really help programmers?

 
BILL: I honestly do not know whether ChatGPT will make programmers more productive. (I am not touching question of whether it puts programmers out of work. That's a problem for Future Bill.) Who can I ask? I found two people who disagree on the issue:

Alice who supports developers in industry. She doesn't write code full time now, but she has written plenty before. She thinks that NO, ChatGPT and LLMs DO NOT help programmers.

Bob is a  friend of a friend who writes code for a living and owns a small software/database company. He thinks YES, ChatGPT and LLMs DO help programmers.

I asked Alice and Bob to discuss the issue over email and make a blog post out of it, or have ChatGPT do it for them. Below is the result.

---------------------------------------------

ALICE: Recently I needed to add CSV import to my apps. The columns in the file needed to be optional and allowed in any order. I  could have figured it out myself but I thought Why not let an LLM do the boring part?

The first two answers it gave me ignored my requirement on the columns. If a human had done this, I would have asked them whether they had read what I wrote. On the third try, it produced something
reasonable. But in the end I went with a different approach anyway.

So yes, the LLM gave me code---just not the code I wanted. Sort of like ordering a salad and getting a smoothie: technically vegetable, but not what I asked for.

BOB: ALICE; you’re basically proving my point for me. You gave the model a fuzzy spec, it gave you fuzzy code. That’s not an indictment of LLMs — that’s just computers doing what computers have done for 70 years.

When I give an LLM a tight spec, it behaves. When I’m vague, it hallucinates. Honestly, that’s not that different from hiring a junior dev, except that the LLM doesn't steal your sparkling water out of the refrigerator or ask what CRUD stands for.  (Create-Read-Update-Delete)

BILL: I recommend keeping a small refrigerator in your office rather than use a communal one.

BOB: Good idea! Anyway, this (writing code, not giving advice on cutting down office theft) is exactly why LLMs save me time: I can hand them the dull bits: scaffolding, boilerplate, adapters — all the stuff I can write but would rather not spend a day on when a decent model will spit out 80% of it in 20 seconds.


ALICE: We tried having an LLM add some error checking to one of my apps. I wanted to put the checks in a plausible place, but it wasn't the best place because it didn't understand that there was another method that already did some of the error checking. It was able to modify a method in a way that was similar to another method it was told to use as a sample. But, I didn't find this too useful. Trying to get it to make such small changes to the code just interrupted my train of thought and required me to spend effort carefully reviewing what it wanted to do.

BOB: I get the train of thought thing — but that’s because you’re trying to use the model as a scalpel. I use it as a bulldozer.

I don’t ask it to make tiny edits inside an existing method; I ask it to refactor an entire 20-year-old unit. That is where it shines.

Recently I refactored code I hadn’t touched since the middle ages. Without an LLM? A week of muttering my ancient comments I’m embarrassed to admit I wrote. With an LLM? Half a day.

It identified the structure, spotted duplications, clarified naming, and turned archaeological dig into Tuesday morning. That’s the sort of productivity gain you don’t shrug off.

ALICE: We had an LLM review one of my apps. Some of what it wrote was correct, but some were just nonsense. It was just writing plausible sounding things, picking bits of my code to justify them, but didn't  understand the code. Its suggestions were vague (e.g., add unit tests), which is the code equivalent of your dentist saying floss more.

BOB: If you ask an LLM to review my app, you’ll get vague motherhood statements. If you ask a human the same Focus only on concurrency issues in this module, or Explain where this user’s error could be triggered based on these logs, it becomes very sharp.


This is, frankly, where it’s scarily good: tracking down where a user-reported issue is probably happening. Humans tell me symptoms; the LLM translates that into “It’s almost certainly in this 40-line cluster over here”, and it’s right far more often than chance. That alone pays for the subscription.

BILL: I've read through your 194 email saga, I still don't know what the upshot is. So, like a court case, each of you give your closing statements.

BOB: LLMs are not magic interns who understand your architecture. They're incredibly powerful amplifiers when you use them properly. For me, the time savings are real — especially in refactoring legacy code, debugging from cryptic user reports, and generating UI scaffolding that would otherwise be tedious.

Do I trust them to invent a brand-new algorithm whole cloth? Of course not — I write NP-Complete solvers for fun. But for 70% of my day-to-day coding, they turn chores into quick wins. And that makes me a more productive programmer, not a lazier one. 

ALICE: LLMs may be helpful for writing small, self-contained chunks of code or for suggesting approaches. But, they do  not understand code. Most of what programmers do requires understanding the code. So, LLMs will be of little, if any benefit, to most competent programmers. Coaxing LLMs to produced something useful takes more time than people realize.

Or as I like to put it, they're great at writing code-shaped objects, not actual code.

BILL: Thank you. I am enlightened. And don't forget to floss before brushing.




Monday, November 24, 2025

The Little Theorems

Last week we had a talk by Purdue philosophy professor Eamon Duede Tail Novelty, Knowledge Collapse, and Useful Frictions in Science. In part of the talk he argued that if AI makes writing technical papers easier, researchers will write up small results that they wouldn't have bothered with before. He thought having a swarm of minor result papers was a bad thing but I'm less sure. Let's get the little results out there. We'll sort them out later, probably with the same AI that helped write them in the first place.

In my career I've had several little theorems in complexity. They should never get into any respectable conference, so what do you do with them? Sometimes I stick them in a sort-of-related paper, in a blog post, or let someone else mention the theorem. Too often the results just don't get written. Did you know that finding an \(S_2^p\) witness is equivalent to TFNPNP? I never found a good place to put it.

Fermat himself never published his "little theorem" from last week's post. He put it in a letter to his friend and fellow mathematician Bernard Frénicle de Bessy. Euler would first publish the proof nearly a century later.

The journal Information Processing Letters used to play the role of publishing moderately interesting research but like most Elsevier journals has long lost its cachet. You can stick little theorems on arXiv, though that still requires writing them up. Using AI to mostly write up results is still frowned upon, but maybe we should make exceptions for papers that wouldn't normally get written. 

Once I used the fact that PPP = P#P and a referee asked for a reference. The proof isn't hard, just a simple binary search, but I couldn't find anyone who first proved it. Not by Valiant who introduced #P or by Gill who defined PP. Eventually I cited a paper by Toda, who had mentioned the result but didn't claim it. Perhaps whomever proved it first never thought to write it up assuming that it was already well known. 

Thursday, November 20, 2025

Factoring Carmichael Numbers

Carmichael Numbers are the bane of probabilistic primality algorithms. You have to go through extra steps just to handle these relatively rare numbers. But did you know that the Miller-Rabin algorithm not only determines the compositeness of Carmichael numbers but can actually find non-trivial factors? Apparently none of the AI models I tried did either. Feel free, Google, OpenAI and Anthropic, to train on this blog post.

Let's start with Fermat's little theorem, not as big as his "last" one but one where we know he had a proof. It states that for any prime \(p\),  \(a^p \equiv a\bmod p\) for all integers a. That suggests a primality test, if you can find an \(a\) such that \(a^m \not\equiv a\bmod m\) then you know \(m\) is not a prime. For example, since \(29^{143}\equiv 35\bmod 143\) then \(143\) is not prime. 

The set of \(a\) such that \(a^m \equiv a\bmod m\) forms a subgroup of \(Z_m^*\) (the numbers \(b\) less than \(m\) with gcd\((b,m)=1)\), if there are any integers \(a\) that fail the test then at least half of  the \(a\) will. So you could just choose a random \(a\) and check if \(a^m \equiv a\bmod m\). That would work for all the composites where there is an \(a\) such that \(a^m \not\equiv a\bmod m\). Alas there is a relatively rare set of composites, known as Carmichael numbers, for which  \(a^m \equiv a\bmod m\) for all \(a\) co-prime to \(m\). The first few Carmichael numbers are \(561\), \(1105\), \(1729\) and \(2465\). For example \(29^{561}\equiv 29\bmod 561\).

So probabilistic primality tests need to account for these Carmichael numbers. Let's look at the Miller-Rabin test

  • If \(m\) is even, output "prime" if \(m=2\), composite otherwise.
  • Pick \(s\) and odd \(d\) such that \(m-1=2^sd\).
  • Pick random \(a\) between 2 and \(m-2\). If gcd\((a,m) \neq 1\) return composite.
  • Let \(x = a^d\bmod m\)
  • Repeat s times
    • Let \(y=x^2\bmod m\)
    • If \(y=1\) and \(x\not\in\{1,m-1\}\) then composite.
    • Let \(x=y\)
  • If \(y=1\) output "probably prime" otherwise "composite"
Here's the kicker. If the Miller-Rabin test says "composite" before the last step, you'll get a factor. Note that \(y-1=x^2-1=(x-1)(x+1)\equiv 0\bmod m\). But since neither \(x-1\) or \(x+1\) is a multiple of \(m\), the greatest common divisor of \(x-1\) and \(m\), as well as the gcd of \(x+1\) and \(m\) will give you a non-trivial factor of \(m\).

For example if we start with \(m=561\) and \(a=29\), we have \(d=35\) and \(s=4\), \(29^{35}\bmod 561=164\) and the \(y\)'s will be \(592,463,67,1\). For \(x=67\), the gcd of \(x-1=66\) and \(561\) is \(33\) and the gcd of \(x+1=68\) and \(561\) is \(17\). \(561=33\times 17.\)

Finally Carmichael numbers will never say "composite" in the last step, so the Miller-Rabin algorithm will always find a factor when it says "composite".

In 2002, Agrawal, Kayal and Saxena gave the first provably polynomial-time algorithm for primality. As far as I know AKS doesn't factor Carmichael numbers and there isn't any known deterministic polynomial-time algorithm finding non-trivial factors of Carmichaels, though if the generalized Riemann hypothesis is true, you need only try the first \(O(\log^2 m)\) \(a\)'s in the Miller-Rabin test

Sunday, November 16, 2025

Test of Time Awards: A Good Idea but ....

Since there is now a CCC Test-of-Time Award, see here,  (CCC stands for Computational Complexity Conference), I decided to look at other Test-of-Time awards in computer science.

Below is a list of various computer science Test-of-Time awards, along with their
eligibility requirements.

1) IEEE Visualization (VIS) Test of Time Award, see here.   VIS=Visualization.  Their 2023 award page says:

Papers are selected for each of the three historic conferences (VAST, InfoVis, and SciVis). ... 
This year VAST gave out a 10-year test of time award, InfoVis a 10- and 20-year award, and SciVis a 13, 14 and 25 year award.

2) SC Test of Time Award, see here. SC=Supercomputing Conference.  Their site says:

Papers appearing in the SC Program 10 to 25 years prior are eligible.

3) CIKM Test of Time Award, see here.   CIKM=Conf. on Information and Knowledge Management.  Their page states:

The Test of Time Award is given for CIKM papers published at least ten years ago.

3) ACM SIGCSE Test of Time Award, see here. SIGCSE = Special Interest Group on Computer Science Education.  ACM=Association for Computing Machinery. The name is from the era when computers looked like angry filing cabinets.  The ACM SIGCSE Test-of-Time page states:

The award recognizes an outstanding paper published in the SIGCSE community. 

 Note that this is broader than just their conference. Good!

4) LICS Test of Time Award, see here. LICS=Logic in Computer Science.  Their award page says:

The LICS Test-of-Time Award recognizes a small number of papers from the LICS proceedings from 20 years prior.

5) STOC Test of Time Award, see  here. STOC = Symposium on Theory of Computing.  Their page states:

There are three awards---papers presented at the STOC conferences in 10, 20, or 30 years ago [they later say there is some flexibility on that]. 

6) FOCS Test of Time Award, see  here. FOCS stands for Foundations of Computer Science.  Their page states:

The awards recognize papers published in the Proceedings of the Annual IEEE Symposium on FOCS. [From elsewhere on the page they have three categories: papers 10-years ago, 20-years-ago, and 30-years ago, but they have some flexibility with that.]

7) SIGecom Test of Time Award, see here. SIGecom = Special Interest Group---Economics and Computation.  Their page states:

...must have been first published in an archival journal or conference proceedings 10–25 years before. At least one author must not be dead. [A surprisingly nontrivial constraint in a Test-of-Time award.]

I think this is the only award on this page that stated the not-dead criteria explicitly.  Are people in Economics and Computation publishing papers into their 90's? While in hospice care?

8) NeurIPS Test of Time Award, see here.  NeurIPS = Neural Information Processing Systems.  I couldn’t find an eligibility description, so I asked ChatGPT, which confidently stated:

Yes—the paper must have been published in NeurIPS at least 10 years ago. 

If this is wrong, please let me know. ChatGPT is sometimes confident the way undergraduates are confident before seeing the homework solution.

9) CCC Test of Time Award, see here. CCC=Computational Complexity Conference. Their page states:

Eligible papers are those that appeared in CCC (or Structures, pre-1996) at least 10 years ago. 

------------------------------------------------------------------------
These awards raise some questions:

1) Why tie eligibility to a conference?  Suppose Abisola proves P ≠ NP and publishes it in Annals of Mathematics.  Under many of these rules, she would be ineligible for most theory Test-of-Time awards.

CON: Important results may not appear in the “right” conference or any conference at all.

PRO: None. Absolutely none.  Why is it done?  To publicize the conference and provide a nice ceremony venue.  These are not strong intellectual reasons. (I had a nastier comment about this but ChatGPT convinced me to be more polite.)

My proofreader brought up the following:

a) Who will give out the award? Answer: An organization like the ACM, and it DOES NOT need to go to an ACM conference.

b) How to limit what papers are eligible? Any paper that was in a conference or journal (though see item 3 below). That seems like a rather large set of papers to consider; however, after 10 years we DO know which papers stood the test of time. As an example- when Lance made his best theorems of the decade lists he did not pay attention to which venue the result appeared in.

2) Why the 10–25 years ago window?  I understand waiting 10 years—time needs time.  But why exclude older papers?  What if the paper Some Connections Between Bounded Query Classes and Nonuniform Complexity (STRUCTURES/CCC 1990) suddenly becomes important? Too bad: it aged out, like a carton of milk.

Are there examples of papers that became important many years after they were published?

I have one sort-of example: Google AI Overview says

The permanent of a matrix was first introduced independently by Cauchy in 1812 and later independently rediscovered by Binet in 1812. [The identical years makes me suspect, and also makes the notion of ``later'' rather odd- Did Cauchy do it in February and Binet in June?]

The permanent became much more important after Valiant, in 1979, showed that it was Sharp-P-Complete. So perhaps Cauchy's paper should get a test-of-time award.

Fun Fact: The Wikipedia entry on The Permanent (see here) does not mention Valiant, though there is a separate Wikipedia entry on Computing the Permanent (see here) that does.

3) Why require publication in a journal or conference at all?  Suppose Perelman proves P ≠ NP, posts it on arXiv, and never submits it anywhere.  Perelman did just that with his proof of the Poincare Conjecture, and that was good enough for a Fields Medal.

This touches on the much broader---and increasingly relevant—question: What is the future role of journals and conferences in the age of arXiv?

4) (Bonus question): Is there any real difference between the STOC conference and the FOCS conference in terms of the scope of the papers?  Was there ever?  I would guess no and no. Maybe some of my older readers can tell me, unless they are too busy writing papers in Economics and Computation. 

5) Another proofreader pointed out that it would be a good idea to live up to the title of this post, Test of Time Awards: A Good Idea but, and say why they are a good idea instead of bitching and moaning about eligibility. Good Idea! Some thoughts:

a) They might encourage researchrs to aim for deep contributions, not just fashionable ones. I doubt this is true since I doubt authors think about which award they might win.

b)  They reward long-time impact over short-term excitement. Is it much of a reward? How do they benefit the recipient? I ask non-rhetorically. 

c) They are an objective record of subjective opinion.  Useful for historians!