Google Analytics and Mathjax

Saturday, April 01, 2023

Who's on April First


Carlos May waving to the crowd on April 1, 1972

Instead of the usual April Fools’ Day post, I present one of the best April Fools Day stunts ever. Here’s the text from an old Parade Magazine clipping I dug up recently that was published on April 1, 1985.

When it comes to innovative and wacky ideas in baseball, Bill Veeck was a true legend. As a team owner and promoter, Veeck was known for his creative approach to the sport, from planting ivy on the walls at Wrigley Field to his famous "exploding scoreboard" at Comiskey Park. But did you know about the time Veeck pulled off an unforgettable April Fools' stunt by having the 1972 Chicago White Sox wear the names from the classic "Who's on First?" sketch?

It was April 1, 1972, and the Chicago White Sox were getting ready to play a game that would go down in history. Bill Veeck had decided to pay homage to the iconic comedy routine by Bud Abbott and Lou Costello, considered by many the greatest comedy sketch ever performed. For those unfamiliar with the sketch, it revolves around a series of misunderstandings based on the names of the players on a fictional baseball team. The names sound like common phrases, leading to a hilariously confusing conversation.

In Veeck's version of the stunt, the White Sox players would take the field with the names of the "Who's on First?" team on the back of their jerseys. The players, initially skeptical of the idea, eventually embraced the spirit of April Fools' Day and played along.

As the game commenced, fans were treated to a scene straight out of the Abbott and Costello routine. Instead of their usual names, the players' jerseys featured names like "Who," "What," "I Don't Know," "Why," "Because," "Tomorrow," and "Today." Here was the starting lineup:

  1. Who - First Base: Dick Allen
  2. What - Second Base: Mike Andrews
  3. I Don't Know - Third Base: Bill Melton
  4. Why - Left Field: Carlos May
  5. Because - Center Field: Ken Berry
  6. Abbott - Right Field: Jay Johnstone
  7. I Don't Care - Shortstop: Luis Aparicio
  8. Today - Catcher: Ed Herrmann
  9. Tomorrow - Pitcher: Wilbur Wood
The right fielder is never named in the sketch. Pat Kelly pinch hit for Johnstone in the 6th wearing “Costello”. 

The confusion was not only limited to the fans in the stadium. The opposing team and the umpires struggled to keep track of the game, often leading to comical misunderstandings on the field. For instance, the umpire might have shouted, "Who's out!" only to be met with the response, "No, Who's on first!"

Though some traditional baseball fans were initially taken aback by the stunt, the majority embraced the humor, making the game one of the most memorable in White Sox history. It was a testament to Veeck's genius that he could seamlessly blend comedy with the sport he loved.

The "Who's on First?" game became a cherished part of baseball lore and added to the legend of Bill Veeck. It demonstrated his willingness to think outside the box, engage fans, and remind everyone that, at its core, baseball should be a source of fun and entertainment.

The 1972 Chicago White Sox "Who's on First?" April Fools' Day game captured the spirit of Bill Veeck's inventive approach to baseball. As we celebrate April Fools' Day this year, let's remember the time when the White Sox took the field with the most confusing lineup in baseball history and showed us all that, sometimes, laughter truly is the best medicine.

Wednesday, March 29, 2023

Alan Turing, The Opera


Last Thursday I attended the world premier of The Life and Death(s) of Alan Turing, a new production from Chicago Opera Theater composed by Justine Chen with the libretto (text) from David Simpatico.

The opera takes a mostly chronological trip through his life in seven scenes, focusing less on the computer science and more on Turing's struggle with homosexuality and his prosecution. Turing does fiddle with an old computer throughout the opera. The opera ends with a different take on his death (spoiler alert), where he attempts to upload his consciousness to escape his chemically castrated body. 

Between the scenes, a chorus chants random numbers and words as they floated on a scrim, in what the composer calls "chat clouds". 

These “chat clouds” transport the listeners with a sonic approximation of internet chatter, filled with information that brings them to the next moment. The aural depiction of these "chat clouds" was inspired by the animated movies and television series of Masamune Shirow's cyberpunk manga Ghost in the Shell. Another sonic influence was Walt Disney’s fantastical Snow White, one of Alan’s greatest obsessions.

I found them reminiscent of the Philip Glass' knee play from Einstein on the Beach. I really enjoyed these sequences, though another academic I ran into during intermission felt otherwise.

Over all I enjoyed the music and the opera, particularly the courtroom scene where Turing gave an impassioned defense though it turns out all in his head. The cast did well across the board, especially Jonathan Michie who sang Turing. 

Librettist David Simpatico (wearing jacking), Composer Justine Chen (in gown)
conductor Lidiya Yankovskaya behind her and Jonathan Michie (in red robe)

One can't help compare this opera to The (R)evolution of Steve Jobs, that I saw in Atlanta last year. Both operas chart a metaphysical journey of a computing giant through various important moments of their lives. During a Q&A I asked Simpatico about the two and he said he purposely avoided the Jobs opera so as to not affect how he wrote this one. Probably for the best.

Sunday, March 26, 2023

The SIGACT Book Review column list of books it wants reviewed

I am posting this for Nick Tran who is the current SIGACT Book Review Editor (before him it was Fred Green for about 6 years, and before him it was me (Bill Gasarch) for 18 years. Nobody should have the job for more than 6 years. No TV show should to on more than 6 years. The 6-year rule probably has other applications.)

Nick asked me to post the list of books that need reviewing. It is most of this post. 

If you spot one you want to review then email him (email address later)  the name of the  book you want to review and your postal address so he can send it to you or have it sent to you. Here are his specs:

Reviews of recently published or bucket-list books of interest to the TCS community are welcome. Manuscripts (NOTE FROM BILL `manuscripts'? Really? Sounds like the kind of thing you would FAX or postal mail) should be between 3 and 6 pages and include a brief introduction, a detailed content summary, an assessment of the work, and a recommendation to the book's targeted audience. 

Nick's email is

The books are: 


Knebl, H. (2020).  Algorithms and Data Structures: Foundations and Probabilistic Methods for Design and Analysis. Springer.

Roughgarden, T. (2022). Algorithms Illuminated: Omnibus Edition. Cambridge University Press.


Amaral Turkman, M., Paulino, C., & Müller, P. (2019). Computational Bayesian Statistics: An Introduction (Institute of Mathematical Statistics Textbooks). Cambridge University Press.

Nakajima, S., Watanabe, K., & Sugiyama, M. (2019). Variational Bayesian Learning Theory. Cambridge University Press.

Hidary, J. D. (2021). Quantum Computing: An Applied Approach (2nd ed.). Springer.

Apt, K. R., & Hoare, T. (Eds.). (2022). Edsger Wybe Dijkstra: His Life, Work, and Legacy (ACM Books). Morgan & Claypool.

Burton, E., Goldsmith, J., Mattei, N., Siler, C., & Swiatek, S. (2023). Computing and Technology Ethics: Engaging through Science Fiction. The MIT Press.


O’Regan, G. (2020). Mathematics in Computing: An Accessible Guide to Historical, Foundational and Application Contexts. Springer Publishing.

Rosenberg, A. L., & Trystram, D. (2020). Understand Mathematics, Understand Computing: Discrete Mathematics That All Computing Students Should Know. Springer Publishing.

Liben-Nowell, D. (2022). Connecting Discrete Mathematics and Computer Science (2nd ed.). Cambridge University Press.


Oorschot, P. . C. (2020). Computer Security and the Internet: Tools and Jewels (Information Security and Cryptography). Springer.


Golumbic, M. C., & Sainte-Laguë, A. (2021). The Zeroth Book of Graph Theory: An Annotated Translation of Les Réseaux (ou Graphes)—André Sainte-Laguë (1926) (Lecture Notes in Mathematics). Springer.

Beineke, L., Golumbic, M., & Wilson, R. (Eds.). (2021). Topics in Algorithmic Graph Theory (Encyclopedia of Mathematics and its Applications).  Cambridge University Press.


Nielson, F., & Nielson, R. H. (2019). Formal Methods: An Appetizer. Springer.

Sanders, P., Mehlhorn, K., Dietzfelbinger, M., & Dementiev, R. (2019). Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox. Springer.


Kurgalin, S., & Borzunov, S. (2022). Algebra and Geometry with Python. Springer.


Thursday, March 23, 2023

A Strange Hiring Season

In my fall jobs post, I had trouble predicting this season's CS faculty job market but I didn't expect this:  strong supply and strong demand, something we haven't seen since the early '80s when the then young CS departments were ramping up.

We have a confluence of two forces that have both strengthened since my post back in November: The layoffs and hiring freezes at the major internet companies (Alphabet, Amazon, Apple, Meta and Microsoft) contrasted with major excitement in machine learning. I wrote the jobs post before ChatGPT was released and Amazon and Meta have since announced even more layoffs.

ML-mania is leading to very strong demand from undergrad and grad students for computing degrees. Across the US we have a record number of open faculty slots in computing as universities try to meet that need.

Meanwhile, PhDs who might have gone to industry are going on the academic job market instead. Also some tech researchers who have been laid off or spooked by the layoffs are considering academic jobs.

Between these two forces we will likely have a record number of faculty hires this year but we may see fewer senior people switching universities because good departments can fill their needs with junior faculty.

There is a mismatch of area. There is a big demand in CS departments to hire in machine learning because that is where the student interest is. ML is not where the big companies are cutting back. If you are on the market this year, position your research to make it relevant to learning, or at least that you are willing to teach ML courses.

By the way, this post is for computer science. Count yourself extremely lucky if you can get a tenure-track job in a non-computing field.

Sunday, March 19, 2023

New Upper Bound on R(k). WOW!

 R(k) is the least n such that for all 2-colorings of the edges of \(K_n\) there is a monochromatic \(K_k\)

(so there are k vertices such that the coloring restricted to the edges between those k vertices is constant)

Here is some history. If I miss a reference or a result, let me know in the comments

\(R(k) \le 2^{2k} = 4^k \) seems to be folklore and is a very old result. Proof could be shown to HS students. I have. Or 9 year old's who are good at math. I have. 

\(R(k)\le {2k-2 \choose k-1}\) which is approx \(\frac{4^k}{\sqrt k}\) was shown by Erdos and Szekeres in 1935. Proof could be shown to HS students. I have. Or 9 year old's who are good at math. I have. 

Thomson in 1988 got

$$R(k) \le \frac{4^k}{k^A\sqrt{\log k}}$$

for some A. This paper is behind paywalls so will be lost to history and I cannot comment on if the proof is easy or hard. (If you know of a link it, let me know and I will provide it). 

Conlon in 2006 got the denom to be super-poly. We omit the result but the paper is here. Proof used the next method of quasi-randomness. 

Sah  (his paper is here) optimized Conlon's technique to get 

$$R(k) \le 4^{k-c(\log k)^2}.$$

(According to the paper I will point to soon that has the major result, Sah's result is the best one can do with the Thomason-Conlon technique. In Complexity theory we may have formalized that and called it a barrier result.)

These results were interesting and used interesting techniques. However, the best known lower bound is  \(R(k) \ge 2^{k/2}\) (I've left out constants) so the REAL questions are

a) Can the 4 be replaced with a smaller number in the upper bound?

b) Is there a number a so that R(k) is essentially \(2^{ak}\)?  

We note in passing that the lower bound was proven by the Prob Method. That last statement obscures the history--- the Prob Method was invented by Erdos TO PROVE the lower bound. I've seen the prob method called an application of Ramsey Theory which is not quite right. Its more like Ramsey INSPIRED a technique that is used A LOT, including things that are actually practical. 

But  back to our story. SO, while all of the above results were interesting, were they making progress towards the REAL question? We can now say YES as progress HAS been made and DID use some of the techniques.

On March 16, 2023 Campos, Griffthis, Morris, Sahasrabudhe posted a paper on arxiv, here, that showed

$$R(k) \le (4-\epsilon)^k$$

 for some (quite small) epsilon. 

Here are some thoughts which are influenced by my email correspondence with Jacob Fox (an eminent combinatorist) on this topic.

1) I am NOT surprised that the result is true. 

2) I AM surprised that it's been proven. Lots of brilliant people have worked on this problem for many years so... .why now? Since its been open for around 70 years, I thought it would take another 70 years to crack.

3) Will this result lead to better results? I hope so!

4) Does R(k) have a nice upper bound? Not clear. The following is unlikely though something like it may be possible:

R(k) roughly\( (3.5)^k\) for k a power of a Fibonacci prime

R(k) roughly\( (3.8)^k \)othewise

5) (Jacob pointed me to this) There is a paper on Book-Ramsey Numbers that also (on page 2) notes a connection to Ramsey Numbers. The paper is here. A conjecture on Book-Ramsey will lead to a big improvement on Ramsey. Those kinds of results are odd since I can't tell if the upshot is

Lets work hard on Book-Ramsey so we can get better bounds on Ramsey!


Book-Ramsey is HARD so lets give up. (Who first said if at first you don't succeed, quit. Why make a damn fool of yourself? ?) 

6) The paper with the new result is 57 pages. It looks dense. I will wait for the movie.  I may have to wait a long time. 

Thursday, March 16, 2023

Identities in Computational Complexity

Guest post by Josh Grochow

On the birdsite, Jay Cummings tweeted:

And it got me thinking about identities in computational complexity. At first I thought: how many could there be? 10? After some thought and with some help, there turn out to be quite a few more than that, and I think they make a pretty good list!

Rules/comments on the list:

  • Some are relatively simple alternative characterizations, such as the various definitions of PH; most are pretty nontrivial theorems.
  • I didn't include existentially quantified oracle results (such as "there exists A such that PA=NPA"), despite them being some of my favorite results, because there'd be waaaay too many of those. Probably thousands? Ask Lance. In some circles he's known as the "oracle oracle" - as in, he's who you go ask if you want to know if a certain oracle exists. I did include identities involving oracles where one side of the identity didn't have an oracle, such as EXP=NPRKt or AlmostP=BPP (AlmostP is the class of languages L such that {A : L is in PA} has measure 1).
  • I also didn't include some important conditional equalities, such as "EXP in P/poly iff EXP=MA" or "NP in BPP iff NP=RP". I guess it's not really an "identity" if it's conditional. Games are only fun if the rules are at least a little constraining!
  • There are some surprising containments that either aren't equalities, or aren't known to be equalities, and I didn't include those either, despite some of them being very important. For example, BPP in Σ2P, other depth reduction results (such as the chasms at depth 4 and 3), and Beigel-Tarui/Yao.

One could teach a class based on a list like this and cover a lot of good ground, but I think it'd be sad to leave out any lower bounds. Actually, by my estimate, if you were teaching from "scratch" (say, starting after an undergrad algorithms class), this list, and all its implied prerequisite material, is already probably the equivalent of about 3-4 semester-long classes!

What other complexity identities did I miss?

Finally, without further ado, a list of (some) identities in computational complexity:


CLS=PPAD∩PLS [from Paul Goldberg @paulwgoldberg]

quasi-poly Frege =quasi-poly noncommutative formula IPS

Space classes




DET=LGapL (see Update 1)

Interactive Proofs

NP=PCP(O(log n), 3)


AM = MAM = AMAM = ... [From Albert Atserias @atserias]


Alternating Turing machines

Σk P = ∃ Pik-1 P = NPΣk-1 P = ΣkTIME(poly(n)) [from Lance]



Circuit classes within P


ACC0=branching programs over solvable monoids


SAC1 = LogCFL [from Michael Levet @Michael_Levet]

REG = DSPACE(O(1)) = NSPACE(O(1)) [from Levet]

REG=DTIME1tape(o(n log n))

ACk = logk time on a CRCW PRAM




QMAlog (1)=BQP [from Martin Schwarz @martin_schwarz]

QMAlog (poly)=QCMA [ditto] 

QAC0f = QNC0f [from Ale `Scinawa' Luongo @scinawa]

QMA(k) = QMA(2) for any k ≥ 2 [from Sarvagya Upadhyay @sarvagya82]

Algebraic complexity





For bilinear maps, tensor rank=Theta(algebraic circuit size)

Logical characterizations




P = FO + LFP on ordered structures [thanks to Lance, and Michael Levet]

Kolmogorov-random strings as oracles



P=COMP∩{dtt reducible to RKU}





  1. March 17th update, h/t Eric Allender: Using Cook's original definition of DET as problems NC1-reducible to the integer determinant, apparently this equality is not known! See Eric's recent guest column in SIGACT News for details and a $1000-prize related to this question, along with many other interesting open questions.

Monday, March 13, 2023

Problems we assume are hard. Are they? ALSO- revised version of Demaine-Gasarch-Hajiaghayi is posted!

 (The latest version of 

Computational Intractability: A Guide to Algorithmic Lower Bounds

by Demaine-Gasarch-Hajiaghayi is posted here. Its new and improved: (1) we have made all or most of the corrections send to us by proofreaders,  (2) there is a chapter on quantum computing, (3) there is an index.  Feel free to read it and send us corrections! 

This post is related to it in that most of the problems-assumed-hard mentioned in this post were the basis for chapters of the book.)

We think SAT is hard because (1) its NPC, and (2) many years of effort have failed to get it into P. 

Imagine a world where we didn't have the Cook-Levin Theorem. We may still think SAT is hard. We may even take it as a hardness assumption to prove other things hard by showing SAT \le A. We may also be curious if they are equivalent and have lots of reductions A \le SAT. The reductions might be Karp or Cook. 

You do not have to imagine this world! We already have it- in different contexts. In other areas of complexity theory there are problems that are assumed hard, but for which there is no analog of Cook-Levin. Are they hard? They seem to be- but the evidence is empirical. Not that there's anything wrong with that. 

I will list out problems that 

a) we assume are hard, for some definition of  we and hard.

b) we have NO proof of that and NO completeness or hardness result.  ADDED LATER- a commenter wanted me to clarify this.

 For SAT there is a well defined set of problems, NP, defined independent of any particular problem (that is, NP was NOT defined as all sets Karp-Red to TSP or anything of the sort) and by Cook-Levin we have 

if SAT is in P then NP is contained in P.

For FPT (the class the commenter was interested in) there IS the result

if weighed k-SAT is in FPT then W[1] is contained in FPT.

but W[1] is DEFINED as the set of problems FPT reducible to Weft-1 circuits (some books use a different basic problems) which IN MY OPINION is not a natural class. One may disagree with this of course. 

COUNTERAUGMENT: but W[1] Was defined as all problems FPT-reducible to Weft-1 circuits. And this is not the same 

(I do not include hardness assumptions for crypto because that's not quite the same thing. Also there are just so many of them!) 

I am sure I missed some- which is where YOU come in! Please leave comments with additional problems that I forgot (or perhaps did not know) to include. 

1) 1vs2 cycle: Given a graph that you are guaranteed is 1 cycle or the union of 2 cycles, determine which is the case. This is assumed to be hard to parallelize (we omit details of defining that formally).  This has been used for lower bounds in parallelism. See here.

2) 3SUM: Given x1,..,xn an array of integers, are there 3 that add to 0? There is an O(n^2) algorithm. The hardness assumption is that, for all epsilon, there is no O(n^{2-\epsilon}) algorithm. This assumption has been used to get lower bounds in comp. geom. See the Wikipedia entry here or the introduction of this paper here. (ADDED LATER, a 2-pager on some algorithms for 3SUM including some randomized ones: here. A commenter asked about randomized algorithms. Perhaps the conjecture should be that no randomized algorithm is sub quadratic.) 

4) APSP (All Pairs Shortest Path) Given a graph G, find for each pair of vertices  the length of the shortest path. There is an O(n^3) algorithm. The hardness assumption is that, for all epsilon, there is no O(n^{3-epsilon}) algorithm. This assumption has been used to get lower bounds on graph problems. For more details see the introduction of this paper: here

5)  Weighted-SAT-k: Given a Boolean formula (it can be taken to be in 2CNF form) is there a satisfying assignment that has exactly k of the variables set to TRUE. This is assumed to not be fixed parameter tractable (that is no function f such that this problem is in  O(f(k)n^{O(1)}) time). Problems that are FPT-equiv to it are called W[1]-complete and are all thought to not be in FPT. W[2], W[t] are also defined but we omit this. W[1]-complete has also been defined in other ways, but I can't seem to find a Cook-Levin type theorem for them. 

6) Graph Isom.  One of he few problems that are in NP but thought to not be NP-complete and, at least for now, is not in P. Babai has shown its in quasi-poly time (n^{(log n)^{O(1)}). There is a notion of GI-hard: problems that, if they are in P then GI is in P. See he Wikipedia entry here. Most of the GI-hard problems are variants of GI, e.g., graph isom. for directed graphs. GI could be in P without unbelievable consequences for complexity and without terrifying consequences for cryptography. 

7) Unique Games Conj: I won't define it formally here, see the Wikipedia entry here. From UGC you get several approximation results are optimal. Does that argue for UGC being true-- having consequences we believe? I would say YES since in some cases the algorithm that gives a good approx has an alpha-approx for some weird number alpha, and assuming UGC you get the SAME alpha as a lower bound. 

8) Existential  Theory of the Reals. Easier for you to read the Wikipedia entry here. It is used for problems that are inbetween NP and PSPACE. (ADDED LATER: One of the commenters says that Blum-Shub-Smale showed ETR is complete for the real version of  NP, so this item should not be on my list.) 

9) Orthogonal vector conjecture. See this paper: here. This is used to show problems are not in subquadratic time. 

Possible research directions and thoughts

a) Try to prove a Cook-Levin type theorem for one of these problems.

b) Build classes analogous to the poly-hiearchy on one of these problems.

c) Ask bounded-query questions. For example: Are k queries to 3SUM more powerful than k-1 (This is a VERY Gasarchian question.) 

d) Try to prove that one of these problems is actually hard. That seems hard. Perhaps on a weaker model (thats prob already been done for at least one of them.)

Monday, March 06, 2023

Peer Review

I ran into a partner of a computer scientist at a social event who asked me "Is the publication system in CS screwed up or really screwed up?" If you don't know my response you haven't been reading this blog long.

Today let's talk about peer review. Kevin McCurley and Adam Mastroianni have recent, not so positive, takes on this topic. 

Peer review came out of a system where we had limited slots in journals and, in computer science, conferences and we had to make tough decisions. Journals and conferences would gain a reputation based somewhat on how difficult it was to get papers published there.

Now we have basically unlimited space to publish your results. And you definitely should do so, posting your papers on your own webpage, and a paper archive site like arXiv or ECCC. The research community would flourish in a world where everyone posts their paper online for public comment, people can promote their favorite papers on social media and we have a TikTok-system for recommending papers to you.

So why do we still need paper review? Mostly because we have to review researchers for jobs and grants, and with the questioning the value of recommendation letters, publication quality and quantity has become a stronger proxy for measuring people for better or for worse.

First of all, peer review is a cornerstone of science. Would you rather have papers reviewed by faceless bureaucrats who know little about the subject area? Or papers only ranked by manipulable statistics like citations.  

But the way we apply peer review, to decide acceptances in conferences, just adds too much randomness to the system. CS conferences have multiplied and continue to get increased submissions as the field grows. It's just impossible to maintain any sort of uniformity in quality of acceptances. Or too often, we find conference committees and funding panels playing it safe rather than take risks with new research. With so much randomness, it's best to try many papers instead of focusing on a stronger result, leading to too much incremental research, especially in academia. 

For hiring, promotion, tenure and funding decisions, we too often rely on short cuts, such as the number of papers accepted to major conferences. Those who don't win the conference lottery get disillusioned and often leave academia for industry and no one wins.

Thursday, March 02, 2023

Goodbye Dilbert

Scott Adams, creator of Dilbert, had a racist rant in a video he posted last week. As a result most newspapers that carried the comic strip are dropping Dilbert, including our local Chicago Tribune. I fully support these moves. Much as I believe in separating the art from the artist, it's different when the artist is living and profiting from their art.

So we need to say to Dilbert, making the end of an era. Dilbert started in 1989 as a strip that captured the absurdities of the work place in an anonymous tech company, predating movies like Office Space and shows like Better Off Ted and Silicon Valley. I used Dilbert strips (with permission) in my book, namely this strip to introduce Kolmogorov complexity and this strip to describe my research area. Just call me Dan.

Farewell to Dilbert, Dogbert, Wally, Alice, Asok, the pointy-haired boss and the rest. I won't miss Scott Adams, but I will miss his creations.

Monday, February 27, 2023

I wish we had less students in a Class. Demographics says I may get my wish.

 According to this article, in the near future LESS people will be going to college. There is even a name for this upcoming shift: The Enrollment Cliff. Why?

Is it Covid-related?  Is it that College has gotten to expensive? To liberal? To much cancel culture?  To many dead white males in the core? The core is to multicultural? Online learning is stealing our students? 

No. The reason is actually very boring and does not serve anyone's political agenda. (thats not quite right).  Or any agenda. And you can probably guess the cause from the title of this blog post.

For some years up until 2007 the birth rate was slowly dropping. Then there was a large drop in the birth rate after the recession of 2007, and the birth rate has never really recovered. And the recession might not have that much to do with it-- the long term move from an agricultural society (where kids are an economic gain) to an industrial one (where, after child labor laws and the expense of college, kids are an economic loss- though that can be debated) has resulted in a very long term decline in births. 

And from personal experience, I know (a) very few people who have 4 or more kids, (b) there is NO stigma about having 0 kids as there once was.  Of course the sample size of people I know may be skewed. 

ANYWAY, what will this mean for colleges? 

a) Harvard, Yale, etc will not be affected. Plenty of people will still apply. Note that they draw from all of American and also internationally. 

b) Colleges that draw from a local area may be affected a lot since they depend on locals, and that population may be shrinking. 

c) Schools in between Harvard and Small colleges- hard to say. 

d) The sports betting places paying schools to allow them to promote on campus (and in some cases helping them promote it) may find far less students to sucker into this loser's game. See my blog on this topic here

Univ of MD has around 4000 Computer Science majors (depending on who tells you this its either a brag or a complaint). In the Spring of 2023 there are three lectures of Discrete math of sizes 240, 270, and 90. Each of those also has recitations of  30 (or so) each. If the decline is gradual (either from demographics or from the CS majors bubble finally bursting, or from the other reasons above) then I am sure we can handle it. If it declines very suddenly we may have a problem adjusting. 

One caveat to this that I've heard is that immigration will save us. Maybe. But America is politically going in the opposite direction. The counterargument of without immigration there will be less students going to college is not that compelling to most Americans. There are other more intelligent and compelling pro-immigration arguments. However, American politics is no longer interested in compelling and logical arguments. (The notion that it once was may be nostalgia for a time that never was.) 

Thursday, February 23, 2023

The Virtual Grad Student

Martin Haug, who is working on a LaTeX alternative Typst, asked me if I had updates on a LaTeX rant from 2011. I haven't seen any new serious backward compatibility problems. We have easier collaboration through on-line editors like Overleaf. We have got closer to WSYWIG thanks to quick compiling but still not at the level of Word or Google Docs. The big problem of user friendliness remains. There's a reason LaTeX has its own Stack Exchange

But we live in a new machine learning world. Can we use generative AI to make LaTeX easier to use?

Mandatory Disclaimer: Generative AI can sometimes create inaccurate, inappropriate or previously-published material. You are ultimately responsible for the contents of your paper no matter how you produced it.

Since I sometimes think of LaTeX as a programming language for papers, I tweeted

Thanks for the responses. The answer to the question is yes, GitHub Copilot works for LaTeX if you edit LaTeX in a programming environment like VS Code, Neovim or Jet Brains. It helps with formatting of formulas and pictures, less so on the text itself. I made a video so you can see how it works.

Latext AI offers a chrome extension that will let you generate text via GPT in Overleaf based on a prompt or previous text, though Latext requires a subscription after a one-week trial. You can also just cut and paste between any text editor and ChatGPT.

ChatGPT notoriously makes up references if you ask for them. Can we have a good system that finds relevant articles to cite and adds them automatically into your bibliography?

Ideally all these should work together seamlessly, suggestions that happen as you type. A true co-pilot for research papers.

There are many more tools out there, feel free to add them to the comments. I expect the integration to improve over time as we develop new APIs and models.

I look forward to the days of a virtual grad student: Here's a research goal and an idea to get there. Now go figure out the details and write the paper. 

It will be a long wait.

Sunday, February 19, 2023

It is more important than ever to teach your students probability (even non-stem students)

(This topic was also covered here.) 

You are a college president. An online betting company says  We will give you X dollars if you allow us to promote online gambling at your University.

I suspect you would say NO.

Too late- it's already happening. A link to a NY times article about this is: here. I urge you to read the entire article. It's worse than it sounds. 

My thoughts

0) I wondered if  a company needed permission to promote a product on a campus. I am not sure of the answer; however, in some cases a school HELPED with the promotion: 

a) During a game there are announcements reminding students that they can place a sports bet! It's easy! It's fun!

b) Links on the schools website to sports gambling sites

c) References to sports betting in emails that goto students.

This is WAY BEYOND  allowing a company to promote.

1) Some points from the article 

Some aspects of the deals also appear to violate the gambling industry's own rules against marketing to underage people. The ``Responsible Marketing Code'' published by the American Gaming Association, the umbrella group for the industry, says sports betting should not be advertised on college campuses. 

``We are not seeing enough oversight, transparency, and education to support the rollout of these kinds of deals'' said Michael Goldman who teaches sports marketing at the Univ of San. Fran. 

During the pandemic, many universities struggled financially ...To fill those holes public and private universities nationwide have been struggling to line up new revenue sources, including by arranging sponsorship deals. (MY THOUGHTS- They don't quite say it, but it seems like the extra money is going back to sports programs. I would be happier if it went into academics- and to be fair, maybe some of it does.) 

2) Online gambling is more addictive than in-person gambling. And it's easier since you don't have to leave your dorm room to do it. 

3) The school gets money and  teaches the students that everything is for sale. So it's a win-win (I am kidding.) 

4) Should a college take  money to allow the promotion of tobacco or alcohol or (if it becomes legal) heroin? I see NO difference between those and online gambling. (See here)

5) I am in favor of all of those things being legal (maybe not heroin but I am open to debate on that)  however, there is a big difference between making something legal, and promoting it.  

6) Silver Lining: This may encourage more students, even non-STEM students, to learn probability. Either advertise it honestly:

Take Probability to find out that Sports Betting is a Loser's Game

Or advertise it dishonestly

Take Probability to find out how you can win at Sports Betting!


Thursday, February 16, 2023

Blurry JPEG or Frozen Concentrate

Ted Chiang in a recent New Yorker article likened ChatGPT to a blurry JPEG, i.e. a "lossy compression" of the web. It's a good article but the analogy isn't quite right, there's a different kind of compression happening. Think of all human written knowledge as a random example of what could have been generated and we remove the randomness, like water is removed to make concentrated orange juice. We then add water (or randomness) to get back some version of the original. 

Lossless compression, like gzip, gives a compressed version of some data with the ability to reconstruct it exactly. It corresponds nicely to Kolmogorov complexity where K(x) is the smallest program p that generates the string x. p is a lossless compression of x.

Lossy compression, like JPEG, often allows much higher compression but with some error. In Kolmogorov terms you are trading off the size of the program p and some error function between x and the output of p. Most compression programs for pictures, music and video use algorithms designed for the specific medium. You can also use machine learning to get lossy compression by training both the compression and decompression algorithms.

Lossy compression tries to recreate the original picture. Generative AI, like ChatGPT, takes a different approach. Let's consider Wikipedia as this is the example used by Chiang. For any specific topic, there are many different ways to write a Wikipedia article, as good as or better than the article that currently exists. ChatGPT doesn't need to recreate anything close to the original article, just one that explains topic well. What we want is a description of a program p that corresponds to a set of possible Wikipedia articles, of which the real article is a random example of this set. An ideal version of ChatGPT would choose a random article from this set. Dall-E, generative AI for art, works a similar way, creating art that is a random example of what art might have been. 

In terms of Kolmogorov complexity, this corresponds to the Kolmogorov Structure Function, basically the smallest program p such that p describes a set S of size m that contains x. with |p| + log m ≈ K(x). The string x is just a random element of S, you can get a string like it by picking an element of S at random.

There is no recursive algorithm that will find p and we also need to limit ourselves to p that are computationally efficient, which means that generative AI algorithms may never be ideal and will sometimes make mistakes. That doesn't mean we shouldn't use them just that we need to be wary of their limitations. As the saying goes "All models are wrong, but some are useful".

Sunday, February 12, 2023

When is a paper `Easily Available' ?

I was looking at the paper 

                                PSPACE-Completeness of reversible deterministic systems

by Erik Demaine, Robert Hearn,  Dylan Hendrickson, and Jayson Lynch (see here) and came across the following fascinating result which I paraphrase:

The problem of, given balls on a pool table (though it can be one you devise which is not the standard one) and each balls initial position and velocity, and a particular ball and place, it is PSPACE complete to determine if that ball ever gets to that place. 

Demaine et al. stated that this was proven by Edward Fredkin and Tommaso Toffoli in 1982 (see here for a link to the 1982 paper, not behind a paywall). Demaine et al. gave an easier proof with some nice properties. (Just in case the link goes away I downloaded the paper to my files and you can find it here.) 

I needed the bib reference for the FT-1982 paper and rather than copy it from Demaine et al. I wanted to cut-and-paste, so I looked for it in DBLP. I didn't find the 1982 paper but I did find a book from 2002 that reprinted it. The book, Collision-based computing, has a website here. The book itself is behind a paywall.

On the website is the following curious statement:

[This book] Gives a state-of-the-art overview of an emerging topic, on which there is little published literature at the moment. [The book] Includes 2 classic paper, both of which are widely referred to but are NOT EASILY AVAILABLE (E. Fredkin and T. Toffoli: Conservative Logic, and N . Margolous Physics-Like Models of Computation). 

The caps are mine.

Not easily available? I found a link in less than a minute, and I used it above when I pointed to the paper. 

But the book IS behind a paywall. 

Perhaps Springer does not know that the article is easily available. That would be odd since the place I found the article is also a Springer website. 

The notion of EASILY AVAILABLE is very odd. While not quite related, it reminds me of when MIT Press had to pay a few thousand dollars for permission (that might not be the legal term) to reprint Turing's 1936 paper where he defined Turing Machines (he didn't call them that), which is on line here (and other places), for Harry Lewis's book Ideas that created the future. 

Thursday, February 09, 2023

Why Can't Little Chatty Do Math?

Despite OpenAI's claim that ChatGPT has improved mathematical capabilities, we don't get far multiplying large numbers.

L:What is 866739766 * 745762645?  C:647733560997969470

Typical for ChatGPT, the answer passes the smell test. It has the right number of digits and has correct first and last couple of digits. But the real answer is 646382140418841070,  quite different from the number given.

As far as I know, multiplication isn't known to be in TC0, the complexity class that roughly corresponds to neural nets. [Note Added: Multiplication is in TC0. See comments.] Also functions learned by deep learning can often be inverted by deep learning. So if AI can learn how to multiply, it might also learn how to factor. 

But what about addition? Addition is known to be in TC0 and ChatGPT performs better.

The correct answer is 1612502411, only one digit off but still wrong. The TC0 algorithm needs to do some tricks for carry lookahead that is probably hard to learn. Addition is easier if you work from right to left, but ChatGPT has trouble reversing numbers. There's a limit to its self-attention.

ChatGPT can't multiply but it does know how to write a program to multiply.

It still claims the result will be the same as before. Running the program gives the correct answer 646382140418841070. 

ChatGPT is run on a general purpose computer, so one could hope a later version that could determine when its given a math question, write a program and run it. That's probably too dangerous--we would want to avoid a code injection vulnerability. But still it could use an API to WolframAlpha or some other math engine. Or a chess engine to play chess. Etc. 

Monday, February 06, 2023

After you are notified that an article is accepted...

 After just one round of referees reports

(they send me the reports, I made the corrections, they were happy) I got email saying my paper on proving the primes are infinite FROM Schur's theorem in Ramsey was ACCEPTED. Yeah! Now What? 

1) The journal send me an email with a link GOOD FOR ONLY FIFTY DAYS to help me publicize the article. Here is the link:,H-cWw6X

Will this really help? The article is already on arxiv. (ADDED LATER: the link on arxiv is here.)  Also, I can blog about it, but how do non-bloggers publicize their work? Do they need to? 

(ADDED LATER: A commenter wanted to know why I am publishing in an Elsevier journal. This was a memorial issue in honor of Landon Rabern (see here) a combinatorist who died young.  I was invited to submit an article.) 

QUESTION: Is this common practice? If so, what do you do with those links? Email them to all of your the people who should care about the article?

2) I got some forms to fill out that asked how many offprints I wanted. While my readers can probably guess what that means, I will remind you: paper copies of the article. I filled out the form:

I want 0 of them.

They still wanted to know the address to send the 0 copies to, so I gave that as well.

Does anyone actually get offprints anymore? That seems so 1990's. With everything on the web I tend to email people who want article pointers. In fact, that happens rarely - either nobody wants to read my articles (quite possible) or they find them on my website (quite possible). 

In 1991 when I went up for tenure the dept wanted 15 copies of every article I wrote so they could send my letter writers (and others) all my stuff. Rumor is that the Governor of Maryland got a copy of every article I ever wrote. I hoped he was a fan of oracle constructions. 

In 1998 when I went up for full prof they did not do this, assuming that the letter writers could find what the needed on the web. I do wonder about that- it might have been a nice courtesy to send them stuff directly and that would be a use for offprints. Depends on if my letter writers prefer reading online or on paper. They could of course print out my papers, but again- as a courtesy perhaps we should have supplied the papers. 

QUESTION: Do you order a non-zero number of offprints and if so why? 

3) The journal offered to have my article to be open access at their site for a price. I did not do this as, again, the article is already on arxiv.

QUESTION: Is there a reason to have your article formally open-access given that its already on arixv? 

4) One of my co-authors on a different article asked me When will it appear IN PRINT?  I can't imagine caring about that. Its already on arxiv and I doubt having it in a journal behind paywalls will increase its visibility AT ALL. The only reason to care about when it appears IN PRINT is so I can update my resume from TO APPEAR to the actual volume and number and year. 

QUESTION: Aside from updating your resume do you care when an article that was accepted appears IN PRINT? And if so why? 

Thursday, February 02, 2023


Nature laid out their ground rules for large language models like ChatGPT including

No LLM tool will be accepted as a credited author on a research paper. That is because any attribution of authorship carries with it accountability for the work, and AI tools cannot take such responsibility.

Let's focus on the last word "responsibility". What does that mean for an author? It means we can hold an author, or set of authors, responsible for any issues in the paper such as

  • The proofs, calculations, code, formulas, measurements, statistics, and other details of the research.
  • Any interpretation or conclusions made in the article
  • Properly citing related work, especially work that calls into question the novelty of this research
  • The article does not contain text identical or very similar to previous work.
  • Anything else described in the article.
The authors should take reasonable measures to ensure that a paper is free from any issues above. Nobody is perfect and if you make a mistake in a paper, you should, as with all mistakes, take responsibility and acknowledge the problems, do everything you can to rectify the issues, such as publishing a corrigendum if needed, and work to ensure you won't make similar mistakes in the future.

Mistakes can arise outside of an author's actions. Perhaps a computer chip makes faulty calculations, you relied on a faulty theorem in another paper, your main result appeared in a paper fifteen years ago in an obscure journal, a LaTeX package for the journal created some mistakes in the formulas or a student who helped with the research or exposition took a lazy way out, or you put too much trust in AI generative text. Nevertheless the responsibility remains with the authors. 

Could an AI ever take responsibility for an academic paper? Would a toaster ever take responsibility for burning my breakfast?


Tuesday, January 31, 2023

Why does pi come up so often? I don't know either but ...

 Here is how history DID unfold:

1) People noticed that the ratio of the circumference to the diameter of ANY circle is always the same, it's a number between 3 and 4, roughly 3.14 or as my Dad would say, exactly 22/7 (see this blog post). On the web (see here) is a claim that Euler first used the symbol pi since it is the first letter of a Greek word for Circumference. I've seen other sites that claim someone less famous and its the first letter of a Greek work for Perimeter

But in any case pi was known (though not called that) a LONG time ago.

2) MUCH LATER people noticed

1/1^2 + 1/2^2 + 1/3^2 + ... = pi^2/6 (Euler showed this in 1735, see here. That site says that it was Basel's problem to determine the sum. Yeah for Basel--- his name lives on even after his problem was solved! This did not happen for Baudet and Vazsonyi. If you don't know what they conjectured--- well, that proves my point. ADDED LATER: commenter Robert pointed out that Basel is NOT a person's name but a cities name. I am delighted to know that!) 

1 - 1/3 + 1/5 - 1/7 + 1/9 ... = pi/4 (Leibniz showed this in 1676, see here. A good Quora site on that sum is here.)

(There are other series where pi comes up as well.) 

3) People wonder Why did pi, which has to do with CIRCLES, end up in INFINITE SERIES?

What if history unfolded the other way: 

1) People noticed

1/1^2 + 1/2^2 + 1/3^2 + ... =a^2/6 (Euler did that in 1735, see here.) 

1 - 1/3 + 1/5 - 1/7 + 1/9 ... = b/4 (Leibniz showed this, see here. A good Quora site on that sum is here.)

and they noticce that a=b and is between 3 and 4, closer to 3. They decide to call it pi for no good reason. 

(There are other series where pi comes up as well.) 

2) MUCH LATER people noticed that this same funny constant pi was the ratio of circumference to diameter in any circle. 

3) People wonder Why did pi, which has to do with INFINITE SERIES, end up in CIRCLES?

The following comic captures the dichotomy: here

Thursday, January 26, 2023

Back to the 90's

The current state of computer science reminds me of the early excitement of the Internet in the mid-90's. By the beginning of the 90's, computers landed in many homes and most offices. Computers had modems, connected through the phone lines. Various online service networks, CompuServe, Prodigy, AOL, MSN, started up which provided some basic information and communication but were mostly walled gardens. 

But then came the mosaic web browser in 1993. There weren't many web pages and they looked awful. There is a special place in hell for whomever invented the <blink> tag. But the ability to link to other pages, local or anywhere else on the web, was a game changer. I quickly created a web page so people could download my papers, to try it out and because I got tired of responding to email requests for them. People experimented with all different kinds of things on the web. Companies tried to figure out what to put on web pages. The online service networks reluctantly put browsers on their sites.

In the mid-90's the Internet was exciting but messy. Search engines would brag about the number of pages they searched but the ranking lacked relevance. Every CS job talk in every subarea, including theory, focused on the Internet. VC money flowed into Internet-related companies no matter how silly. It wasn't until Google using the PageRank algorithm gave us a good search engine, the dot-com bust drove out the bad companies and cloud computing gave us good platforms that we got to the Internet we have today, for better or for worse.

We're at that messy stage with machine learning. We can see the game-changing potential of the technology but far too many problems limit our ability to use them. VC money flows into new ML startups while our traditional Internet companies are shedding jobs. Will the transformers paper be this generation's PageRank or do we need another trick to take the next step? If the Internet is any guide, we'll get past this early stage, the market will shake out, only to develop even more challenges down the line.

Sunday, January 22, 2023

The Betty White Award for 2022

In Dec 2021 I noted in this post, which was my 1000th post ever (according to Ken Regan, see here) that Betty White had the misfortune of dying on Dec 31, 2021, so AFTER the People we said goodbye to in 2021 articles had already appeared. 

That might not be quite right since when she died it was Jan 1 SOMEWHERE in the world. I learned a new phrase- AWE- AnyWhere on Earth.  But no, she was not mentioned in the People we said goodbye to in 2022 articles.

The Betty White award goes to a celebrity that had the same fate- dying too late in the year to be mentioned in those articles. I had not thought of what criteria I would use if there is more than one option, and indeed, this year there are three candidates that I know of. 

Pele was the greatest soccer player of all time.  Died Dec 29, 2022, at the age of 82. 

Barbara Walters was a famous broadcast journalist. Died Dec 30, 2022, at the age of 93.

Pope Emeritus Benedict  was a prior Pope (obviously). Died Dec 31, 2022 at the age of 95. He died at 9:34AM Vatican Time. I do not think he died on Jan 1, 2023 AWE though that would not disqualify him since he surely will not be in the 2023 People we say goodbye to in 2022 articles.

So which one should get the award?

The term famous means famous when they died. They were all much more famous some years ago than they are now. I am using famous when they died as a criteria. 

a) I am not sure who is more famous internationally- Pele or Pope Emeritus Benedict. Walters is not famous internationally. 

b) Walters is more famous in America. I think Benedict is second, though its hard to tell.  Frankly none of the three are that famous in America. Walters was at one time. Fame is fleeting!

c) Benedict died older and later in the year.

d) Of the three, one was prominent in one of worlds largest religions. The others were a broadcast journalist and a former Pope. 

Rather than try to find a well defined criteria, I will give it the Betty White award to all three of them.

ADDED LATER: Lance has tweeted a poll so you can vote on who you think should have won the ward. The poll has you vote for one of the three, so you can't vote for two of the three, or (as I would have done) vote for all three. 

Note that  Martin Davis avoided being considered for the award since he died on Jan 1, 2023. 

Thursday, January 19, 2023


I'm sure many of you long-time readers are asking, "Why all this big focus on machine learning in your posts and tweets? You are the 'Computational Complexity' blog! You've barely said a word about meta-complexity."

So what is meta-complexity? From what I can tell the term goes back a few years but really came into wide use in computational complexity in the past year. The Computational Complexity Conference held an invited talk on meta-complexity by Rahul Santhanam, and the Simons Institute is hosting a research program this spring on the topic.

As the name suggests, meta-complexity studies the complexity of computing the complexity of various problems. It's a term that encompasses recent research into the Minimum Circuit Value Problem (given the truth-table of a Boolean function, find the size of the smallest circuit that computes it) and the complexity of time-bounded Kolmogorov complexity. 

To quote from the Simons page

Meta-complexity provides a unifying framework for a variety of important tasks in several important areas of computer science, including computational complexity, proof complexity, cryptography, and learning theory. These areas are all intimately linked, but only recently are these links being made explicit and studied more closely. For example, learning can be interpreted as solving search versions of the Minimum Circuit Size Problem and related problems. Basing primitives such as one-way functions and indistinguishability obfuscation on standard complexity assumptions is one of the main objectives in theoretical cryptography. Important recent directions involving meta-complexity within proof complexity, such as lifting and automatability, strengthen analogies and connections between proof complexity and circuit complexity. In addition, independence results such as the natural proofs framework have intuitive interpretations in terms of meta-complexity. These connections have led to several recent breakthroughs, including quasi-polynomial time PAC-learning algorithms for constant-depth circuits with parity gates, new worst-case to average-case reductions for NP problems, a new complexity-theoretic characterization of one-way functions, and the NP-hardness of automating resolution.

Not to mention the theorem of the year, Shuichi Hirahara's proof that determining the minimum circuit of a partially specified function is NP-complete. 

When you get down to it meta-complexity is all about learning, determining the complexity of finding programs. You cannot escape it.

To dive deeper into meta-complexity check out the videos of the Simons meta-complexity bootcamp. 

Monday, January 16, 2023

ChatGPT tried to write an obit for Martin Davis. What it got right, wrong, and what to make of it.

When Martin Davis passed away Lance emailed me what he got from using ChatGPT to do an obit. Here it is and I also note what it got wrong.


Born in 1928 in Brooklyn, New York, Davis received his bachelor's degree from Brooklyn College in 1950 [WRONG-1948] and his Ph.D. from the University of Chicago in 1953 [WRONG- it was Princeton in 1950].  He went on to have a distinguished career, with positions at the Institute for Advanced Study, the University of California, Berkeley, and New York University, where he spent the majority of his career.

Davis made significant contributions to the field of mathematics, particularly in the areas of computational complexity theory, recursive function theory, and mathematical logic. His work on the Davis-Putnam-Robinson-Matiyasevich algorithm [ADDED LATER BASED ON A COMMENT: WRONG-The DPRM theorem is that there is NO algorithm. It is not an algorithm.] helped establish the connection between logic and computers [WRONG- THE CONNECTION BETWEEN LOGIC AND COMPUTERS WAS ALREADY WELL KNOWN. CONNECTION BETWEEN COMPUTABILITY AND POLYNOMIALS WOULD HAVE BEEN BETTER] and has had major implications for the field of computer science.

In addition to his academic pursuits, Davis was also a dedicated teacher and mentor. He was a popular and engaging speaker, known for his wit and clarity in explaining complex mathematical concepts. He inspired and encouraged countless students throughout his career and left a lasting impact on the mathematical community.

Davis was the recipient of many prestigious awards and honors, including the ACM Turing Award [WRONG- DAVIS NEVER WON A TURING AWARD], the highest honor in computer science, in 1994. He was also a member of the National Academy of Sciences and the American Academy of Arts and Sciences.  [I WAS ABLE TO VERIFY THAT HE WAS IN THE AMERICAN ACAD. I WAS NOT ABLE TO VERIFY THAT HE WAS IN THE NATIONAL ACAD, BUT I DO NOT KNOW.]

Martin Davis will be greatly missed by his colleagues, students, and friends.  His contributions to mathematics and computer science will not be forgotten and will continue to inspire future generations of mathematicians and scientists.


This raises some questions

a) It got some factual things wrong. This surprised me. Its also a cautionary note: when you using ChatGPT for a first draft you need to check factual information. 

b) I was going to take the obit and modify it for my post. I found that I could not do this, not because of the mistakes in it, but because its not what I wanted to emphasize. See my obit post here.

c) Will this make bloggers jobs  and other jobs easier by just posting modified versions of things ChatGPT outputs? This is both a hope (less work) and a fear (no work). Options: 

c1) Much like farming, there will be a transition period where people are out of work but in the long term its a good thing that instead of 90% of people working on farms its 3%. So people transitioned (or the next generation did) to other jobs. (One could argue if this is really a good thing, though I've heard farming was VERY HARD WORK so people are happier not being farmers.)

c2) The jobs that are going away, there will NOT be replacement jobs and we are looking at an economic and psychological catastrophe. 

d) Even before ChatGPT I had heard of using a program to output a sports story about little league games for very local papers. 

e) The obvious fear: will students have ChatGPT produce their papers for them? In its current state the students would have to modify the paper. A few ways this could go:

e1) Analog to calculators: Students used to have to memorize multiplication up to 20x20 but now we let t hem use calculators. Students used to have to write papers, now we let them use ChatGPT and modify. This may be forced on us as opposed to something we want to do.

e2) There will be a fierce fight where teachers what students to NOT use ChatGPT but its hard to stop them. 

e3) ChatGPT will get much better. Even so, there will still be some things its bad at. Students won't know which is which, or won't care.

e4) I am tempted to say it won't affect math but I think it might for, say, standard proofs by induction, standard calculations from calculus (we already have programs that can differentiate and integrate- has that affected how Calculus is taught or graded?). 

Thursday, January 12, 2023

Semantic Search for the Blog

 As Google started to limit academic storage, I started looking at Google Takeout and started wondering what I could do with all that data. I downloaded all the posts from the blog, since we use Google's blogger, and ran them through OpenAI's Ada Embedding. The Ada embedding maps text up to 8192 words into a point on the 1536-dimensional unit sphere. You can measure the similarity between two embeddings via a simple dot product, giving you the cosine of the angle between them.

So I created a semantic search for the blog. Go ahead and try it out.

Search for  

You can enter a search term, phrase, or the full URL (including https) of a blog post. It will return a list of the 5 closest posts, with the percentage match, computed as the square of the cosine. I don't have a mechanism for automatically updating the files, so you'll only see posts from 2022 and earlier.

This was an Open AI-assisted affair, as I used ChatGPT and GitHub co-pilot to help with the python and pandas data frames. It took me longer to figure out how to create a web application so you can try the search. Similarity match doesn't work like normal searches, for example if you search for a city like "Detroit", you'll get posts that mention other cities. Some other oddities, like "mad" seems to match "Madhu". It probably says something about me that my most happy post is not about some great new theorem but about baseball.

Monday, January 09, 2023

Martin Davis Passed Away on Jan 1, 2023

As you probably already know from other sources, Martin Davis passed away on Jan 1, 2023, at the age of 94. His wife Virginia died a few hours later.

He majored in math at Brooklyn College and graduated in 1948.

He got his PhD under Alonzo Church at the Princeton in 1950.

Two years for a PhD seems fast!

His Phd was on Recursion theory.

In it he conjectured that Hilbert's tenth problem (below) is undecidable.

He is known for the following (if you know more, please leave a comment).

1) Hilbert's Tenth Problem.

Hilbert posed 23 problems in the year 1900 for mathematicians to work on

over the next 100 years. Hilbert's tenth problem, in modern terminology, was

Find an algorithm that will, given a polynomial p \in Z[x_1,...,x_n],

determine it has a Diophantine solution (that is, a_1,...,a_n\in Z

such that  p(a_1,...,a_n)=0).

 In Hilbert's article he did say in a preface to all of the problems. Here is the exact quote:

Occasionally it happens that we seek the solution under insufficient hypotheses or in an incorrect sense, and for this reason do not succeed. The problem then arises: to show the impossibility of the solution under the given hypotheses or in the sense contemplated.

Hilbert had hoped that this problem would lead to deep results in number theory. And it has to some extend. However this went from being a problem in number theory to a problem in logic. That might not be quite right: the result did use number theory.

In 1961 Davis-Putnam-Robinson showed that the problem is undecidable IF you also allow exponentials.  This may have been a turning point for the conventional wisdom to shift from `Probably Solvable' to `Probably Unsolvable.'

Martin Davis predicted that the H10 would be shown undecidable by a young Russian by the end of the decade. He was correct. Yuri Matiyasevich did indeed prove H10 undecidable in 1970. By all account Davis was delighted. When the result is cited usually all four people are credited which is as it should be.  He wrote an excellent exposition of the complete proof from soup to nuts in: 

Hilbert's tenth problem is unsolvable, American Math Monthly, Volume 80, No 4, 233-269.

When I first heard of this result I assumed that the number of variables and the degree to get undecidability was huge. I was wrong.  I wrote a survey of H10 emphasizing what happens if you bound the degree and the number of variables, see here

2) SAT Solvers. Davis-Putnam-Logemann-Loveland outlined a SAT Solver, or really a class os SAT Solvers. While I doubt it was the first SAT Solver, it was an early one that tried to cut down on the time needed.

3) He wrote the following books:

Computability and Unsolvability (1958, reprinted 1982)

Applied Non-Standard Analysis (1977, reprinted 2014)

Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science (1994 (with Elaine Weyuker)

The Universal Computer: The Road from Leibniz to Turing (2000, reprinted as

Engines of Logic: Mathematicians and the origin of the computer)

The Undecidable: Basic Paper on undecidable propositions, unsolvable problems and computable functions (2004)

4) He was a recursion theorist who could actually program a Turing Machine to really do things. There are still some people who do that- getting the UTM down to a small number of states, and the work (the Wolfram Challenge), and the Busy Beaver Function (see Scott Aaronson's open problems column here,

but I think this is done less by academic recursion theorists than it used to be. I do not have my students in Automata Theory write any TM code. Clyde Kruskal, who had that same course from Martin Davis, thinks that I should. 

4) For more on Martin Davis see this obit here and/or his Wikipedia page here

Tuesday, January 03, 2023

Positional Encoding

Given the excitement over ChatGPT, I spent part of the winter recess trying to understand the underlying technology of Transformers. After trying various tutorials, I found the best explanation comes in the original 2017 paper, Attention is All you Need. This is my attempt to figure out positional encoding in transformers with some naive questions. Let me know if I'm completely off base.

Earlier models like Recurrent Neural Nets and Convolutional NNs worked under the assumption that information close by was more likely to be correlated. Machine learning seems to improve as the models make fewer and fewer assumptions about the data and transformers use positional data with no predisposed favoritism to nearby inputs.

The attention paper uses the following encoding for positions. Let α(j) = 10000-j/d where d is the dimension of the input embedding, i.e. the number of numbers used to represent each word of the input. We encode position p as d/2 pairs of numbers cos(p α(j)) and sin(p α(j)), for j ranging from 1 to d/2. They chose this function because relative positions are easy to address. We can address a relative position of a fixed k by a linear combination of cos(p α(j)) and sin(p α(j)) using the addition rules of cos and sin.

cos ((p+k) α(j)) = cos(k α(j)) cos(p α(j))  - sin(k α(j)) sin(p α(j)) 

sin ((p+k) α(j)) = cos(k α(j)) sin(p α(j))  + sin(k α(j)) cos(p α(j)) 

The d-dimensional vector of position encodings is added to the input embedding.

Why is the position encodings added to the input embedding?

I scoured the Internet and can't seem to find a good reason for this, other than it seems to work. Wouldn't the linear combinations to handle relative positions muddle up the input embedding? Since the input embedding is learned, perhaps some parts of the embedding are made zero or very small so the positional embedding stands out. Why not concatenate the two, have separate inputs for the input embedding and the positions? You wouldn't need to fully double the dimension since you would no longer need to match the dimension of the input encoding.

Why not use complex numbers?

I see cos(p α(j)) and sin(p α(j)) and immediately think of them as the real and imaginary parts of ep α(j) i. So why not just do the positional encodings as complex numbers? This paper suggests multiplying ep α(j) i with the input embedding, i.e., the input is embedding into the amplitude and the position by the phase. That makes more sense. You can now multiply by ep α(k) i to get the relative position j+k without affecting the input embedding.

I don't see a good reason not to use complex numbers for transformers, given that most learning packages and GPUs can handle complex numbers just fine. Even if you don't want to use complex numbers you could multiply the sin and cos versions of the positional encoding instead of adding to achieve a similar effect.

How about positional encoding for outputs?

Transformers output words in order but that makes it harder to relate outputs that are far apart. So why not give positional encoding to the outputs. A post-processor could then put the outputs in the correct order. More generally, how about outputting a program that produces the real output? We know transformers can generate code, and this way can handle operations that transformers normally struggle with, like multiplication or sorting.