## Tuesday, January 31, 2023

### Why does pi come up so often? I don't know either but ...

Here is how history DID unfold:

1) People noticed that the ratio of the circumference to the diameter of ANY circle is always the same, it's a number between 3 and 4, roughly 3.14 or as my Dad would say, exactly 22/7 (see this blog post). On the web (see here) is a claim that Euler first used the symbol pi since it is the first letter of a Greek word for Circumference. I've seen other sites that claim someone less famous and its the first letter of a Greek work for Perimeter

But in any case pi was known (though not called that) a LONG time ago.

2) MUCH LATER people noticed

1/1^2 + 1/2^2 + 1/3^2 + ... = pi^2/6 (Euler showed this in 1735, see here. That site says that it was Basel's problem to determine the sum. Yeah for Basel--- his name lives on even after his problem was solved! This did not happen for Baudet and Vazsonyi. If you don't know what they conjectured--- well, that proves my point. ADDED LATER: commenter Robert pointed out that Basel is NOT a person's name but a cities name. I am delighted to know that!)

1 - 1/3 + 1/5 - 1/7 + 1/9 ... = pi/4 (Leibniz showed this in 1676, see here. A good Quora site on that sum is here.)

(There are other series where pi comes up as well.)

3) People wonder Why did pi, which has to do with CIRCLES, end up in INFINITE SERIES?

What if history unfolded the other way:

1) People noticed

1/1^2 + 1/2^2 + 1/3^2 + ... =a^2/6 (Euler did that in 1735, see here.)

1 - 1/3 + 1/5 - 1/7 + 1/9 ... = b/4 (Leibniz showed this, see here. A good Quora site on that sum is here.)

and they noticce that a=b and is between 3 and 4, closer to 3. They decide to call it pi for no good reason.

(There are other series where pi comes up as well.)

2) MUCH LATER people noticed that this same funny constant pi was the ratio of circumference to diameter in any circle.

3) People wonder Why did pi, which has to do with INFINITE SERIES, end up in CIRCLES?

The following comic captures the dichotomy: here

## Thursday, January 26, 2023

### Back to the 90's

The current state of computer science reminds me of the early excitement of the Internet in the mid-90's. By the beginning of the 90's, computers landed in many homes and most offices. Computers had modems, connected through the phone lines. Various online service networks, CompuServe, Prodigy, AOL, MSN, started up which provided some basic information and communication but were mostly walled gardens.

But then came the mosaic web browser in 1993. There weren't many web pages and they looked awful. There is a special place in hell for whomever invented the <blink> tag. But the ability to link to other pages, local or anywhere else on the web, was a game changer. I quickly created a web page so people could download my papers, to try it out and because I got tired of responding to email requests for them. People experimented with all different kinds of things on the web. Companies tried to figure out what to put on web pages. The online service networks reluctantly put browsers on their sites.

In the mid-90's the Internet was exciting but messy. Search engines would brag about the number of pages they searched but the ranking lacked relevance. Every CS job talk in every subarea, including theory, focused on the Internet. VC money flowed into Internet-related companies no matter how silly. It wasn't until Google using the PageRank algorithm gave us a good search engine, the dot-com bust drove out the bad companies and cloud computing gave us good platforms that we got to the Internet we have today, for better or for worse.

We're at that messy stage with machine learning. We can see the game-changing potential of the technology but far too many problems limit our ability to use them. VC money flows into new ML startups while our traditional Internet companies are shedding jobs. Will the transformers paper be this generation's PageRank or do we need another trick to take the next step? If the Internet is any guide, we'll get past this early stage, the market will shake out, only to develop even more challenges down the line.

## Sunday, January 22, 2023

### The Betty White Award for 2022

In Dec 2021 I noted in this post, which was my 1000th post ever (according to Ken Regan, see here) that Betty White had the misfortune of dying on Dec 31, 2021, so AFTER the People we said goodbye to in 2021 articles had already appeared.

That might not be quite right since when she died it was Jan 1 SOMEWHERE in the world. I learned a new phrase- AWE- AnyWhere on Earth.  But no, she was not mentioned in the People we said goodbye to in 2022 articles.

The Betty White award goes to a celebrity that had the same fate- dying too late in the year to be mentioned in those articles. I had not thought of what criteria I would use if there is more than one option, and indeed, this year there are three candidates that I know of.

Pele was the greatest soccer player of all time.  Died Dec 29, 2022, at the age of 82.

Barbara Walters was a famous broadcast journalist. Died Dec 30, 2022, at the age of 93.

Pope Emeritus Benedict  was a prior Pope (obviously). Died Dec 31, 2022 at the age of 95. He died at 9:34AM Vatican Time. I do not think he died on Jan 1, 2023 AWE though that would not disqualify him since he surely will not be in the 2023 People we say goodbye to in 2022 articles.

So which one should get the award?

The term famous means famous when they died. They were all much more famous some years ago than they are now. I am using famous when they died as a criteria.

a) I am not sure who is more famous internationally- Pele or Pope Emeritus Benedict. Walters is not famous internationally.

b) Walters is more famous in America. I think Benedict is second, though its hard to tell.  Frankly none of the three are that famous in America. Walters was at one time. Fame is fleeting!

c) Benedict died older and later in the year.

d) Of the three, one was prominent in one of worlds largest religions. The others were a broadcast journalist and a former Pope.

Rather than try to find a well defined criteria, I will give it the Betty White award to all three of them.

ADDED LATER: Lance has tweeted a poll so you can vote on who you think should have won the ward. The poll has you vote for one of the three, so you can't vote for two of the three, or (as I would have done) vote for all three.

Note that  Martin Davis avoided being considered for the award since he died on Jan 1, 2023.

## Thursday, January 19, 2023

### Meta-Complexity

I'm sure many of you long-time readers are asking, "Why all this big focus on machine learning in your posts and tweets? You are the 'Computational Complexity' blog! You've barely said a word about meta-complexity."

So what is meta-complexity? From what I can tell the term goes back a few years but really came into wide use in computational complexity in the past year. The Computational Complexity Conference held an invited talk on meta-complexity by Rahul Santhanam, and the Simons Institute is hosting a research program this spring on the topic.

As the name suggests, meta-complexity studies the complexity of computing the complexity of various problems. It's a term that encompasses recent research into the Minimum Circuit Value Problem (given the truth-table of a Boolean function, find the size of the smallest circuit that computes it) and the complexity of time-bounded Kolmogorov complexity.

To quote from the Simons page

Meta-complexity provides a unifying framework for a variety of important tasks in several important areas of computer science, including computational complexity, proof complexity, cryptography, and learning theory. These areas are all intimately linked, but only recently are these links being made explicit and studied more closely. For example, learning can be interpreted as solving search versions of the Minimum Circuit Size Problem and related problems. Basing primitives such as one-way functions and indistinguishability obfuscation on standard complexity assumptions is one of the main objectives in theoretical cryptography. Important recent directions involving meta-complexity within proof complexity, such as lifting and automatability, strengthen analogies and connections between proof complexity and circuit complexity. In addition, independence results such as the natural proofs framework have intuitive interpretations in terms of meta-complexity. These connections have led to several recent breakthroughs, including quasi-polynomial time PAC-learning algorithms for constant-depth circuits with parity gates, new worst-case to average-case reductions for NP problems, a new complexity-theoretic characterization of one-way functions, and the NP-hardness of automating resolution.

Not to mention the theorem of the year, Shuichi Hirahara's proof that determining the minimum circuit of a partially specified function is NP-complete.

When you get down to it meta-complexity is all about learning, determining the complexity of finding programs. You cannot escape it.

To dive deeper into meta-complexity check out the videos of the Simons meta-complexity bootcamp.

## Monday, January 16, 2023

### ChatGPT tried to write an obit for Martin Davis. What it got right, wrong, and what to make of it.

When Martin Davis passed away Lance emailed me what he got from using ChatGPT to do an obit. Here it is and I also note what it got wrong.

-------------------------------------------------------------------------

Born in 1928 in Brooklyn, New York, Davis received his bachelor's degree from Brooklyn College in 1950 [WRONG-1948] and his Ph.D. from the University of Chicago in 1953 [WRONG- it was Princeton in 1950].  He went on to have a distinguished career, with positions at the Institute for Advanced Study, the University of California, Berkeley, and New York University, where he spent the majority of his career.

Davis made significant contributions to the field of mathematics, particularly in the areas of computational complexity theory, recursive function theory, and mathematical logic. His work on the Davis-Putnam-Robinson-Matiyasevich algorithm [ADDED LATER BASED ON A COMMENT: WRONG-The DPRM theorem is that there is NO algorithm. It is not an algorithm.] helped establish the connection between logic and computers [WRONG- THE CONNECTION BETWEEN LOGIC AND COMPUTERS WAS ALREADY WELL KNOWN. CONNECTION BETWEEN COMPUTABILITY AND POLYNOMIALS WOULD HAVE BEEN BETTER] and has had major implications for the field of computer science.

In addition to his academic pursuits, Davis was also a dedicated teacher and mentor. He was a popular and engaging speaker, known for his wit and clarity in explaining complex mathematical concepts. He inspired and encouraged countless students throughout his career and left a lasting impact on the mathematical community.

Davis was the recipient of many prestigious awards and honors, including the ACM Turing Award [WRONG- DAVIS NEVER WON A TURING AWARD], the highest honor in computer science, in 1994. He was also a member of the National Academy of Sciences and the American Academy of Arts and Sciences.  [I WAS ABLE TO VERIFY THAT HE WAS IN THE AMERICAN ACAD. I WAS NOT ABLE TO VERIFY THAT HE WAS IN THE NATIONAL ACAD, BUT I DO NOT KNOW.]

Martin Davis will be greatly missed by his colleagues, students, and friends.  His contributions to mathematics and computer science will not be forgotten and will continue to inspire future generations of mathematicians and scientists.

-------------------------------------------------------------------------

This raises some questions

a) It got some factual things wrong. This surprised me. Its also a cautionary note: when you using ChatGPT for a first draft you need to check factual information.

b) I was going to take the obit and modify it for my post. I found that I could not do this, not because of the mistakes in it, but because its not what I wanted to emphasize. See my obit post here.

c) Will this make bloggers jobs  and other jobs easier by just posting modified versions of things ChatGPT outputs? This is both a hope (less work) and a fear (no work). Options:

c1) Much like farming, there will be a transition period where people are out of work but in the long term its a good thing that instead of 90% of people working on farms its 3%. So people transitioned (or the next generation did) to other jobs. (One could argue if this is really a good thing, though I've heard farming was VERY HARD WORK so people are happier not being farmers.)

c2) The jobs that are going away, there will NOT be replacement jobs and we are looking at an economic and psychological catastrophe.

d) Even before ChatGPT I had heard of using a program to output a sports story about little league games for very local papers.

e) The obvious fear: will students have ChatGPT produce their papers for them? In its current state the students would have to modify the paper. A few ways this could go:

e1) Analog to calculators: Students used to have to memorize multiplication up to 20x20 but now we let t hem use calculators. Students used to have to write papers, now we let them use ChatGPT and modify. This may be forced on us as opposed to something we want to do.

e2) There will be a fierce fight where teachers what students to NOT use ChatGPT but its hard to stop them.

e3) ChatGPT will get much better. Even so, there will still be some things its bad at. Students won't know which is which, or won't care.

e4) I am tempted to say it won't affect math but I think it might for, say, standard proofs by induction, standard calculations from calculus (we already have programs that can differentiate and integrate- has that affected how Calculus is taught or graded?).

## Thursday, January 12, 2023

### Semantic Search for the Blog

As Google started to limit academic storage, I started looking at Google Takeout and started wondering what I could do with all that data. I downloaded all the posts from the blog, since we use Google's blogger, and ran them through OpenAI's Ada Embedding. The Ada embedding maps text up to 8192 words into a point on the 1536-dimensional unit sphere. You can measure the similarity between two embeddings via a simple dot product, giving you the cosine of the angle between them.

So I created a semantic search for the blog. Go ahead and try it out.

Search for

You can enter a search term, phrase, or the full URL (including https) of a blog post. It will return a list of the 5 closest posts, with the percentage match, computed as the square of the cosine. I don't have a mechanism for automatically updating the files, so you'll only see posts from 2022 and earlier.

This was an Open AI-assisted affair, as I used ChatGPT and GitHub co-pilot to help with the python and pandas data frames. It took me longer to figure out how to create a web application so you can try the search. Similarity match doesn't work like normal searches, for example if you search for a city like "Detroit", you'll get posts that mention other cities. Some other oddities, like "mad" seems to match "Madhu". It probably says something about me that my most happy post is not about some great new theorem but about baseball.

## Monday, January 09, 2023

### Martin Davis Passed Away on Jan 1, 2023

As you probably already know from other sources, Martin Davis passed away on Jan 1, 2023, at the age of 94. His wife Virginia died a few hours later.

He majored in math at Brooklyn College and graduated in 1948.

He got his PhD under Alonzo Church at the Princeton in 1950.

Two years for a PhD seems fast!

His Phd was on Recursion theory.

In it he conjectured that Hilbert's tenth problem (below) is undecidable.

1) Hilbert's Tenth Problem.

Hilbert posed 23 problems in the year 1900 for mathematicians to work on

over the next 100 years. Hilbert's tenth problem, in modern terminology, was

Find an algorithm that will, given a polynomial p \in Z[x_1,...,x_n],

determine it has a Diophantine solution (that is, a_1,...,a_n\in Z

such that  p(a_1,...,a_n)=0).

In Hilbert's article he did say in a preface to all of the problems. Here is the exact quote:

Occasionally it happens that we seek the solution under insufficient hypotheses or in an incorrect sense, and for this reason do not succeed. The problem then arises: to show the impossibility of the solution under the given hypotheses or in the sense contemplated.

Hilbert had hoped that this problem would lead to deep results in number theory. And it has to some extend. However this went from being a problem in number theory to a problem in logic. That might not be quite right: the result did use number theory.

In 1961 Davis-Putnam-Robinson showed that the problem is undecidable IF you also allow exponentials.  This may have been a turning point for the conventional wisdom to shift from Probably Solvable' to Probably Unsolvable.'

Martin Davis predicted that the H10 would be shown undecidable by a young Russian by the end of the decade. He was correct. Yuri Matiyasevich did indeed prove H10 undecidable in 1970. By all account Davis was delighted. When the result is cited usually all four people are credited which is as it should be.  He wrote an excellent exposition of the complete proof from soup to nuts in:

Hilbert's tenth problem is unsolvable, American Math Monthly, Volume 80, No 4, 233-269.

When I first heard of this result I assumed that the number of variables and the degree to get undecidability was huge. I was wrong.  I wrote a survey of H10 emphasizing what happens if you bound the degree and the number of variables, see here

2) SAT Solvers. Davis-Putnam-Logemann-Loveland outlined a SAT Solver, or really a class os SAT Solvers. While I doubt it was the first SAT Solver, it was an early one that tried to cut down on the time needed.

3) He wrote the following books:

Computability and Unsolvability (1958, reprinted 1982)

Applied Non-Standard Analysis (1977, reprinted 2014)

Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science (1994 (with Elaine Weyuker)

The Universal Computer: The Road from Leibniz to Turing (2000, reprinted as

Engines of Logic: Mathematicians and the origin of the computer)

The Undecidable: Basic Paper on undecidable propositions, unsolvable problems and computable functions (2004)

4) He was a recursion theorist who could actually program a Turing Machine to really do things. There are still some people who do that- getting the UTM down to a small number of states, and the work (the Wolfram Challenge), and the Busy Beaver Function (see Scott Aaronson's open problems column here,

but I think this is done less by academic recursion theorists than it used to be. I do not have my students in Automata Theory write any TM code. Clyde Kruskal, who had that same course from Martin Davis, thinks that I should.

4) For more on Martin Davis see this obit here and/or his Wikipedia page here

## Tuesday, January 03, 2023

### Positional Encoding

Given the excitement over ChatGPT, I spent part of the winter recess trying to understand the underlying technology of Transformers. After trying various tutorials, I found the best explanation comes in the original 2017 paper, Attention is All you Need. This is my attempt to figure out positional encoding in transformers with some naive questions. Let me know if I'm completely off base.

Earlier models like Recurrent Neural Nets and Convolutional NNs worked under the assumption that information close by was more likely to be correlated. Machine learning seems to improve as the models make fewer and fewer assumptions about the data and transformers use positional data with no predisposed favoritism to nearby inputs.

The attention paper uses the following encoding for positions. Let α(j) = 10000-j/d where d is the dimension of the input embedding, i.e. the number of numbers used to represent each word of the input. We encode position p as d/2 pairs of numbers cos(p α(j)) and sin(p α(j)), for j ranging from 1 to d/2. They chose this function because relative positions are easy to address. We can address a relative position of a fixed k by a linear combination of cos(p α(j)) and sin(p α(j)) using the addition rules of cos and sin.

cos ((p+k) α(j)) = cos(k α(j)) cos(p α(j))  - sin(k α(j)) sin(p α(j))

sin ((p+k) α(j)) = cos(k α(j)) sin(p α(j))  + sin(k α(j)) cos(p α(j))

The d-dimensional vector of position encodings is added to the input embedding.

Why is the position encodings added to the input embedding?

I scoured the Internet and can't seem to find a good reason for this, other than it seems to work. Wouldn't the linear combinations to handle relative positions muddle up the input embedding? Since the input embedding is learned, perhaps some parts of the embedding are made zero or very small so the positional embedding stands out. Why not concatenate the two, have separate inputs for the input embedding and the positions? You wouldn't need to fully double the dimension since you would no longer need to match the dimension of the input encoding.

Why not use complex numbers?

I see cos(p α(j)) and sin(p α(j)) and immediately think of them as the real and imaginary parts of ep α(j) i. So why not just do the positional encodings as complex numbers? This paper suggests multiplying ep α(j) i with the input embedding, i.e., the input is embedding into the amplitude and the position by the phase. That makes more sense. You can now multiply by ep α(k) i to get the relative position j+k without affecting the input embedding.

I don't see a good reason not to use complex numbers for transformers, given that most learning packages and GPUs can handle complex numbers just fine. Even if you don't want to use complex numbers you could multiply the sin and cos versions of the positional encoding instead of adding to achieve a similar effect.

How about positional encoding for outputs?

Transformers output words in order but that makes it harder to relate outputs that are far apart. So why not give positional encoding to the outputs. A post-processor could then put the outputs in the correct order. More generally, how about outputting a program that produces the real output? We know transformers can generate code, and this way can handle operations that transformers normally struggle with, like multiplication or sorting.