Wednesday, February 28, 2024

A Quantum State

Illinois' most famous citizen working on a quantum computer

The governor of Illinois, JB Pritzker, unveiled his budget last week including $500 million for quantum computing research. Is this the best way to spend my tax dollars?

As long-time readers know, I have strong doubts about the real-world applications of quantum computing and the hype for the field. But the article does not suggest any applications of quantum computing, rather

Pritzker says he's optimistic that the Illinois state legislature will embrace his proposal as a catalyst for job creation and investment attraction.

That does make sense. Investing in quantum may very well bring in extra federal and corporate investment into quantum in Chicago. At the least it will bring in smart people to Illinois to fill research roles. And it's not if this money would go to any other scientific endeavor if we don't put it into quantum.

So it makes sense financially and scientifically even if these machines don't actually solve any real-world problems. Quantum winter will eventually come but might as well take advantage of the hype while it's still there. Or should we?

A physicist colleague strongly supports Illinois spending half a billion on quantum. He lives in Indiana. 

Sunday, February 25, 2024

When is it worth the time and effort to verify a proof FORMALLY?

(This post was inspired by Lance's tweet and later post on part of IP=PSPACE being formally verified.) 

We now have the means to verify that a proof (prob just some proofs)  is correct (one could quibble about verifying the verifier but we will not address that here). 

The Coq proof assistant (see here). This was used to verify the proof of the four-color theorem, see here.

The Lean Programming Language and Proof Verifier (see here). It has been used to verify Marton's Conjecture which had been proven by Gowers-Green-Tao (see here for a quanta article, and here for Terry Tao's blog post about it.) 

The HOL (Higher Order Logic)  is a family of interactive theorem proving systems (see here). The Wikipedia entry for HOL (proof assistant) (see here) says: 

The CakeML project developed a formally proven compiler for ML [ML is a programming language, not to be confused with Machine Learning.]

The ISABELLE is an HOL system. See here. The Wikipedia page says that it is used to verify hardware and software. It has recently been used to verify the Sumcheck Protocol which was developed to help prove IP=PSPACE (See here.)One of the authors of the IP=PSPACE paper, Lance Fortnow, said of this

Now I can sleep soundly at night.

The Kepler Conjecture was proven by Thomas Hales. (Alfred Hales is the co-author of the Hales-Jewitt. I do not know if Thomas and Alfred Hales are related, and Wikipedia does not discuss mathematicians blood-relatives, only their academic ancestors.)  The proof was... long. Over a period of many years it was verified by HOL light and Isabelle. See here. (Note- the paper itself says it used HOL-light and Isabelle, but the Wikipedia site on Lean says that Hales used Lean.) 

I am sure there are other proof assistants and other theorems verified by them and the ones above. 

My question is 

Which proofs should we be spending time and effort verifying? 

Random Thoughts

1) The time and effort is now much less than it used to be so perhaps the question is moot.

2) Proofs that were done with the help of a program (e.g., the four-color theorem) should be verified.

3) The theorem has to have a certain level of importance that is hard to define, but item 1 might make that a less compelling criteria. 

4) A proof in an area where there have been mistakes made before should be verified. That is the justification given in the paper about verifying part of the IP=PSPACE proof.

5) A rather complicated and long proof should be verified. That was the case for Marton's Conjecture. 

6) A cautionary note: Usually when a long and hard proof comes out, people try to find an easier proof. Once a theorem is proven (a) we know its true (hmmm..) and (b) we have some idea how the proof goes. Hence an easier proof may be possible. In some cases just a better write up and presentation is done which is also a good idea. I wonder if after having a verifier say YES people will stop bothering getting  easier proofs.

7) Sometimes the people involved with the original proof also were involved in the verification (Kepler Conjecture, Marton's Conjecture) and sometimes not (IP=PSPACE, 4-color theorem). 

8) When I teach my Ramsey Theory class I try to find easier proofs, or at least better write ups, of what is in the literature. I sometimes end up with well-written but WRONG proofs. The students, who are very good, act as my VERIFIER. This does not always work:  there are still some students who, 10  years ago, believed an  incorrect proof of the a-ary Canonical Ramsey Theorem, though they went on to live normal lives nonetheless. Would it be worth it for me to use a formal verifier? I would guess no, but again, see item (1). 

9) Might it one day be ROUTINE that BEFORE submitting an article you use a verifier? Will using a verifier be as easy as using LaTeX? 

10) The use of these tools for verifying code--- does that put a dent in the argument by Demillo-Lipton-Perlis (their paper is here, and I had two blog posts on it here and here) that verifying software is impossible?'

11) HOL stands for High Order Logic. I could not find out what Isabelle, Lean, or Coq stand for. I notice that they all use a Cap letter then small letters so perhaps they don't stand for anything. 

Wednesday, February 21, 2024

Sumchecks and Snarks

Last summer as I lamented that my research didn't have real world implications, one of the comments mentioned the sumcheck protocol used for zero-knowledge SNARKs. I tried to figure out the connection back then but got lost in technical papers. 

With the formal verification of the sumcheck protocol announced last week I tried again this time using Google's new Gemini Ultra. Gemini gave a readable explanation. Let me try to summarize here.

Sumcheck comes from the paper where Howard Karloff, Carsten Lund, Noam Nisan and I showed that co-NP has interactive proofs. It was Carsten who first came up with the sumcheck protocol in fall of 1989. Here's a simple version:

Suppose there is a hard to compute polynomial p of low-degree d and a prover claims p(0)+p(1) = v. The prover gives a degree-d polynomial q to the verifier claiming q = p. The verifier checks q(0)+q(1) = v and picks an r from a large range. If q was actually p then p(r) = q(r). But if p(0)+p(1) \(\neq\) v then p and q are different polynomials and with high probability p(r) \(\neq\) q(r) since p and q can only agree on at most d points.

The protocol reduces checking a sum to checking a single randomly chosen location. Our first proof used the sumcheck protocol to reduce checking the permanent of a nxn matrix to a (n-1)x(n-1) matrix and then we repeat until the matrix is small enough to compute the permanent directly.

Now let's break down the properties of zkSNARK (zero-knowledge succinct non-interactive arguments of knowledge). 

  • Efficient: A prover who knows the secret input can generate the proof efficiently.
  • Zero-Knowledge: A prover can convince a verifier that they know a secret input that satisfies a statement without revealing anything else about that secret.
  • Succinct: The proof itself is extremely small and easy to verify.
  • Non-Interactive: No repeated interaction between prover and verifier is necessary. The prover generates one proof, and anyone can verify it.

Our original sumcheck protocol is not all efficient, the prover needs to solve #P-hard problems, but in SNARKs we assume the prover already knows the problem's solution.

Our protocol requires significant interaction, in particular that r is chosen after q is set. But for SNARKs since we can assume the prover is efficient, we can use r as a hash of q. It should be difficult to find a q different than p such that q(hash(q)) \(\neq\) p(hash(q)).

Our protocol is not at all zero-knowledge, if p = q then the protocol reveals p(0) and p(1). Lots more tricks needed to make it zero-knowledge and succinct but I leave that to the interested reader.

Sunday, February 18, 2024

ChatGPT thinks Graph Isomorphism has real applications. Is it right?

Lance did a post on Babai's result on Graph Isomorphism (see here). I then did a post asking if Graph Isomorphism has real applications (see here).  Lance proofread my post (There were some typos! Really!) and then he was inspired to ask ChatGPT what it thought. The conversation is below (I added some fonts, and my opinion clearly marked,  but nothing else).  I leave it to the reader to determine what ChatGPT got right and got wrong. In areas that I am unfamiliar with I tend to think that ChatGPT is right, perhaps a case of Gell-man Amnesia (see here).

LANCE: What are some applications of Graph Isomorphism?

ChatGPT:

Isomorphism, a concept from the field of graph theory in mathematics and computer science, refers to the problem of determining whether two graphs are isomorphic, meaning there is a one-to-one correspondence between their vertex sets that preserves adjacency. Despite its theoretical underpinnings, graph isomorphism has practical applications in various domains:

Chemical Informatics: It's crucial for identifying chemical compounds. Since chemical structures can be represented as graphs (with atoms as vertices and bonds as edges), determining if two graphs are isomorphic can indicate whether two compounds are structurally identical.

BILL'S OPINION: I suspect simple heuristics work fine. Also I suspect that chemicals are more complicated than graphs. More generally, instead of asking Does GI have any real applications? I should have asked Do the last 45 years of graph isomorphism algorithms produced by our community have any applications? I suspect the answer is no.

Pattern Recognition: In image and pattern recognition tasks, objects can be represented as graphs, and finding isomorphic graphs helps in identifying similar patterns or shapes within different datasets.

BILL'S OPINION: I suspect that in real world pattern recognition, especially for images, approx matches is what you really want. What makes GI hard/interesting is that you need exact Isomorphism. So I doubt GI really helps for Pattern Recognition.

Network Analysis: Graph isomorphism is used to analyze and compare different network structures in social networks, biological networks (like neural connections and protein interactions), and the Internet, to understand their underlying properties and dynamics.

BILL'S OPINION: This sounds plausible to me. The people doing the studying are other academics, however, that can still count. 

Computer Vision: For object recognition and classification, where objects within an image can be represented as graphs, and their identification involves finding isomorphic subgraphs within larger scene graphs.

BILL'S OPINION: Same issue as with Pattern matching. 

Cryptography: Graph isomorphism problems can serve as the basis for cryptographic protocols, including public key cryptography, due to the computational difficulty of solving graph isomorphism for general cases.

BILL'S OPINION: Just NO.  One reason factoring is so useful for crypto is that one can easily generate hard instances. For GI you cannot (at least not yet). I think ChatGPT confused GI being in Zero-Knowledge with GI actually being used for Crypto. More on this later.

Database Indexing: In databases, especially graph databases, graph isomorphism helps in query optimization and efficient data retrieval by identifying identical or similar graph structures.

BILL'S OPINION: I can believe this! Gell-man Amnesia? 

These applications demonstrate the wide-ranging impact of graph isomorphism beyond theoretical computer science, influencing fields where structural and relational data analysis is crucial.


LANCE: Can you give me specific examples of these?

(BILL: I won't comment on these, except in one case,  as it would be the same comments as above.) 

ChatGPT:

Certainly! Here are specific examples for the practical applications of graph isomorphism mentioned:

Chemical Informatics: A classic example is the identification of isomers in chemistry. Isomers are compounds with the same molecular formula but different structures. Graph isomorphism algorithms can distinguish between different isomers by comparing their graph representations, where vertices represent atoms and edges represent chemical bonds. For instance, butane and isobutane are structural isomers with different connectivity among the same set of atoms, which can be identified as distinct through graph isomorphism.

Pattern Recognition: In character recognition systems, letters or symbols can be represented as graphs, where the strokes are edges and intersections are vertices. Graph isomorphism algorithms can then be used to match a scanned, handwritten, or printed character against a database of graph-represented characters to identify the letter or symbol.

Network Analysis: In social network analysis, researchers might use graph isomorphism to identify similar subnetworks within larger networks. For example, detecting communities or cliques within social networks that have identical or similar interaction patterns, which can be crucial for understanding social dynamics or the spread of information.

Computer Vision: An application in computer vision could be recognizing objects regardless of their orientation in an image. By representing objects as graphs, where vertices correspond to object features and edges to spatial relationships between these features, graph isomorphism can help in identifying an object by finding a match between its graph representation and that of a known object.

Cryptography: A specific use case in cryptography is the design of cryptographic primitives based on the Graph Isomorphism problem, such as the Goldreich-Goldwasser-Halevi (GGH) cryptosystem. This cryptosystem constructs a public key from a hard instance of a lattice problem, which is closely related to finding isomorphisms between noisy graph representations, leveraging the computational difficulty of graph isomorphism for security.

BILL: The pointer to the Wikipedia page on the GGH system is here. It has nothing to do with graph isomorphism. So this one is complete garbage.

Database Indexing: In the context of graph databases used for storing knowledge graphs or social network data, an example application is the optimization of query processing. When a user queries the database to find data patterns that match a specific graph structure, graph isomorphism algorithms can help quickly locate all instances of the data that are isomorphic to the query graph, enhancing search efficiency and performance.

These examples highlight how graph isomorphism plays a crucial role in solving real-world problems across different domains by leveraging the structure and properties of graphs.

LANCE: Give me a specific research paper in Chemistry that uses graph isomorphism

ChatGPT
A notable research paper that applies graph isomorphism in the field of chemistry is:

Title: "Development and Application of a Novel Graph-Theoretical Approach for Molecular Similarity Analysis"

Authors: Raymond E. Carhart, David H. Smith, Robert Venkataraghavan


Journal: Journal of Chemical Information and Computer Sciences

Year: 1985, Volume: 25, Issue: 1, Pages: 64–73, DOI: 10.1021/ci00045a002

Abstract:

In this paper, the authors introduce a novel method for analyzing molecular similarity based on graph-theoretical concepts. They present an algorithm, known as the MCS (Maximum Common Subgraph) algorithm, which is used to identify the largest subgraph common to two chemical compounds represented as graphs. This process relies on graph isomorphism to determine the structural similarity between molecules. The approach was groundbreaking at the time because it provided a quantitative method for assessing molecular similarity, which is crucial for drug discovery and the development of chemical databases.

The significance of this work lies in its application of graph isomorphism to solve a practical problem in chemistry, demonstrating the interdisciplinary nature of graph theory and its potential to impact research beyond theoretical computer science. This paper has been influential in the fields of cheminformatics and drug design, where molecular similarity plays a key role in identifying potential therapeutic agents.

BILL'S COMMENT: This information from ChatGPT would be more impressive if this paper actually existed. 

(ADDED LATER- there is some debate in the comments about if the paper exists. 

DOES NOT EXIST: Bill could not find it on the web.

DOES NOT EXIST: Bill went to the website of that journal and did a search and did not find it. 

DOES NOT EXIST: The DOI does not point to it.

DOES EXIST: Some slide packet pointed to it as a reference. 

If someone leaves a comment with a pointer to it, or emails me a pointer to it, I will update this post.)





Thursday, February 15, 2024

Change

An academic field often organizes itself pulling ideas from its own field. For example, students on the economics faculty job search can signal at most two schools, without giving any other rules on how the signals should be used, allowing equilibrium behavior to rule.

In computer science, our field is decentralized without any single organization setting policies or much coordination. We're very data oriented, so we do a good job like with the Taulbee Survey. And because we're a field based on programming and logic, we tend to outsource decision to specific rules, formulas for ranking departments, judging faculty productivity (h-index, major conference publications, course evaluations) and graduation requirements.

Computing itself has changed over the last decade with the growth of artificial intelligence and learning from data. We see many examples when computers trained on data often dramatically out-perform those built on logic, certainly in say machine translation, facial recognition and spam detection for example.

Will the changes in computing change the way we think about how we run our field? Probably, but over a long period of time. People don't change their ways, but eventually the people change.

Sunday, February 11, 2024

Are there any REAL applications of Graph Isomorphism?

Lance's post on Babai's result on Graph Isomorphism (henceforth GI) inspired some random thoughts on GI. (Lance's post is here.) 

1) Here is a conversation with someone who I will call DAVE. This conversation took place in the late 1980's. 

BILL: You got your ugrad degree in engineering and then went to CS grad school. You are a pragmatic guy. So why did you work on graph isomorphism?

DAVE: My advisor was working on it. The 1980's was a good time for GI: the problems restricted to bounded genus, bounded degree, and bounded eigenvalue multiplicity were proven to be in P. Tools from Linear Algebra and Group Theory were being used. People thought that GI might be shown to be in P within the next five  year. Then it all stopped for quite some time. 

But back to your question about GI being practical, I asked my advisor for real applications of GI. He said:  

Organic Chemists use graph isomorphism to match chemical structures.

By the time I found out this wasn't true, I already had my PhD. I quickly switched to Computational Geometry which really does have applications. Maybe. 

2) So why was graph isomorphism studied? Much to my surprise, a survey by Grohe and Schweitzer (see here) says that 

Graph isomorphism as a computational problem first appeared in the chemical documentation literature of he 1950's (for example Ray and Kirsh) as the problem of matching a molecular graph against a database of such graphs.

(The article by Ray and Kirsh is here.) 

So at one time it was thought that GI would apply to Chemistry. Did it? I suspect some simple algorithms and heuristics  did, but (1) chemicals are more complicated than graphs, and (2) by the time we are talking about graph isomorphism of bounded eigenvalue multiplicity, the algorithms got to complicated to really use. BUT these are just my suspicions (what is the difference between a suspicion and a guess?) so I welcome comments or corrections.

2) The same article also says:

Applications [of GI] spans a broad field of areas ranging from Chemistry to computer vision. Closely related is the problem of detecting symmetries of graphs and of general combinatorial structures. Again this has many application domains, for example, combinatorial optimization, the generation of combinatorial structures, and the computation of normal forms. 

That sounds impressive, but I would like to know if the applications are to the real world, or are they do other theory things. 

3) The prominence of GI in the CS theory literature is because its one of the few natural problems that is in NP, not in P, but not known to be (and unlikely to be) NP-complete.

4) Graph Isomorphism IS a natural problem. What is an unnatural problem? 

BILL: Do you find the following interesting: There is an r.e. set that is not decidable and not Turing-equivalent to HALT.

DARLING: Yes. Unless it was some dumb-ass set that people like you constructed for the whole point of being r.e., not decidable, and not Turing-equivalent to HALT. 

BILL: You nailed it!
 
5) So, is GI in P? This is truly open question in that it really could go either direction. There is no real consensus. 
 
6) I have heard that Babai's result is as far as current techniques can take us. So we await a new idea.

Wednesday, February 07, 2024

Favorite Theorems: Graph Isomorphism

We start our favorite theorems of the last decade with a blockbuster improvement in a long-standing problem. 

Graph Isomorphism in Quasipolynomial Time by LászlĂł Babai

The graph isomorphism problem takes two graphs G and H both on n nodes and asks if there a permutation \(\sigma\) such that \((i,j)\) is an edge of G if and only if \((\sigma(i),\sigma(j))\) is an edge of H. While we can solve graph isomorphism in practice, until Babai's paper we had no known subexponential algorithm for the problem.

Graph Isomorphism has a long history in complexity. It has its own self-named complexity class in the zoo. It's one of the few decision problems in NP not known to be NP-complete or in P. Graph Isomorphism and its complement are the poster child problems for complexity classes AM, SZK and SPP, public-coin proof systems, and the minimum-size circuit problem. Köbler, Schöning and Torán wrote a whole book on the structural complexity of graph isomorphism.

Graph Isomorphism has the kind of group structure that suggests a quick quantum algorithm, but that remains an annoying open problem. 

Babai found a simple algorithm and proof based on Johnson Graphs. Who am I kidding? You would need several months to work through the paper even for an expert group theorist. 

Babai has been working on graph isomorphism since the 80's. In November 2015, he announced a talk with that title and the theory world went crazy. They needed a larger room. We invited Babai down to Georgia Tech to give a talk. A week before he sends me an email saying the proof no longer gave the quasipolynomial time bound and asks if he should still give the talk. We say of course and he gives the talk on January 9, 2017, describing the problem with the proof. And then he announces "It's been fixed". I witnessed history in the making. 

Sunday, February 04, 2024

The advantage of working on obscue things

 I got an email from an organization that wants to publicize one of my papers. Which paper did they want to publicize?

1) If the organization was Quanta, they are KNOWN so I would trust it. Indeed, they have contacted me about my paper proving primes infinite from Ramsey theory (they contacted all of the authors of papers that did that). The article they published is here and I blogged about those proofs here.

2) Someone writing an article on my muffin work for a recreational journal contacted me- also believable.

3) But the email I got was from an organization I never heard of and they wanted to publicize An NP-Complete Problem in Grid Coloring. This  is not a topic of general interest, even among complexity theorist or Ramsey theorists.  Indeed, the paper has three authors and only two of them care about it. Hence I suspected the organization was a scam or quasi-scam. I looked them up and YES, at some point they would have wanted money for their efforts. 

I want to say none of my work is of general interest but that may depend on what is meant by general and by interest. Suffice to say that if a scammer claims they want to publicize my work and picks a random paper to ask me about, the probability that its one where I will say Yes, I can see that one being worth getting to a wider audience is negligible. Indeed, Darling was amazed that Quanta cared about using Ramsey Theory to prove primes infinite. She may have a point there.

But if I worked in Quantum or ML or Quantum ML  then it would be harder for me to say for a random paper they can't possibly think that this was wider appeal, so its obviously a scam. 

That is the advantage of working on obscure things.