Wednesday, February 21, 2024

Sumchecks and Snarks

Last summer as I lamented that my research didn't have real world implications, one of the comments mentioned the sumcheck protocol used for zero-knowledge SNARKs. I tried to figure out the connection back then but got lost in technical papers. 

With the formal verification of the sumcheck protocol announced last week I tried again this time using Google's new Gemini Ultra. Gemini gave a readable explanation. Let me try to summarize here.

Sumcheck comes from the paper where Howard Karloff, Carsten Lund, Noam Nisan and I showed that co-NP has interactive proofs. It was Carsten who first came up with the sumcheck protocol in fall of 1989. Here's a simple version:

Suppose there is a hard to compute polynomial p of low-degree d and a prover claims p(0)+p(1) = v. The prover gives a degree-d polynomial q to the verifier claiming q = p. The verifier checks q(0)+q(1) = v and picks an r from a large range. If q was actually p then p(r) = q(r). But if p(0)+p(1) \(\neq\) v then p and q are different polynomials and with high probability p(r) \(\neq\) q(r) since p and q can only agree on at most d points.

The protocol reduces checking a sum to checking a single randomly chosen location. Our first proof used the sumcheck protocol to reduce checking the permanent of a nxn matrix to a (n-1)x(n-1) matrix and then we repeat until the matrix is small enough to compute the permanent directly.

Now let's break down the properties of zkSNARK (zero-knowledge succinct non-interactive arguments of knowledge). 

  • Efficient: A prover who knows the secret input can generate the proof efficiently.
  • Zero-Knowledge: A prover can convince a verifier that they know a secret input that satisfies a statement without revealing anything else about that secret.
  • Succinct: The proof itself is extremely small and easy to verify.
  • Non-Interactive: No repeated interaction between prover and verifier is necessary. The prover generates one proof, and anyone can verify it.

Our original sumcheck protocol is not all efficient, the prover needs to solve #P-hard problems, but in SNARKs we assume the prover already knows the problem's solution.

Our protocol requires significant interaction, in particular that r is chosen after q is set. But for SNARKs since we can assume the prover is efficient, we can use r as a hash of q. It should be difficult to find a q different than p such that q(hash(q)) \(\neq\) p(hash(q)).

Our protocol is not at all zero-knowledge, if p = q then the protocol reveals p(0) and p(1). Lots more tricks needed to make it zero-knowledge and succinct but I leave that to the interested reader.

Sunday, February 18, 2024

ChatGPT thinks Graph Isomorphism has real applications. Is it right?

Lance did a post on Babai's result on Graph Isomorphism (see here). I then did a post asking if Graph Isomorphism has real applications (see here).  Lance proofread my post (There were some typos! Really!) and then he was inspired to ask ChatGPT what it thought. The conversation is below (I added some fonts, and my opinion clearly marked,  but nothing else).  I leave it to the reader to determine what ChatGPT got right and got wrong. In areas that I am unfamiliar with I tend to think that ChatGPT is right, perhaps a case of Gell-man Amnesia (see here).

LANCE: What are some applications of Graph Isomorphism?

ChatGPT:

Isomorphism, a concept from the field of graph theory in mathematics and computer science, refers to the problem of determining whether two graphs are isomorphic, meaning there is a one-to-one correspondence between their vertex sets that preserves adjacency. Despite its theoretical underpinnings, graph isomorphism has practical applications in various domains:

Chemical Informatics: It's crucial for identifying chemical compounds. Since chemical structures can be represented as graphs (with atoms as vertices and bonds as edges), determining if two graphs are isomorphic can indicate whether two compounds are structurally identical.

BILL'S OPINION: I suspect simple heuristics work fine. Also I suspect that chemicals are more complicated than graphs. More generally, instead of asking Does GI have any real applications? I should have asked Do the last 45 years of graph isomorphism algorithms produced by our community have any applications? I suspect the answer is no.

Pattern Recognition: In image and pattern recognition tasks, objects can be represented as graphs, and finding isomorphic graphs helps in identifying similar patterns or shapes within different datasets.

BILL'S OPINION: I suspect that in real world pattern recognition, especially for images, approx matches is what you really want. What makes GI hard/interesting is that you need exact Isomorphism. So I doubt GI really helps for Pattern Recognition.

Network Analysis: Graph isomorphism is used to analyze and compare different network structures in social networks, biological networks (like neural connections and protein interactions), and the Internet, to understand their underlying properties and dynamics.

BILL'S OPINION: This sounds plausible to me. The people doing the studying are other academics, however, that can still count. 

Computer Vision: For object recognition and classification, where objects within an image can be represented as graphs, and their identification involves finding isomorphic subgraphs within larger scene graphs.

BILL'S OPINION: Same issue as with Pattern matching. 

Cryptography: Graph isomorphism problems can serve as the basis for cryptographic protocols, including public key cryptography, due to the computational difficulty of solving graph isomorphism for general cases.

BILL'S OPINION: Just NO.  One reason factoring is so useful for crypto is that one can easily generate hard instances. For GI you cannot (at least not yet). I think ChatGPT confused GI being in Zero-Knowledge with GI actually being used for Crypto. More on this later.

Database Indexing: In databases, especially graph databases, graph isomorphism helps in query optimization and efficient data retrieval by identifying identical or similar graph structures.

BILL'S OPINION: I can believe this! Gell-man Amnesia? 

These applications demonstrate the wide-ranging impact of graph isomorphism beyond theoretical computer science, influencing fields where structural and relational data analysis is crucial.


LANCE: Can you give me specific examples of these?

(BILL: I won't comment on these, except in one case,  as it would be the same comments as above.) 

ChatGPT:

Certainly! Here are specific examples for the practical applications of graph isomorphism mentioned:

Chemical Informatics: A classic example is the identification of isomers in chemistry. Isomers are compounds with the same molecular formula but different structures. Graph isomorphism algorithms can distinguish between different isomers by comparing their graph representations, where vertices represent atoms and edges represent chemical bonds. For instance, butane and isobutane are structural isomers with different connectivity among the same set of atoms, which can be identified as distinct through graph isomorphism.

Pattern Recognition: In character recognition systems, letters or symbols can be represented as graphs, where the strokes are edges and intersections are vertices. Graph isomorphism algorithms can then be used to match a scanned, handwritten, or printed character against a database of graph-represented characters to identify the letter or symbol.

Network Analysis: In social network analysis, researchers might use graph isomorphism to identify similar subnetworks within larger networks. For example, detecting communities or cliques within social networks that have identical or similar interaction patterns, which can be crucial for understanding social dynamics or the spread of information.

Computer Vision: An application in computer vision could be recognizing objects regardless of their orientation in an image. By representing objects as graphs, where vertices correspond to object features and edges to spatial relationships between these features, graph isomorphism can help in identifying an object by finding a match between its graph representation and that of a known object.

Cryptography: A specific use case in cryptography is the design of cryptographic primitives based on the Graph Isomorphism problem, such as the Goldreich-Goldwasser-Halevi (GGH) cryptosystem. This cryptosystem constructs a public key from a hard instance of a lattice problem, which is closely related to finding isomorphisms between noisy graph representations, leveraging the computational difficulty of graph isomorphism for security.

BILL: The pointer to the Wikipedia page on the GGH system is here. It has nothing to do with graph isomorphism. So this one is complete garbage.

Database Indexing: In the context of graph databases used for storing knowledge graphs or social network data, an example application is the optimization of query processing. When a user queries the database to find data patterns that match a specific graph structure, graph isomorphism algorithms can help quickly locate all instances of the data that are isomorphic to the query graph, enhancing search efficiency and performance.

These examples highlight how graph isomorphism plays a crucial role in solving real-world problems across different domains by leveraging the structure and properties of graphs.

LANCE: Give me a specific research paper in Chemistry that uses graph isomorphism

ChatGPT
A notable research paper that applies graph isomorphism in the field of chemistry is:

Title: "Development and Application of a Novel Graph-Theoretical Approach for Molecular Similarity Analysis"

Authors: Raymond E. Carhart, David H. Smith, Robert Venkataraghavan


Journal: Journal of Chemical Information and Computer Sciences

Year: 1985, Volume: 25, Issue: 1, Pages: 64–73, DOI: 10.1021/ci00045a002

Abstract:

In this paper, the authors introduce a novel method for analyzing molecular similarity based on graph-theoretical concepts. They present an algorithm, known as the MCS (Maximum Common Subgraph) algorithm, which is used to identify the largest subgraph common to two chemical compounds represented as graphs. This process relies on graph isomorphism to determine the structural similarity between molecules. The approach was groundbreaking at the time because it provided a quantitative method for assessing molecular similarity, which is crucial for drug discovery and the development of chemical databases.

The significance of this work lies in its application of graph isomorphism to solve a practical problem in chemistry, demonstrating the interdisciplinary nature of graph theory and its potential to impact research beyond theoretical computer science. This paper has been influential in the fields of cheminformatics and drug design, where molecular similarity plays a key role in identifying potential therapeutic agents.

BILL'S COMMENT: This information from ChatGPT would be more impressive if this paper actually existed. 

(ADDED LATER- there is some debate in the comments about if the paper exists. 

DOES NOT EXIST: Bill could not find it on the web.

DOES NOT EXIST: Bill went to the website of that journal and did a search and did not find it. 

DOES NOT EXIST: The DOI does not point to it.

DOES EXIST: Some slide packet pointed to it as a reference. 

If someone leaves a comment with a pointer to it, or emails me a pointer to it, I will update this post.)





Thursday, February 15, 2024

Change

An academic field often organizes itself pulling ideas from its own field. For example, students on the economics faculty job search can signal at most two schools, without giving any other rules on how the signals should be used, allowing equilibrium behavior to rule.

In computer science, our field is decentralized without any single organization setting policies or much coordination. We're very data oriented, so we do a good job like with the Taulbee Survey. And because we're a field based on programming and logic, we tend to outsource decision to specific rules, formulas for ranking departments, judging faculty productivity (h-index, major conference publications, course evaluations) and graduation requirements.

Computing itself has changed over the last decade with the growth of artificial intelligence and learning from data. We see many examples when computers trained on data often dramatically out-perform those built on logic, certainly in say machine translation, facial recognition and spam detection for example.

Will the changes in computing change the way we think about how we run our field? Probably, but over a long period of time. People don't change their ways, but eventually the people change.

Sunday, February 11, 2024

Are there any REAL applications of Graph Isomorphism?

Lance's post on Babai's result on Graph Isomorphism (henceforth GI) inspired some random thoughts on GI. (Lance's post is here.) 

1) Here is a conversation with someone who I will call DAVE. This conversation took place in the late 1980's. 

BILL: You got your ugrad degree in engineering and then went to CS grad school. You are a pragmatic guy. So why did you work on graph isomorphism?

DAVE: My advisor was working on it. The 1980's was a good time for GI: the problems restricted to bounded genus, bounded degree, and bounded eigenvalue multiplicity were proven to be in P. Tools from Linear Algebra and Group Theory were being used. People thought that GI might be shown to be in P within the next five  year. Then it all stopped for quite some time. 

But back to your question about GI being practical, I asked my advisor for real applications of GI. He said:  

Organic Chemists use graph isomorphism to match chemical structures.

By the time I found out this wasn't true, I already had my PhD. I quickly switched to Computational Geometry which really does have applications. Maybe. 

2) So why was graph isomorphism studied? Much to my surprise, a survey by Grohe and Schweitzer (see here) says that 

Graph isomorphism as a computational problem first appeared in the chemical documentation literature of he 1950's (for example Ray and Kirsh) as the problem of matching a molecular graph against a database of such graphs.

(The article by Ray and Kirsh is here.) 

So at one time it was thought that GI would apply to Chemistry. Did it? I suspect some simple algorithms and heuristics  did, but (1) chemicals are more complicated than graphs, and (2) by the time we are talking about graph isomorphism of bounded eigenvalue multiplicity, the algorithms got to complicated to really use. BUT these are just my suspicions (what is the difference between a suspicion and a guess?) so I welcome comments or corrections.

2) The same article also says:

Applications [of GI] spans a broad field of areas ranging from Chemistry to computer vision. Closely related is the problem of detecting symmetries of graphs and of general combinatorial structures. Again this has many application domains, for example, combinatorial optimization, the generation of combinatorial structures, and the computation of normal forms. 

That sounds impressive, but I would like to know if the applications are to the real world, or are they do other theory things. 

3) The prominence of GI in the CS theory literature is because its one of the few natural problems that is in NP, not in P, but not known to be (and unlikely to be) NP-complete.

4) Graph Isomorphism IS a natural problem. What is an unnatural problem? 

BILL: Do you find the following interesting: There is an r.e. set that is not decidable and not Turing-equivalent to HALT.

DARLING: Yes. Unless it was some dumb-ass set that people like you constructed for the whole point of being r.e., not decidable, and not Turing-equivalent to HALT. 

BILL: You nailed it!
 
5) So, is GI in P? This is truly open question in that it really could go either direction. There is no real consensus. 
 
6) I have heard that Babai's result is as far as current techniques can take us. So we await a new idea.

Wednesday, February 07, 2024

Favorite Theorems: Graph Isomorphism

We start our favorite theorems of the last decade with a blockbuster improvement in a long-standing problem. 

Graph Isomorphism in Quasipolynomial Time by László Babai

The graph isomorphism problem takes two graphs G and H both on n nodes and asks if there a permutation \(\sigma\) such that \((i,j)\) is an edge of G if and only if \((\sigma(i),\sigma(j))\) is an edge of H. While we can solve graph isomorphism in practice, until Babai's paper we had no known subexponential algorithm for the problem.

Graph Isomorphism has a long history in complexity. It has its own self-named complexity class in the zoo. It's one of the few decision problems in NP not known to be NP-complete or in P. Graph Isomorphism and its complement are the poster child problems for complexity classes AM, SZK and SPP, public-coin proof systems, and the minimum-size circuit problem. Köbler, Schöning and Torán wrote a whole book on the structural complexity of graph isomorphism.

Graph Isomorphism has the kind of group structure that suggests a quick quantum algorithm, but that remains an annoying open problem. 

Babai found a simple algorithm and proof based on Johnson Graphs. Who am I kidding? You would need several months to work through the paper even for an expert group theorist. 

Babai has been working on graph isomorphism since the 80's. In November 2015, he announced a talk with that title and the theory world went crazy. They needed a larger room. We invited Babai down to Georgia Tech to give a talk. A week before he sends me an email saying the proof no longer gave the quasipolynomial time bound and asks if he should still give the talk. We say of course and he gives the talk on January 9, 2017, describing the problem with the proof. And then he announces "It's been fixed". I witnessed history in the making. 

Sunday, February 04, 2024

The advantage of working on obscue things

 I got an email from an organization that wants to publicize one of my papers. Which paper did they want to publicize?

1) If the organization was Quanta, they are KNOWN so I would trust it. Indeed, they have contacted me about my paper proving primes infinite from Ramsey theory (they contacted all of the authors of papers that did that). The article they published is here and I blogged about those proofs here.

2) Someone writing an article on my muffin work for a recreational journal contacted me- also believable.

3) But the email I got was from an organization I never heard of and they wanted to publicize An NP-Complete Problem in Grid Coloring. This  is not a topic of general interest, even among complexity theorist or Ramsey theorists.  Indeed, the paper has three authors and only two of them care about it. Hence I suspected the organization was a scam or quasi-scam. I looked them up and YES, at some point they would have wanted money for their efforts. 

I want to say none of my work is of general interest but that may depend on what is meant by general and by interest. Suffice to say that if a scammer claims they want to publicize my work and picks a random paper to ask me about, the probability that its one where I will say Yes, I can see that one being worth getting to a wider audience is negligible. Indeed, Darling was amazed that Quanta cared about using Ramsey Theory to prove primes infinite. She may have a point there.

But if I worked in Quantum or ML or Quantum ML  then it would be harder for me to say for a random paper they can't possibly think that this was wider appeal, so its obviously a scam. 

That is the advantage of working on obscure things.

Wednesday, January 31, 2024

University Challenges

Seems like US universities have been in the news quite a bit recently. You'd think for the great academics and research. Alas, no.

I decided to make a list of the not so unrelated topics. The list got long and still from from complete.

  • Public perception of universities has been dropping
  • Enrollment: Up in 2023 for the first time since the pandemic. But still down 900,000 undergrads since five years ago and the demographic cliff is just a couple years away.
  • Fiscal Challenges even at Penn State and University of Chicago.
  • On the other hand are those with huge endowments.
  • Where have the men gone? While CS is still predominantly male, men make up only about 40% of undergrads on four-year colleges. The percentages are lower for African-American and Hispanic men.
  • The increase in teaching faculty over tenure-track. 
  • AI - How best to use it to help the educational process without letting students cheat, and where do you draw the line. 
  • State government exerting control. We are seeing a number of conservative states pushing back against DEI and the perceived liberal bias.
  • Affirmative Action - How do universities maintain a diverse student body in the wake the supreme court ruling last summer.
  • Admissions policies that favor the rich, notably legacy admissions and sports other than football and basketball, where wealthier kids have the time, training and equipment to succeed.
  • SAT - Most schools have eliminated the SAT exam but should they bring it back
  • College is seen more as a place to build career skills. STEM fields especially in computing have seen huge gains in enrollment while humanities and social sciences have been decimated. How should colleges respond?
  • Competition from certificate programs, online degrees, apprenticeships and boot camps.
  • Athletics - Chasing ever-increasing broadcast revenue has restructured conferences (Goodbye Pac-12). Name-Image-Likeness has made some college athletes, notably in football and basketball, significant money. Alumni collectives and easier transfer rules have turned college student athletes into free agents. Meanwhile some lower revenue sports are getting cut.
  • Colleges as a political football as college graduates trend democratic.

The Israel-Gaza-Hamas conflict alone has supercharged a number of issues.
  • When and how should universities take a stand on political issues. 
  • Congressional hearings into university policies
  • Free speech, especially when it creates disruption, makes people feel unsafe and leads to discrimination. Where do you draw the line?
  • Activist donors and boards. As universities rely more on "net-high worth individuals", these large donors can hold considerable sway, including pushing out presidents. 
So as a college professor or graduate student, how do you deal with all of the above? Best to ignore it all and focus on your research and teaching.

Sunday, January 28, 2024

Certifying a Number is in a set A using Polynomials

 (This post was done with the help of Max Burkes and Larry Washington.)


During this post \(N= \{0,1,2,\ldots \}\) and  \(N^+=\{1,2,3,\ldots \}\).

Recall: Hilbert's 10th problem was to (in todays terms) find an algorithm that would, on input a polynomial \(p(x_1,\ldots,x_n)\in Z[x]\), determine if there are integers \(a_1,\ldots,a_n\) such that \(p(a_1,\ldots,a_n)=0\).

From the combined work of Martin Davis, Yuri Matiyasevich, Hillary Putnam, and Julia Robinson it was shown that there is no such algorithm.  I have a survey on the work done since then, see here
The following is a corollary of their work:


Main Theorem:
Let \(A\subseteq N^+\) be an r.e. set. There is a polynomial \(p(y_0,y_1,\ldots,y_n)\in Z[y_0,y_1,\ldots,y_n]\) such that

\((x\in A) \hbox{ iff } (\exists a_1,\ldots,a_n\in {\sf N})[(p(x,a_1,\ldots,a_n)=0) \wedge (x> 0)] \}.\)

Random Thoughts:

1) Actual examples of polynomials \(p\) are of the form

\(p_1(y_0,y_1,\ldots,y_n)^2 + p_2(y_0,y_1,\ldots,y_n)^2 + \cdots + p_m(y_0,y_1,\ldots,y_n)^2\)

as a way of saying that we want \(a_1,\ldots,a_n\) such that the following are all true simultaneously:

\(p_1(x,a_1,\ldots,a_n)=0\), \(p_2(x,a_1,\ldots,a_n)=0\), \(\ldots\), \(p_m(x,a_1,\ldots,a_n)=0\),

2) The condition \(x>0\) can be phrased

\((\exists z_1,z_2,z_3,z_4)[x-1-z_1^2-z_2^2-z_3^2-z_4^2=0].\)

This phrasing uses that every natural number is the sum of 4 squares.


The Main theorem gives a ways to certify that \(x\in A\): Find \(a_1,\ldots,a_n\in Z\)
such that \(p(x,a_1,\ldots,a_n)=0\).

Can we really find such \(a_1,\ldots,a_n\)?

A High School student, Max Burkes, working with my math colleague Larry Washington, worked on the problem of finding \(a_1,\ldots,a_n\).

Not much is known on this type of problem. We will see why soon. Here is a list of what is known.

1) Jones, Sato, Wada, Wiens (see here) obtained a 26-variable polynomial \(q(x_1,\ldots,x_{26})\in Z[x_1,\ldots,x_{26}]\) such that

\(x\in \hbox{PRIMES iff } (\exists a_1,\ldots a_{26}\in N)[(q(a_1,\ldots,a_{26}=x) \wedge (x>0)].\)

To obtain a polynomial that fits the main theorem take

\((x,x_1,\ldots,x_{26},z_1,z_2,z_3,z_4)=(x-q(x_1,\ldots,x_{26}))^2+(x-z_1^2+z_2^2+z_3^2+z_4^2)^2.\)

Jones et al. wrote the polynomial \(q\) using as variables \(a,\ldots,z\) which is cute since thats all of the letters in the English Alphabet. See their paper pointed to above, or see Max's paper here.

2) Nachiketa Gupta, in his Masters Thesis, (see here) tried to obtain the the 26 numbers\ (a_1,\ldots,a_{26}\) such that \(q(a_1,\ldots,a_{26})=2\) where \(q\) is the polynomial that Jones et al. came up with.  Nachiketa Gupta found  22 of them.  The other 4 are, like the odds of getting a Royal Fizzbin, astronomical.  Could todays computers (21 years later) or AI or Quantum or Quantum AI obtain those four numbers?  No, the numbers are just to big.

3) There is a 19-variable polynomial \(p\) from the Main Theorem for the set

\(\{ (x,y,k) \colon x^k=y\}.\)

See Max's paper here Page 2 and 3, equations 1 to 13. The polynomial \(p\) is the sum of squares of those equations. So for example \(r(x,y,z)=1\) becomes \((r(x,y,z)-1)^2\).

Max Burkes found the needed numbers to prove \(1^1=1\) and \(2^2=4.\) The numbers for the \(2^2=4\) are quite large, though they can be written down (as he did).

 
Some Random Thoughts:

1) It is good to know some of these values, but we really can't go much further.

2) Open Question: Can we obtain polynomials for primes and other r.e. sets so that the numbers used are not that large. Tangible goals:

(a) Get a complete verification-via-polynomials that 2 is prime.

(b) The numbers to verify that \(2^3=8\).

3) In a 1974 book about progress on Hilbert's problems (I reviewed it in this book rev col, see here) there is a chapter on Hilbert's 10 problem by Davis-Matiyasevich-Robinson that notes the following. Using the polynomial for primes, there is a constant \(c\) such that, for all primes \(p\) there is a computation that shows \(p\) is prime in \( \le c \) operations. The article did not mention that the operations are on enormous numbers.

OPEN: Is there some way to verify a prime with a constant number of operations using numbers that are not quite so enormous.