Sunday, February 11, 2024

Are there any REAL applications of Graph Isomorphism?

Lance's post on Babai's result on Graph Isomorphism (henceforth GI) inspired some random thoughts on GI. (Lance's post is here.) 

1) Here is a conversation with someone who I will call DAVE. This conversation took place in the late 1980's. 

BILL: You got your ugrad degree in engineering and then went to CS grad school. You are a pragmatic guy. So why did you work on graph isomorphism?

DAVE: My advisor was working on it. The 1980's was a good time for GI: the problems restricted to bounded genus, bounded degree, and bounded eigenvalue multiplicity were proven to be in P. Tools from Linear Algebra and Group Theory were being used. People thought that GI might be shown to be in P within the next five  year. Then it all stopped for quite some time. 

But back to your question about GI being practical, I asked my advisor for real applications of GI. He said:  

Organic Chemists use graph isomorphism to match chemical structures.

By the time I found out this wasn't true, I already had my PhD. I quickly switched to Computational Geometry which really does have applications. Maybe. 

2) So why was graph isomorphism studied? Much to my surprise, a survey by Grohe and Schweitzer (see here) says that 

Graph isomorphism as a computational problem first appeared in the chemical documentation literature of he 1950's (for example Ray and Kirsh) as the problem of matching a molecular graph against a database of such graphs.

(The article by Ray and Kirsh is here.) 

So at one time it was thought that GI would apply to Chemistry. Did it? I suspect some simple algorithms and heuristics  did, but (1) chemicals are more complicated than graphs, and (2) by the time we are talking about graph isomorphism of bounded eigenvalue multiplicity, the algorithms got to complicated to really use. BUT these are just my suspicions (what is the difference between a suspicion and a guess?) so I welcome comments or corrections.

2) The same article also says:

Applications [of GI] spans a broad field of areas ranging from Chemistry to computer vision. Closely related is the problem of detecting symmetries of graphs and of general combinatorial structures. Again this has many application domains, for example, combinatorial optimization, the generation of combinatorial structures, and the computation of normal forms. 

That sounds impressive, but I would like to know if the applications are to the real world, or are they do other theory things. 

3) The prominence of GI in the CS theory literature is because its one of the few natural problems that is in NP, not in P, but not known to be (and unlikely to be) NP-complete.

4) Graph Isomorphism IS a natural problem. What is an unnatural problem? 

BILL: Do you find the following interesting: There is an r.e. set that is not decidable and not Turing-equivalent to HALT.

DARLING: Yes. Unless it was some dumb-ass set that people like you constructed for the whole point of being r.e., not decidable, and not Turing-equivalent to HALT. 

BILL: You nailed it!
 
5) So, is GI in P? This is truly open question in that it really could go either direction. There is no real consensus. 
 
6) I have heard that Babai's result is as far as current techniques can take us. So we await a new idea.

14 comments:

  1. "So at one time it was thought that GI would apply to Chemistry. Did it? I suspect some simple algorithms and heuristics did, but (1) chemicals are more complicated than graphs"
    No problem, the isomorphism of complicated (but finite) structures can be reduced to graph isomorphism. The question you should ask yourself is whether there are real world applications for determining whether two different descriptions of a complicated (but finite) structure actually describe the same structure. I guess the answer is: "yes, but it would be even more practical to get a unique string for each single structure, such that one could hash or do other types of quick database searches". This even more useful operation is called "computing a canonical label". The famous nauty and Traces can do this. In fact, they can do graph canonization (finding a canonical form of a given graph), typically described by a canonical order of the nodes. It is not obvious whether Babai's result transfers to graph canonization, but I have seen remarks that at least it should transfer to "computing a canonical label".

    ReplyDelete
  2. As far as a "practical application", I use graph isomorphism, specifically its implementation in the Python package networkx (https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.isomorphism.is_isomorphic.html#networkx.algorithms.isomorphism.is_isomorphic), to code up auto-graded homework assignments for my Theory of Computation course.

    (Note: networkx does not use Babai's algorithm, it uses something called "vf2"... not sure if that's because Babai's constants are too large to be practical, or if there's some other reason.)

    For example, I ask the students to do the subset construction to convert an NFA into an equivalent DFA, and I use this package to test, regardless of the names they chose to assign to the DFA states, whether it is isomorphic to the DFA that results from the "official" subset construction algorithm.

    I'm not sure if grading problems in a course on Theoretical Computer Science, using algorithms developed in Theoretical Computer Science, counts as a practical application of Theoretical Computer Science, but it all feels very practical to me as that autograding code hums away, and the students get their DFAs graded without us having to read over it.

    Full disclosure: even if Graph Isomorphism required n! time to solve, since we only give the students 4-state NFAs to convert, we could test the resulting 16-state DFAs for isomorphism using the naive brute-force algorithm taking time 16-factorial. But if we did ever decide to be cruel and ask them to convert 20-state NFAs into 20!-state DFAs, we could in principle test those 20!-state DFAs in a reasonable about of time.

    ReplyDelete
    Replies
    1. Really? 16! ~ 20 trillion. Sounds slow. (My heuristic is that usually anything beyond 12! ~ 500 billion is out of scope without a cluster.)

      Delete
  3. One moment, LMGTFY :-)

    I haven't checked carefully, but the following paper seems to make use of heuristics for graph isomorphism (and contains many images of non-isomorphic graphs along with impressive formulas towards the end):
    M Borinsky, O Schnetz: Recursive computation of Feynman periods. Journal of High Energy Physics, 2022(8):1-83
    https://link.springer.com/content/pdf/10.1007/JHEP08(2022)291.pdf

    To dig up that paper (and many others of similar nature), I used google scholar to find papers that cite Brendan McKays heuristic that implements GI - practical graph isomorphism, II. (IIRC this had been the "latest and greatest" practical implementation some years ago.)

    ReplyDelete
  4. Hi Bill,

    I'm currently in the process of interviewing computational chemists to get to the bottom of this, as one chapter of a book I'm writing (called "Practical Math for Programmers" subtitle: "A Tour of Math in Production Software"). Would you be open to reading that draft and providing feedback?

    ReplyDelete
    Replies
    1. (This is Bill) Email me privately to discuss.

      Delete
  5. There are three reasons why GI is important, two of them have already been mentioned in your post and the comments.

    1. GI is a universal problem for finite structures in the sense that isomorphism of any finite structure (including groups) can be reduced to GI.

    2. If GI is not in P, then it resolves the P vs NP question, which could lead to provable one way functions, which are critical for cryptography, which is critical for computer security.

    If GI is in P, it could show potential vulnerabilities of factoring and other cryptographic functions.

    3. (If you build it, they will come.) The resolution of GI, which has been open for decades, must need new techniques. These would be directly or indirectly useful for other problems, including cryptography.

    ReplyDelete
  6. How does GI play a role in this theorem?

    ReplyDelete
  7. There is too much misguided emphasis on solving real-world stuff using exact computation when all we can even ever hope for is a very crude approximation (a fact well known since Newton and Leibniz's days).
    A two-link simple pendulum is a chaotic system from a physicists perspective but a two-year old with a zillion joints (in an inverted pendulum configuration) can do stairs, ski, skate, bicycle, swim and what not; this person will have occasional trips and falls (the underlying system is no doubt 'chaotic') but will carry on and keep doing all this and so much more till they're 100 years old.

    ReplyDelete
  8. There is practical code for Graph Isomorphism (Nauty, see http://users.cecs.anu.edu.au/~bdm/nauty/), which doesn't involve any of the theory of the provably fast algorithms that works very well in practice. Less capable programs were used for years in digital design where edits of circuit layouts would need to be checked quickly to see if no errors were introduced. I know it has been used in formal verification to find automorphisms of logical structures to avoid repeating the same work repeatedly.

    This practical utility has nothing to do with the recent fast algorithms, which handle "corner" cases that seemed really hard.

    ReplyDelete
    Replies
    1. Simply writing code to solve a problem doesnt necessarily mean it's applied. I would love to hear about the specific places where it's used in formal verification. I was under the impression that was more on the side of SAT solving.

      Delete
    2. See, e.g.: https://doi.org/10.1109/ICCAD.1988.122520

      Delete
    3. I was able to find one software package that mentions the use of GeminiII: https://www.staticfreesoft.com/

      Looks like it's quite old (last release was maybe in the 2000's?). Meanwhile, I found this on Wikipedia (https://en.wikipedia.org/wiki/Layout_Versus_Schematic):

      > The need for such programs [comparing circuit layout to schematic] was recognized relatively early in the history of ICs, and programs to perform this comparison were written as early as 1975.[1] These early programs operated mainly on the level of graph isomorphism, checking whether the schematic and layout were indeed identical. With the advent of digital logic, this was too restrictive, since exactly the same function can be implemented in many different (and non-isomorphic) ways. Therefore, LVS has been augmented by formal equivalence checking, which checks whether two circuits perform exactly the same function without demanding isomorphism.

      This matches my intuition that modern hardware synthesis tools (like, say Yosys or ABC) use SAT solving and formal methods to check for logical equivalence. It's also a sort of "worst case" for an applied math technique when the reason the technique was abandoned is because the technique was not sufficient to solve the underlying problem.

      Delete
  9. Update: I found one!

    Integer linear solvers use nauty/sassy to handle ILPs with highly symmetric feasible regions, so as to prune the branch-and-bound search tree (eliminating equivalent symmetric subproblems) and speed up the search. See Symmetry Handling in Mixed-Integer Programming by Jim Ostrowski (doi:10.1002/9780470400531.eorms0863) and the topic of "Orbital branching". I confirmed these techniques are applied in the open source solver packages SCIP, CBC/Coin-OR.

    ReplyDelete