tag:blogger.com,1999:blog-3722233.post334620401324981723..comments2024-08-02T16:56:41.757-05:00Comments on Computational Complexity: ChatGPT thinks Graph Isomorphism has real applications. Is it right? Lance Fortnowhttp://www.blogger.com/profile/06752030912874378610noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-3722233.post-8154292860643406152024-02-24T14:58:48.071-06:002024-02-24T14:58:48.071-06:00https://www.youtube.com/watch?v=y6dUkCxlrd8https://www.youtube.com/watch?v=y6dUkCxlrd8j2kunhttps://www.blogger.com/profile/08041921049916424212noreply@blogger.comtag:blogger.com,1999:blog-3722233.post-13772795658067913922024-02-21T11:21:23.528-06:002024-02-21T11:21:23.528-06:00Especially ChatGPT-3.5 has a tendency to confabula...Especially ChatGPT-3.5 has a tendency to confabulate references, ChatGPT-4 is better, but for such exact/discrete questions it is wise not to ask a lossy compressed language model :)<br /><br />The paper it tried to refer to seems to be: https://pubs.acs.org/doi/pdf/10.1021/ci00046a002 "Atom pairs as molecular features in structure-activity studies: definition and applications" Raymond E. Carhart, Dennis H. Smith, and R. Venkataraghavan (J. Chem. Inf. Comput. Sci. 1985, 25, 2, 64–73)<br /><br />So the idea was in the relevant latent direction, but the execution less so.<br /><br />Really appreciated your Frozen Kolmogorov Concentrate intuition pump for language models.<br /><br />I once used subgraph isomorphism to detect credit card/transaction fraud, but mostly focused on graph edit distance because real-life graphs often not exactly the same, but can be very similar.Anonhttps://www.blogger.com/profile/01360755535659830635noreply@blogger.comtag:blogger.com,1999:blog-3722233.post-16803947083048573802024-02-19T15:01:11.399-06:002024-02-19T15:01:11.399-06:00Regarding "Graph canonicalization" as &q...Regarding "Graph canonicalization" as "give a string/hash identifier"<br />(see j2kun's comment) versus "canonisation" as "canonical order of a graph". The abstracts of the following two papers explain a close relationship between "a logic capturing PTIME," "Choiceless Polynomial Time with counting (CPT)," and "canonisation," which is absent for the weaker "give a string/hash identifier":<br /><br />Choiceless Computation and Symmetry: Limitations of Definability (https://arxiv.org/abs/2401.07147) by Benedikt Pago<br />The search for a logic capturing PTIME is a long standing open problem in finite model theory. One of the most promising candidate logics for this is Choiceless Polynomial Time with counting (CPT). Abstractly speaking, CPT is an isomorphism-invariant computation model working with hereditarily finite sets as data structures. While it is easy to check that the evaluation of CPT-sentences is possible in polynomial time, the converse has been open for more than 20 years: Can every PTIME-decidable property of finite structures be expressed in CPT? We attempt to make progress towards a negative answer and show that Choiceless Polynomial Time cannot compute a preorder with colour classes of logarithmic size in every hypercube. The reason is that such preorders have super-polynomially many automorphic images, which makes it impossible for CPT to define them.<br /><br /><br />Deep Weisfeiler Leman (https://arxiv.org/abs/2003.10935) by Martin Grohe, Pascal Schweitzer, Daniel Wiebking<br />We introduce the framework of Deep Weisfeiler Leman algorithms (DeepWL), which allows the design of purely combinatorial graph isomorphism tests that are more powerful than the well-known Weisfeiler-Leman algorithm.<br />We prove that, as an abstract computational model, polynomial time DeepWL-algorithms have exactly the same expressiveness as the logic Choiceless Polynomial Time (with counting) introduced by Blass, Gurevich, and Shelah (Ann. Pure Appl. Logic., 1999)<br />It is a well-known open question whether the existence of a polynomial time graph isomorphism test implies the existence of a polynomial time canonisation algorithm. Our main technical result states that for each class of graphs (satisfying some mild closure condition), if there is a polynomial time DeepWL isomorphism test then there is a polynomial canonisation algorithm for this class. This implies that there is also a logic capturing polynomial time on this class.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3722233.post-86979075729907050652024-02-19T14:32:07.108-06:002024-02-19T14:32:07.108-06:00On subgraph isomorphism things are clearer. There ...On subgraph isomorphism things are clearer. There is huge literature on related problems. <br />In network resource orchestration one of the topics of intense research is the embedding of Service Function Chains (SFCs) in large networks, like datacenters. <br />In brief, an SFC is modelled as a small weighted graph and a datacenter as a large weighted graph. Embedding is the mapping of the small graph on a subgraph of the large network so that the resulting embedding will have certain properties, like optimal resource usage. It is a generalization of subgraph isomorphism as we search for the best approximation of an isomorphic subgraph. Of course there are variations to this problem, like letting more than one nodes from the SFC to be mapped in one node of the large network.<br />The problem is NP-hard and we try to apply AI methods to achieve efficient solutions for real time usage. Although even using advanced methods like genetic algorithms it is still hard to get exact solutions; see section 4 in<br />https://doi.org/10.1007/s10922-023-09771-yPantelis Rodishttps://www.blogger.com/profile/11999985308984556045noreply@blogger.comtag:blogger.com,1999:blog-3722233.post-71066661597564525642024-02-19T14:21:42.371-06:002024-02-19T14:21:42.371-06:00"One of my interviewees gave this talk that c..."One of my interviewees gave this talk that covers the history of cheminformatics through 1990." Which talk? Looks like your copy from an email was lossy.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3722233.post-83271733939805301812024-02-19T12:23:27.424-06:002024-02-19T12:23:27.424-06:00(Copying from an email I sent to Bill)
So chemist...(Copying from an email I sent to Bill)<br /><br />So chemists use two related problems, but not vanilla GI as far as I can tell. <br /><br />Induced subgraph isomorphism: used to find the maximal common induced subgraph of a collection of molecular graphs, and in some cases used to do database queries like "which chemicals are available for sale by our suppliers that contain a specific chemical core?" Many use cases here seem to be in ad-hoc visualization and exploration. They want to plot a bunch of chemical graphs in a way that highlights a particular part (the common core) and lets the chemist use their "chemical intuition" to figure out how that relates to a physical reaction. The leading algorithm here is called VF2++, after a line of work from VF->VF2->VF2+->VF2++. It's a branch and bound algorithm.<br /><br />Graph canonicalization: used to give a string/hash identifier for fast database lookups/search. Notably, chemists gave up on using GI for database search because in the time it took to get fast GI solvers like nauty/bliss, the industry came up with a few different kinds of string identifiers for chemicals that are "good enough" for large swaths of drug discovery work. The "SMILES" string is one I read about in detail. Maybe someone with more theory background can explain the relationship between GI and canonicalization. Clearly canonicalization is at least as hard as GI, and chemists probably relax the "canonical" part slightly.<br /><br />I did find two chemistry projects that use GI directly. Both of these projects seem relatively unpopular in the chemistry world compared to the big open source projects whose maintainers I interviewed.<br />https://github.com/StructureGenerator/surge which " generates all non-isomorphic constitutional isomers of a given molecular formula." It was written by Brendan McKay (the author of nauty), so I feel there is a potential conflict of interest and I shouldn't conclude this is a valid use case unless I find someone else using this package, which I have yet to. I have to talk to my chemistry contacts to see if this task is a useful one for most chemists.<br /><br />https://github.com/RuleWorld/bionetgen which is a chemistry modeling framework that uses GI for some sort of data structure optimizations I didn't get to the bottom of.<br /><br />A few other things to mention. In re "chemicals are more complicated than graphs," yes absolutely, but for the most part they annotate the edges and vertices with enough extra information to make it work, and the parts where it cannot work (e.g., knotted molecules, weird quantum effects) they resort to other, more detailed representations, and it's more academic rather than what people use at drug design companies. The general field of trying to infer physical/biological activity from the chemical/graph structure of a molecule is called Quantitative Structure-Activity Relationship, and I found there is a lot of successful work in trying to formulate chemically-relevant graph invariants.<br /><br />One of my interviewees gave this talk that covers the history of cheminformatics through 1990. You might find it enjoyable at least as a love letter to how intertwined the history of cheminformatics and computer science are.<br /><br />Summarizing ChatGPT: it's wrong in the sense that chemists don't use the atomic formula to determine if two compounds are identical, they use the SMILES string or similar, which has enough information to distinguish simple isomers like the one listed.j2kunhttps://www.blogger.com/profile/08041921049916424212noreply@blogger.comtag:blogger.com,1999:blog-3722233.post-53984403783571729522024-02-19T12:17:44.253-06:002024-02-19T12:17:44.253-06:00The only Google hits for "Development and App...The only Google hits for "Development and Application of a Novel Graph-Theoretical Approach for Molecular Similarity Analysis" are tied to this post.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3722233.post-58986112630289940652024-02-19T03:37:36.944-06:002024-02-19T03:37:36.944-06:00At least we ought to be able to agree that the DOI...At least we ought to be able to agree that the DOI that the ChatBot provided is to a completely-unrelated article, entitled "Keeping up with Japanese chemical technology at Chemical Abstracts Service". At least, I assume that it's completely unrelated. It is, indeed, behind a paywall, and the Rutgers library system does not provide me with access to it.<br />Eric Allenderhttps://www.blogger.com/profile/07417405562053586552noreply@blogger.comtag:blogger.com,1999:blog-3722233.post-38351693432577701412024-02-18T19:31:25.928-06:002024-02-18T19:31:25.928-06:00(Bill) Before posting I went to the website of JCI...(Bill) Before posting I went to the website of JCICS and searched for the paper and could not find it. The website I used was<br /><br />https://dl.acm.org/journal/jchm<br /><br />So what is going on? Possibilities<br />1) It is there and either I am not good at seraching or the serach engine is bad. <br />2) The slide packet you pointed to that had the paper as a reference is incorrect.<br /><br />Both are quite plausible. If you find a pointer to the paper (or just the abstract- its prob behind a paywall) then I will correct my post. Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-3722233.post-72812831585985072192024-02-18T17:02:24.222-06:002024-02-18T17:02:24.222-06:00That paper appears to exist: R.E. Carhart, D.H. Sm...That paper appears to exist: R.E. Carhart, D.H. Smith, R. Venkataraghavan JCICS 25:64-73 (1985), and it is referenced here: https://www.rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf in the context of atomic structure similarity.<br />Anonymousnoreply@blogger.com