## Wednesday, December 06, 2023

### Do We Need to Formalize?

Back in the 1990s I explored the system Nuprl for formalizing mathematical proofs. We had a theoretical paper on quickly checking a proof and I wanted to see if we could code it up and make it work on some simple theorems. It never went anywhere.

Automated proof systems have come a long way. Yesterday Terry Tao announced that he has fully formalized his proof with Tim Gowers, Ben Green and Freddie Manners of the Polynomial Freiman-Ruzsa (PFR) conjecture over $${\mathbb F}_2$$ in the Lean proof system. The process from paper to formalization took about three weeks.

It's an impressive feat--major new theorem to full formalization in weeks. There's a nice Quanta article from 2021 about the Lean system formalizing a different result.

Is this just a novelty or do we gain something from the formalization?

Mathematics has done quite well without full formalization of theorems--we trust papers that give enough details of a proof that we know a full formalization is possible but fully formalizing a proof just didn't seem an efficient use of a mathematician's time. Should we not trust the proof of the Green-Tao theorem that there are arbitrarily long arithmetic progressions in the primes if we don't have a full formalization?

It would be interesting if formalizing in Lean led to new theorems or found serious bugs in "proofs" of old theorems.

Maybe AI combined with strong proof systems in the future will be able to take a well-written math paper and find a fully formalized proof or output a logical gap. That would be great--a reviewer can focus on the value of a result instead of its correctness. But we are a long way from unassisted formalization.

Concepts like Turing Machines are hard to fully formulate but somebody did it and there's a lengthy discussion of formalizing computational complexity. So perhaps we can require that anyone who claims a solution to P vs NP needs to formalize their proof before they get to post it to arXiv. For me, that would be the killer app for Lean.

1. I think some killer apps are more like things that we nowadays take for granted when coding. A good IDE can find for you all invocations of a method and makes refactoring somewhat easier. Imagine if you could refactor your proofs just as easily.

As a simple example: suppose you prove some theorem about fields. I need the same result but for commutative rings. Right now, the process is: I have to go through your proof line by line and see if everything still holds for commutative rings. But in a proof assistant, this literally (should) just becomes an act of refactoring, and where the refactoring fails can tell you where you might need to prove a new lemma, or maybe it succeeds and then you're good to go.

And *search*. As more math becomes formalized, it becomes easier to search not based on keywords, but on actual structure/content. Maybe someone proved a combinatorial lemma about something they call widgets, but it's secretly the same as what you call gizmos. If you search for "gizmo" (on Google, MathSciNet, DBLP, etc.) you'd never find it, but if you can search based on the axioms and actual structure of the mathematics, you could.

2. I don't think that we are a long way from unassisted formalization. Probably the next step will be some kind of more human friendly, interactive interface, that can ask you questions while it is processing the paper, but does all the formalization in the background. I think that these systems are within reach. After this works well, and it has enough samples, it can become unassisted in formalizing. And even after that, it will be able to come up with the proofs as well.

3. While a different meaning of '"formalization", formalizing complexity theoretic results in bounded arithmetic (subtheories of Peano Arithmetic corresponding to efficient reasoning), is very important to computational/proof complexity! It is important to us that we formalize our results in bounded arithmetic theories, as these can correspond to algorithms (via witnessing theorems of Sam Buss and others) and proof complexity upper/lower bounds by propositional translations.
I imagine Lean and other proof assistants would be able to take a formally verified proof and then indicate a logical theory in which said proof could be formalized in. That would be incredibly helpful as a tool.

4. There is a lengthy discussion about formalisation of complexity for Lean, but first (impressive) steps have been made in other proof assistants. For instance in Coq already in 2021 (https://drops.dagstuhl.de/opus/volltexte/2021/13915/pdf/LIPIcs-ITP-2021-20.pdf) and in Isabelle just recently (no paper available yet). What makes formalising complexity much harder is that there is a much higher amount of invisible mathematics present than in other kinds of maths. Since in proof assistants one has to spell out both the visible and invisible mathematics, complexity proofs have a very high overhead factor when going from paper to proof assistants.

5. Your post conflates two things: Formalizing proofs to allow checking vs the problem of proof search. These are fundamentally different complexity tasks. We shouldn't expect new proofs from this process; just indications of where we might need to add to our proofs. At heart, proof assistants like Lean are just in the proof checking business. (Unlike complete SAT solvers which are in the proof finding business.)

Your confidence in our proofs seems a bit higher than warranted: What portion of the proofs in a typical theory conference have significant bugs/gaps that the authors did not consider? It is likely considerable. Would it be worthwhile to eliminate time spent by others building on buggy/incorrect arguments? Would it be worthwhile to resolve the controversy about the claimed proof ABC Conjecture that has gone on for a decade already?

One of the reasons that Voevodsky was so keen on formalizing mathematical proofs using Coq was that, in the abstract domain of algebraic geometry he was working on, he felt he no longer had the same level of confidence in the correctness of his own proofs.

I agree that NuPrl and Boyer-Moore were very far from fun to work with and the interaction of "tactics" was a bit of nightmare so your basic sense from your experience is understandable. Somehow computer scientists who want people to trust their software - or want to trust their own software - have sufficient motivation and I believe that things are vastly better these days, though that doesn't still avoid the need to specify tactics.

1. Evangelos Georgiadis7:17 AM, December 08, 2023

@Paul Beame, regarding Voevodsky's reasons on formalizing mathematical proofs using Coq -- fairly accurate, succinct and diplomatic statement. To add a little bit of color with an \epsilon amount of precision, highlighting why Voevodsky warmed up towards automated proof assistants.

Voevodsky was actually (fairly) confident of his proofs (e.g., intrinsically trusted them) -- until he gave a series of lectures in which Pierre Deligne was actively engaged as `proof-check' participant rather than merely spectator.

The unnoticed mistake -- which persisted and propagated in the works of other mathematicians for an extended period before detection -- namely, Proposition 4.23 of paper "Cohomological Theory of Presheaves with Transfers",
[Source: https://www.math.ias.edu/Voevodsky/files/files-original/Dropbox/Published_papers/Motives/Collection/s3.pdf]
later weakened to Lemma 22.10, as seen in lecture 22 entitled "Zariski sheaves with transfers" of "Lecture Notes on Motivic Cohomology" [Source: www.ams.org/bookpages/cmim-2] prompted Voevodsky to reevaluate his own stance.

This experience caused Voevodsky to become more critical and warm up towards the idea of using computer languages to formalize mathematical statements, so as to represent
mathematical structures, proofs in such a way that mathematical knowledge can be archived and logically validated presuming some kind of fixed format. He pioneered his work in Coq rather than Lean.
(A historical footnote for people not familiar with proof checkers: Vladimir's efforts stretched from Feb 2010 to Dec 2013, however, Lean, was first released for research purposes in 2013 by Microsoft Research.)

The incident underscores a common challenge:
confidence/trust in correctness until an unforeseen event exposes a mistake. This dilemma extends to contemporary proof checkers like Lean and its many predecessors. Additionally,
it raises a more pertinent question: What exactly makes us believe that these proof checkers are bug-free
and can be relied upon in an almost trustless manner?
(NB: Similar concerns of different nuance were already raised in the 1980s when the National Security Agency began to believe that European work in formal methods was ahead of the US ... NSA was more interested in understanding
how adversaries can maliciously tinker in these systems.)

This reassessment is crucial when entrusting formalized code to ostensibly trustless, open-source proof assistants. As we delve into the intricate dance between trust and verification, the essay "Reflections on Trusting 'Trustlessness' in the era of 'Crypto'/Blockchains" provides valuable insights into the multifaceted nature of trust within the realm of open-source solutions.

6. Many proof checkers can "extract" runnable code. (It looks like Coq and Isabelle can extract OCaml and Haskell. And Lean is at least partway to extracting programs via C.) I will be curious to see if this gets more widely used in software engineering (compared to e.g. unit tests and "fuzzing".)

Even if you don't have a full specification, being able to say that a slow (but easy to understand) program always returns the same thing as a fast (but tricky to understand) program, seems like a specification many people. Proving this can be tricky, which leads to the
full employment "theorem" for compiler writers.

I'm sure that if this does become popular, there will still be ways programs can be buggy. (But they'll be new and different ways! Progress!)

7. Aside from a field like algebraic geometry and homology and homotopy becoming too complicated for humans, there is also interest in verified software for critical parts of important systems.