Sunday, February 25, 2024

When is it worth the time and effort to verify a proof FORMALLY?

(This post was inspired by Lance's tweet and later post on part of IP=PSPACE being formally verified.) 

We now have the means to verify that a proof (prob just some proofs)  is correct (one could quibble about verifying the verifier but we will not address that here). 

The Coq proof assistant (see here). This was used to verify the proof of the four-color theorem, see here.

The Lean Programming Language and Proof Verifier (see here). It has been used to verify Marton's Conjecture which had been proven by Gowers-Green-Tao (see here for a quanta article, and here for Terry Tao's blog post about it.) 

The HOL (Higher Order Logic)  is a family of interactive theorem proving systems (see here). The Wikipedia entry for HOL (proof assistant) (see here) says: 

The CakeML project developed a formally proven compiler for ML [ML is a programming language, not to be confused with Machine Learning.]

The ISABELLE is an HOL system. See here. The Wikipedia page says that it is used to verify hardware and software. It has recently been used to verify the Sumcheck Protocol which was developed to help prove IP=PSPACE (See here.)One of the authors of the IP=PSPACE paper, Lance Fortnow, said of this

Now I can sleep soundly at night.

The Kepler Conjecture was proven by Thomas Hales. (Alfred Hales is the co-author of the Hales-Jewitt. I do not know if Thomas and Alfred Hales are related, and Wikipedia does not discuss mathematicians blood-relatives, only their academic ancestors.)  The proof was... long. Over a period of many years it was verified by HOL light and Isabelle. See here. (Note- the paper itself says it used HOL-light and Isabelle, but the Wikipedia site on Lean says that Hales used Lean.) 

I am sure there are other proof assistants and other theorems verified by them and the ones above. 

My question is 

Which proofs should we be spending time and effort verifying? 

Random Thoughts

1) The time and effort is now much less than it used to be so perhaps the question is moot.

2) Proofs that were done with the help of a program (e.g., the four-color theorem) should be verified.

3) The theorem has to have a certain level of importance that is hard to define, but item 1 might make that a less compelling criteria. 

4) A proof in an area where there have been mistakes made before should be verified. That is the justification given in the paper about verifying part of the IP=PSPACE proof.

5) A rather complicated and long proof should be verified. That was the case for Marton's Conjecture. 

6) A cautionary note: Usually when a long and hard proof comes out, people try to find an easier proof. Once a theorem is proven (a) we know its true (hmmm..) and (b) we have some idea how the proof goes. Hence an easier proof may be possible. In some cases just a better write up and presentation is done which is also a good idea. I wonder if after having a verifier say YES people will stop bothering getting  easier proofs.

7) Sometimes the people involved with the original proof also were involved in the verification (Kepler Conjecture, Marton's Conjecture) and sometimes not (IP=PSPACE, 4-color theorem). 

8) When I teach my Ramsey Theory class I try to find easier proofs, or at least better write ups, of what is in the literature. I sometimes end up with well-written but WRONG proofs. The students, who are very good, act as my VERIFIER. This does not always work:  there are still some students who, 10  years ago, believed an  incorrect proof of the a-ary Canonical Ramsey Theorem, though they went on to live normal lives nonetheless. Would it be worth it for me to use a formal verifier? I would guess no, but again, see item (1). 

9) Might it one day be ROUTINE that BEFORE submitting an article you use a verifier? Will using a verifier be as easy as using LaTeX? 

10) The use of these tools for verifying code--- does that put a dent in the argument by Demillo-Lipton-Perlis (their paper is here, and I had two blog posts on it here and here) that verifying software is impossible?'

11) HOL stands for High Order Logic. I could not find out what Isabelle, Lean, or Coq stand for. I notice that they all use a Cap letter then small letters so perhaps they don't stand for anything. 


  1. Actually Wikipedia biographies often do discuss mathematical blood relatives, when they can be properly sourced. For instance the articles on the three Kuperbergs all mention each other. If you can find a publication describing a relation between Alfred Hales and Thomas Hales, please let me know.

    1. Before posting I did many google searches to find out if they are related but could not find anything. They are both still alive so I could email or call them, though that seems nosy. Hopefully one of them will see this post!

  2. Isn't this really just the same judgement we make all the time as to whether a proof should add more detail? We balance the amount it would add to our convinction the proof was correct with the effort and other costs.

    But the price for formal verification is still absurdly high.

  3. Also Coq is being renamed to Rocq (short for Rocquencourt) because (according to its previous name led to women being harassed with dick jokes and leaving the project.

  4. > The Coq proof assistant (see here). This was used to verify the proof of
    > the four-color theorem, see here.


    one might wonder what exactly they proved. Looking at

    we find

    "Definition 4. A point is a corner of a map if and only if it belongs to the closures of at least three regions.

    "The definition of 'corner' allows for contiguous points, to allow for boundaries with accumulation points, such as the curve sin 1/x."

    I guess they can make any definitions they want to, but that isn't how I would define "corner".

  5. The end goal should be that all mathematical results, historical and current, and their proofs are formalized (or, the result shown not to have been proved.)

  6. Terry Tao gave a great talk on machine-assisted proofs at the Joint Math Meetings last month. See:

    He describes the whole process as taking roughly 3 weeks for the Marton-polynomial-Freiman-Rusza conjecture resolution involving many individuals, including entering a chunk of the core math background. That effort made it much easier to subsequently improve a constant (from 12 to 11), close some small gaps in how they had stated things, and also identified a simpler intermediate property involving a smaller hammer for part of the proof (since the large hammer they had used would have been a lot more work in Lean) that seems likely to be useful elsewhere.

    He suggests that while LLMs are wholly unreliable in coming up with sound arguments themselves right now, using them to help with converting things to Lean is already useful.

    LLMs may also be useful in converting Lean arguments into readable math proofs.

    Obviously, these big theorems are worth spending the time now - it just requires the theorem to be sufficiently attractive for the community to take part. (My current favorite to have this done would be last year's bounds for 3-term arithmetic progressions of Kelley-Meka with simplifications, for example.) The implication is that as the body of math already in Lean grows, the extra effort is going to get smaller quite soon.

  7. Coq got its name from the first 3 letters of one of the authors of the original paper.

  8. Somewhat relevant to #6 is:

    This is more about what it means to be a proof, i.e., that understanding a proof is a lot more than just formalizing each step, than the details of proof formalization.

    I'd guess that just because a verifier says that a particular proof is correct, it wouldn't mean that mathematicians stop looking for better proofs, i.e. proofs that say something deeper about the mathematics involved in the subject of the statement being proved.

    (Think Langlands Program here.)

    An obvious example is FLT, whose proof involves translating the statement of the problem into a previously thought unrelated area of mathematics, grinding the resulting problem in that area of mathematics, and then translating back into the original form.