In 1998 Hales submitted a computer-assisted proof of the Kepler conjecture, a theorem dating back to 1611. This describes the most efficient way to pack spheres in a box, wasting as little space as possible. It appears the best arrangement resembles the stacks of oranges seen in grocery stores.Should we trust such a computer-assisted proof? I have more faith in a computer properly executing its code more than a mathematician verifying a proof. But what should constitute a legitimate proof when part of that proof is verified by a computer?Hales' proof is over 300 pages long and involves 40,000 lines of custom computer code. When he and his colleagues sent it to the Annals of Mathematics for publication, 12 reviewers were assigned to check the proof. "After four years they came back to me and said they were 99% sure that the proof was correct and they said were they exhausted from checking the proof," Hale says.

As a result, the journal then took the unusual step of publishing the paper without complete certification from the referees.

In most mathematical papers, we don't give formal logical proofs of our theorems. Instead we give a detailed proof that gives all of the necessary ideas to convince the reader that a formal proof would be possible. But, at least with our current technology, that level of informality will not work with computer code. So any proof using computer verification should have a formal proof of the correctness of the code.

This would require significant work from the author, but we are talking about establishing mathematical truths. When the other alternative is publishing the solutions to major open questions without being fully refereed, what choice do we have?

ReplyDeleteI have more faith in a computer properly executing its code more than a mathematician verifying a proof.But I have more faith in a mathematician verifying a proof than I do in mathematician with no training in software engineering writing 40,000 lines of code without a bug. I don't actually know whether Hales is a good software engineer or not, but there are far more people out there who think they know how to program than there are good programmers.

So any proof using computer verification should have a formal proof of the correctness of the code.But one usually does not prove the correctness of code. Rather, one proves the correctness of an algorithm, and nobody implements an algorithm in 40,000 lines without a bug. Most mathematicians (and probably many computer scientists) are probably not aware of the field of software safety and formal methods for requirements specification and analysis.

"Proof by programming" is a perfectly valid technique, but the programming needs to be done by someone with training in software engineering, rather than a self-taught hacker using 40,000 lines of atrocities such as Duff's device.

ReplyDelete40,000 lines of atrocities such as Duff's deviceC'mon, Duff's device isn't an atrocity; it's a well known idiom.

(That said, I'd be somewhat surprised to see any use for it here, and if it's being used where a compiler would already know to unroll, then it actually is an atrocity.)

For those who haven't seen it before, there was recently a completely computer-checked proof of the four-color theorem, using the Coq proof assistant. The proof can be checked using a proof checking program that is not specialized to the theorem statement at all. The proof also has considerable "computational" content, which is supported nicely by features of Coq's logic that make it easy to mix "computational" and "mathematical" reasoning in proofs.

ReplyDeleteI bring this up in case there are nay-sayers in the audience who don't believe that it is feasible today to check formal proofs with large computational content with today's tools. :-)

Imagine the verifier has a very hard to find bug.

ReplyDeleteAs usual someone finds it.

1 - If he is honest he publishes his knowledge.

2 -If he is not honest he gains world fame by proving a famous conjecture from ...

how many proofs published in annals of mathematics have later been found to be wrong?

ReplyDeletei agree with all the comments about the odds of 40000 lines of computer code not containing a bug being much higher than 1%

I remember Robin Thomas giving a lecture about their simplified proof of the 4-color theorem. What I took from it was that authors need to publish the proof in a computer checkable format, and then multiple referees should write an independent theorem checker to verify the results.

ReplyDeleteAs long as computer assisted proofs are open sourced and indepedantly verifiable I have no problem with them. That is why I am always leery of confrence papers where authors give their latest and greatest timings without publishing the source code of their implementation.

if you twist up a chaine,it will take exactly that many turn to straitin it

ReplyDelete-- how many proofs published in annals of mathematics have later been found to be wrong?

ReplyDeleteWrong question. How many proofs in AofM have been found to contain an error? LOTS!

How many have turned out to be ultimately wrong? Few.

The probability of the code having a bug is certainly > 1%, but who knows what percentage of those are minor fixable things and which ones would actually make the proof fall.

Hales definitely tries to address the reservations cited above, see

ReplyDeletehttp://www.math.pitt.edu/~thales/flyspeck/

ReplyDeleteany proof using computer verification should have a formal proof of the correctness of the codeSo the section of a proof which has been gone over by a compiler, which has had tons of runtime checking done, and probably has some test code to boot, must on top of that have a formal proof, while the part which is simply hand-checked by extremely error-prone humans is given a free ride? Forgive me for finding this as absurd double standard, but I find the idea that human-checked mathematical proofs contain fewer errors than human-reviewed code contains bugs to be quite laughable.

============

ReplyDeleteA similar point, made pedantically also by Robertson, Sanders, Seymour and Thomas [as quoted in Calude, Cristian S., Calude, Elena, and Marcus, Solomon. 2001. Passages of Proof. http://arXiv.org/abs/math. HO/ 0305213], about the reliability of their 1996 computer-generated proof of 4CT, can be seen as besieged sophistry:

"However, an argument can be made that our 'proof' is not a proof in the traditional sense, because it contains steps that can never be verified by humans. In particular, we have not proved the correctness of the compiler we compiled our programs on, nor have we proved the infallibility of the hardware we ran our programs on. These have to be taken on faith, and are conceivably a source of error."

Although one may doubt whether their program is logically sound, there is no essential reason why such soundness cannot be established theoretically by competent computer scientists.

Presumably, the referees included some such scientists, and, so, were able to conclude that the salient features underlying Hales' proof could be expressed, and verified, in an appropriate axiomatic mathematical language, before being expressed in '40,000 lines of custom computer code'.

Since every step of a formal proof sequence in such languages is either an axiom, or an immediate consequence of any two preceding elements of the sequence, each step can be effectively verified mechanistically by identifying the concerned axiom, or two preceding elements of the sequence that are known to be theorems, of which it is a consequence.

So long as each step is verified as logically sound, such a procedure need not be time bound, nor limited to the conceptual ability of any one individual to grasp, or even verify, the correctness of the entire proof.

Thus, there is no sensible reason to doubt the output, even though the relation between a logically effective method and a mechanistic computation is, indeed, one of faith.

It is conceivable that an appropriately designed, and maintained, machine continuously calculating the digits of Pi may start outputting an unending series of zero's, perhaps for as long as a zillion years. Our belief that it will eventually output a digit other than zero comes about because of our faith that the machine is faithfully translating our theoretical calculations concerning the digits of Pi into a physical language of mechanical I/O devices; hence we believe, as an article of faith, and despite the physical evidence, that the series of zeros will not be unending.

So, finally, what we should place our faith in is our ability to intuitively "see" the soundness of our axiomatic assertions, and theses, at the lowest, and not, as some believe, at a sufficiently sophisticated, level of understanding. This, of course, is the distinguishing feature of, for instance, Dedekind's formulation of the Peano axioms, or - as any high-school student can testify - Euclid's axioms for geometry; except the parallel postulate, these axioms are obvious to all, even if their consequences are not.

At least Hayes' program did not use randomized Monte Carlo subroutines. Although it would have been fun to watch the discussion if that was the case...

ReplyDelete

ReplyDeleteSo the section of a proof which has been gone over by a compiler, which has had tons of runtime checking done, and probably has some test code to boot, must on top of that have a formal proof, while the part which is simply hand-checked by extremely error-prone humans is given a free ride?This makes perfect sense. While the probability of "errors" in the compilation and such may be less than of a human verifying a mathematical proof, the standards being applied to the code are typically drastically lower than that applied to a rigorous proof. By the same reasoning you apply to programs, a proof should be considered correct if 1. its English grammar and mathematical notation all parse correctly, and 2. its logic follows correctly on some number of "test" instances.

For a real proof, we need more than that, both in the less formal English description and in the explicit computer code. We need not just a general good idea for why it ought to work -- there are plenty of proofs that "ought to work" but that yield false conclusions.

- David