## Tuesday, September 04, 2007

### Social Process and Proofs of Theorems and Programs

How does a theorem get to be believed to be true? There was a paper on this in 1979, Social Processes and Proofs of Theorems and Programs by DeMillo, Lipton, and Perlis. The paper had several points to make:
1. When a theorem in math is proven that is just the start of the process. If it is important enough it will be passed around the community and checked and rechecked. At some point if it survives scrutiny it will be accepted. (Makes you wonder about proofs in the literature that nobody reads- could they be false?)
2. The people working in Program Verification want to give program-correctness the same confidence that we have in Math Theorems.
3. This is not a good idea since Programs cannot be passed around the same way Math Theorem proofs can. (Makes you wonder about the Classification of Finite Simple Groups, or the Four Color Theorem which also cannot be passed around that easily.)

The comments on Program Verification do not really apply anymore since those people seem to have scaled down their claims to building tools to find bugs, and to automatic verification of Protocols written in a SPEC language, which seems far more plausible. (I'm not in the Program Verification Field so if someone wants to tell me I'm wrong, leave an intelligent comment.)

When I first read this article as a young grad students I was very impressed with what it said about math. YES, the proof is just the beginning, but constant checks and rechecks are needed.

1. One could argue that it is math that has a correctness crisis -- not software.

Software can always be tested. Math proofs are not generally checked by a computer.

It might very well be the case that most math theorems are false or have incorrect proofs.

Most software at least works to some extent.

2. Just my two cents.
It's true that many researchers have limited themselves to finding bugs. But there are still people working in program verification. The idea is not to build a fully automated program verifier, but build a tool that can help programmers develop proof interactively.

3. It might very well be the case that most math theorems are false or have incorrect proofs.

It depends on the definition of "most". Most theorems anyone cares about are essentially correct (by which I mean correct except for typos and minor, immediately correctable misstatements). However, there's a lot of garbage out there. There are entire journals that as far as I know have never published a worthwhile paper. I don't want to name many examples, since I don't want to offend anyone or risk naming a journal that might be better than I think, but here's one example: the Southwest Journal of Pure and Applied Mathematics. SWJPAM was a free electronic journal that was published from 1995 to 2004; it ceased publication after publishing a ridiculous paper by a notorious internet crank. Maybe SWJPAM once published a paper someone cared about, but certainly not very often: for the 140 papers they published, there have been a total of 9 citations (according to Math Reviews), including self citations.

In any case, my guess is that at least half the papers in SWJPAM were never read by anyone but the authors and the referees, and they may never have been checked carefully by anyone at all. So who knows what the error rate is?

It's hard to know what to make of this. On the one hand, if we literally look at every mathematics paper out there, then the error rate may be quite a bit higher than anticipated. On the other hand, if nobody ever hears of these papers, who cares whether they are correct? (The garbage papers are generally segregated in journals nobody takes seriously.)

The real philosophical difficulty comes from subtle technicalities. In most traditional areas of mathematics, there is powerful intuition for where difficulties are likely to arise and how to circumvent them. (In analysis and geometry, mathematicians have spent centuries cultivating such intuition.) However, there are certain important areas in which we lack this intuition. For example, analysis of cryptographic protocols is extremely subtle, and so far thinking about it carefully requires painstaking effort. Certain parts of distributed computing are similar. Maybe someday we will have developed much better intuition for these things, but currently they are magnets for subtly defective proofs.

4. When I met Robin Milner at IFIP'94 in Hamburg, I asked him how long it would take to have automated verification for the kind of theorems one proves in an undergraduate intro theory course.

He replied, "Maybe in 50 years" or "Not in 50 years". So at least 37 more to wait... :-)

My class yesterday was about proofs: I proved the language identity A(B U C) = AB U AC on the left-hand side of the blackboard, and down the right-hand side I ran an optically-similar "Poof" of the "Thro'em" A(B n C) = AB n AC. The bug was not easy for them to find! That precision is ultimately a human responsibility is a theme I emphasize.

5. With software, attempts at improving correctness are technology based: higher level languages, better testing tools, and better verification tools.

With math, attempts at improving correctness are people based: identify the most reliable researchers and take their work seriously.

6. This attitude never ceases to astound me. Programmers out of necessity dot every i and cross every t, painstakingly debugging all their code, and don't whine about it, while theorists handwavingly skip all the details, then have the audacity to claim that not only is this reasonable, but that human checking is more reliable than computer checking! Seriously, what planet are you people living on?

Are you aware of the computer-checkable proof of the four color theorem which has already been created? That theorem has now been vetted more thoroughly than almost any theorem of equivalent complexity. I honestly don't see how anyone could think that we'll ever get real confidence in the simple finite group classification theorem without it becoming computer-checkable.

7. Programmers out of necessity dot every i and cross every t, painstakingly debugging all their code, and don't whine about it, while theorists handwavingly skip all the details, then have the audacity to claim that not only is this reasonable, but that human checking is more reliable than computer checking!

I don't think mathematicians would argue that human checking is more reliable than computer checking.

But maybe they would argue that it is more feasible given today's verification tools.

Moreover, I suspect that mathematicians like the status quo. They would hate their jobs if they had to verify their proofs with computers.

8. I honestly don't see how anyone could think that we'll ever get real confidence in the simple finite group classification theorem without it becoming computer-checkable.

Agreed.

OTH, it would be bad if we only did computer checking of proofs - not because of any reliability problem but because of the loss of what people learn by reading each other's proofs.

9. I honestly don't see how anyone could think that we'll ever get real confidence in the simple finite group classification theorem without it becoming computer-checkable.

We have extreme confidence in it because of its intrinsic structure. It's not just a random collection of logical implications (that would be humanly uncheckable); rather, it has a global structure that actually makes sense. There is admittedly a fair amount of case analysis where mistakes could conceivably creep in, but there aren't that many serious candidates for where new finite simple groups could be found. These candidates have been really thoroughly checked (during the 1970's many people really wanted to find their own finite simple group).

I do not believe any expert thinks the result might be wrong. (The closest I've heard is from John Conway. He's been known to say that he wishes there were more finite simple groups, because they are really interesting, but he acknowledges that there aren't.) There are undoubtedly some small gaps in the many papers, but the chances that they affect the overall structure of the proof are negligible.

It's the same with any sufficiently complicated computer program. Compilers frequently have bugs, but nobody siezes on that as evidence that there might not exist a bug-free compiler (even theoretically). If you understand what the compiler is doing, you recognize that some small variant of it must work in principle, even if the particular implementation is flawed.

10. Programmers out of necessity dot every i and cross every t, painstakingly debugging all their code, and don't whine about it, while theorists handwavingly skip all the details

I think you underestimate the extent to which many theorists painstakingly debug their proofs. (Some people are notoriously sloppy. For example, Lance Fortnow has had several serious bugs in his proofs. That's human nature - some programmers have lots of bugs, and this is only loosely correlated with how good their programs are by other measures. Lance has great ideas, but doesn't always implement them correctly.) Similarly, I think you overestimate how careful most programmers are. There are entire classes of bugs, like memory leaks or buffer overflows, that can exist only because of sloppiness.