Wednesday, June 23, 2021

I went to the ``debate'' about Program Verif and the Lipton-Demillo-Perlis paper

On Thursday June 17 I went to (on zoom- does that need to be added anymore?)

A Debate on Program Correctness

There was no subtitle but it could have been:

Have the points made in Social Processes and Proofs of Theorems and Programs by DeMillo, Lipton, Perlis, survived the test of time ? (Spoiler Alert: Yes.)

I found out about it from the Lipton-Regan blog here

The debaters were Richard DeMillo and Richard Lipton and the moderator was Harry Lewis (Alan Perlis passed away in 1990).  Calling it a debate is not correct since DeMillo and Lipton (and Lewis) all agree. (DeMillo and Lipton even have the same first name!)  The DLP paper is in Harry Lewis's collection Ideas that created the future.  The event  should have been advertised as a discussion. However, it was a good discussion so this is not a complaint.

Here are some things that came out of the discussion.

1) The main topic was the 1979 DeMillo-Lipton-Perlis paper (see here) that gave arguments why Proofs of Program correctness could not work.

An all-to-brief summary of the DLP paper: Some researchers are trying to set up frameworks for doing proofs that programs are correct, analogous to the certainty  we get with a proof of a Theorem in Mathematics. But proofs in Mathematics are, in reality, NOT that rigorous. Often details are left out or left to the reader. This is fine for mathematics (more on that later) but unacceptable for programs which need rather precise and rigorous proofs.

How do theorems in mathematics really get verified? By having enough people look at them and make sure they match intuitions, what DLP call A Social Process.  (NOTE FROM BILL: Papers that are not important do not get looked at so there may well be errors.)

2) The notion of proving-programs-correct was very seductive; however, the people who were trying to do this had a blind spot about how the analogy of proving-programs-correct and proving-theorem-correct differ.  In particular, a program is rather complicated and even stating carefully what you want to prove is difficult. By contrast, for most math statements, what you want to prove is clear. Note also that a program has lots of code (far more now than when DLP was written) and so much can happen that you cannot account for.

3) The DLP paper had a large effect on the field of program verification.  Funding for it was reduced and students were discouraged from going into it.

4) When DLP appeared DeMillo and Lipton were pre-tenure. Hence it took lots of courage to publish it. Alan Perlis had tenure and had already won a Turing award.  This did give DeMillo and Lipton some cover; however, it still took courage.

5) How did the Program Verification  Community deal with the objections in DLP?  DeMillo said that he looked at a large set of papers in the field, and very few even mentioned DLP. He recommends reading the book Mechanizing Proof: Computing, Risk, and Trust by David McKenzie see here.

6) So how can we be more certain that programs are correct?

a) Testing.
b) Modularize and test. Fix errors. Modularize  and test. Fix errors...
c) Try to isolate side effects.
d) More testing.

Some point to Model Checking, which could be considered very sophisticated testing, but that's used to verify circuits and perhaps low-level code, not programs. Model checking is a success story and note that Ed Clark, E. Allen Emerson, and Joseph Sifakis shared a (well deserved) Turing award for this work. But see next note.

6.5) An audience member pointed out that Program Verification people have won several Turing Awards

Dijkstra 1972

Floyd 1978 

Hoare 1980

Pnueli 1996

(Are there more?) 

so the field is alive and healthy. DeMillo responded that prizes for academic research are a poor measure of  success. 

7) Can computers themselves help with proofs of correctness? That is the only hope; however, there are scaling problems.

8) When DLP was written a program with 100,000 lines of code was considered large. Now we have programs with millions of lines of code. And now we have more concurrency. So the lessons of the DLP paper are probably more relevant now then they were then.

9) Since Program Verification does not seem to be used, how come we don't have a Software crisis?

a) We do! The Q+A mechanism at the meeting was terrible. 
c) See the answer to question 6.

10) SPECS are a problem. Tony Hoare once gave a talk where he proves that a program sorted correctly and then pointed out that if the program just output 0,...0 that would have also satisfied the SPEC since all that was required was that the output be sorted, not the (overlooked!) requirement that it be the same numbers as the input. So one needs to be careful!

11) Despite being a leader in the field, Tony Hoare has come to see the limitations of the Proofing-programs-correct approach to Software Verification.  His paper An Axiomatic basis for Computer Programming (1969)  (which is also in Harry Lewis's collection Ideas that Created the Future).
Much later, commenting on the paper,  Hoare says the following:

Ten years ago, researchers into formal methods (and I was the most mistaken among them) predicted that the programming world would embrace with gratitude every assistance promised by formalization to solve the problems of reliability that arise when programs get large and more safety-critical. Programs have now got very large and very critical--well beyond the scale which can be comfortably tackled by formal methods. There have been many problems and failures, but these have nearly always been attributable to inadequate analysis of requirements or inadequate management control. It has turned out that the world just does not suffer significantly from the kind of problem that our research was originally intended to solve.'

12) Richard Lipton told a story where he showed that the program in question satisfied the SPEC, but the SPEC was a tautology that any program would satisfy.  Again, one needs to be careful!

13) The test of time: Verifying large scale programs does not seem to be common in industry. Is industrial adaptation a fair measure? 

14) Harry Lewis's  book Ideas that created the future collects up, edits, and comments on 46 important papers in Computer Science (I reviewed it in the issue of SIGACT News that is in your mailbox---I will blog about it at a later time.) There are several papers in it about program verification, including DLP, Hoare's paper, and three papers by Dijkstra.

a) When Harry discussed including DLP some people said `You're going to include that!  Its a polemic, not a paper!'

b) When Harry teaches a course from this book (it must be an awesome class!) and asks the students at the end which papers they learned the most from, the top two are an excerpt from Fred Brooks The Mythical Man Month (see my post on Brook's work here ) and DLP.

c) I am hoping that this is just one of 46 talks with authors of the papers in his book.  I look forward to his interview with Aristotle, Leibnitz, Boole, Turing, ...


  1. As for point 9), bad software isn't the same as incorrect software. A program can do what it is supposed to do, but still be considered bad software because it gives a horrible user experience, is hard to install, consumes an awful lot of CPU or memory, or a dozen other reasons than not doing what it supposed to do.

    On the other hand, there is software which is loved by millions of users, but which isn't 100% bug-free.

  2. By analogy DLP (1979) might have had a similar effect on program verification as Minsky & Papert (1969) had on neural networks. Both stated convincingly that it will not work. I am still waiting for verification rising like a phoenix from the ashes similarly as neural network learning lately accomplished. I would bet that the computer assistance of formal verification would make the difference (unlike to proofs in mathematics these are very long and very tedious, but mostly trivial so a computer should handle them). Let's wait another decade.

  3. > But proofs in Mathematics are, in reality, NOT that rigorous. Often
    > details are left out or left to the reader.

    There is some confusion here. Proofs in mathematics certainly are rigorous. That is part of the definition of the word "proof". However, it is important to realize that what is written in books and articles is merely a way for the author to communicate the proof to the reader. The reader must construct the rigorous proof for themselves. This is why the steps that you need to include in a written proof depend on the audience that you are writing for. (I was surprised to learn that when non-mathematicians read a math book, they don't check the statements for themselves. I have no idea what they think they are doing.)

    Formal proofs are different. They have all steps written out, and so should be checkable by a computer. However, they are probably impossible for humans to check because humans are not computers.

    When I write software (and I've written a lot of software), after I've written it, I print it out, then proofread it the same way that I would proofread a mathematical proof. Of course, it helps a lot if the code is written in a way that facilitates this. Most code is poorly written. For the best advice on how to write code so that you can understand it and check that it is correct, see the book "Professional Pascal: Essays in the Practice of Programming" by Henry Ledgard. I read this book when it came out, and it completely changed the way I write code. "The Mythical Man-Month" is a great book, but it is about project management, not what the code should look like.

    I think very few programmers print out their code an proofread it. Various software methodologies try to encourage having more than one person check the code, e.g., via pair programming or code reviews. But, I hope that any mathematician feels confident they can check a proof themselves without asking a colleague to check it for them.

    1. Math proofs as written down in papers and textbooks are often not completely rigourous- and this is not a criticism. The missing parts can be supplied by the reader or are obvious. DLM argues that this approach will not work for proving programming correct.

    2. It works the same way for programs: If you want a human to check your proof, write it so a human can understand it. If you want a computer to check your proof, write it so a computer can check it.

  4. I did not attend the debate, but I find it interesting that it occurred less than two weeks after Peter Scholze, a Fields Medalist, blogged about the progress toward formally verifying one of his most important theorems.

    While I agree with most of what the DLP paper says, I think that some of what it says about mathematics is, at best, dated. There is no doubt that socialization will always play a crucial role in the acceptance of mathematical theorems; in the case of the "Liquid Tensor Experiment," this role is clearly explained in the comments to Scholze's blog post. That said, it does not follow that formal verification has no role to play when it comes to increasing our confidence in the correctness of mathematical proofs. The Flyspeck project is perhaps the clearest example of this.

    The intermediate lemmas in the proof are certainly messy, ugly, idiosyncratic, and require more than a roll of paper towels to sketch, but they are indispensable to the proof. And their formal verification has tangibly increased the community's confidence that the Kepler conjecture has been rigorously proved. I find it amusing and slightly ironic that perhaps the biggest change since the DLP paper was first written is that mathematics has become more like program verification. Here's another article that you may find interesting, written by the late Fields Medalist Vladimir Voevodsky:

    That said, I still agree with DLP that the role of formal verification in the case of software is likely to be limited. But it is not because formal verification is irrelevant to mathematics (it isn't), or because all nontrivial formal mathematical proofs are infeasibly long (they aren't), or because important mathematics is never so ridiculously computational that a computer is needed (it sometimes is). Rather, it is because in program verification, exactly specifying what the program is supposed to do is usually an infeasibly difficult process (with moving goalposts) that by its nature cannot be fully automated. By contrast, even in the case of Flyspeck, the end product was the Kepler conjecture itself, and checking that what the computer formally verified was indeed the Kepler conjecture was a humanly feasible task (with fixed goalposts).

    In the few cases where writing down the spec (and checking that it really says what we mean) is a humanly feasible task, I think that formal software verification could potentially play an important role.

  5. Perhaps it would be more useful to have a debate involving participants who actually have some notion of how advances in verification and program logics are actually used in industrial software development, see, e.g.,

    Why in the world would we expect two debaters and a moderator who are totally removed from the field to have anything useful to say on the matter?

    1. Agree that it would have been good to have alternative viewpoint at the discussion.

      I thought static analyzers FOUND BUGS which is GREAT, but didn't (nor did they claim to) VERIFY programs? Am I making to fine a distinction between the two?

    2. I would say that these analyzers can verify certain properties of program units, but not entire programs or systems. On the the hand, it is not even clear what the latter would mean.

  6. "The program is correct; the only issue is that it doesn't do what you want"

    1. "In the world there are only 'Programmers' and 'Sh*t Programmers'; the only difference is that a 'Programmer' has not yet coded enough to realize that he is a 'Sh*t Programmer'" (this one is by me :-)

  7. Testing is fine, but the emphasis should be on writing good code. If the code is poorly written, then no amount of testing will fix that. I do comparatively little testing for my own code. I focus my testing on catching errors that are hard for me to catch by proofreading. Using a language and compiler that make it hard to have such errors is extremely useful.

    1. I see your point, and, as a programmer, I feel the same about my own code. Yet, for software I did not write, I've more faith in well tested code, than code whose programmer claimed it was well written.

    2. You can read the code and see if it is well written. How do you know code is "well tested"? Is it tested by people in the same organization that wrote it? I recommend the essay "It Worked Right the Third Time" in the book "Professional Pascal".

    3. I inadvertently just produced an example of why testing is not enough to produce good software.

      A user reported a bug in my table tennis tournament software ( The bug had been there for many years, maybe from the first release. The bugged procedure was only a couple dozen lines, but when I looked at it, I thought it was not written well: the order of statements seemed odd.

      I rewrote the procedure to fix the bug, and the result was much more logical. I should have spent more time originally to write it better.

      The bug would have been very hard to discover by testing. To reproduce it, I had to add a player to a singles event, then add the player to two doubles events with different partners where the player's tournament ID was less than the tournament IDs of the player's doubles partners.

  8. Bill, great post and in line of thought with some
    previous ideas we both entertained!

    (An aside thought.)
    Question how does the Turing-Completeness property
    of programming languages impact (i.e. limit[?]) proof verification?
    (Assumption is that programs under investigation
    [i.e.,to be checked, verified] are written in a
    Turing complete language.)

  9. Re: David Marcus comment:
    Yes. Exactly. Guy Steele once said "A program that you spend 9 hours thinking about and 1 hour writing will be better and more interesting than a program that you spend 1 hour writing and 9 hours debugging". (Or something to that effect.)

  10. I think machine learning sidesteps "specifying what to compute" by saying "compute something sort of like these examples" (with variable success). Possibly that's partly why it's popular?

  11. I wrote a whole 120 page survey paper on formal verification of large software systems that cites them, and apparently they can't even be bothered to read it. They're proud of themselves for writing a hit piece about a small subfield of programming languages that was in its infancy, causing many to lose their jobs, with a Turing Award winner on board to help them with their hit piece. And the whole paper comes down to "this is currently hard, so we shouldn't work on it," which is absolutely silly because hard problems inspire interesting and broadly applicable solutions; it was program verification that birthed the ML family of languages, for example, features of which have slowly leaked into almost all modern programming languages. Honestly appalling.

    Also hi Bill. I'm a CS professor now. At Illinois. I took your classes forever ago at Maryland.

  12. Hi Talia!
    I reread my post upon getting your comment and I will point out one thing that you prob agree with: DLP may have been a self-fullfiling prophecy:
    They say the problem is hard and won't make progress.
    Hence people stop working on it.
    Hence no progress is made.

    Perhaps `no progress' is to strong, but you get the idea.