Coding Style Is Important
Does coding style matter? We teach students how to write code and about algorithms. But, do we discuss coding style? Some people may say that style is just personal preference. But, there is undoubtedly good style and bad style, both for prose and for code. Following is a guest post by David Marcus that is a book review of a book that focuses on coding style.
====
This is a review of the book "Professional Pascal: Essays on the Practice of Programming" by Henry Ledgard. This book changed my life: After reading it, I rewrote all my code.
"Professional Pascal" is not about algorithms. It is about writing code. I consider it to be the Strunk & White for programmers.
While programs must be understood by the computer, they must also be understood by people: We need to read the code to determine that the program is correct (testing can only show that it is incorrect, not that it is correct). We need to read the code to maintain/improve/extend/debug/fix it. This applies even if you are the only person who will ever read or use your code. You probably spend more time reading your code than writing it. If you need to do something with code that you wrote long ago, you will be grateful if the style makes it easy to understand.
If you don't use Pascal (or Delphi), don't be put off by the "Pascal" in the title. Almost all of the advice can be applied to any programming language.
I won't try to summarize what Professor Ledgard has written in his book: He explains it much better than I can. But, I will mention a few of the topics that he covers and some things that I found notable.
One of the essays in the book is "The Naming Thicket". Names are important. Variables should be nouns, procedures should be verbs, booleans should be adjectives.
I was once reading some code in a commercial product, and I saw several methods that had a variable named "Ary". I eventually figured out that this was an abbreviation for "Array". Of course, this was a terrible name: "Array" didn't tell me what the contents of the array was, plus the unnecessary abbreviation turned it into a puzzle.
Another time I was reading a method in a commercial product and the method had a two-letter name. It wasn't clear to me what the two letters meant. I eventually guessed that they were the initials of the developer. I confirmed this with another developer who worked on the same team.
Another essay in the book is "Comments: The Reader's Bridge".
Following Professor Ledgard's advice, in my code I put a comment at the beginning of every procedure, function, method, unit, group of constants/types, and every class, but I never put comments mixed in with the code. If I'm tempted to put a comment in the code, I just make it a method name and call the method at that point in the code.
I own a commercial database library that comes with source code. There are some comments in the source, but most of them appear to be there so that they can be extracted into the Help. I know someone who works for the company. I asked them whether the source code that they ship is the same as their internal copy. I thought maybe they strip out some of the comments in the shipped version. Nope. It is the same as their internal version.
Another essay in the book is "Program Layout". To differ slightly from Professor Ledgard, I would say the layout should make the syntax clear (not the semantics). In case you haven't figured it out, the correct indentation for blocks is three spaces (I don't think Professor Ledgard says this, but he indents three spaces in his examples).
Another essay in the book is "A Purist's View of Structured Programming". Only use one-in, one-out control structures. Never exit a method or block in the middle. If you know the code is written to adhere to this rule, it is much easier to understand: You don't have to scan the whole method looking for quit/exit/return statements.
Another essay in the book is "The Persistence of Global Variables". I once had a bug in one of my programs. I spent a lot of time trying to figure out what was wrong. The problem was in a method that called several other methods of the same class, something like
DoThis( X, Y );
DoThat( X, Y );
After much confusion, a light bulb went off in my head when I remembered that DoThis was changing the value of the object's fields. The object is global to the method calls because the compiler passes it in, but it isn't explicitly listed in the parameters/arguments of the methods. After that experience, I always include the object when calling a method, e.g.,
self.DoThis( X, Y );
self.DoThat( X, Y );
Another essay in the book is "It Worked Right the Third Time". When I write code, I compile it to catch syntax errors (like a spell checker). I then proofread it to make sure that it is correct (just like reading a math proof to check it). (Ideally, I will print the code out, but if the changes are scattered in multiple files, then viewing a diff onscreen works better.) Only then will I try it. The emphasis should be on writing good code. No amount of testing can make poor code good.
This book was published in 1986, and that is when I first read it. So, why am I writing a book review now? I read a lot of code written by people who make their living writing code. And, all of these people would benefit from reading this book.
Professor Ledgard has also written the books "Software Engineering Concepts (Professional Software, Vol 1)" and "Programming Practice (Professional Software, Vol 2)" (both written with John Tauer) . Much of the material in "Professional Pascal" is also in these books. "Professional Pascal" is shorter, so if you are only going to read one of these books, read it. If a book is too long, here is an article: "Coding Guidelines: Finding the Art in the Science: What separates good code from great code?" by Robert Green and Henry Ledgard, ACM Queue, vol. 9, No. 11, 2011-11-02. The link is here
"Only use one-in, one-out control structures." It is funny, because this is not the mainstream practice anywhere. Exceptions require early exits, and generally so long as the control flow is not overly nested, early exits are preferred because they reduce nesting and serve as precondition documentation for the rest of the function.
ReplyDeleteAnyway, style is important, but one cannot be too tied to particular stylistic conventions because they depend so heavily on the language and application domain.
The advice should not be interpreted to mean don't use exceptions. Exceptions let you separate your error-handling logic from your normal logic. It is the latter that should use one-in one-out control structures. Good style is important in all languages and domains.
DeleteAs a senior systems programmer working on fairly big codebases, I can tell you that some of his advice has aged very well & other pieces are downright infamous.
ReplyDeleteThe good parts:
1) Always indent nested scopes:
The wars over the details are ancient & interminable- how many spaces constitute a "proper" or "standard" indent, tabs vs. regular white spaces, even the different standard defaults of Posix vs. Windows tools- but the overall rule is akin to 1 of the 10 commandments. You may disagree about interpretation, but not the core premise. Programmers who fail to honor this commandment don't last long; even most hackers' scripts respect it.
2) Remember the parts of speech:
The key thing word in the term "programming language" is 'language'. Human languages have nouns to denote specific & generic things, therefore specific special objects (e.g. global singletons) should be Proper Nouns that are named & capitalized accordingly. Functions are verbs. Things like traits or attributes that modify nouns are adjectives, those that modify verbs are adverbs. Punctuation and related symbols indicate context and control flow. Articles distinguish singular from many, and a specific thing from its type or collections of them.
Most importantly, a good programming language provides all of these same, necessary grammatical pieces that a viable human language does. C++ is infamous in programming circles for its size and complexity, in much the same way English is among normal languages. Both are also wildly successful, and that correlation is no accident. Conversely, there are hundreds of failed & crippled programming languages that were doomed by designers who decided that their users didn't need 1st class nouns (e.g. "Pure functional" languages like Haskell), or anything resembling a straightforward and intelligible syntax (e.g. Lisp).
3) Documentation comments:
Placing comments above each logical chunk of code instead of all over has become standard practice. Partly because it keeps comment bloat down, partly because it encourages people to better structure & layout their code in order to do this.
As for the laughable parts, a couple stick out:
1) "Avoid multiple returns from functions":
No, you *should* check for input errors at the beginning of functions & error out immediately. Not only is it more logically correct, it also produces faster, less buggy, and more maintainable code. This is one of the many idiotic things traceable back to Dijkstra, like his refusal to give the core functions of a Semaphore proper names or his clueless crusade against 'goto'. The man was clearly both a great theoretical computer scientist and a terrible programmer.
2) Never put comments within the code:
For normal or vanilla code, fine. But there's always that 1% of code that required obscure hackery, invisible (in the text) black magic, or some detailed external knowledge (e.g. atomic x64 instructions) to get right. Code that needs additional documentation for the same reason we put a skull & crossbones sticker on a bottle of chemicals or the radiation symbol on a hunk of uranium. Believe me, people are a lot more mad when you DON'T put the big comment there, especially if said comment could've saved them 2 days of debugging hell.
It'd be interesting to see if there's a newer version of this style guide, and if so what's changed in it. It has to be a fairly old work if it used Delphi as the reference language.
For an entertaining history of Pascal/Delphi, see the book "Pioneering Simplicity: The fascinating history of Delphi and Pascal" by Marco Geuze, 2024, ISBN-13: 978-9083440316
DeleteFor all style advice there exist conditions where it is stupid to adhere to: http://www.lel.ed.ac.uk/~gpullum/LandOfTheFree.pdf
ReplyDeleteThat's an unfair (albeit extremely fun) link.
DeleteFor the non-linguists here, allow me to explain.
There are two kinds of linguists: descriptive linguists and prescriptive linguists. The former are scientists who investigate how human languages work by studying real humans scientificaly. The latter are cranks.
The former take great glee in exposing the foibles of the latter. My favorite example is the prescritive linguists' rule "avoid passives". At first blush, this appears to be good advice: passives can be longer, more complex than their active counterpart. But there are two problems. One is that the passive voice exists in languages because it's needed (and when it is needed, is the best way to express what's needed). The other is that prescritive linguists simply don't know what a passive is, and many of their examples of this rule are simply not passives. They don't know wherof they speak. In spades.
Like I said, fun link. But the problem with the cranks is that they don't actually understand the underlying grammar. So it's not that their rules sometimes don't apply, it's that the rules are complete nonsense.
For both prose and code, style helps communicate meaning. For code, the knowledge that a rule is followed by the author can reduce the effort needed by the reader.
Delete"While programs must be understood by the computer ...": Although I am not a native speaker I think this is highly ambiguous ;)
ReplyDeleteThat is, the compiler must be able to compile the code, and the processor able to run it.
Delete@Marcus, sorry that this post dealing with an exciting topic didn't receive as much attention as it deserves. Thank you for the ACM article. It appears that you helped the authors of that particular article substantially! Would I be correct?
ReplyDeleteI'd had hoped perhaps a comparison with other authors whose works/articles have been a bit more popular and thus people feel perhaps more familiar. Examples include Kernighan's work on programming style, or Don Knuth's work on coding, etc...