Wednesday, June 11, 2008

Impressions of C++

I have just finished teaching "Programming for Engineers" at Northwestern, my first time teaching a course on the C++ programming language and pushing me to learn the the many fine points of the language. Yes, I know this is a theory blog but I'm giving you my impressions of the language anyway.

C++ was a language written for its time, the 80's, when computers were slow, memory was expensive and machines, for the most part, didn't talk to each other that much. The language, an object oriented upgrade to C, allowed one to write code just above the assembly language code. It kept memory limited to the point of not storing the sizes of arrays and allowed users to have pointers directly to memory. It is a language that, with objects, lets you create whatever you want and overload operators, allowing you to redefine "+" or "=" to do whatever you want. It is also a powerful language, allowing you to build classes based on other classes and, with the Standard Template Library, gives you access to a set of highly optimized data structures.

C++ does create incredibly fast code especially when run on today's processors. On my laptop, looping through a trillion operations takes only a couple of seconds. Because of the popularity of C++, there are C++ compilers for every platform, many of which produced incredibly optimized code. And of course there is a wealth of information and code libraries written in C++.

That's the good. Now the bad: C++ is a complicated language—I spent most of the ten week course teaching syntax and still didn't cover close to the entire language. C++ allows obviously bad statements, for example if you replace the "==" in " if (a == b) d++; " with a single "=" then it still compiles and runs but doesn't do what you wanted. The expression "3^5" outputs 6 and not 243. To make a class abstract you don't say "abstract" but just make sure you have at least one pure virtual function. To make a virtual function pure you add "=0" at the end of its header definition. There is something very mystical about purifying something by setting it equal to nothing.

But those are minor complaints. We don't live in the 80's anymore but in the Internet age which causes two major problems: compatibility and security. C++ has not standard way to access Internet objects like XML. In class I showed how Microsoft's XMLTextReader worked but it is tricky to set up and doesn't port to other systems. Too bad as a computer that doesn't access the Net hardly seems like a computer anymore.

Even worse, C++ is a very trusting language, not having many safety checks and allowing direct access to low-level operations and memory. The Internet is full of non-trusting people. One can write safe code in C++, say that doesn't allow buffer overruns, but it is tricky and a programmer even a little lazy can leave a gaping hole for hackers to climb through.

Legacy code will keep C++ programmers employed for many years, especially as we close in to 2038 and we hit the time limit. But a programmer starting a new project today would do better in a safer language. Python anyone?

23 comments:

  1. The syntax and the capabilities of a/the language is one thing.
    Libraries with goodies (network programming, processing images, algorithms for graphs, thread and/or concurrent programming, algorithms for strings, etc, etc, etc...) is another.

    For the second thing, take a look at www.boost.org.

    Elias

    ReplyDelete
  2. I moved to the 2 language model. Ruby for high level stuff, and C++/C when I need it fast.

    As a scientist you need a structure of Make type files in a high level language to run experiments and keep track of everything.

    As mentioned in the earlier post Boost is passable for some things, but I still hand-code most of my graph algorithm libraries in C.

    C++0X has some interesting features with concepts and finally bringing goodies like regular expressions into the standard library. They went off track on parallelism going for a shared memory model... PRAM==EVIL

    ReplyDelete
  3. Lance, many of the engineers you were teaching -- maybe even the majority -- will deploy their programming skills in the context of large-scale enterprises (like the Boeing 787, the Genome Project, VLSI development, the Sloan Digitial Sky Survey, etc.) in which the social aspects of programming are at least as important as the computational aspects.

    Because how can these global-scale enterprises be coordinated, other than by shared large-scale modeling?

    In recent years many excellent documents have appeared that describe coding practices that optimize the social aspects of enterprise-level engineering. For example, Bjarne Stroustrup's home page points to the JSF C++ coding standards. There's no moral relativism in this document ... "good" and "bad" coding practices are spelled-out in scrupulous detail.

    More broadly, our UW QSE Group admires JPL's Design, Verification/Validation and Operations Principles for Flight System -- just search for the keywords "model" and "simulation", which on-average occur more than once per page.

    It is increasingly apparent that within the rapidly expanding social context of modern engineering, what language an engineer learns is increasingly less important than what algorithms the engineer learns, and how effectively these algorithms are coordinated to serve broader enterprise objectives.

    As a specific example, in quantum system engineering we are discovering that teaching young quantum engineers to use the von Neumann-style projective theory of quantum measurement is like teaching software engineers to use "GOTO" statements ... a practice that is hallowed by decades of tradition and literature, but which is pedagogically inadvisable.

    So if you don't mind the question, what algorithms, models, and interfaces do these young engineers (and their employers) have in mind with which to launch their careers? What are they going to do with their C++ skill-set?

    I am very interested in any remarks that you (or your readers) may care to offer on this general topic.

    ReplyDelete
  4. If you would like to teach students something more modern and sophisticated, check out Scala.

    As a start, take a look at these free chapters:

    http://www.artima.com/shop/forsale

    Also see:

    http://www.scala-lang.org/docu/index.html

    There's really no reason to use a dynamically typed language such as Python when you can use Scala.

    ReplyDelete
  5. Python, Ruby or JavaScript for most applications and C, C++ or Assembly when I need performance.

    ReplyDelete
  6. Oh, but can you do this in Python?

    In case it's not immediately obvious that code solves the following problem: You are given a rectangular B&W image (at most 50x50) and you must flip white pixels into black to make all black blocks 'smooth'. A block is 'smooth' if any two 'pixels' of the block are united by a black path whose length equals the Manhattan distance. (Each step of the path is N, E, S, or W. And, yes, flip as few pixels as possible.)

    I know this is a serious blog but I still had to post a 'just for fun' comment. ;)

    ReplyDelete
  7. "C++ does create incredibly fast code especially when run on today's processors. On my laptop, looping through a trillion operations takes only a couple of seconds."

    Where can I get one of those 500-gigahertz laptops?

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. http://yosefk.com/c++fqa/

    ReplyDelete
  10. If you're teaching people how to program, c++ is way to complicated and allows you to do too many stupid things as you have pointed out.

    If you're looking for mathematical programming and you want to spend only a few weeks on syntax, straight up c, or even better, FORTRAN will do the job marvellously. These probably won't address your concerns about security, but to be fair, that's not really a language's job. Any sane operating system should prevent a regular user from doing something too dangerous.

    If you want something that people can make "modern" programs with easily, take a look at python which is complete with all the object oriented bells and whistles.

    ReplyDelete
  11. 1. a = b and a == b are both perfectly valid and necessary statements in C++. I can't think why one should write a=b when meaning to write a==b. That's a silly mistake, but can be configured to generate a warning using a good compiler or lint like tool.

    2. what does Java and C#, created in 1995 and 2000 and not in 1980s, return for 3^5?

    3. To me, if I want an abstract function, I create an abstract function and this automatically implies that the class is abstract; I don't have to duplicate the same intention using a keyword.

    4. To me, "=0" means "no definition" for this method. One implication of it is that objects of this type cannot be created(type's implementation is incomplete). You can call it a "pure virtual function" or whatever.

    5. C++ and XML: http://xerces.apache.org/xerces-c/
    Internet is not just XML. I can consume the web services in C++, can send and receive http requests/responses; ftp/smtp/etc. all kind of libraries available for C++. see: http://www.ibm.com/developerworks/xml/library/x-ctlbx.html

    6. Gaping holes can be left in PHP code also. I would be hard pressed to prove that it is far too difficult to write secure code in C++ than in PHP.

    ReplyDelete
  12. It is surprising to me that no one on this thread has commented upon the following startling empirical fact: most large programming projects are written in "bad" languages.

    Examples of "bad" are languages like C, C++, Ada, and Fortran ... these languages are so bad that large projects have to issue formal "THOU SHALT NOT" specifications of language capabilities and idioms that you are NOT allowed to access ... that these "THOU SHALT NOT" specifications typically are longer and more complicated than the language itself!

    And yet this badness seemingly serves some important functional purpose, because it happens over-and-over again.

    I have my own ideas, but would enjoy hearing other people's (game theoretic?) ideas relating to this observation. `Cuz gee, does the world really need another "my language is best" thread?

    There's something wonderfully good about languages that have truly bad features ... what is it?

    ReplyDelete
  13. a = b and a == b are both perfectly valid and necessary statements in C++. I can't think why one should write a=b when meaning to write a==b. That's a silly mistake, but can be configured to generate a warning using a good compiler or lint like tool.

    Certainly lint will usually detect this problem, but it was a still a big design mistake in C. Having two different operators for assignment and testing equality makes sense, but they should have been := and =, respectively. The problem is that in almost all of the non-C world (including all of the non-programming world), an equals sign is used to test for equality. Experienced C programmers very rarely confuse the two, but beginners do it all the time, and they are absolutely not well served by the fact that "if (x=2)" will compile and run just fine but not mean what they think it does.

    In any case, this isn't a huge deal (since most important programs aren't written by beginners), but using := and = would have made it psychologically easier for beginners without imposing any costs on experts.

    ReplyDelete
  14. And yet this badness seemingly serves some important functional purpose, because it happens over-and-over again.

    Speaking from experience, in languages that apply strict rules often is impossible to recover from a software architecting mistake, after having written 1M lines of oce. In C/C++ you can use the "badness" to save the day. At the end of the day, if your 1MLoC have just one goto, or a few global variables they are still good and well written code by almost anyone's standard. Yet strict languages such as Java/Pascal on the other hand would force you to either rearchitect the whole thing or come up with an uglier fix than that of C/C++.

    ReplyDelete
  15. a = b and a == b are both perfectly valid and necessary statements in C++.

    It is surprising how strongly supporters of a language defend even the silliest design choices in said language.

    It's ok to nitpick at a language one loves. Brian Kernighan criticizes the cumbersome C operator precedence rules for one.

    ReplyDelete
  16. Python R us!

    We wrote our entire prototype in under 1000 lines of python and it took us less than 2 weeks. The code is very readable and easy to re-understand once forgotten.

    I'm not sure what powerful libraries are out there for C++ web-programming, but I doubt we could've achieved that using C++.

    The main thing I like about python over C++ is that it seems to be a lot more expressive. There is a lot more thinking and less coding involved in the process.

    ReplyDelete
  17. You might want to check out F#.

    ReplyDelete
  18. C++ allows obviously bad statements, for example if you replace the "==" in " if (a == b) d++; " with a single "=" then it still compiles and runs but doesn't do what you wanted.

    If what you wanted was to assign the value of b to a, then increment d if that value is nonzero, then the new statement does what you wanted. Which of the modern languages does not allow programmers to type things they don't mean?

    Yes, it's easy for beginners to be tripped up by this. A justification for it, maybe, is that more assignment gets done than equality testing, so the resulting programs are more concise, and this can be a good thing for understandability.

    The sometimes baroque character of C++ is probably due to the design choice to make it roughly an extension of C. If this choice had not been made, the language would probably not have attained its current widespread usage.

    ReplyDelete
  19. Anonymous makes a valid point in asserting that "In C/C++ you can use the "badness" to save the day."

    And (IMHO) it is highly significant that this fact has multiple levels of understanding (rather like the technical notion of higher-order knowledge that plays a key role on The Island of the Blue-Eyed People).

    I.e., surely it is good for a student programmer to learn that "GOTO" is bad practice. Therefore, isn't it even better when a student programmer understands that another programmer understands that "GOTO" is bad practice ... and so on?

    The implication of this reasoning is enjoyably paradoxical: bad languages are good ... because they require explicit embedding of higher-order knowledge in the software design strategy. And good languages are bad ... because they encourage both individuals and enterprises to overlook the practical necessity of cultivating common knowledge during the development effort.

    And like all good logical paradoxes, the above is true in both theory and practice. :)

    ReplyDelete
  20. I'm pretty much convinced now that nobody really understands C++ until they've a) read all the relevant "modern" books (e.g. Myers, Sutter, Alexandrescu) and then b) tried to implement something nontrivial in it.

    Before you've tried that, you will go through the "it's too complicated", "it's horrible", "it's archaic" objections. I certainly did.

    After, you may still not like it, but you'll definitely respect it, and acknowledge (if grudgingly) that it hits a "sweet spot" in programming languages.

    I say this as someone whose favourite language is Haskell; in many ways, it's a complete opposite to C++, though in many ways, modern C++ feeds off Haskell.

    I do agree with those who have noted that C++ is a poor choice of programming language to teach programming with.

    I disagree with anyone who uses the term "C/C++". There is no such language, and from the point of view of engineering and programming practice, there are almost exactly no similarities between the two languages.

    ReplyDelete
  21. All the following comments make good arguments against teaching C++ as a first language, whether the commenters intended them that way or not:

    (a) It is a language that, with objects, lets you create whatever you want and overload operators, allowing you to redefine "+" or "=" to do whatever you want.

    (b) a = b and a == b are both perfectly valid and necessary statements in C++. I can't think why one should write a=b when meaning to write a==b. That's a silly mistake ...

    (c) To me, if I want an abstract function, I create an abstract function and this automatically implies that the class is abstract; I don't have to duplicate the same intention using a keyword.

    (d) In C/C++ you can use the "badness" to save the day.

    (e) A justification for [= and == operators in C/C++], maybe, is that more assignment gets done than equality testing, so the resulting programs are more concise, and this can be a good thing for understandability. [IMHO, this kind of conciseness does not aid understandability at all.]

    (f) I'm pretty much convinced now that nobody really understands C++ until they've a) read all the relevant "modern" books ... and then b) tried to implement something nontrivial in it.

    My personal experience teaching (or attempting to teach) C++ to beginners has been very frustrating. I remember spending nearly a whole lecture trying to clear up the students' misunderstanding of the "static" keyword, which means completely different things in different contexts.

    Yet strict languages such as Java/Pascal on the other hand would force you to either rearchitect the whole thing or come up with an uglier fix than that of C/C++.

    Pascal does allow goto's and global variables. In my experience, Pascal is one of the best languages to teach beginners.

    ReplyDelete
  22. Steve Fenner makes a strong case that C++ is a bad first language ... and it may well be that C++ is a poor choice for solo projects ... but (historically speaking) C++ is surely among the very best languages for large-scale software projects.

    My colleague Jon Jacky points to the following (reasonably famous) quote by C++ author Bjarne Stroustrup: "Much more successful software has been written in languages proclaimed BAD, than has been written in languages acclaimed as saviors of suffering programmers; much more."

    The historical record shows pretty clearly that Stroustrup is right.

    That said, my own preferred programming environment is to write executable code in (full-of-badness) Mathematica and MATLAB, using for the API a custom Mathematica notebook style that exports itself as a Knuthian literate programming file (specifically NuWeb).

    Yes, I am an in-the-closet literate programmer. :)

    AFAICT, only about CS article in 5000 makes any mention of literate programming ... which probably is clear evidence of something ... if only we knew what!

    ReplyDelete
  23. but (historically speaking) C++ is surely among the very best languages for large-scale software projects.

    No argument there. It's clear that C++ scales well. It's the huge overhead in learning the language (especially if it's your first!) that has left many a beginning programmer in the ditch on the side of the road.

    "Much more successful software has been written in languages proclaimed BAD, than has been written in languages acclaimed as saviors of suffering programmers; much more."

    Great quote. I just hope no one takes it as inspiration to design a purposefully bad language ;)

    ReplyDelete