Wednesday, April 16, 2008

The Revenge of Parallelism

Now that I sit in an engineering school, I see more applied recruiting talks. Many of them have a variation of this picture.

This picture represents the future of Moore's law. The number of transistors in our computers continue to grow exponentially but the clock speed is levelling off. What do we use this new transistors for? To make multiple CPUs on a single integrated circuit, known as a multicore machine. New chips from Intel have 2 or 4 cores and the number of cores is expected to double every couple of years.

Multicores present interesting challenges for computer science, for example compiler researchers are trying to make the best use of multiple CPUs without having the user explicitly use parallelism in their code.

Our theory community hasn't really responded to this new computing model (nothing much in STOC and FOCS, though SPAA 2008 has a special track on the topic). Now the theory isn't that interesting if you have two or four cores, but what happens when we have millions on a chip? Do our old parallel models like the PRAM apply to multicore machines? There are hints of this in comments to my PRAM post three years ago. Or perhaps we need new models.

We study computational complexity in computer science instead of mathematics because, at least some level, our models reflect real-world computing paradigms. As those paradigms change, Complexity quickly adapts (random and quantum for instance). Should multicore machines be another one of these paradigm changes that drives our theory?


  1. Can you please post a somewhat recent survey (or list of surveys) that describe the most exciting results in parallel complexity?

    The drafs of the new books (Arora-Barak / Goldreich) do not contain the word parallel in their table of contents. :-(

  2. We have looked into ways to modify the PRAM model to reflect modern multicore architectures. We argue essentially that a multicore computer is a CREW PRAM with O(log n) processors. Surprisingly this choice of model has interesting theoretical implications in terms of the effectivness of the model. We call this model Low-degree-of-parallelism PRAM or LoPRAM for short.

    Here's the Tech report, which will appears in SPAA'08 in the brief announcements track.

  3. Sigact News has several articles on what multicore means for Theory.

  4. Uzi Viskin has been addressing the "PRAM on a Chip" model for sometime. See

  5. I'm so certain that multicores are the way of the future that (as a recently admitted CS grad student) I'm planning on devoting almost all my time on the subject.

  6. Coincidentally, I just got back from a panel discussion at IPDPS, Miami, Florida. The topic was: How to avoid making the same mistakes all over again or... how to make the experiences of the parallel processing communities useful for the multi/many-core generation. Please look up for my presentation.
    A summary of the main points follows:
    The huge investment in parallel computing over several decades produced a lot of knowledge. However, the world is yet to see a truly successful general-purpose parallel computer for single task completion time.
    Still, even the most conservative computer vendors decided to abandon the serial paradigm that worked for them for so long and bet the future of their companies on parallel computing. (They really did not have a choice. It is now full 5 years since increased power consumption of computer chips froze progress in clock frequency.)
    The actions of vendors so far are truly puzzling:
    1. They essentially bet their future on approaches that are proven failures.
    2. While they are committed to exponentially increasing the number of processors on a chip over the coming the decade, they are not giving enough details on their architecture, or on how they will effectively programmed. This deters application software vendors who considered making big investments. The reason is that their competitors will be able to provide a better product for a fraction of the cost, by just waiting a few years till the dust settles on these architectures. This can lead to a serious recession in at least this sector of the IT market.
    3. The only CS community that ever had a decisive winner (when it come to parallel computing) is the theory/algorithms community with the PRAM approach. Vendors would like to replace serial computing by parallel computing, but they do not consider algorithms, or how to teach them, as something that must be a part of their concern.
    What should we do as CS educators?
    There is no question anymore whether we should start teaching parallel algorithms and programming at all levels. The question is when. IMHO we will deserve malpractice suits if we defer this teaching by too much. Currently, we produce 22-year old dinosaurs trained for 50-year career dominated by parallelism by programming yesterday’s computers. In fact, we don’t only under-teach: we mis-teach bad serial habits that will make it much more difficult to switch to parallelism later.
    Luckily the impasse noted above regarding the application software industry does not affect education.
    1. PRAM algorithms are necessary knowledge for any CS graduate, as explained next. In the same way that serial algorithms require accounting for the total number of operations (time complexity) of an algorithms, the PRAM approach requires accounting for the total number of operations of a parallel algorithm ("work") and for its time under the assumptions that unlimited hardware is available ("depth"). This work-depth level of cognition falls in the common denominator of all other approaches to parallel computing, and therefore any CS graduate will have to study it.
    2. PRAM algorithms are also sufficient. In the past, the PRAM was criticized for being too simplistic for practice. However, the 64-processor PRAM-On-Chip hardware prototype built at UMD finally showed that a machine that can look to the programmer like a PRAM can be built. In fact, the basic XMT (explicit multi-threaded) architecture can be scaled to 1000 on-chip processors.
    Namely, we can immediately go ahead and teach PRAM algorithms without waiting for the vendors to converge. I think that the on-line tutorial provides a good start.
    The panel presentation also points out that 19 out of the 38 participants in an IBM-NSF workshop on future directions for parallel computing that took place in December 1988 were algorithms/theory people. Since IPDPS08 had 600 registrants, I suggested to the IPDPS08 audience to imagine that there are another group of 600 algorithm/theory people in the room, and consider the effects of nearly all of them say that we should go ahead and incorporate the PRAM into future multi-core architectures. Wouldn’t it be nice if Lance’s recent posting would lead to making this a reality in a couple of years?

  7. This post (and Uzi Vishkin's comments) come just in time for me. After a recent distinguished lecture at Penn State on multicore processors, I decided to include a chapter on parallel algorithms in my graduate algorithms course next semester.

    So: more suggestions on educational materials (with exercises!) would be wonderful. This is a chapter of my own education that is sadly lacking. The wonderful textbook by Kleinberg and Tardos, which I plan to use for the course, does not say much about parallelization. I was planning to use, instead, the relevant chapter of Cormen, Leiserson, Rivest and Stein. Other suggestions would be welcome.

    My course is basically required for incoming grad students, so I am not necessarily looking for the deepest or most complicated results but rather elegant examples of how seemingly serial computations can be parallelized.

  8. Joseph JaJa's book on introduction to
    parallel algorithms is a good place to start.

  9. Re: Adam Smith's question, I find the survey of Karp and Ramachandran from the Handbook of Theoretical CS to fit the bill very well -- it is from 1990, but many of the core PRAM ideas and references are there.

    aravind srinivasan

  10. Below, are some thoughts on how to teach parallel algorithms.

    The basic claim is that:
    - It does not make sense to have a new platform of general-purpose parallel computing succeed the established serial platform without having a one-to-one match of EVERYTHING, including algorithms and data structures.
    - In particular, it does not make sense to teach parallel programming without teaching parallel algorithms and data structures. The gap between programming and algorithms must be bridged, so that the continuum from algorithms and data-structures to programming will resemble as much as possible the continuum in serial computing.
    - Since the PRAM theory is the only serious candidate developed in nearly 3 decades of research, PRAM algorithms have got to be taught.

    I expect theorists to endorse this argument and use it to convince their colleagues that PRAM algorithms need to be taught. But, I have to be frank. I am concerned that some of us will do the following: teach a course on parallel algorithms as a purely theory course WITHOUT any connection to programming. This will miss the point as it ignores the need to relate algorithms to programming. The Q&A at the end of this text elaborate further on the programming issue.

    As others have implied, you can find several fine sources for PRAM algorithms. For this reason, my comments below mostly focus on a way to address the parallel programming issue:
    1. In class presentation.
    (i) Read Section 2.1 entitled XMTC in
    It reviews a modest extension to the C programming language called XMTC that allows PRAM-like programming. XMTC essentially adds only 2 basic commands to C: Spawn and PS (for prefix-sum).
    (ii) Devote a total of around 15-20 minutes similar to slides 37-39 in to present XMTC. Slide 40 can guide a discussion.
    2. Supporting documentation. The students should then be referred to:
    the XMTC manual
    and XMTC tutorial
    3. Programming assignments. Please look up under assignments on
    4. Running programming assignments. The UMD PRAM-On-Chip project is on track for public release by the end of June 2008 of: (i) a cycle accurate simulator of the PRAM-On-Chip machine and (ii) a compiler from XMTC to that machine. The will allow your students to run XMTC code on an emulated 64-processor PRAM-On-Chip machine. To remind you, a hardware prototype of such a machine (using FPGA technology) has been in use at UMD since January 2007. A compiler that translates XMTC to OpenMP will also be released, giving your students an alternative way to run their assignments.

    Finally, please note that this type of programming cannot be too difficult. I have given a 1-day parallel algorithms tutorial to a dozen high school students in Fall 2007 and subsequently some of them managed to do on their own 8 programming assignments. In fact, the above link to programming assignments gives these 8 programming assignments. The only help the high school student got was one office hour per week by an undergraduate teaching assistant. They did not get any school credit for their work. Their participation was in the context of a computer club after completing their regular school work (8 periods per day).

    If you are looking for code examples, you are welcome to write to me.

    Here are some Q&A:
    Q: I never learned parallel programming formally, but I picked up some ideas in my free time from Java/MPI/OpenMP/etc. How do any of these relate to XMTC parallel programming?
    A: XMTC parallel programming is simpler and different.

    Q: The problem of algorithms being taught independently of programming is present within the exclusively serial world. What would you say to the many theorists who are resistant to the idea of having a heavy programming component in their courses?
    A: IMHO the serial case is completely different. Most students have experienced/learned serial programming BEFORE taking the serial algorithms course. This is NOT the case for parallel programming. My experience is that students learn best if parallel programming is coupled with parallel algorithms. The main difference is that the parallel algorithms course is where parallel programming should be FIRST taught. The reason is that parallelism requires introduction of some first principles representing an "alien culture" to students. In contrast, serial computing is related to: (i) mathematical induction, (ii) the way our brain instructs our body (as a single processor), etc. There is nothing out there that prepares us for parallel computing.

    Q: What text do you use for teaching parallel algorithms?
    A: I have been using my class notes:

    Warm thanks to Adam Smith and Aravind Srinivasan for their helpful comments on an earlier draft of this text.