Sunday, April 03, 2005

What happened to the PRAM?

When computational complexity gets accused to having no connection to reality, I bring up the story of the PRAM (Parallel Random Access Machines), a complexity model killed by technology.

Today we think of the class NC in terms of circuits: NC contains the problems solvable in polynomial-size and polylogarithmic-depth circuits. But Nick Pippenger originally defined the class to capture parallel computation: problems solvable on a PRAM with a polynomial number of processors and polylogarithmic time. The PRAM model had several processors that shared a polynomial amount of random-access memory. There were three main variations:

  • EREW–Exclusive Read/Exclusive Write: Every memory cell can be read or written only by one processor at a time.
  • CREW–Concurrent Read/Exclusive Write: Multiple processors could read a memory cell but only one could write at a time.
  • CRCW–Concurrent Read/Concurrent Write: Multiple processors could read and write memory cells. This variation had several subvariations depending on how one handled conflicting writes.
PRAMS got criticized due to the unrealistic nature of immediately addressable shared parallel memory. Areas like VLSI and Parallel Models (like the butterly network) worked to address these concerns. However while algorithmicists worried about the various PRAM models, achieving the better networks only causes a logarithmic factor increase in time and doesn't affect the complexity class NC.

So why don't we think PRAM anymore when we look at NC? Moore's Law. Processors got faster. Much much faster. The ideas of having many many processors each doing a tiny bit of work seems wasteful these days when we can just as cheaply have each processor do a lot of work.

We still see active research in parallel computing and one can speed up many computations using a large number of machines sometimes far away from each other just connected via the internet. But the best one could hope for is perhaps a quadratic improvement, not the exponential improvement that comes from PRAMs.


  1. Uzi Vishkin things the time has come
    for PRAM on a chip. See the link

    He makes several interesting points.

  2. I don't understand why some CS theory people apologize for the PRAM. NC is robust and interesting as a complexity class, and an easy way to show that a problem is in NC is to give a PRAM algorithm. That's all the argument I need for the PRAM's existence. And yes, I've heard all of the arguments about why the PRAM is completely unrealistic.
    As a theory researcher who did some work in the PRAM area, I never thought the PRAM was realistic. I did, however, think it was interesting. I also thought it was (and still is) relevant to computer science theory.

    It would be a shame if beautiful results like the various RNC algorithms for graph matching or Mulmuley's NC algorithm for finding the rank of a matrix were overlooked by the next generation of CS theorists. If you throw out the PRAM because it's unrealistic, then there's a whole lot of other stuff in computer science theory that should go with it.

  3. The PRAM research you mention is very far from irrelevant theoretically. Did you forget that the recent O(log n loglog n) space algorithm for st-connectivity of Trifonov is based in significant part on an EREW PRAM algorithm of Chong and Lam? (And the difference between EREW and CRCW PRAMs was a key motivator for that work since very different efficient CRCW algorithms for connectivity were already known.)

    Theoreticians got a lot of grief at the idea of throwing n processors at a problem of size n but the PRAM model was hugely liberating because it gave a convenient way to think about radically new ways to solve problems. The best parallel algorithms developed for the PRAM have had long-lasting impact on both theory and practice. People are now using in practice the key new ideas from parallel algorithms that they had denigrated a decade earlier. (Of course nobody is actually going to throw n processors at problems of size n.)

    We've had a good 15 year run where the improvements in processor design have outstripped the need for separate processors but already the best chips are typically dual processor designs. Grid computing is our way of dealing with 'embarassingly' parallel applications but it doesn't seem long before some of the more subtle parallel algorithms will be used in large-scale application.

    The PRAM story also points out another lesson: In the face of criticism of the PRAM model some theoreticians added complicated twists to their parallel algorithms in order to make them 'work-efficient': the product of processors X parallel time = O(sequential time). (That way one could use them somewhat efficiently at any number of processors.) Some of these work-saving ideas have been useful but, from what I have seen, the original simple inefficient ideas for PRAM algorithms have been far more important in practice than their more optimized successors.

    When the PRAM came out there was a lot of low-hanging fruit picked but also quite a few gems. Within theory the area declined as the low-hanging fruit disappeared (interestingly the Chong & Lam algorithm mentioned above came as the general interest had largely ebbed) but some great problems about NC algorithms still remain (e.g. GCD).

    The work inspired by the PRAM model may someday be seen as an example of theory being ahead of its time. We have to be ready to say 'I told you so' and not get hung up in the details about how the PRAM is not close to a precise model of how parallel computing is implemented.

    Paul B

  4. "But the best one could hope for is perhaps a quadratic improvement ..."

    Why can we only hope for a quadratic improvement?

  5. In my opinion, the criticism of PRAM, similar to the criticism of n^17 time algorithms, comes from a misunderstanding of the role of theory.

    Theory is not everything:
    we are not here to write papers that can be taken as is and implemented by engineers, and what non-theoreticians do is not "boring" implementation details but non-trivial and meaningful work, requiring novel ideas.

    Theory is also not nothing:
    the fact that our algorithms can not in general be impmented as is does not mean they are meaningless.

    When you design an n^17 algorithm for a problem, you do not give a practical way to solve that problem. You prove something inherent about this problem, namely that it is in P, and so its hardness does not grow exponentially with the input size. Hopefully along the way you also produce some key insights that might later be used in practical algorithms for this problem, for other problems, or for other theory papers.

    Similarly, when you write a PRAM algorithm for a problem, you prove something inherent about the problem (namely that it is parallelizable) and again hopefully introduce some new ideas into this world, that if, beautiful enough, will eventually find their uses in theory or in practice.

    You never know when ideas like that will be used. A couple of years ago Shamir&Tromer gave a design for a special purpose parallel factoring machine (which I don't know if and to what extent used ideas from PRAM literature). Perhaps, as mentioned above, the PRAM will return.


  6. Moore's law is starting to hit the ceiling, which is indicated by CPU developers going multi-core, while not increasing clock rate much anymore. See for instance IBM/Sony/Toshiba's CELL chip => many cores with local memory. So, I'd place a bet that PRAM-related research will experience a renaissance.

    Greetings from Edmonton.

  7. Another link from Vishkin's
    home page

    title: "Is PRAM algorithmics implementable?"

  8. What was wrong with the PRAM (and the n^17) algorithm is that it was oversold, and that is why some CS theory people apologize for them. There is nothing wrong with studying a complexity model with lots of processors or an algorithm that takes time n^17. That is not what is being questioned.

    As someone who also did research on PRAMs, the talk at the time was that the PRAMs were going to be the next great thing. A similar overselling is when we equate all polynomial algorithms with practical.

  9. We have CRCW PRAM now in a commercial product, NVIDIA's CUDA general purpose programming model ( All threads within a single "cooperative thread array" can access a shared data cache. True, a CTA is limited to 512 threads (and only that many if the register footprint is fairly small). And it is not true PRAM in the strictest sense of the word, since the threads are time multiplexed in SIMD chunks. Nevertheless, it's real PRAM for all intents and purposes!