I'm a bit skeptical--parallel computing has many pitfalls, one has to be careful about data access and order of execution, that could trip many students if they try to learn parallel computing off the bat. But I'm happy that CMU is experimenting and we'll be watching.

Guy and I chatted a bit about theory's role in parallel computing. We both agree that the PRAM model doesn't really capture today's parallel computers because it assumes exponentially more processors than computation time. Guy would rather see a clean tradeoff: If one has a sequential algorithm that takes time T(n), he would like a parallel algorithm that takes time t(n) with p(n) processors for p=t(n)

^{ε}and p(n)t(n)=O(T(n) polylog T(n)). That way you can have a smooth trade-off so initially with fewer processors you can still get some advantage and it will scale nicely as the number of processors grows.Guy's primary question was s-t shortest path. One can solve shortest paths by a PRAM using polynomial number of processors and polylogarithmic time using matrix multiplication methods. But can one have an algorithm using p=n

^{ε}processors and time t(n) with p(n)t(n) quasilinear?
Parallelism could be introduced in a framework without the pitfalls you mention. For example, if there simply

ReplyDeleteisn'tany shared state, then you don't need to synchronize data access. Here's an explanation of parallelism!=concurrency.Cilk might look more familiar, but it still has some explicit synchronization.This topic links-up nicely with several of Dick Lipton's recent topics ("The Shadow Case Model of Complexity" ... "Natural, Natural, and Natural") on the general theme: What measures of complexity are

ReplyDeletenatural?My bibtex files have many quotations to the effect that ideas of naturality in mathematics have historically develops "by the treatment of many specific problems".

So let's look at some concrete problems that nowadays are receiving plenty of attention: (1) migrate LAPACK onto GPUs ... and while we're at it, (2) migrate our congugate gradient methods onto GPUs too ... and heck (3) migrate our sparse solvers and compressive sampling algorithms onto GPUs too.

All of the preceding are receiving huge investments of engineering effort nowadays, for the simple reason that *lots* of folks want to solve practical problems—in dynamics, thermodynamics, and optimization—by evolving trajectories on state-spaces of (say) 10^6 dimensions or more, that are pulled-back from state-spaces of much larger numbers of dimensions ... and they want to solve these practical trajectory-tracing problems quickly, cheaply, and generally.

The result is a dynamic confluence of our various notions of naturality in relation to our evolving notions of computational complexity: the technological naturality of multicore processing, the engineering naturality of simulating trajectories, the physical naturality of dynamic compression onto low-dimension state-spaces, the geometric naturality of pullback, the algebraic naturality of symmetries in the dynamics, the informatic naturality of preconditioners.

To be useful, new notions of computational complexity somehow have to represent the confluence of *ALL* of the preceding notions of naturality ... technological, physical, geometric, algebraic, and informatic (and more).

Dovetailing these various notions of naturality sure ain't easy ... but it has global practical consequences ... and that's why this post ... and Dick Lipton's columns too ... are making a very important contribution by posing these wonderful questions.

The point is that posing new definitions of computational complexity isn't solely an abstract problem; it is also a very practical problem, that is intimately linked to our most fundamental ideas of naturality, that not just mathematicians care about, but practicing engineers too.

1. What's the quantification over epsilon? For all 0<epsilon<1 there exists an algorithm?

ReplyDelete2. s-t reachability in a directed graph is the special-case of shortest paths where all weights are zero or infinity. The best currently known algorithm is in a paper by Ullman and Yannakakis in SPAA 1990 titled "High-probability parallel transitive closure algorithms". On sparse graphs it has runtime t=O~(m/sqrt(p)), where m is the input size and p is the number of processors. (This works for any 1<=p<=n^{4/3}.) This is within logarithms of optimal when p=1, but the work p*t grows substantially with the number of processors. So even this special case of shortest path does not have a known algorithm satisfying Guy's wish list.

I don't see how the goal is at all in conflict with the PRAM model. A lot of the later PRAM work was in "work-optimal" (or nearly so) parallel algorithms. If you have a work-optimal fast parallel algorithm then using a simple slow-down simulation you get an algorithm with the parameters you suggest.

ReplyDeleteIt isn't the PRAM that is the issue, it's the complexity class NC that's the red herring. It focuses on an optimization that is not so critical at this point. (For example, n^{1/3} is only roughly (log n)^2 for n around a billion.) It is not clear what the right complexity view is here. There are natural P-complete problems where one can get speed-ups down to sqrt{n} time in a fairly work-efficient way.

Paul Beame sums up the issues nicely.

ReplyDeleteWhat's the best survey/book/resource on parallel algorithms (from a TCS perspective)?

ReplyDelete"Clean tradeoff" is an oxymoron.

ReplyDeleteUsing multiple processors allows more problems to be solved in a "reasonable time" (and other ambiguities). However there will always be problems which don't fit into how the "tradeoff" was made.

To give somewhat of a concrete example, Chinook can play a "perfect" game of checkers in that it will never make a mistake and loose. However the game of checkers is not completely solved. There are just too many possible checker games to spin through them all.