Thursday, February 22, 2007

Henzinger on Algorithms

A reader writes about coming across a 2003 CIO article about Google Research Director Monika Henzinger and being quite surprised to read the following:
But it was while teaching courses on her beloved algorithms at Cornell University when she had a flash. "I realized that efficient algorithms were fun but not very useful to the world anymore," she says.
The reader asks if there any reasonable sense in which that could be true?

While most algorithms developed by theorists have little practical value and will never see code, efficient algorithms play a critical role in any large system, especially at Google. As Henzinger states herself later in the same article:

If you think Google is fast, it's because we have good algorithms.

20 comments:

  1. I think the implication is that there are hardly any use for algorithms that run in time n log log log n instead of n log log n. That is to say, most of the pioneering work has been done already, and what remains are only incremental improvements. I would wager that the "good algorithms" that Google uses are largely based on research done several decades ago...

    ReplyDelete
  2. The difference between practical efficiency and theoretical efficiency? Obviously there is overlap but in the real world who actually profiles system requirements based on limits to infinity?

    ReplyDelete
  3. I guess the implication is that good algorithms need not be efficient (i.e, provably efficient).

    ReplyDelete
  4. Or perhaps she meant to say that the fastest graph or sorting algorithm is not as relevant to today's computing needs as it was in the early 70s when we first looked at them?

    I would wager that the "good algorithms" that Google uses are largely based on research done several decades ago...

    ...and you would be wrong. Many search engines use suffix arrays, data stream algorithms, page ranking and other matrix eigenvalue computations all of which are less than two decades old.

    ReplyDelete
  5. I disagree with all the previous posters. The issue is that the models of computation required have some different properties than standard uses of the RAM model.

    This has meant that new algorithm development are needed to deal with these new models. Thus the need for algorithms is even greater than it would have been if they simply took algorithms from your favorite textbook.

    Lots of algorithms research since the mid 1990's has focused on these new models. This focus has helped to keep the field exciting.

    Oh, and by the way, with the sizes of these data sets, getting that O(nlog n) algorithm down to O(n) is even more important.

    ReplyDelete
  6. Whether improving O(lg n) to O(lglg n) is great work or theoretical nonsense depends very much on context. Tell somebody programming a router to use binary search and you are certain to be laughed at.

    We should also understand that reducing 2^n to n^3 is equally useless if the instances people care about have n=1M. Or designing O(lg^2 n) approximation algorithms for a lot of problems.


    As far as modern developments and practice are concerned, I think the main problem is that theoretical ideas are not promoted enough to the practical community. The old algorithms made it into intro classes, and everybody knows they exist. For new algorithms, many practitioners don't even realize they can hope for something better.

    For certain problems, where practical heuristics are not great at all, it is obvious to every practitioner that he should hope for something better. When there's a theory talk about such a problem, the room is packed with people hoping to learn something nonobvious.

    ReplyDelete
  7. It's not that algorithms don't matter. They do. It's just that applications matter more to the average human in the world, and systems that implement good algorithms matter more than algorithms that cannot be reasonably implemented.

    There is always some tension between theoretical branches of science and the more applied branches. To the extent that we can bridge those gaps, we'll improve science.

    I think the RAM is actually a reasonable model in which to analyze serial algorithms. On the other hand, most of what we do at Google involves parallelism, and there are great opportunities for analysis of parallel algorithms. Unfortunately most of the academic literature on parallel algorithms involves impractical models of computation that have little to do with real systems. Think of it as an opportunity for theorists to do some work that has high impact outside of theory...

    ReplyDelete
  8. In my opinion, it is a major theoretical challenge to devise a good theoretical notion implying practical (as opposed to asymptotic) hardness or efficiency. It is partially approached by self-reducibility, but this is certainly not enough.

    ReplyDelete
  9. Thank you Kevin for being the only voice of reason in these comments.

    ReplyDelete
  10. "In theory there is no difference between theory and practice. In practice there is."

    Yogi Berra

    ReplyDelete
  11. Just to remark, efficient algorithms are at the heart of modern system engineering, which largely centers upon efficient system simulation by model order reduction (MOR).

    The recent virtual rollout of the Boeing 787 is a good example of the central role that efficient simulation algorithms play in modern system engineering.

    In particular, the social and economic unity of the 787's (enormous) development enterprise are intimately entangled with the mathematical and algorithmic integrity of its engineering design. They are directly related, one to the other.

    From a mathematical point of view, the natural "complexification" of Boeing's classical MOR algorithms are the (many) emerging algorithms for efficient quantum system simulation.

    Just as the notion of analytic continuation greatly increases the power of functional analysis, the natural mathematical complexification of classical MOR yields algorithms that greatly increase the efficiency of quantum system simulations.

    We explain this point-of-view to nonspecialists (e.g., our biomedical colleagues) as follows.

    ----------

    Richard Feynman said in 1966: "We are very lucky to live in an age in which we are still making discoveries. It is like the discovery of America—you only discover it once. The age in which we live is the one in which we are discovering the fundamental laws of nature, and that day will never come~again."

    Our view is similar to Feynman’s, but updated for a new century: "We are very lucky to be living at the dawn of the age of quantum system engineering. Our age is the one in which engineers will learn to create near-perfect technologies, for example quantum-limited microscopes that observe for the first time all biological molecules in situ, and thereby help humanity discover our own inner workings."

    ----------

    From this engineering point of view (and from many other points of view too), we are truly entering "the century of algorithms." Fun!!!

    ReplyDelete
  12. There was an interesting talk at PODC 2006 by T. Chandra regarding the difference between theory and practice in implementing the Paxos protocol at Google. One take-home message was that the "page complexity" of an algorithm is more important in practice than in theory. (Particularly in distributed computing where debugging is harder)

    ReplyDelete
  13. Research in TCS algorithms is mostly
    about ideas. Many of those don't
    lead to practical things. Even when
    they can much extra work is needed
    either in terms of engineering or
    further development. An average TCS
    person does not gain anything by
    doing this follow up work. Whether
    this is good or bad is hard to tell.
    I think there is some responsibility
    on the applied people to seek out
    the best algorithms that exist for
    a problem and also to interest
    theoreticians to work on their
    problems. It is not possible for
    a theory person to solve every
    variant that might come up in
    practice in advance.

    ReplyDelete
  14. This thread is SOOO close to being useful, let's push it along....

    Hey Kevin---

    What applications do you care about and where do you think current models of concurrency and parallelism fall short for analyzing these applications? What factors could added to the current models to make them more useful to you?

    ReplyDelete
  15. Having talked to researchers at Google, Micorosoft, and Intel, I am confident that in some of the most important emerging areas (high-dimensional data analysis, object recognition/intelligent image and video processing, massive databases, machine learning) there is ample opportunity for theory to export _groundbreaking_ technology to industry. One problem is that the people working at those companies don't have the technical background to understand our approaches, and this is crucial because good implementations require high quality engineering to go along with high quality ideas.

    Just as some of the bright young TCS researchers are importing ideas from mathematics on a regular basis, I suspect that in the next five years, a mathematically minded systems person is going to blow away the world by importing techniques from TCS into actual systems.

    I'm not implying that this doesn't already happen, but the potential for more of it is overwhelming.

    ReplyDelete
  16. Anonymous says: This thread is SOOO close to being useful, let's push it along....

    LOL! A comment that aptly applies to the entire blogosphere. Please *do* consider helping to "push it along".

    ReplyDelete
  17. > Even when
    > they can much extra work isneeded
    > either in terms of engineering or
    > further development. An average TCS
    > person does not gain anything by
    > doing this follow up work

    There is *plenty* to be gained by a TCS person from putting the "extra work" to make the algorithms practical. There is (a) the satisfaction of your algorithms being used in practice, (b) better understanding of the structure underlying the problems, which can lead to further theoretical work. If this sounds too nebulous, there are also concrete carrots, such as (c) your citations will shoot up (if people find your algorithms useful, they will cite them) and (d) you will have much easier time convincing funding agencies to give you money for research.

    ReplyDelete
  18. I won't attempt to map out the big problems in parallel computing, and different applications matter to different communities.

    If you think abstractly about what matters to Google, you can at least identify index builds and index serving. These fit into a clear theme that has emerged over the history of high performance computing, namely that for technological reasons, our ability to perform computation seems to always outstrip our ability to move data around. That's why we have caches in microprocessors, and why the most expensive supercomputers like Blue Gene have a significant amount of their budget spent on the communication network.

    In building an index, we have to take the (document, term) relations and organize them by term. This ends up performing a personalized all-to-all communication of what is essentially the entire data input set.

    The PRAM model essentially ignores the cost of data movement, focusing instead on the cost of computation once the data has been accessed. There have been other models proposed, but there is still a lot of room for better models of computation and programming that are more representative models of the technology that we can build.

    ReplyDelete
  19. Kevin McCurley said...

    The PRAM model essentially ignores the cost of data movement, focusing instead on the cost of computation once the data has been accessed. There have been other models proposed, but there is still a lot of room for better models of computation and programming that are more representative models of the technology that we can build.


    What is the problem with other models,
    e.g. external-memory, cache-oblivious, etc?

    To build a more represntative model,
    one would first have to know, why the current models are not "representative enough".

    I think more information on this could be very helpful.

    ReplyDelete
  20. Our QSE Group's impression is that most of the literature on algorithms is, well, algorithmic.

    As with any branch of mathematics, other points of view are feasible -- algebraic, differential, geometric -- and one mark of maturity in a mathematical discipline is that these points of view begin to fuse.

    Engineering algorithms are often focused upon simulation. The simulation of linear systems has reached a reasonable degree of maturity, based largely on the analytic continuation of linear response functions -- there is an immense body of literature on this topic.

    The simulation of nonlinear systems (in our opinion) is poised for a similar leap forward, based upon the "complexification" of state space that is naturally associated with the simulation of quantum states. The mathematical invariances that are naturally associated with this complex state space provide (we think) a mathematical "handle" upon nonlinearity that has previously been lacking.

    The scope of application for nonlinear quantum dynamical simulation is Google-scale (we think). After all, every cell of the human body has within it 100X as many atoms as there are stars in the galaxy. In principle, it seems that we are allowed by quantum mechanics, and challenged as engineers and mathematicians, to observe these individual atoms by quantum spin microscopy, and then to "fly" them at the system level by quantum dynamical simulation.

    Conducted as a whole-biome survey, this structure-and-function survey would be a planetary-scale effort -- many orders of magnitude larger than the genome project.

    This is just to point out, that fundamental research in algorithms contributes not only to existing large-scale enterprises like Google, but to the genesis of wholly new large-scale projects.

    If you ever wondered why our QSE Group takes such interest in fundamental algorithms, well, that's why.

    This is a great topic. Thanks, Lance.

    ReplyDelete