## Sunday, February 19, 2006

### Why Computer Science Theory Matters?

At the AAAS Annual Meeting on Friday, the CRA organized a session Computer Science Behind Your Science. Bernard Chazelle gave one of the talks Why Computer Science Theory Matters? based on an essay he wrote for the undergraduate magazine Math Horizons. In a pre-talk interview Chazelle argues that algorithms can help us explain scientific ideas in a fundamentally different way than simple mathematical formula.
Computer science is a new way of thinking, a new way of looking at things. For example, mathematics can't come near to describing the complexity of human endeavors in the way that computer science can. To make a literary analogy, mathematics produces the equivalent of one-liners – equations that are pithy, insightful, brilliant. Computer science is more like a novel by Tolstoy: it is messy and infuriatingly complex. But that is exactly what makes it unique and appealing — computer algorithms are infinitely more capable of capturing nuances of complex reality in a way that pure mathematics cannot.
When one asks scientists in other disciplines what role computer science has for them, one usually sees CS as a way to solve their large computational problems, like large matrix computations. The more enlightened realize the importance of algorithmic issues and even have a rough understanding of NP-completeness and what that means for the problems they would like to solve. But we haven't on a large scale made scientists in other fields realize that computation exists within the systems they study. Protein folding, economic markets, the ways astronomical bodies interact are all computational processes and once we can make this case, the ideas and tools of computational complexity and theoretical computer science can help them understand the strengths and limitations of these processes.

Suresh and Jeff have more on Bernard's talk and Scott has an interesting and not-unrelated post.

1. One of the main "philosophical" contributions of TCS to other areas that I have seen is the notion of computational efficiency as a means to discriminate between theories. In economics, this takes the form of modelling humans as probabilistic polynomial time Turing Machines, and then rejecting mechanisms that require economic actors to perform superpolynomial computation. In physics, this takes the form of rejecting theories of how the universe works that require Nature to perform superpolynomial computations.

The first insight leads to computational game theory. The second insight I haven't seen as much, although Scott Aaronson discusses it in his thesis and the recent paper on NP-hardness of problems associated with string theory may get into it (I haven't had time to read it yet).

2. I think these are very good examples. But, the interaction between sciences is a two-way street, and we might also use it to re-evaluate assumptions used in "our" science. For example: it is tempting to interpret the first example above as that we should reject mechanisms which require solving NP-hard problems. Now, this is not necessarily so: we know of many problems that are hard in the worst case, but are easy, say, in the average case. So unless we can argue that the players inputs are adversarial, the above conclusion is not so clear.

At the same time, however, hardness proof does point out that there is more to the problem structure than meets the eye, and stimulates further research. If a new structure is discovered, then we can investigate the complexity of problems under the new assumptions.

3. TCS transcends computers, indeed. But sometimes I feel that the scientists most blind to this fact are *computer* scientists. Non-Theory, that is.

4. What, you mean there exist problems in which the players inputs are not adversarial? (Sorry, that's the cryptographer in me writing.)

Yes, I should probably have written "inefficient" instead of "superpolynomial" above. That gives enough wiggle room to cover mechanisms that require humans to solve NP-hard problems but only easy instances thereof, for example. In the case of protocols involving humans (most of economics), it's not even clear that polynomial-time is the right notion of "efficient." It certainly isn't the right notion for a human without a computer assistant.

Actually, for a case of insights from other sciences applying to CS, let's look again at economics. The economic notion of an adversary that is simply selfish instead of fully adversarial is an important one -- it lets us analyze protocols that would fail under an arbitrary adversary and show that they actually should work well in situations with self-interested players.

5. TCS is just a sub area of Mathematics. Any other way to look at it will or could eventually lead to reinventing the wheel in many ways.

Surely, any (new) sub area of mathematics can be looked at as a "new paradigm", or even as a new discipline. (Think of Mathematical Logic for instance, or any other distinct part of mathematics, within mathematics).

The reason TCS is formally separate from Mathematics per se, is a matter of the historical evolution of science, industrial trends and financial interests within the faculties themselves.

6. In the case of people, "polynomial" is not the right notion of complexity. (People even cannot do tipping correctly.) An example that I was interested in. Assume you have polling on a stigmatizing subject ("have you shoplifted?", "do you understand the PCP theorem"?). You want to get a correct poll, but you are not interested in concrete outcomes. moreover, respondents are not going to tell you the truth, unless you can convince them that it is safe to say "I have shoplifted"/"I am dumb". The Randomized Response Technique from statistics offers a well-known answer: Before answering, toss a biased coin, so that with a probability $p>0.5$, you answer correctly and with probability $1-p$, you lie. If the number of respondents is large, you get an unbiased estimator to the fraction of stigmatized people, while the privacy of every individual is well protected. Clear and neat?

Not so... RRT has not been in use much, since people do not understand that it is safe for them to follow the protocol!

Imagine now that you want to use anything more complex... for example, a cryptographic protocol to check that the respondents actually follow the protocol. I bet they will cringe.

And how do you model the computational power of Mr X, a truck driver, and Mr Y, a Fields Medal winner? Are they both polynomial, but Y can do $n^9$ instead of $n^2$?

I don't believe into asymptotics... but this also might be the little green cryptographer in me.

7. ---In the case of protocols involving humans (most of economics), it's not even clear that polynomial-time is the right notion of "efficient."

In fact, in practice polynomial is not efficient even when involving computers. Efficient in practice means n polylog(n). Quadratic/cubic time is barely bearable, n^4 time means we can only solve mickey mouse sized problems.

If we are trying to make theory matter we should be well advised to heed to this fact.

8. Efficient in practice means n polylog(n)

I agree that, in practice, it is often the case that "polynomial" and "efficient" do not coincide. But, n polylog n can be inefficient as well. It is hard to "codify" practical efficiency using asymptotic methods.

9. Computer science is a new way of thinking, a new way of looking at things. For example, mathematics can't come near to describing the complexity of human endeavors in the way that computer science can. To make a literary analogy, mathematics produces the equivalent of one-liners � equations that are pithy, insightful, brilliant. Computer science is more like a novel by Tolstoy: it is messy and infuriatingly complex. But that is exactly what makes it unique and appealing � computer algorithms are infinitely more capable of capturing nuances of complex reality in a way that pure mathematics cannot.

Pardon my ignorance, but isn't this (at the crudest, highest level) what Wolfram said in his book? That the universe isn't governed by equations but by a process not unlike computation?

Except in his case, he went on to assert it, but Chazelle is merely arguing that CS is richer for this reason.

10. --- But, n polylog n can be inefficient as well.

No definition will satisfy every application. This definition covers the majority of the cases were are interested in. It makes no sense to stick to a definition of efficiency that is "wronger" simply because the much preferred correction is not perfect.

I can only think of one application where n polylog n is not acceptable: data streaming which requires linear time. Observe that queries which are usually thought to require log n time do no count, as an input size of n queries takes time n log n, i.e. n polylog n.

11. More on the n polylog n issue.

1) n polylog n, obviously, covers not only n log n, but n log^5 n as well. As a rule of thumb, any additional log increases the running time by an order of magnitude, so at the end you can get something quite slow.

There are quite a few algorithms out there with n polylog n time, which are not very efficient in practice exactly because of this reason.

2) Restricting the notion of efficiency to npolylog n excludes times such as n^1.5, which can be pretty managable. E.g., standard matrix multiplication has this running time, if n denotes the size of the input.

Overall, I think that asymptotic definitions of practical efficiency are tricky. At the end, you need to plug in the actual input size and see what happens.

12. So we both agree that n^18 is not reasonable, so we should stop calling polynomial algorithms "efficient". Let's call polynomial algorithms tractable which is a better term.

Then we can call O(n^3) algorithms manageable, o(n^2) algorithms efficient and O(n) algorithms super efficient or if you can think of better terms I welcome any suggestions.

13. For what it's worth, I think Chazelle really overstated his case, by a long shot (and I am a theorist). Where to start? I'll pick a few things that caught my attention.

"If a math giant from the past �- someone like Gauss �- were to come back to Earth...he would find that math is done much the same way that it was done during his life."

Isn't this partly because math has been around thousands of years longer? If Euclid were to come back, I don't think he would find math done "much the same way it was done during his life." TCS research will eventually stabilize (it already has), and then you can make the same statement about our field.

"For example, mathematics can't come near to describing the complexity of human endeavors in the way that computer science can."

I'm sorry, that is just ridiculous.

"Computer science is more like a novel by Tolstoy: it is messy and infuriatingly complex."

Actually, as far as I'm concerned this is the problem with TCS: most results are obtained through random-looking "tricks" and "tweaks"; what we lack, for the most part, is unification of large bodies of results (I'm thinking here specifically of algorithms) which mathematics has managed to provide.

"Biology, for example, is very quantitatively driven, so a computer science background is imperative."

Another ridiculous statement, as far as I could tell. Not to say that TCS can't be useful for certain types of biology research, but necessary?

"Third...are careers in the field of theoretical computer science."

Is he joking? I wonder what proportion of TCS grad students end up working in a TCS career.

I don't think the way to make the case for TCS is to contrast it to math, but rather (as a previous poster said) to compare it to math...

14. I also cringed at some of the
things Chazelle wrote.
Although it is important to point
out the importance of TCS and its
unique contributions, I don't
think we should go to extremes.
I found the following statement
in his write up to be out of place
and unreasonable characterization
of mathematics and TCS.

"For example, mathematics can't come near to describing the complexity of human endeavors in the way that computer science can."

I don't see how such exaggerated
statements are good for our
standing within the larger scientific community.

15. As a CS grad student who once considered TCS as a possible career path, I can say that it's a daunting field to approach. This is at least partially because of its messiness; as described above, to probably all but a few students, it can seem like a large and mostly disconnected set of tricks, hacks and gimmicks. Its beauty and its connection with physics, i.e., its ability to cast physics in a computational framework, are what keep me from abandoning it completely, and these are aspects that I've only recently discovered on my own. In fact, saying that makes me wish that I'd had more encouragement, and had it earlier in my education, to pursue TCS, i.e., there's a lot more to it than NP-completeness proofs, approximation algorithms and de-randomization routines.

16. "For example, mathematics can't come near to describing the complexity of human endeavors in the way that computer science can."

Well: if one thinks about TCS as a subfield of mathematics, then this statement is tautologically false. But if one thinks about TCS as a field somewhat different from classical mathematics (e.g., its new direction, etc.) then the statement makes an important point. Which is that: once the system is sufficiently complex, you are not going to have a formula that predicts its behavior; but, you can still have an algorithm which simulates the system efficiently, and methods to reason about such algorithms.

And, "human endeavors", whatever they are, are indeed quite complex.

17. ======================
I think that we need to view Chazelle's remarks in a wider perspective.

Consider the following, standard, identities:

(i) A total number-theoretic function (or relation, treated as a Boolean function), is algorithmically (Markov) computable if, and only if, it is Turing-computable.

(ii) A total number-theoretic function (or relation, treated as a Boolean function), is Turing-computable if, and only if, it is recursive.

(iii) A total number-theoretic function (or relation, treated as a Boolean function), is recursive if, and only if, it is algorithmically (Markov) computable.

So, algorithmic computability of total number-theoretic functions (and relations, treated as Boolean functions) is the same as their recursive computability.

However, recursive arithmetic is only one of the foundation pillars of mathematical languages. Another, of course, is Peano Arithmetic.

Now, Goedel has shown that there is a total primitive recursive relation, say Q(x), such that:

(a) Q(x) is true for all natural-number values of x;

(b) Q(x) is instantiationally equivalent to an arithmetical relation R(x) for all natural-number values of x;

(c) The formula [(Ax)R(x)] is not provable in first order Peano Arithmetic, even though the formula [R(n)] is PA-provable for any given numeral [n].

Further, it is not unreasonable to assume the thesis (unverifiable/irrefutable, in view of Turing's Halting argument) that, an instantiationally true total arithmetical function is Turing-computable if, and only if, it is provable in first order PA.

It follows that, although the arithmetical relation R(x) is instantiationally equivalent to the Turing-computable recursive relation (treated as a Boolean function), Q(x), the relation R(x) may not, itself, be Turing-computable (recursive).

So, is it reasonable - or even healthy - to promote the view that algorithmic computability, by itself, is, in some sense, "richer" than classical mathematics - which encompasses not only Recursive Arithmetic and Peano Arithmetic, but also Set Theory - for expressing our mathematically expressible abstract concepts?

Bhup