Tuesday, December 30, 2003

End of Year Thoughts

We had another solid year for theoretical computer science and computational complexity with many exciting results such as the polynomial algorithm for perfect graphs developed in a series of papers by Chudnovsky, Cornuéjols, Liu, Seymour and Vuškovic. We also saw a number of strong papers in derandomization, extractor construction, dimension reduction and many other areas of complexity.

In 2003 we celebrated the 100th anniversary of the births of Alonzo Church, Andrey Kolmogorov, John von Neumann and Frank Ramsey, mathematicians who work played a big role in complexity. Next up, Kurt Gödel in 2006.

Trends to watch for in 2004:

  • How will the reorganization of CISE and the end of the ITR affect NSF funding for theoretical computer science?
  • Will a hopefully improving economy push universities to increase their resources in computer science?
  • What trends will we see in recruiting this year? More and more Ph.D.'s seem to prefer postdoc positions though the number of these positions continue to decrease, especially in industry.
  • How will the choice of the US president in 2004 affect the future of scientific research in America? A critical question, but not one that a journalist will ask in any debate.
Happy new year to all! May we prove P≠NP in 2004.

Sunday, December 28, 2003

John von Neumann

Today is the 100th anniversary of the birth of John von Neumann, one of the greatest mathematicians of the 20th century. Let me discuss two aspects of his work, one big, one small, that have greatly affected computational complexity.

The von Neumann min-max theorem showed that every finite zero-sum two-person game has optimal mixed strategies. More formally, let A be the payoff matrix of a game, then

maxx miny xTAy = miny maxx xTAy
where x and y are probability vectors.

Andrew Yao used the min-max theorem to prove what we now call the Yao Principle: The worst case expected runtime of a randomized algorithm for any input equals best case running time of a deterministic algorithm for a worst-case distribution of inputs. The Yao principle has proven invaluable for proving upper and lower bounds for deterministic and probabilistic algorithms.

How can you get a fair coin by flipping a coin of unknown bias? You use the von Neumann coin-flipping trick: Flip the biased coin twice. If you get heads then tails output HEADS. If you get tails then heads output TAILS. Otherwise repeat. This procedure will output HEADS or TAILS with equal probability and if the bias is not too close to zero or one the expected number of repetitions is relatively small.

The von Neumann coin flipping trick is the first in a long line of research in complexity extracting random bits from weak random sources.

John von Neumann passed away February 8, 1957 in Washington, DC.

Friday, December 26, 2003

That's My State

Nothing like this Chicago Tribune story to ruin your holidays. Apparently the Illinois Board of Higher Education has launched a study of faculty productivity looking at what kind of research projects faculty undertake to how much time they spend in the classroom. According to board chairman James Kaplan "there's got to be a tangible, measurable benefit for the people of the state of Illinois for a professor doing research." Nothing good can come from this.

The University of Chicago is a private school unaffected by this study, but I would hate to see my friends at the various U. Illinois campuses to have to take unpaid leave to go to conferences.

Update 12/29: Daniel Drezner dissects this article. Also, I talked with a University of Illinois-Chicago professor who worries less about this study than budget cuts in the University of Illinois system due to the continual fiscal crisis in the state.

Monday, December 22, 2003

Scientific Superstars

On Saturday I visited the Einstein Exhibit at Chicago's Field Museum. Some manuscripts and letters and a nice exhibit explaining why time must vary if the speed of light remains a constant made this an interesting but not a must-see exhibit. The biggest surprise for me came from seeing how Einstein's fame happened overnight instead of the more gradual fame I would have expected. In 1919 a solar eclipse showed that light from stars do bend from gravitational forces. Einstein's fame grew immediately and his name became synonymous with genius.

This superstardom for a scientist doesn't seem to happen today. When Andrew Wiles proved Fermat's last theorem he did get some deserved attention but he never became a true household name. When you realize Wiles has hit the upper limit of fame a mathematician can receive (ruling out people like Ted Kaczynski and John Nash) one can see the Einstein effect of science may never return.

On the other hand, University of Chicago paleontologist Paul Sereno headlines the social page of the Chicago Tribune at the "Party With Giants." Perhaps scientists can still achieve more than fifteen minutes of fame after all.

Thursday, December 18, 2003

Does NP=UP?

Time for another of my favorite open problems.

Does NP=UP imply the polynomial-time hierarchy collapses?

UP is the class of languages accepted by nondeterministic polynomial-time Turing machines that have at most one accepting computation for all inputs.

This problem has loose connections to Valiant-Vazirani but Hemaspaandra, Naik, Ogiwara and Selman have the most closely related result. Consider the following proposition.

(*) There is a set A in NP such that for all satisfiable formula φ there is a unique satisfying assignment a of φ such that (φ,a) is in A.

Hemaspaandra et. al. show that (*) implies the polynomial-time hierarchy collapses to the second level.

For all we know, (*) and NP=UP are incomparable. If (*) holds for some A in P then NP=UP by just guessing a. We don't know whether NP=UP implies (*) since the accepting computations of a UP machine accepting SAT need not reveal a satisfying assignment of a formula.

There exist an oracle relative to which UP=NP≠co-NP. A relativized world with UP=NP and Σ2p≠Π2p remains open.

Monday, December 15, 2003

Does PowerPoint Make You Dumb?

Yesterday the New York Times Magazine had its annual "The Year in Ideas" issue. Ideas Futures Markets in Everything and Proving You're Human I had discussed in earlier posts here and here. Instead let us discuss whether PowerPoint Makes You Dumb?

As an avid PowerPoint enthusiast, I found the Times discussion quite misleading. The NASA scientists followed the advice of Tufte, filling the slides full of information that made them impossible to follow. One should not try to give complex ideas in a presentation; rather give an overview of these ideas and leave the messy details to technical articles. PowerPoint makes it easy to give overview talks and hard to give talks full of ugly details, crammed slides and detailed formulas, almost always leading to better talks.

Friday, December 12, 2003

Ordering Authors

Every scientific field has their own rules for the order of authors in a paper. In theoretical computer science, tradition dictates that we list the authors alphabetically by last name. I don't agree with this tradition; rarely do all the co-authors of a paper play an equal role. The decision whether to add someone as a co-author, and thus an equal, often becomes difficult.

But breaking with tradition can have its own problems. I have three papers that break the alphabetical rule though two were in biology which has its own rules. In the other back in 1990, Carsten Lund, a graduate student at the time, made the key step in developing an interactive proof system for the permanent. For that we made him first author in the Lund-Fortnow-Karloff-Nisan paper. In retrospect I regret this decision. It only added confusion to those who cited the paper. Also did Lund not play as important a role in other papers where we kept alphabetical order? Breaking with tradition, even with the best of intentions, can often cause more harm than good.

Wednesday, December 10, 2003

Seven Naughty Words

Want an easy rule to greatly improve your writing? Just avoid the following words, particularly in the abstract and introduction of your papers.

am is are was were be been

Avoiding these seven forms of "to be" will force you to write in the active tense instead of the passive making your sentences less boring. For example, instead of "It is known that all functions can be computed securely in the information theoretic setting" use "We can compute all functions securely in the information theoretic setting."

Taking this rule to the extreme can lead to some very convoluted sentences but, I promise, forcing yourself to think actively about every statement you write will make a great difference in your prose. In almost all cases the right answer is "not to be."

Monday, December 08, 2003

Two Uses of Randomness

Over the last 15 years, two very active research areas seem at odds. Derandomization results have shown us that we can often remove randomness from computation but interactive proof systems and PCPs exhibit incredible power from randomness. There is no contradiction here, just two very different ways we use randomness in complexity: for searching and for hiding.

Typically we think of randomness for searching, for example finding disagreements with Fermat's little theorem to show a number is composite or taking random walks on graphs to show they are connected. Derandomization results have given us reasons to believe we can replace the randomness in these computations with pseudorandom number generators.

Randomness can also play the role of hiding, since no one can predict future coin tosses. In interactive proofs we make the jump from NP to PSPACE because of randomness. For PCPs with O(log n) queries the jump goes from P to NP and with poly queries from NP to NEXP, in the later case classes which are provable different. In all these cases the prover cannot cheat because it cannot predict coin tosses not yet made by the verifier. A verifier using a pseudorandom generator will fail here, since the prover could then predict the verifier's actions.

AM protocols have the verifier flip coins first so no hiding going on, rather searching for a statement Merlin can prove and we expect some derandomization for AM. The result that MA is in AM says that sometimes we can replace hiding randomness with searching randomness.

Thursday, December 04, 2003

A Regular Problem Solved

Paz Carmi, Yinnon Haviv and Enav Weinreb from Ben-Gurion University have solved the regular language problem I posted last month.

The problem came from Janos Simon based on a homework question in Kozen's book. Let L(A)={x|x^m is in A for some m}. The homework question asked to show that L(A) is regular if A is regular. The question Janos asks was how many states do we need for a DFA for L(A) if A has n states. Carmi, Haviv and Weinreb show that an exponential number of states are required.

Not only did they solve the problem but also sent me this nice write-up of the proof. I believe this is the first time someone has directly solved a problem I've given on this weblog. I owe each of them a beer (or non-alcoholic beverage of their choice).

Update 12/9: I received the following today from Markus Holzer.

It seems that I have missed your posting about the problem last month. The problem you have stated was already solved in June by Jeff Shallit and co-authors. They have given a lower bound on the DFA accepting root, by considering the (largest) syntactic monoid induced by two generators. The latter problem on syntactic monoid size is of its own interest and I was working on that for a while, therefore I know the result of Shallit et al on the root descriptional complexity. Maybe you also owe the beers to Shallit et al.

Wednesday, December 03, 2003

The Elsevier Dilemma

The Cornell University Library has announced it will drop a substantial number of their Elsevier subscriptions, part of a general problem Cornell and other libraries are facing with higher costs and different pricing models from commercial academic publishers.

I have wanted to write a post on this topic for a while but I find it difficult to truly understand the problems or the potential solutions. Elsevier does a nice job with their portal and their publishing, but because of consolidation and cheap distribution via the internet, they have changed their pricing model in ways that make it difficult for many libraries to afford all of the journals that they need.

This poses some moral questions: Should we avoid submitting our papers to Elsevier journals? Is it wrong for me to serve on the editorial board of the Elsevier-published Information and Computation? I just don't know.

Tuesday, December 02, 2003

The NSF and SIGACT News

First an update on NSF program solicitations: The Formal and Mathematical Foundations cluster has posted its solicitation which includes computational complexity. The deadline is March 4. The program announcement for the Emerging Models and Technologies for Computation Cluster, which includes quantum and biological computing, is still under development. Also the ITR solicitation has also been posted, with some major changes from previous years.

Donald Knuth's tribute to Robert Floyd highlights the December SIGACT News. Also reviews of a bunch of crypto books, a column on sublinear-time algorithms and the complexity theory column on "Post's Lattice with Applications to Complexity Theory."

In my mailbox yesterday was not one but five copies of SIGACT News shrink-wrapped together. Once I unwrapped them and looked at the labels, only the outer one belonged to me. There were two for other professors in my department, one for our library and one for the library of Loyola University Chicago, which is on the other side of the city. I'm not sure if it was a mistake or some attempt by the ACM to reduce mailing costs, but I hope this is a one-time occurrence.

Monday, December 01, 2003

Show Us Your Papers

I hope you all got your Complexity submissions in on time last week. Now let the rest of the world see your work. Make them into technical reports, put them on your home page, submit them to ECCC and/or CoRR (neither of which conflicts with submitting to Complexity). People won't judge you, okay maybe they will, but no one gets famous by keeping their work a secret.

And since I know you'll ask: I didn't have any Complexity submissions this year. It happens.

Wednesday, November 26, 2003

Critical Mass

In the left column of this weblog's homepage, I've added a new "weblogs" section to list the academic weblogs that I have been following. Not very long right now, so start up a relevant weblog and I'll add you to the list.

What is Critical Mass? This is a weblog written by U. Penn English Professor Erin O'Connor. Not much computer science but she covers a number of issues about academic freedom, an important issue to every academic. Take a look.

Don't forget that the 2004 Complexity Conference submissions are due today. Have a great Thanksgiving!

Monday, November 24, 2003

Theory and XML

XML (eXtensible Markup Language) has become quite a popular data format in recent years. XML roughly corresponds to a tree. For example,

<person><name>Harry</name><age>29</age></person>
<person><name>Jane</name><major>Computer Science</major></person>

represents a tree. The root having two children, each labeled "person". The first of these children has two children named "name" and "age". The first of those children has a leaf node labeled with the phrase "Harry". For a larger example, see the RSS feed for my weblog.

XML was designed as a flexible way to present documents for later displaying. Since the XML format can be easily produced and parsed, XML also serves as a standardized method for transferring data between databases, far better than the old CSV (Comma-Separated Values) format.

Recently there have been some work on directly manipulating and querying the XML data. As a theorist, this seems like a bad idea, particularly for larger databases. While XML completely represents the underlying tree, it is not a good implementation of that tree. Basic tree operations like parent and sibling are very expensive just looking at XML. About the only thing one can do quickly with XML is depth-first search. Far better to "shred" the data into a better tree implementation like a DOM (Document Object Model) or a full-fledged database and do the work there, rewriting a new XML if needed.

One issue though is when the XML file is on the order of 5-10 GB, a bit larger than what can be stored in the memory of a typical desktop machine. One can stream through the file rather quickly but cannot recreate the tree. This opens up some interesting theoretical questions:

  1. Given a stream of data in XML format, how can one do efficient analysis and manipulations of the underlying tree? I suspect one would want to sometimes shred subtrees, but you cannot determine the size of the subtree until after its been streamed. Perhaps some randomness or streaming the file multiple times might be helpful.
  2. XML might not be the right model of a tree for this purpose. What is the best way to stream a tree or other data structure to allow an efficient implementation of the basic operations of the data structure? Perhaps some redundancy might be useful.

Friday, November 21, 2003

Rational Functions and Decision-Tree Complexity

I thought I should mention some of my favorite and most frustrating open questions over the years. Here's one of them:

Let f:{0,1}n→{0,1}. Let h and g be n-variable degree d polynomials over the reals. Suppose for all x in {0,1}n, g(x)≠0 and f(x)=h(x)/g(x). Is there a constant k such that the decision-tree complexity of f is bounded by dk?

The decision-tree (or query) complexity is the number of bits of x that need to be viewed to determine f(x). The queries to the bits of x can be adaptive. I'm particularly interested in the case where d is poly-logarithmic in n.

Nisan and Szegedy answer the question in the affirmative if g(x)=1. Their result holds even if f(x) is only approximated by h(x). However if we allow arbitrary g(x), h(x)/g(x) can closely approximate the OR function which requires looking at all of the bits. The case where we require exact equality of f(x) and h(x)/g(x) is the open question at hand.

Update 1/14/26: Solved!

Wednesday, November 19, 2003

Midwest Theory Day

Over the years a number of localized theory workshops have developed which bring together researchers from around the area to mingle and learn about some of their research. For example BATS (Bay Area Theory Seminar-San Francisco area), CATS (Capital Area Theory Seminar-Washington DC), DIMACS Mixers (New Jersey), Netherlands Theoriedag and others including many I've never heard of.

In the middle of America we have the Midwest Theory Day. Generally since 1980 this is a day of talks held in the fall in the Chicago area and in the spring somewhere else in the Midwest such as the University of Wisconsin, Indiana, Illinois, Purdue, Notre Dame, etc. The next Midwest theory day will be held December 13 at DePaul University and in the spring on April 24 at the University of Iowa.

So come to Chicago or attend (or organize) you own local area theory seminar and meet your theory neighbors.

Monday, November 17, 2003

Editors Needed

Back in my science-fiction reading days, I particularly remember one editorial written in one of those anthology magazines about 1980: In the near future, you will be able to access, via your personal computer, any science fiction story right after it has been written. If you like a certain author, you can read other stories from that author, even if we didn't decide to put it in this magazine. In this future world, will you still need me, the editor? The answer is yes, for there will be way too much dreck out there for you to find the good stories within, and you will need people like me to point them out to you.

The future is now and though I haven't kept up with science fiction, the same issue applies to academic publications. Recent posts by Michael Nielsen and on Slashdot have asked: With nearly all new papers in physics and computer science easily accessible on the web, how do you find the ones worth reading?

Conferences have traditionally played this role in computer science. But, by definition, paper choices are decisions by committee and with the massive growth in the field, many good papers do not appear in the major conferences.

What we need are "editors"! You can help. Write a survey paper, or spend a page in your research papers discussing the important earlier results in a field. Maintain a web page pointing to papers you find interesting. Start a weblog saying what you find interesting--you don't have to post long or often, just to say, hey, this paper is worth looking at. This way people with similar interests to you can find out what at least you think is important. Only by many of us working together can we make the interesting papers stand out.

Friday, November 14, 2003

A Regular Problem

Here's a problem from Janos Simon. For a set A, let
L(A)={x | for some m≥0, xm is in A}
Simon asked, as a homework problem, to show that if A is regular then L(A) is also regular. The standard solutions require an exponential increase in the number of states. Simon asks whether this is needed in general. If A is accepted by an n-state DFA, can one find a poly(n)-state DFA for L(A) or is an exponential state DFA for L(A) required?

As an aside the submission deadline for the Complexity Conference is right around the corner. Get your papers ready.

Wednesday, November 12, 2003

Opportunities at Chicago (Blatant Plug)

[I will do this just once. Feel free to use the comments link to mention your own institution.]

We are seeking theory graduate students and faculty (all ranks) in the Computer Science Department of the University of Chicago. The Toyota Technological Institute, located on the University of Chicago campus, is looking for both faculty and postdoctoral candidates.

The Theory Group has expanded over the past few years and will continue to grow. We have a long tradition of outstanding doctoral students and notable research accomplishments.

Click here for further information.