Wednesday, November 26, 2003

Critical Mass

In the left column of this weblog's homepage, I've added a new "weblogs" section to list the academic weblogs that I have been following. Not very long right now, so start up a relevant weblog and I'll add you to the list.

What is Critical Mass? This is a weblog written by U. Penn English Professor Erin O'Connor. Not much computer science but she covers a number of issues about academic freedom, an important issue to every academic. Take a look.

Don't forget that the 2004 Complexity Conference submissions are due today. Have a great Thanksgiving!

Monday, November 24, 2003

Theory and XML

XML (eXtensible Markup Language) has become quite a popular data format in recent years. XML roughly corresponds to a tree. For example,

<person><name>Jane</name><major>Computer Science</major></person>

represents a tree. The root having two children, each labeled "person". The first of these children has two children named "name" and "age". The first of those children has a leaf node labeled with the phrase "Harry". For a larger example, see the RSS feed for my weblog.

XML was designed as a flexible way to present documents for later displaying. Since the XML format can be easily produced and parsed, XML also serves as a standardized method for transferring data between databases, far better than the old CSV (Comma-Separated Values) format.

Recently there have been some work on directly manipulating and querying the XML data. As a theorist, this seems like a bad idea, particularly for larger databases. While XML completely represents the underlying tree, it is not a good implementation of that tree. Basic tree operations like parent and sibling are very expensive just looking at XML. About the only thing one can do quickly with XML is depth-first search. Far better to "shred" the data into a better tree implementation like a DOM (Document Object Model) or a full-fledged database and do the work there, rewriting a new XML if needed.

One issue though is when the XML file is on the order of 5-10 GB, a bit larger than what can be stored in the memory of a typical desktop machine. One can stream through the file rather quickly but cannot recreate the tree. This opens up some interesting theoretical questions:

  1. Given a stream of data in XML format, how can one do efficient analysis and manipulations of the underlying tree? I suspect one would want to sometimes shred subtrees, but you cannot determine the size of the subtree until after its been streamed. Perhaps some randomness or streaming the file multiple times might be helpful.
  2. XML might not be the right model of a tree for this purpose. What is the best way to stream a tree or other data structure to allow an efficient implementation of the basic operations of the data structure? Perhaps some redundancy might be useful.

Friday, November 21, 2003

Rational Functions and Decision-Tree Complexity

I thought I should mention some of my favorite and most frustrating open questions over the years. Here's one of them:

Let f:{0,1}n→{0,1}. Let h and g be n-variable degree d polynomials over the reals. Suppose for all x in {0,1}n, g(x)≠0 and f(x)=h(x)/g(x). Is there a constant k such that the decision-tree complexity of f is bounded by dk?

The decision-tree (or query) complexity is the number of bits of x that need to be viewed to determine f(x). The queries to the bits of x can be adaptive. I'm particularly interested in the case where d is poly-logarithmic in n.

Nisan and Szegedy answer the question in the affirmative if g(x)=1. Their result holds even if f(x) is only approximated by h(x). However if we allow arbitrary g(x), h(x)/g(x) can closely approximate the OR function which requires looking at all of the bits. The case where we require exact equality of f(x) and h(x)/g(x) is the open question at hand.

Wednesday, November 19, 2003

Midwest Theory Day

Over the years a number of localized theory workshops have developed which bring together researchers from around the area to mingle and learn about some of their research. For example BATS (Bay Area Theory Seminar-San Francisco area), CATS (Capital Area Theory Seminar-Washington DC), DIMACS Mixers (New Jersey), Netherlands Theoriedag and others including many I've never heard of.

In the middle of America we have the Midwest Theory Day. Generally since 1980 this is a day of talks held in the fall in the Chicago area and in the spring somewhere else in the Midwest such as the University of Wisconsin, Indiana, Illinois, Purdue, Notre Dame, etc. The next Midwest theory day will be held December 13 at DePaul University and in the spring on April 24 at the University of Iowa.

So come to Chicago or attend (or organize) you own local area theory seminar and meet your theory neighbors.

Monday, November 17, 2003

Editors Needed

Back in my science-fiction reading days, I particularly remember one editorial written in one of those anthology magazines about 1980: In the near future, you will be able to access, via your personal computer, any science fiction story right after it has been written. If you like a certain author, you can read other stories from that author, even if we didn't decide to put it in this magazine. In this future world, will you still need me, the editor? The answer is yes, for there will be way too much dreck out there for you to find the good stories within, and you will need people like me to point them out to you.

The future is now and though I haven't kept up with science fiction, the same issue applies to academic publications. Recent posts by Michael Nielsen and on Slashdot have asked: With nearly all new papers in physics and computer science easily accessible on the web, how do you find the ones worth reading?

Conferences have traditionally played this role in computer science. But, by definition, paper choices are decisions by committee and with the massive growth in the field, many good papers do not appear in the major conferences.

What we need are "editors"! You can help. Write a survey paper, or spend a page in your research papers discussing the important earlier results in a field. Maintain a web page pointing to papers you find interesting. Start a weblog saying what you find interesting--you don't have to post long or often, just to say, hey, this paper is worth looking at. This way people with similar interests to you can find out what at least you think is important. Only by many of us working together can we make the interesting papers stand out.

Friday, November 14, 2003

A Regular Problem

Here's a problem from Janos Simon. For a set A, let
L(A)={x | for some m≥0, xm is in A}
Simon asked, as a homework problem, to show that if A is regular then L(A) is also regular. The standard solutions require an exponential increase in the number of states. Simon asks whether this is needed in general. If A is accepted by an n-state DFA, can one find a poly(n)-state DFA for L(A) or is an exponential state DFA for L(A) required?

As an aside the submission deadline for the Complexity Conference is right around the corner. Get your papers ready.

Wednesday, November 12, 2003

Opportunities at Chicago (Blatant Plug)

[I will do this just once. Feel free to use the comments link to mention your own institution.]

We are seeking theory graduate students and faculty (all ranks) in the Computer Science Department of the University of Chicago. The Toyota Technological Institute, located on the University of Chicago campus, is looking for both faculty and postdoctoral candidates.

The Theory Group has expanded over the past few years and will continue to grow. We have a long tradition of outstanding doctoral students and notable research accomplishments.

Click here for further information.

Tuesday, November 11, 2003

25 Years of Science Times

It happened right after I started high school in suburban New Jersey, the start of the Science Times section in Tuesday's New York Times. The Science Times not only helped get me excited about science but made me feel others could get excited over science as well. I've kept reading it off and on during these past 25 years. The Science Times has reported on a fair amount of research in complexity and theoretical computer science, for a time some joked that a result was not important until it appeared in the New York Times.

Today the New York Times celebrates the 25th Anniversary Issue of the Science Times. It features 25 questions such as Does Science Matter? and What Is the Most Important Problem in Math Today? (Hint: It's not P versus NP).

I'll end this post with a quote from the essay of Alan Lightman:
All of the scientists I've known have at least one quality in common: they do what they do because they love it, and because they cannot imagine doing anything else. In a sense, this is the real reason a scientist does science. Because the scientist must. Such a compulsion is both blessing and burden. A blessing because the creative life, in any endeavor, is a gift filled with beauty and not given to everyone, a burden because the call is unrelenting and can drown out the rest of life.

Sunday, November 09, 2003

CISE Reorganization

The Computer and Information Science and Engineering Directorate of the NSF has completed it reorganization. The CISE web site details the new structure.

CISE now has four divisions. Instead of each division have a large number of specific programs, each division contains a smaller number of clusters covering a broader research area. I'm happy to see "Computational Complexity" specifically mentioned in the Formal and Mathematical Foundations Cluster in the Division of Computing & Communication Foundations. However it shares that cluster with such diverse topics as "computational algorithms for high-end scientific and engineering applications" and "analysis of images, video, and multimedia information." Hopefully funding panels will meet in the more specific areas to avoid trying to compare proposals from vastly different areas of computer science.

Quantum and Biological Computing sit in a different CCF cluster, Emerging Models and Technologies for Computation. This shows NSF's hopes for these new technologies but may also give them a way to phase out these areas if the technologies don't show promise.

Program announcements for the CCF clusters are still under development. The ITR solicitation is still not expected until Thanksgiving. So if you plan a grant proposal this year, you'll still need to wait.

Friday, November 07, 2003

The Web and Research

I received the following email from David Molnar, a Berkeley graduate student. He asks some interesting questions about the use of the web and research. Instead of just answering them immediately, I though I would share his email (with his permission) and let you think about it. Feel free to comment. I'll also address some of his points later in future posts.

Molnar's Letter:

I wanted to follow up on your comment about theoretical computer science and e-mail/IM. Also, use of weblogs.

  1. There's a piece of folklore about how IP=PSPACE was proved pursuant to some race between many researchers carried out over e-mail. I've once or twice found myself in e-mail races with other people on mailing lists when we're responding to something technical. Do you think the day will come when some major theorem is proved in an IM chat room? :)

  2. Weblogs as place for ideas. A friend of mine asked me if I thought researchers in theory would be interested in putting up a web page (actually, we were discussing a wiki) about open problems. My response was pretty much like what you see in Oded Goldreich's FAQ -- "well that's nice, but good research problems are hard to find, and people seem to reserve them for themselves or their close colleagues."

    The point: I find that I'm starting to use my weblog as a place to deposit ideas for problems that I just don't have time to pursue, or aren't in what I think is my main line of research. In the old days they would have gone into a notebook somewhere and no one would have seen them. Now maybe someone will pick them up -- and I think that's great (especially since I usually don't know how to progress). At the same time, I wonder how peoples' concerns about ownership of ideas will affect the way they use weblogs. What do you think?

Wednesday, November 05, 2003

The PCP Theorem

This week I started teaching the PCP theorem: Every language in NP has a proof that can be probabilistically checked with only O(log n) random coins and a constant number of queries. One graduate student asked me which came first, the PCP theorem or the connection to hardness of approximation. That seemed like a strange question; after all why would so much effort be put into reducing parameters like number of random coins or queries unless there was some application behind it. But then again, so much emphasis is now in the literature on these parameters that he probably thought these were always the parameters people cared about the most.

To put the record straight: Feige, Goldwasser, Lovász, Safra and Szegedy made the connection between the interactive proof literature and approximation. Arora and Safra showed the importance of reducing random coins and queries and proved a weaker version of the PCP theorem. The theorem in the above form was proved by Arora, Lund, Motwani, Sudan and Szegedy.

I have been teaching off the notes of Madhu Sudan from a short course he taught on the PCP theorem. The notes don't go through every detail, but they give one of the better expositions of what is still quite a complex proof.

Monday, November 03, 2003

What happened at NEC?

The NEC Research Institute (NECI) died just over a year ago. I didn't feel comfortable talking about it then so let me say a few words now.

I joined NECI in 1999 just after its tenth anniversary. When I joined its mission and focus was basic research in computer science and physics. NECI gave me considerable time and resources to study fundamental questions in computational complexity. It was an exciting place to be.

Soon thereafter some changes were occurring. NEC modified the mission of NECI to focus on producing technologies with basic research secondary. Some researchers (though not us theorists) were encouraged to join "technology groups" to find practical applications of their research. But during this time, the administrators always supported basic research and I never felt uncomfortable doing theory.

But then on November 1, 2002, NECI merged with NEC CCR&L, a more applied research lab in the same building to form NEC Laboratories America. The new mission makes no mention of basic research. The scientists in charge were replaced by engineering/management types. Many of the research scientists, particularly physicists, were let go.

My job was never in immediate danger but NEC was no longer the place for me and so I went on the job market; no one was surprised when I decided to leave.

A corporation like NEC needs to make decisions for the health of the company. I do not fault NEC for the decisions that it made and they gave me a few great years. Still I mourn the NEC Research Institute, quite a special place during its thirteen year run.