Saturday, November 30, 2002

CAPTCHA


Can you read the word above? Could a computer?

Manuel Blum won his Turing Award in 1995 for his work in computational complexity and its applications to cryptography. The theory of much of modern-day cryptography uses the assumption that certain problems are not easily computable to create unbreakable codes and protocols.

These days Blum is working on another project that also uses the assumption that some problems are hard computationally. The idea is to use problems that humans can solve easier than computers to prevent automated registration, voting, etc. Check out the CAPTCHA project web site.

Wednesday, November 27, 2002

Complexity Deadline

One last reminder that today is the submission deadline for the 2003 Conference on Computational Complexity. Good luck to all submitters.

Have a great Thanksgiving everyone!

Monday, November 25, 2002

Identity-Based Encryption

Dan Boneh gave a talk at Princeton today about some recent developments in cryptography based on algebraic geometry. One of these tools is identity-based encryption which is public-key encryption where the public key is just a user's identity such as their email address.

Dan's group has an implementation of the system for Outlook, Yahoo Mail and some other systems. If you want to be the first on your block using the latest and greatest encryption or just want more information check out the IBE web site.

Personally, I send all my email in cleartext. If anyone goes through the hassle of capturing it they will only discover what a boring person I am.

IAS

An article on the search for a new director of the Institute for Advanced Study discusses the changing nature of the institute.

The faculty member in computer science mentioned in the article is complexity theorist Avi Wigderson. It was a coup for complexity and all of computer science when he was appointed a faculty member in 1999. With a large collection of postdocs, visitors and students he has made the institute quite an exciting place for theoretical computer science and discrete math.

Saturday, November 23, 2002

FOCS and Visas

The FOCS Conference, the major fall theory conference held last week in Vancouver, sounded like a complete success. According to PC Chair Bernard Chazelle there were 320 registrants--quite a healthy number for this conference. Most encouraging was the larger number of students attending as well as a number of strong student papers indicating a solid future for theoretical computer science.

The 320 does not count another 50 "registrants" from Nigeria. They registered with fake credit card numbers in order to obtain letters from the conference organizers to help them obtain visas to go in this case to Canada. Whether they got the visas is unclear and they, of course, never showed up at the conference.

The temptation to help those from Africa is strong, especially since that continent is woefully underrepresented in computer science. However we must as a community be diligent against those who use our conferences as a way to get around standard immigration laws. Whether or not we agree with those laws, if abuses of this nature continue it becomes harder to bring in legitimate scientists, a problem I discussed in my last post.

Friday, November 22, 2002

Coming to America

A recent Washington Post editorial brings up an important issue.

Andris Ambainis was supposed to spend the fall at MSRI in Berkeley but instead is enjoying his native Latvia. When Manindra Agrawal came to Boston last month to talk about his primality algorithm, he was supposed to bring along his student co-authors. Instead he came alone.

Worries about terrorism have caused the US government have made them more cautious about issuing visas and this has slowed down the visa process tremendously. Visa problems have always been a thorn for academics but this fall seems particularly bad.

I understand the need to be careful but when science is hindered by politics nobody is a winner.

Wednesday, November 20, 2002

Where was Cook's Talk?

In 1971, Steve Cook gave his conference presentation that showed that SAT was NP-complete. There it did not immediately stir up much excitement but it is, in retrospect, the single most important conference talk in this history of theoretical computer science. So when and where was this talk?

Steve Cook's paper The Complexity of Theorem Proving Procedures appeared at the Third Annual ACM Symposium on Theory of Computing (STOC) that was held May 3-5, 1971 in Shaker Heights, Ohio, a suburb of Cleveland.

Funda Ergun, a professor at Case Western Reserve, just purchased a house in Shaker Heights and wondered where exactly the conference took place. We got the answer from Bill Rounds, who was one of the local organizers of that conference.

It was (at I think Stouffer's hotel) at the intersection of Warrensville Center Road and Chagrin Boulevard, in the Van Aken center district. The hotel is now gone.

Here is a Mapquest map of that location.

Someday we will organize a P versus NP workshop in that area and make a pilgrimage to this site.

Tuesday, November 19, 2002

Foundations of Complexity
Lesson 8: Efficient Computation

Previous Lesson | Next Lesson

In past lessons, we have studied the computable functions. Computable functions can take an arbitrary amount of time: What good is a program that will eventually give the correct answer but might not finish before the universe collapses?

Somehow we want to limit the amount of time or memory that a computer can use. Just giving a fixed bound does not work well. As technology improves and computers get faster and better, we expect to solve larger and larger problems in a reasonable amount of time. Hartmanis and Stearns, in their seminal paper on computational complexity, turn this around to come up with the right idea: Consider time and memory as functions of the size of the input.

The time a machine M takes on input x is just the number of computation steps that it takes before halting starting with input x. I am being very informal about what a "step" is. In a later lesson we will get into formal definitions of computers and steps but for now just use the idea of implementing one instruction.

The memory or space as we theorists call it is just the number of bits of storage used by M on a given input.

Edmonds gave an algorithm for the matching problem that ran in polynomial time: The number of steps used by the algorithm on a graph of size n is nk for some k. He suggests that polynomial-time captures efficient computation.

We now define our first complexity class P as the set of all languages L for which machine exist that determine whether x is in L and halts in time polynomial in the length of x. The P has many nice properties: A polynomial-time algorithm that uses a polynomial-time subroutine remains in polynomial-time. Also P is robust, that is the class is the same no matter how you formally define your machines.

In these lessons, we will treat P as the class consisting of efficiently computable problems. More classes will come.

Friday, November 15, 2002

It all started with a machine...

Steve Homer and I have finally finished our chapter A Short History of Computational Complexity. It will eventually appear in a book collection on the history of mathematical logic.

You can also see a talk I gave at the 2002 Complexity Conference based on the paper.

Thursday, November 14, 2002

Tic Tac Toe Variation

In my daughter's second grade math homework there was an interesting variation of Tic-Tac-Toe designed to teach addition and subtraction. Take a 3 x 3 grid and randomly give each square a different number between 2 and 18. We have two players X and O. Play goes as follows:
  1. Player X chooses a number from 1 to 9.
  2. Player O chooses a number from 1 to 9 that she had not picked before.
  3. Player O adds that number and the last number picked from X and if that square is on the board and unmarked, that square is marked O.
  4. Player X chooses a number from 1 to 9 that he had not picked before.
  5. Player X adds that number and the last number picked from O and if that square is on the board and unmarked, that square is marked X.
  6. Go to step 2.
Play ends when either X or O has three in a row and is declared a winner or when all the numbers run out and the game is declared a draw.

Here is an example:

12 |  5 | 7
-----------
14 | 11 | 3
-----------
4  | 13 | 9
X: picks 1, O: picks 3 (to make 4), X: 8 (11), O: 4 (12), X: 3 (7), O: 6 (9). At the point the board looks like:
 O |  5 | X
-----------
14 |  X | O
-----------
 O | 13 | O
Defensively X plays 2, Y: 1, X; 1, Y:2 and whatever X plays next Y has a forced win by making 13 or 14.

Despite the simplicity this is quite a challenging game. For every initial configuration, is there always a forced draw like in real Tic-Tac-Toe or do some configurations have a forced win for X or O? How complicated is it to compute an optimal strategy?

My daughter was frustrated at how hard it is to win this game but she shouldn't be ashamed--I couldn't figure out the best strategy either. Amazing what complicated things can come out of a second-grade class.

Tuesday, November 12, 2002

Kolmogorov Complexity Web Site

Can't get enough Kolmogorov Complexity. Check out Marcus Hutter's site on Kolmogorov Complexity and Solomonoff Induction. The site is a bit dated but contains many useful links and information about the Kolmogorov mailing list which still seems quite active.

The Union of Complexity Classes

We often see the intersection of two classes as an interesting class in and of itself. For example factoring is in NP∩co-NP. In some cases you get interesting equalities, like that ZPP is equal to RP∩co-RP. But we rarely see the union of two classes. Every wonder why?

In fact, no complexity class can be the nontrivial union of two other classes. To formalize and prove this statement we need some definitions.

Let A and B be subsets of {0,1}*. We define the join, A⊕B, as the union of {0x | x is in A} and {1y | y is in B}. Given a set C we define the 0-projection of C as {x | 0x is in C} and the 1-projection of C as {y | 1y is in C}. Note that the 0-projection of A⊕B is just A and the 1-projection is just B.

Essentially every complexity class is closed under joins and projections. For example if A and B are in NP then A⊕B is also in NP. The fact that no complexity class is the nontrivial union of other classes follows from the following Lemma.

Lemma: Let E, F and G be classes of languages that are closed under joins and projections and G = EF. Then either G = E or G = F.

Proof: Suppose the lemma is false. Let A be a set in G-E and B be a set in G-F. Let C = A⊕B. We have that C is in G since G is closed under joins. Thus C is in either E or F. Suppose C is in E. Since E is closed under projections, we have A is in E a contradiction. If C is in F then B is in F also a contradiction.

Monday, November 11, 2002

Foundations of Complexity
Lesson 7: The Recursion Theorem

Previous lesson | Next Lesson

Here we are in Lesson 7 and have not yet talked about complexity per se. I felt it important to give some background on computability theory not only for the importance of the results but also to introduce the basic concepts of Turing machines, diagonalization and reducibility. We will start complexity in the next lesson.

Let me end the discussion of computability by one of my favorite theorems. Suppose you wanted to create the ultimate computer virus that attacked any program and made it change its behavior. The recursion theorem states that no matter how powerful the virus, some program will remain unscathed. At first this seems impossible just by considering the function that simulates a program and then adds one to the answer. But this process will not affect the machine that never halts.

Theorem: Let f be any computable function. There is some Turing machine M such that

L(M) = L(f(<M>))

The recursion theorem, sometimes called the fixed-point theorem, has one of the most unintuitive proofs where I cannot explain why it works, only that it does.

Proof: Fix a computable function f. For each machine N, construct a Turing machine <R> that on input x, simulates N(<N>) to produce the description of a machine and simulates that machine on x. Let g(<N>) be the function that outputs <R>. Note that if N(<N>) halts then the programs described by g(<N>) and N(<N>) accept the same language.

Note that g is computable even if N(<N>) does not halt. Let T(x) be the machine that computes f(g(x)). We will let M be the machine described by g(<T>). Then we have that
M accepts input x if and only if
the machine described by g(<T>) accepts input x if and only if
the machine described by T(<T>) accepts input x if and only if
the machine described by f(g(<T>)) accepts input x (since T(x)=f(g(x))) if and only if
the machine described by f(<M>) accepts input x. QED

As an application, consider the function f(x) that outputs the description of a machine that accepts {x}. By the recursion theorem must be some M such that L(M) accepts exactly <M>. As an experiment, pick your favorite programming language and find a program that outputs its own code. By an argument based on the recursion theorem, such a task is always possible but it is trickier than it seems.

This ends the section on computability theory which is an exciting area of research in and of itself. For further reading the book of Homer and Selman goes into these ideas with some more detail and examples. For more advanced concepts I recommend the books of Soare, Odifreddi or Schoenfield.

Friday, November 08, 2002

STACS

The STACS Conference has just posted the list of accepted papers for their 20th conference. STACS alternates between France and Germany (and only some truth to the rumor that it alternates between great food and great organization). The upcoming 2003 conference will be held in Berlin, February 27 to March 1.

I have always considered STACS, the Symposium on Theoretical Aspects of Computer Science, the best venue for computational complexity in Europe. I have attended the conference many times and they consistently have several strong papers in the area as well a good attendance of complexity theorists from both Europe and America. You can see the weight complexity gets on the web page where "Computational and structural complexity" gets the same weight as "Algorithms and data structures, including: parallel and distributed algorithms, computational geometry, cryptography, algorithmic learning theory".

The ICALP conference has a longer history, a larger audience, more traditions and does a better job representing Europe as a whole. But the scope in ICALP is quite large and computational complexity often gets lost in the shuffle.

Wednesday, November 06, 2002

Complexity Class of the Week: SPP, Part II

Previous CCW

Last week we gave the history of the complexity class SPP and described GapP functions. This week we will give a definition of SPP and many of the class' amazing properties.

A language L is in SPP if there is a GapP function f such that

  1. If x is in L then f(x)=1.
  2. If x is not in L then f(x)=0.
That is if x is in L there is one more accepting than rejecting path. If x is not in L there are the same number of each.

If we used #P functions instead of GapP functions we have the definition of UP. SPP contains UP since every #P function is a GapP function. In fact SPP contains FewP and even Few where we don't believe such languages are in UP.

SPP is the smallest Gap-definable class, i.e., the smallest class that can be defined by GapP functions as above. There are a number of common Gap-definable classes, for example from the Zoo: ⊕P, AWPP, C=P, ModP, ModkP, MP, AmpMP, PP, WPP and of course SPP. SPP is contained in all of these classes. AWPP is the smallest classical class known to contain BQP, the class of problems with efficient quantum algorithms, though it is not known if BQP is itself Gap-definable.

SPP is exactly equal to the low sets for GapP, i.e., SPP is exactly the set of oracles A such that for any NP machine M, the number of accepting minus the number of rejecting paths of M^A(x) is still an (unrelativized) GapP function. This means that SPP is low for all of the Gap-definable classes, for example that ⊕PSPP = ⊕P. This also means that SPP is self-low: SPPSPP = SPP which means SPP is closed under union, complement and in fact any Turing-reduction.

Kobler, Schoning and Toran showed that graph automorphism is in SPP and very recently Arvind and Kurur have show that graph isomorphism is in SPP. This means that graph isomorphism sits in and is in fact low for every Gap-definable class.

The decision tree version of SPP is interesting. A function f on n bits is in this class if there is a polynomial g with polylog degree such that f(x)=g(x) on all x in {0,1}*. All such functions have low deterministic decision tree complexity--the first complexity application of a combinatorial lemma of Nisan and Szegedy. Applications of this result include relativized worlds where SPP does not have complete sets or where P = SPP and the polynomial-time hierarchy is infinite.

Monday, November 04, 2002

Foundations of Complexity
Lesson 6: The Halting Problem

Previous Lesson | Next Lesson

Last lesson we learned about using reductions to show problems are hard. Now consider the most famous of undecidable problems, the halting problem:

LH = {<M> | <M> eventually halts with blank tape as input}
We will now show that LH is not computable. We do this by reducing the universal language LU to LH where LU is the set of pairs (<M>,x) such that M(x) accepts.

Given <M> and x, consider the following program:
Replace input with x.
Simulate M on x.
If M(x) accepts then halt.
If M(x) does not accept then go into an infinite loop.

Let us call this program N. Note that M(x) accepts if and only if N halts on blank tape.

Now here is the important point. Consider the function f that given <M> and x, will produce the program N. Even though M(x) and N may not halt the actual procedure that converts <M> and x to N is computable. This is just converting one program to another.

So we have that (<M>,x) is in LU if and only if M(x) accepts if and only if N=f(<M>,x) halts on blank tape if and only if N is in LH. Thus f reduces LU to LH and thus by the Lemma of Lesson 5, we have that LH is not computable.

I consider the noncomputability of the halting problem to be the single most important result in theoretical computer science. There are some programs, of course, that are easy to determine whether or not they will halt. But in general, no matter how smart you are or fast the computers, it is simply impossible to analyze a piece of code and see if it will terminate.

Using similar techniques one can prove a general result known as Rice's Theorem: Every nontrivial property of the computably enumerable languages is undecidable. More formally
Rice's Theorem: Let P be any non-empty proper subset of the computably enumerable languages. Then the language

LP = {<M> | L(M) is in P}
is not computable.

For example the following languages are not computable:

  • {<M> | L(M) is empty}
  • {<M> | L(M) is computable}
  • {<M> | L(M) is finite}

Friday, November 01, 2002

Conferences

November is a month for conference deadlines. The STOC conference has a submission deadline of November 6. STOC and FOCS, which is being held November 16-19, are the two major theoretical computer science conferences.

STOC this year is part of the 2003 Federated Computing Research Conference in San Diego in June. Several other theory conferences are also part of FCRC and many of them have deadlines in November or soon thereafter.

My favorite conference, The IEEE Conference on Computational Complexity, will be held in Denmark in July. Their submissions deadline is November 27.

In computer science in general and theoretical computer science in particular, conferences are the primary outlet for announcement and publication of results. Since computer science is a relatively young discipline, the field changes dramatically year to year and the usual long process of journal publications might often publish outdated work. More mature fields like mathematics and physics use journals as the primary source of publication.

The main disadvantage of the computer science system is that while computer scientists are encouraged to submit their work to refereed journals, many of the important papers in the area never make it that far.

There have been at least two recent major exceptions to this process. Alexander Razborov wrote a paper last spring on lower bounds on quantum communication complexity that would have been the best quantum paper in FOCS if not the best paper. Instead he chose to submit it directly to a journal, Izvestiya of the Russian Academy of Science: Mathematics. The Agrawal-Kayal-Saxena Primality Paper which would easily be the best paper at the upcoming STOC is not being submitted to a conference either but directly to Annals of Mathematics. "Why should I send it to a conference," Manindra Agrawal asks, "when everyone already knows the result?"

Are these two papers a trend? Are conferences less important as papers are easily available online? Or is computer science finally becoming a mature field?