Computational Complexity: December 2010

Wednesday, December 29, 2010

Complexity Year in Review 2010

Complexity Theorem of the year goes to Ryan Williams for his exciting separation of NEXP from ACC⁰. The runner up is Arora, Barak and Steurer for their algorithm for unique games. Also some great progress on some of Bill's favorite questions including Arithmetic Progressions and the Erdos Distance Problem.

None of these papers got mentioned in the New York Times so the most notable paper of the year goes to Deolalikar's P ≠ NP. Many of you got upset that I didn't give this paper the respect it didn't deserve. I did appreciate the publicity the paper generated for our great open problem but the status of the P versus NP question remains: still open.

Last year we highlighted several new blogs. This year the trend is reversing as several theory bloggers have slowed down or stopped blogging. A few of our commentors got very ugly on our blog this year and finally we have given in to comment moderation, though we rarely block.

But social networking in the theory community continues on in other ways highlighted by the Theoretical Computer Science Q&A site. The SIGACT Facebook and Twitter pages have well over a hundred followers each.

The jury is still out on how a near complete change of NSF personnel and the fall elections will affect funding for theoretical computer science. We can always hope.

In this year I started a discussion on remaking STOC. The most popular thing I ever wrote is now this tweet. And don't forget my daughter Molly and her friend Danielle as NP and P.

Gone but not forgotten: Martin Gardner, Joseph Kruskal, Avner Magen, Benoît Mandelbrot, Robin Milner, Partha Niyogi, Sam Roweis and Numb3rs.

Thanks much to our guest posters: Daniel Apon, Paul Beame, Rance Cleveland, Ben Fulton, Josh Grochow, M.T. Hajiaghayi, Bernhard Haeupler, Nicole Immorlica, Subrahmanyam Kalyanasundaram, Clyde Kruskal, Michael Mitzenmacher, Rahul Santhanam, Aaron Sterling, Richard Taylor and Vijay Vazirani. We also thank guest photographer Evan Golub.

Looking forward to 2011 with the big FCRC meeting, learning the location for the Simons Institute for the Theory of Computing and just maybe I'll finish writing my P versus NP book (a bit more than half finished).

Wednesday, December 22, 2010

America's Most Important Algorithm

Yesterday the Census Bureau announced the new apportionment of the 435 representatives to states based on the 2010 census. Illinois lost one representative. Texas gains four. Not only do these affect the makeup of the House of Representatives but also the Electoral College that chooses the president.

Since 1940 the apportionment is not done by a specific formula but by an algorithm.

Input: Pop, a population array for the 50 states.
Output: Rep, a representatives array for the 50 states.
Let Rep[i] = 1 for each state i.
For j = 51 to 435

Let i = arg max Pop[i]/sqrt(Rep[i]*(Rep[i]+1))
Rep[i] = Rep[i]+1

This algorithm, called the Huntington-Hill method or the Method of Equal Proportions, minimizes the relative difference between sizes of congressional districts.

Check out the Census Bureau video The Amazing Apportionment Machine.

Implemented naively the running time is O(rn) for n the number of states and r the number of representatives. I'll leave it to you readers to find better implementations. Can I compute how many representatives New Jersey gets from Pop in o(n) space?

Of course with n=50 and r=435 the numbers don't get big enough to cause any problem at all for today's computers. I wonder how long it took in 1940?

Monday, December 20, 2010

BREAKTHROUGH in algorithms: Improved algorithm for Metric TSP!!!!!!!!

BREAKTHROUGH in Algorithms: Improved Algorithm for Metric TSP,

(Guest Blog by Mohammad Hajiaghayi)

We all recently heard about the breakthrough complexity result by Ryan Williams on non-uniform circuit lower bounds. Here is a breakthrough from algorithms side.

All of us may heard about the Christofides' algorithm from 1976 for metric Traveling Salesman Problem (TSP), in which, given an undirected graph G with nonnegative edge weights on the edges satisfying the triangle inequality, the goal is find a shortest possible tour that visits each city once. The algorithm and its analysis is very simple. Create the minimum spanning tree (MST) T of G, find a minimum weight perfect matching M (T-join) in the complete graph over the vertices with odd degree in T, combine the edges of M and T to form an Eulerian circuit and shortcut the circuit by skipping visited nodes. This simple algorithm is a 3/2-approximation algorithm for metric TSP since both the weight of T and twice the weight of M are lowerbounds for the optimum. You can write a natural LP known as Held-Karp relaxation for the problem, for which one can show an integrality gap of at most 3/2 essentially via Christofides' algorithm. For lowerbounds, we know that the integrality gap is at least 4/3 for this LP. (ADDED BY BILL: According to Wikipedia it is known that you can never get an approx better than 220/219=1.00456... times opt, unless P = NP. ADDED LATER: Here is the link to the Wikipedia post and here is the paper by Papadrimtriou and Vempala that contains the result. The post lead me to find it. )

Though Christofides' algorithm seems very simple and easy to improve, so far there has been no progress in this regard despite consistent efforts from 1976. Today I'm happy to announce a breakthrough for this problem by Shayan Oveis-Gharan, Amin Saberi and Mohit Singh.

They obtain a (3/2-c)-approximation algorithm, for some positive constant c, for the graphical TSP where the metric is the shortest path distance in an unweighted graph (a very natural case considered by many so far). Their approach is natural also, they find the tour by sampling a random spanning tree from a maximum entropy distribution defined by the linear programming relaxation of the problem (similar to the approach of Asadpour, Goemans, Madry, Oveis-Gharan, and Saberi from SODA'10 on improved algorithms for asymmetric TSP). Since again the cost of this tree is upper bounded by the optimum solution, it suffices to show that the cost of its Eulerian augmentation (or T-join) is strictly less than half of the optimum. Of course the analysis is not simple at all. According to the authors:

``The first ingredient of the proof is a deeper study of random spanning trees and their corresponding generating functions. We build on a recent and very interesting study of real-stable polynomials and their applications in discrete probability. In particular, we use these techniques to bound the probability of events defined on the parity of the number of edges chosen from a certain subset to be in the random spanning tree. This is crucial for bounding the cost of the T-join.

The other essential ingredient of our proof is a more detailed analysis of the near-minimum cuts of the LP solution. We study the cross graph of (1+d)-near minimum cuts for some d < 1/100 and observe several interesting properties. In particular, we show that the cross graph corresponding to these cuts has a structure very similar to cactuses. Our analysis generalizes to all graphs and it could be of independent interest.''

Amin told me that they should post the paper in arxiv by the end of this month.

Thursday, December 16, 2010

Low, Superlow, supersuperlow sets, and Paywalls

Recall the following:

If A is a set then A' (pronounced 'A jump') is the halting set relative to A. Formally it is:
{ e | M_e^A(e) converges }
A set A is low if A' &le_T HALT.
A set A is superlow if A' &le_tt HALT.

There is a well known construction of an undecidable low c.e. (What I call c.e., computably enumerable, used to be called r.e., recursively enumerable. See Soare's article for why the change makes sense.) This is one way to get an intermediary Turing Degree. However, it turns out that the set is not just low, its superlow. Consider the following definition of supersuperlow which I made up:

A set A is supersuperlow if A' &le_btt HALT.

I wondered if there exists an undecidable supersuperlow set. I asked four prominent computability theorists (spellcheck wanted me to write computable theorists). They all (1) thought it was a good question, (2) thought the answer was NO and KNOWN, and (3) didn't have a proof or reference. I then asked Carl Jockusch and he (1) thought it was a good question, (2) knew the answer was NO and KNOWN, and (3) had both a proof and a reference.

Since they all thought it was a good question, and because papers that contain the needed results are behind paywalls or unpublished, and hence lost to humanity forever, I did an exposition, which you can get free online here where I present the following:

The classic proof that there exists an undecidable c.e. low set.
Comments on this classic proof that show how it really resulted in an undecidable c.e. superlow set.
A complete unpublished proof that if A is supersuperlow then A is decidable.

Should I add more to it and make it a real survey for a real journal?

PRO: The referees comments may help improve it.
PRO: If the journal is free online then it will be better archived.
PRO: I will get to learn all of this material.
PRO: A paper on my resume.
CON: Referees can be a pain in the neck.
CON: Math arXiv is just as good, perhaps better, than a journal for archiving (I will certainly polish it a bit and submit it to math arXiv at some point)
CON: I will have to learn all this stuff. Do I care beyond what I have already found out? I stopped doing recursion theory back when it was still called recursion theory. Now I'd rather spend my time and energy on Ramsey Theory and other things. (I could, of course, do a survey of Computable Ramsey Theory. Sample theorems by Carl Jockusch: (1) If COL is a COMPUTABLE 2-coloring of the edges of the complete graph on N then there exists a &Pi₂ homogeneous set. (2) There exists a COMPUTABLE 2-coloring of the edges of the complete graph on N such that no homogeneous set is &Sigma₂. There is one problem with that: I already did! Its part of my survey of recursive combinatorics.
CAVEAT: Is having a survey on my resume that valuable? I ask this non-rhetorically which is why this is a CAVEAT rather than a CON.

Monday, December 13, 2010

Math- Old School

In the last month we have reported on NEW RESULTS by Williams, Katz and Guth, Sanders, and Pinkerton and Setra. For a change of pace lets look at some really OLD math from a really OLD book- the Bible. (NOTE- this has nothing to do with whether the Bible is true, just that its old.)

In Genesis 18 God wants to destroy Sodom and Gomorrah. Abraham wants to argue against this. Here is a paraphrase using modern terms.

GOD: If there are 50 righteous people in Sodom then I will spare the city.

ABRAHAM: What if there are 45? 45 is pretty close to 50.

GOD: Okay. If there are 45 righteous people then I will spare the city.

ABRAHAM: What if there are 40? 40 is pretty close to 45.

GOD: Okay. If there are 40 righteous people then I will spare the city.

ABRAHAM: You know, uh, 30 is pretty close to 40.

GOD: Okay. If there are 30 righteous people then I will spare the city.

ABRAHAM: 20 is only 10 less than 30 so how about....

GOD: Okay. If there are 20 righteous people then I will spare the city.

ABRAHAM: 10 is only 10 less than 20 so how about....

GOD: Okay. If there are 10 righteous people then I will spare the city.

ABRAHAM: Good. (He stops bargaining. Perhaps he shouldn't have--- They couldn't find 10 and the city was destroyed.)

I think Abraham should have used smaller increments as he went down since going from 20 to 10 is cutting it in half which sounds like a lot. But who am I to argue--- he got God down to 10 which is impressive even though it didn't work.

This reminds me of two paradoxes:

The Small Number Paradox. All numbers are small. Proof by induction. Clearly 0 is small. If n is small then adding 1 can't make it much bigger, so n+1 is small. Hence all numbers are small.
The Sorties Paradox also called The Heap Paradox. 1,000,000 grains of sand makes a heap. If n grains of sand make a heap then so do n-1 grains of sand. Hence 0 grains of sand make a heap.

Is the passage in Genesis 18 really math? In a very primitive form I would say yes. Is there any older source for anything that resembles math?

Thursday, December 09, 2010

46 free lunches!

(CONGRADS to all the new ACM fellows. Among them are theorists Jennifer Chayes, Anne Condon, Phil Klein, S. Muthu, and Dan Spielman.)

In my post about mentoring High School Students I noted that for every $1000 my HS mentees win in math and science research competitions I get a free lunch. Two of my students, James Pinkerton and Rafael Setra entered the Team Siemens Competition and won $23,000 each! (The article says $20,000 but James and Rafael tell me it's $23,000.) Looks like I will be eating well for a while. If you visit UMCP and they are free for lunch with us then you can have a free lunch from them.

Here is the announcement of the winners. They came in third place in the team category.
Their project was on DUPLICATOR SPOILER games. (Like most games defined in math papers this game is not fun.) Recall that these games provide a way to prove certain properties are not expressible in certain languages (e.g., well-foundness for linear orderings is not first-order expressible). These games go for a FINITE number of moves that is specified before the game begins. They looked at these games when the number of moves can be an ordinal. Here is how it works: (1) at the beginning of the game there is a counter that has in the ordinal &alpha, and (2) after SPOILER makes his move he decrments the counter to some ordinal &beta < &alpha of his choice. Here is one result: for any ordinal &alpha there exists two linear orderings L₁ and L₂ such that if the game is played with these two orderings and both DUPLICATOR and SPOILER play perfectly then SPOILER wins the game in exactly &alpha moves. This is not just an existence proof- they actually say what the orderings are. They are natural in that they were not constructed just for the purpose of having this property.
Here is their paper.
The money actually goes towards tuition and other college-related expenses.
What is the secret of mine or their success? There are two things that are needed here (this applies to ugrad projects, Masters Thesis, PhD thesis, and, if properly generalized, to life):
1. Student that are hard working, intelligent, and care about the material. (James actually thinks Dup-SPOILER games are fun!)
2. A project that is doable.
Note that if a project doesn't go very far it might be that the PROJECT wasn't good or that the STUDENTS weren't good. This happens quite a bit, though ALWAYS something can be salvaged for a high school project.
I usually get to re-use projects since some students don't do a good job or give up. This was the FIRST TIME I had done the DUP-SPOILER GAMES WITH ORDINAL NUMBER OF MOVES project. Now, alas, I can't use it again.

Monday, December 06, 2010

Do Uniform Lower Bounds Matter?

From Ryan Willams' paper:

Non-uniform lower bounds establish impossibility results for computation in the physical world: it could be that P ≠ NP, yet NP-complete problems can still be efﬁciently solved using a “bloated” program with sufficiently many lines of code. Non-uniform circuit size lower bounds for NP would rule out this possibility.

The class P contains a language L where have a single fixed program that efficiently solves L for all inputs at all lengths. L is in P/poly if for every length n there is a program of size polynomial in n that efficiently solves L for all inputs of length n. The programs for two different lengths may have no relation to each other.

Karp and Lipton showed that if NP is in P/poly then the polynomial-time hierarchy collapses so we generally don't believe that NP is in P/poly. Suppose though we lived in a world where NP is in P/poly but still P ≠ NP.

In one argument, even though we study complexity in asymptotics we generally live at one input length, the size of the problems we want to solve. Once we find the program that works at this length then P = NP for us, even though P ≠ NP in general.

But the only constant over time is time itself. An hour now is the same as an hour in 1971, but the amount we can compute in that hour has grown dramatically. We don't care about fixed problem sizes, rather we care about the largest problem we can solve in a given amount of time. As our technology gets better, those input sizes also grow. For problems in P, like linear programming, the algorithms we have for smaller inputs also work on larger inputs by just increasing the computation time. However, if NP is in P/poly the code for NP-complete problems that worked a few years ago may fail miserably now as we may have to find completely different code for the larger inputs we care about now. If P ≠ NP we can't have an efficient process to find that code even though we know it exists.

Thursday, December 02, 2010

A BREAKTHROUGH result on density and 3-AP's

We use the following terminology: [n] means the set {1,...,n}. k-AP means an arithmetic progression of length k. A 3-free set is one with no 3-AP.s in it. We use the following conventions: (1)all inequalities have a big-O or big-&Omega which we do not include, and (2) we have the pointers to the papers be at the authors name when possible.

The following result was posted Oct 30, 2010 by Sanders.

For large n, for all A &sube [n], |A| &ge n(log log n)⁵/log n then A has a 3-AP.

I give the history and some misc information about the result.

In 1927 van der Waerden published the following theorem which is now known as van der Waerden's theorem:
For all k, for all c, there exists W=W(k,c) such that for all c-colorings of [W] there exists a monochromatic k-AP.
Online sources: (1) Wikipedia, (2)blog by Gilish Varma, and (3) preprint of a book by Gasarch et al .The upper bounds on W(k,c) from the original proof and the links (which are essentially the same) are BIG. In particular they are not primitive recursive. (NOTE- if you know of an online source for van der warden's original article and/or a translation into English, let me know.)
Erdos and Turan(you'll need to scroll down some) wanted an alternative proof of this theorem with smaller bounds. To this end they made the following conjectures:
1. (ER1) For all k, for all &epsilon for large enough n, for all A &sube {1,...,n} such that |A| &ge &epsilon n, A has a k-AP.
2. (ER2) Let A be a set of natural numbers. If &Sigma_{x ∈ A} 1/x diverges then A has arbitrarily long arithmetic sequences.
Both of these imply VDW's theorem. The hope was that they would provide new proofs with better bounds.
Roth showed ER1 for k=3. In fact, he showed the following: for large enough n, for all A &sube {1,...,n} such that |A| &ge n/log log n, A has a 3-AP. The proof used non-combinatorial methods. For an online proofs see either Terry Tao's blog or Alex Iosevich's notes.
Szemeredi proved ER1 for k=4 and then for general k. The proof for general k used VDW's theorem and hence did not provide better bounds for W(k,c). Szmeredi's Regularity Lemma which he developed to prove it, has found many many applications. His proof for k=4 was purely combinatorial. A scaled down version of it for k=3 was presented in Graham-Rothchild-Spencer in about 2 pages. For an online exposition of this see pages 121-130 of Gasarch et al. For an online exposition of the proof for the general case see this exposition by Tao.
Furstenberg obtained a different proof of ER1 using ergodic theory. This proof was nonconstructive and hence yielded no bounds on the VDW numbers. His technique was later used by Bergelson and Leibman to prove the poly VDW theorem and Poly HJ Theorem. Later Walters found a combinatorial proofs for both. See also the exposition by Gasarch et al.
Shelah obtained primitive recursive bounds on the VDW numbers. His proof was purely combinatorial.
Gowers proved ER1 in a way that lead to much better bounds on the VDW numbers.
A purely combinatorial proof of the Density Hales-Jewitt theorem was done by the polymath group. This is likely the easiest proof of ER1.
Green and Tao showed that the set of primes have arb large arithmetic progressions. Aside from this, there seems to have been little progress on ER2. Some people think ER2 should replace Poincare's conjectures on the list of Millennium Prizes.

That would seem to be the end of the story for now. ER1 was proven in a way that lead to better bounds on VDW numbers. But notice Roth's theorem. The condition on the density of a set that lead to a 3-AP are better than ER1. There have been improvements on Roth's theorem.

Szemeredi and Heath-Brown obtained the following result independently: There exists a constant d such that, for large enough n, for all A &sube [n], |A| &ge n/(log n)^d, A has a 3-AP. There is a nice exposition by Green (NOTE- if you know of an online source for the original papers let me know.)
Bourgain obtained the following result: for large enough n, for all A &sube [n], |A| &ge n((log log n)/log n)^0.5 A has a 3-AP.
Bourgain improved his result to: |A| &ge n/(log n)^2/3-o(1)
Sanders improved Bourgain's result to |A| &ge n/(log n)^3/4-o(1).
Sanders NEW result is that we only need |A| &ge n(log log n)⁵/log n.

OKAY- why is this important? It breaks a barrier that seemed very stubborn and it also leads to better bounds on W(3,c):

Let f be a function such that for large enough n, for all A &sube [n], |A| &ge f(n), A has a 3-AP. Assume [n] is c-colored. Some color must appear n/c times. If n/c &ge f(n) then there is a 3-AP. Hence we need n/f(n) &ge c. So if n/f(n) &ge c then W(3,c) &le n.

Roth's result yields that W(3,c) &le 2^{2^O(c)}. ( Graham and Solymosi obtained this result purely combinatorially.)
Szemeredi's and Heath-Brown result yields W(3,c) &le 2^{c^O(1)}.
Bourgain's first result yields W(3,c) &le 2^{c²log c}.
Bourgain's second result improves the exponent of c from 2 to 3/2-o(1)
Sanders's result results improve the exponent of c from 3/2-o(1) to 4/3-o(1).
Sanders NEW RESULT yields W(3,c) &le 2^{c(log c)⁵}.

The theorems above are of the form: If A is big enough then A has a 3-AP. What about the other side of the ledger: How large can 3-free sets be? There has been an empirical study for small n by Gasarch et al., an unpolished (not yet submitted) survey by Gasarch et al.. (ADDED LATER- SOMEONE POSTED A WEBSITE WHERE YOU CAN FIND ACTUAL LARGE 3-FREE SETS. here it is.) Behrend had the largest 3-free sets for about 50 years before Elkin obtained an improvement. A shorter proof of Elkin's result was given by by Green and Wolf. The current state of affairs on this was summarized in a nice blog entry by Gil Kalai.

Do these result help find lower bounds on W(3,c)? YES! Chandra-Furst-Lipton in their paper on multiparty protocols (of all things!) showed that if A &sube [n] and A is k-free then there is a coloring of [n] with nlog n/|A| colors without any k-AP's. Its an easy prob argument.

SO- how far are we from showing ER2 in the case of k=3? If Sanders NEW result could be improved to |A| &ge n/(log n)^1+&delta or even |A| &ge n/log n (log log n)^1+&delta or any function such that if you divide by n² the series converges then ER2 for k=3 would be proven. How far away is this result? A few weeks ago I would have told you that getting NEXP not in ACC⁰ was not in likely for at least 20 years. Ryan Williams proved me WRONG! Two different knowledgeable sources told me personally that the Erdos Distance Problem had reached an impasse at n^0.864. Guth and Katz proved them WRONG and me WRONG for believing them. Hence, for this one, I do not venture a guess.

What about k-AP's and k-free sets?

Gowers has shown that there is a constant c such that if |A| &ge n/(log log n)^c then A has a 4-AP. Gowers also proved that a function c(k) such that if |A| &ge n/(log log n)^c(k) then A has a k-AP. In the paper cited he takes c(k) to be 2^{-2^k+9}; however, I have heard this has been improved.
Green and Tao. have shown that there is a constant c such that if |A| &ge n/(log n)^c then A has a 4-AP. Tao's website also indicates they have improved this and a paper is in preparation.
Laba and Lacey have a construction of k-free sets which it turns out was already known.

This post has 35 distinct links (UPDATE- now its 37) and mentioned four Field medalists: Roth (1958), Bourgain (1994), Gowers (1998), and Tao (2006). These are probably both complexityblog records.

For more information on this NEW result see also a nice blog posting by Gil Kalai.