Monday, February 08, 2016

The Moral Hazard of Avoiding Complexity Assumptions

Moshe Vardi's CACM editor letter The Moral Hazard of Complexity-Theoretic Assumptions practically begs a response from this blog. I also encourage you to read the discussion between Moshe and Piotr Indyk and a twitter discussion between Moshe and Ryan Williams.

I take issue mostly with the title of Vardi's letter. Unfortunately we don't have the tools to prove strong unconditional lower bounds on solving problems. In computational complexity we rely on hardness assumptions, like P ≠ NP, to show that various problems are difficult to solve. Some of these assumptions, like the strong exponential-time hypothesis, the Unique Games Conjecture, the circuits lower bounds needed for full derandomization, are quite strong and we can't be completely confident that these assumptions are true. Nevertheless computer scientists will not likely disprove these assumptions in the near future so they do point to the extreme hardness of solving problems like getting a better than quadratic lower bound for edit distance or a better than 2-ε approximation for vertex cover.

If you read Vardi's letter, he doesn't disagree with the above paragraph. His piece focuses instead on the press that oversells theory results, claiming the efficiency of Babai's new factoring algorithm or the impossibility of improving edit distance. A science writer friend once told me that scientists always want an article to be fully and technically correct, but he doesn't write for scientists, he writes for the readers who want to be excited by science. Scientists rarely mislead the press, and we shouldn't, but do we really want to force science writers to downplay the story? These stories might not be completely accurate but if we can get the public interested in theoretical computer science we all win. To paraphrase Oscar Wilde, as I tweeted, the only thing worse than the press talking about theoretical computer science results is the press not talking about theoretical computer science results.

Thursday, February 04, 2016

Go Google Go

In 2009 I posted about a surprising new approach that moved computer Go from programs that lose to beginners to where it could beat good amateurs. That approach, now called Monte Carlo Tree Search, involves evaluating a position using random game play and doing a bounded-depth tree search to maximize the evaluation.

Google last week announced AlphaGo, a program that uses ML techniques to optimize MCTS. This program beat the European Go champion five games to none, a huge advance over beating amateurs. In March AlphaGo will play the world Go champion, Lee Sedol, in a five game match. The world will follow the match closely (on YouTube naturally). For now we should keep our expectations in check, Deep Blue failed to beat Kasparov in its first attempt.

Google researchers describe AlphaGo in detail in a readable Nature article. To oversimplify they train deep neural nets to learn two functions, the probability of a win from a given position, and the probability distribution used to choose the next move in the random game play in MCTS. First they train with supervised learning based on historical game data between expert players and then reinforcement learning by basically having the program play itself. AlphaGo uses these functions to guide the Monte Carlo Tree Search.

AlphaGo differs quite a bit from chess algorithms.
  • AlphaGo uses no built-in strategy for Go. The same approach could be used for most other two player games. I guess this approach would fail miserably for Chess but I would love to see it tried.
  • Machine learning has the nice property that you can train offline slowly and then apply the resulting neural nets quickly during gameplay. While we can refine the chess algorithms offline but all the computation generally happens during the game.
  • If a computer chess program makes a surprising move, good or bad, one can work through the code and figure out why the program made that particular move. If AlphaGo makes a surprising move, we'll have no clue why.
  • I wonder if the same applies to human play. A chess player can explain the reasoning behind a particular move. Can Go players do the same or do great Go players rely more on intuition?
Machine learning applications like AlphaGo seemingly tackle difficult computational problems with virtually no built-in domain knowledge. Except for generating the game data for the supervised learning, humans play little role in how AlphaGo decides what to do. ML uses the same techniques to translate languages, to weed out spam, to set the temperature in my house, and in the near future to drive my car. Will they prove our theorems? Time will tell. 

Monday, February 01, 2016

Math questions that come out of the Iowa Caucus

Link for info I refer to here

Jim Gilmore, republican, has 0% of the vote. Does that mean that literally NOBODY voted for him?

Hillary beat Bernie , but BOTH get 21 delegates. I hardly call that a win. I call that a tie.

Why do we refer to Hillary and Bernie by their first names, but most of the republicans by their last name. The only exception is Jeb! who we call Jeb to dist from his brother. Also, I think he wants to play down his relation to his brother. Not that it will matter.

Cruz/Trump/Rubio will get 8,7,6 delegates.(This might change but not by a lot). I'd call that a tie. Later states will be winner-take-all which make no sense in a race with this many people (though it may go down soon-- Huckabee has already suspended his campaign which seems like an odd way of saying I quit). But in winner-take-all states there will be real winners.  Why are there winner-take-all states? It was a deal made so that those states wouldn't move their primaries up.

This is a terrible way to pick a president. I don't mean democracy which is fine, I mean the confusing combination of Caucus's and Primaries, with some states winner-take-all, some by proportion, and Iowa and NH having... more power than they should.  This was NOT a planned system it just evolved that way. But its hard to change.

If you are a registered Republican  but want Hillary to win then do you (1) vote for the republican you like the best, or (2) vote for the Republican that Hillary can most easily beat. The problem with (2) is that you could end up with President Trump.

Stat analysis of polls and looking at past trends have their limits for two reasons:

1) The amount of data is small. The modern primary system has only been in place since 1972.  Some nominations are incumbents which are very different from a free-for-all. The only times both parties had free-for-alls were 1988, 2000, 2008, and 2016.

2) Whatever trends you do find, even if they are long term (e.g.,  the tallest candidate wins) might just change. The old Machine Learning warning: Trends hold until they don't.

Most sciences get BETTER over time. Polling is a mixed bag. On the one hand, using modern technology you can poll more people. On the other hand, people have so many diff ways to contact them that its hard to know what to do. For example, its no longer the case that everyone has a landline.

Is this headline a satire?:here

AI project- write a program that tells political satire from political fact. Might be hard.

My wife pointed out that

Hillary WINS since she didn't lose!

Bernie WINS since an insurgent who TIES the favorite is a win

Cruz WINS since... well, he actually DID win

Trump WINS since his support is mostly real and he didn't collapse. And he was surprisingly gracious in his concession speech. (This is the weakest `he WINS' argument on this list)

Rubio might be the BIG WINNER since he did way better than expected. Thats a rather odd criteria and it makes people want to set their expectations low.

SO--- like a little league game where they all tried hard, THE'RE ALL WINNERS!

 The American Public--- not so much.

Thursday, January 28, 2016

We Still Can't Beat Relativization

As we celebrate our successes in computational complexity here's a sobering fact: We have had no new non-relativizing techniques in the last 25 years.

A little background: In 1975, a few years after Cook established the P v NP problem, Baker, Gill and Solovay created two sets A and B such that every NP machine that could ask questions about A could be simulated by a P machine that could ask questions to A and there is some language accepted by an NP machine that could ask questions to B that cannot be solved by a P machine that could ask questions to B. In other words PA = NPA but PB ≠ NPB.

All the known proof techniques at the time relativized, i.e., if you proved two classes were the same or different, they would be the same of different relative to any set. BGS implied that these techniques could not settle the P v NP problem.

In the 80's and 90's we got very good at creating these relativized worlds and, with a couple of exceptions, we created relativized worlds that make nearly all the open questions of complexity classes between P and PSPACE both true and false.

There were some results in the that were non-relativizing for technical uninteresting reasons. In the late 80s/early 90s we had some progress, results like NP having zero-knowledge proofs, co-NP (and later PSPACE) having interactive proofs and NEXP having (exponential-sized) probabilistically checkable proofs, despite relativized worlds making those statements false. But then it stopped. We had a few more nonrelativizing results but those just used the results above, not any new techniques.

In 1994 I wrote on survey on what relativization meant post-interactive proofs, still mostly up to date. We have seen new barriers put up such as natural proofs and algebrization, but until we can at least get past traditional relativization we just cannot make much more progress with complexity classes.

Sunday, January 24, 2016

When do we stop giving the original reference?

I was preparing  a talk which included my result that there is a sane reduction 4-COL \le 3-COL (the paper is here) when I  realized that if I am going to claim the reduction is by Gasarch then I should find out and credit the person who proved 3-COL NP-complete (I assume that the result 3-COL \le 4-COL is too trivial to have an author).  I could not find the ref on the web (I am sure that one can find it on the web, but a cursory glance did not yield it).  The reference was not in Goldreich's complexity book, nor the CLR Algorithms book. I suspect its not in most recent books.

The reference is Stockmeyer, SIGACT NEWS 1973,

Few modern paper references Cook or Levin's original papers for the Cook-Levin Theorem. I've even heard thats a sign its a crank paper.

Few modern papers reference Ramsey's original paper for Ramsey's Theorem.

But the questions arises--- when is a result such common knowledge that a ref is no longer needed?

A result can also go through a phase where the reference is to a book that contains it, rather than the original paper.

On the one hand, one SHOULD ref the original so that people know who did it and what year it was from. On the other hand there has to be a limit to this or else we would all be refering to Euclid and Pythagoras.

Where does Stockmeyer's result fall? I do not know; however, I've noticed that I didn't reference him, and I will correct that.

Thursday, January 21, 2016

The Growing Academic Divide

The decreases of the past three years bring the number of advertised jobs to a new low, below the level reached after the severe drop between 2007–08 and 2009–10. 
So reads the Report on the Modern Language Association Job Information List. The MLA is the main scholarly organization for the humanities in the US. The bright spot--we aren't Japan.

Meanwhile in computer science we continue to see large enrollment increases in major, minors and students just wanting to take computer science courses. In the upcoming CRA Snowbird Conference "a major focus of the conference will be booming enrollments, with a short plenary followed by parallel sessions devoted to the topic, its various ramifications, and ideas to help you deal with it, including best practices for managing growth."

The 2015 November CRA News had 83 pages of faculty job ads, up from 75 in 2014 and 34 in 2012. This doesn't even count that fact that many departments are looking to hire two, three, four or more positions in CS. It will be an interesting job market this spring.

All of this is driven by jobs. We can't produce enough strong computer scientists to fill industry demand. And it's becoming increasingly hard for a humanities major to get a good first job.

It's nice to be on the side of growth but it's a shame that faculty hiring seems to be a zero-sum game. We need poets as well as nerds. We've help create a world where we have made ourselves indispensable but is this a world we really want to live in?

Monday, January 18, 2016

Is it okay to praise an article or book in an article or book?

I recently had a paper accepted (YEAH!). The referees had some good corrections and one that puzzled me.

you wrote ``our proof is similar to the one in the wonderful book by Wilf on generationg functions [ref]''. You should not call a book wonderful as that is subjective. You can say it's well known.

I asked a pretension of professors about this. Is it okay to praise an article or book? Is it okay to state an opinon? Would the following be acceptable:

1) In Cook's groundbreaking paper SAT was shown to be NP-complete. It IS grounbreaking, so maybe thats okay.

2) Ramsey's paper, while brilliant, is hard for the modern reader to comprehend. Hence we give an exposition. If I am writing an exposition then I might need to say why the original is not good to read so this is informative.

3) Ryan Williams proved an important lower bound in [ref]. Is this okay to write? For most people yes, but NOT if  you are  Ryan Williams. (He never wrote such.)

4) William Gasarch proved an unimportant lower bound in [ref]. Is this okay to write ? Only if you ARE William Gasarch (He never wrote such).

The version that will be in a journal will indeed NOT call Wilf's book wonderful. The version on arXiv which will be far more read (not behind a paywall) will call Wilf's book wonderful.

Thursday, January 14, 2016

A limit on the DFA-CFG divide for finite sets

It is easy to see that for Ln =  {a,b}*a{a,b}2n

There IS a CFG of size O(n)

ANY DFA is of size double-exp-in-n

I was wondering if we can increase this gap. That is, a statment like: For all but a finite number of n there exists a lang Ln such that

There IS a CFG of size O(n)

ANY DFA is of size triple-exp-in-n.

I also looked at finite sets. For COMPLIMENT of { ww : |w|=2n }

There IS a CFG of size O(n)

ANY DFA (in fact any DPDA) is of size double-exp-in-n.

This is stated in my paper here though the real interesting math needed to get it are in here where they get lower bounds on the size of CFG's, generalizing techniques from here.

SO my question still stands- can we get a triple exp separation? I have shown that this CANNOT be achieved with finite sets:

THEOREM: If L is a finite lang then there exists n such that (1) Any CFG in Chomsky Normal Form for L has to be of size \ge log n, AND (2) there IS a DFA of size 2n.

Proof:  Let n be such that the longest string in L if of length n.  In order for a Chomsky Normal Form grammar to generate this string it needs \ge log n nonterminals. Since the lang has at most 2n
strings in it, there is a DFA of size 2n for it.
End of proof.

So I have shown that one approach won't work. I am hoping that YOU know an approach to get
a trip-exp-sep and leave a comment about it.