Thursday, April 17, 2014

Should you reveal a P = NP algorithm?

A reader asks
What would you do if you could prove that P=NP with a time-complexity of n2 or better... moreover, would you publish it?  
There are all these statements of the good that could come of it. But how would the government react in its present state? Would it ever see the light of day? How would a person be treated if they just gave it away on the internet? Could a person be labeled a threat to national security for giving it away?
 I consider this a completely hypothetical and unlikely scenario. If you think this applies to you, make sure you truly have a working algorithm. Code it up and mint yourself some bitcoins, but not enough to notice. If you can't use your algorithm to mint a bitcoin, you don't have a working algorithm.

The next step is up to you. I believe that the positives of P v NP, like their use in curing diseases for example, greatly outweigh the negatives. I would first warn the Internet companies (like was done for heartbleed) so they can modify their systems. Then I would just publish the algorithm. Once the genie is out of the bottle everyone can use it and the government wouldn't be able to hide it.

If you can find an algorithm so can others so you should just take the credit or someone else will discover it. I don't see how one can get into trouble for revealing an algorithm you created. But you shouldn't take legal advice from this blog.

Once again though no one will take you seriously unless you really have a working algorithm. If you just tell Google you have an algorithm for NP-complete problem they will just ignore you. If you hand them their private keys then they will listen.

Sunday, April 13, 2014

Factorization in coNP- in other domains?

I had on an exam in my grad complexity course to show that the following set is in coNP

FACT = { (n,m) : there is a factor y of n with 2 \le y \le m }

The answer I was looking for was to write FACTbar (the complement) as

FACTbar = { (n,m) | (\exists p_1,...,p_L) where L \le log n
for all i \le L we have m < p_i \le n and p_i is prime (the p_i are not necc distinct)
n =p_1 p_2 ... p_L
INTUITION: Find the unique factorization and note that the none of the primes are < m
To prove this work you seem to need to use the Unique Factorization theorem and you need
that PRIMES is in NP (the fact that its in P does not help).

A student who I will call Jesse (since that is his name) didn't think to complement the set  so instead he wrote the following CORRECT answer

FACT = { (n,m) | n is NOT PRIME and forall p_1,p_2,...,p_L  where 2\le L\le log n
for all i \le L,  m< p_i \le n-1 , (p_i prime but not necc distinct).
n \ne p_1 p_2 ... p_L
(I doubt this proof that FACT is in coNP is new.)
INTUITION: show that all possible ways to multiply together numbers larger than m do not yield n,
hence n must have a factor \le m.

Here is what strikes me- Jesse's proof does not seem to use Unique Factorization.  Hence it can be used in other domains(?). Even those that do not have Unique Factorization (e.g. Z[\sqrt{-5}]. Let D= Z[\alpha_1,...,\alpha_k] where the alpha_i are algebraic. If n\in D then let N(n) be the absolute value of the sum of the coefficients (we might want to use the product of n with all of its conjugates instead, but lets not for now).

FACT = { (n,m) : n\in D, m\in NATURALS, there is a factor y in D of n with 2 \le N(y) \le m}

Is this in NP? Not obvious (to me) --- how many such y's are there.

Is this the set we care about? That is, if we knew this set is in P would factoring be in P? Not obv (to me).

I suspect FACT is in NP, though perhaps with a diff definition of N( ). What about FACTbar?
I think Jesse's approach works there, though might need  diff bound then log L.

I am (CLEARLY) not an expert here and I suspect a lot of this is known, so my real point is
that a students diff answer then you had in mind can be inspiring. And in fact I am inspired to
read Factorization: Unique and Otherwise by Weintraub which is one of many books I've been
meaning to read for a while.

Wednesday, April 09, 2014

Favorite Theorems: Extended Formulations

The first couple of favorite theorems took place at the beginning of the last decade, but recent years have brought exciting results as well, such as the limitations of using linear programs to solve hard problems.
It all started with a paper by E.R. "Ted" Swart (the Deolalikar of the 80's) that claimed to show P = NP by giving a polynomial-time algorithm for Traveling Salesman based on linear programming. Yannakakis put the nail in that technique by showing that no symmetric linear program formulation can solve the traveling salesman problem. The TSP problem expressed as a linear program has many facets (multi-dimensional faces) but one could possibly have a higher dimensional polytope with a smaller number of facets that could project down to the TSP polytope. Yannakakis showed that these higher dimensional symmetric polytopes must still have many facets by showing an equivalence between the smallest number of facets and the monotone dimension of a slack matrix of the polytope describing the TSP.

Not much happened since the 80's until Fiorini et al. generalized the Yannakakis result to give exponential lower bounds on the number of facets of general non-symmetric extensions of the polytope for traveling salesman. They combine Yannakakis' techniques with some communication lower bounds due to Alexander Razborov. Their approach was inspired by quantum computing ideas but in the end they had a more conventional proof.

Last fall Rothvoß showed even stronger exponential lower bounds for the matching polytope through a "substantial modification" of Razborov's work. Since matching is in P, Rothvoß showed that there are polynomial-time computable problems that can be formulated, but not succinctly formulated, as linear programs.

There has also been recent work showing lower bounds approximating clique via linear programs.

Ankur Moitra gave an excellent talk at the recent Dagstuhl giving an overview of these results and the techniques involved. The next step: Show lower bounds for semidefinite programs.

Monday, April 07, 2014

How free is Randomness?

Matthew Green had a great post on the topic how do you know a random number generator is working.  Gee, I just look at the sequence and see if it LOOKS random. Probably not the best idea. Actually people often found pattersn where there aren't any.

Reminds me of a the following apocryphal story which would have to have taken place before everyone had a computer: A teacher assigns the class a HW assignment to flip a coin 600 times and record H's and T's. He warns them that if they just try to come up with a random sequence without flipping he will know. Sure enough, a few of them had sequences with NO runs of 6 or more H's or T's. He knows they are faked. One of the students fessed up that yes, he just wrote down H's and T's in a way that looked random. But another student claims that he DID flip the coins, but when he saw a sequence of 6 H's in a row he changed it since he THOUGHT the teacher would spot it and THINK it wasn't random.

Is randomness a free resource? Papers in Complexity Theory strive to find good RNG's while papers in Crypto claim they already exist. Who is right?

Faulty RNG's are surely a problem for crypto protocols. But are they a problem for randomized algorithms? People really do use randomized algorithms to test primes. What other algorithms do people in the real world routinely use randomness for? (Not counting crypto protocols).Is it ever a problem?

I believe there are stories where a faulty RNG led to a crypto system being broken.

Are there stories where a faulty RNG led to a randomized algorithm being wrong?

Friday, April 04, 2014

How I learn a result

There is a theorem I want to learn. How do I go about it? (How do YOU go about it?) I give an example here which also leads to  pointers to some mathematics of interest.

A division of a cake between n people is PROPORTIONAL (henceforth prop.)
if everyone thinks they got (IN THEIR OWN MEASURE- AND PEOPLE CAN DIFFER WILDLY) at least 1/n. A division of a cake is ENVY FREE if everyone thinks they got the biggest (or tied) piece.

There are two kinds of protocols- Discrete and Moving Knife. I discuss Discrete here.

I had read several books on fair division (see here for a review) and had some interest in the topic. In particular (1) I knew the (fairly easy) discrete 3-person Envy Free protocol, but did not know (2) the rather complicated discrete 4-person Envy Free Protocol. And I wanted to learn it! How to do I do that (this isn't quite advice on how YOU should do it, but I am curious what you think of what I do AND what you do.)

  1. Find a good source for it. I used the original article from the American Math Monthly. It is NOT always the case that the original source is the best (I discussed this in another blog post.)
  2. Create a REASON to learn it with a deadline. I both agreed to teach it in seminar AND I volunteered to do an HONORS COURSE (interdisciplinary) on Fair Division (nickname: Cake Cutting). The first time I taught the course I didn't get to the theorem (getting my feet wet in both HONORS courses and the material was a limiting factor) but I have in subsequent years.
  3. While reading make HANDWRITTEN notes for the lecture I am going to give. I try  to understand everything that I  write down before  going on, but sometimes, I have to go forward and then backward. I find myself redoing parts of the notes many times. The notes  have many examples, pictures (which is why I don't type them initially), and counterexamples to check that the premises are needed. I check EVERY detail. What I present will  be less detailed.
  4. Type in the notes. (mine for 4-envyfree are here ) For me, somehow, being forced to type it  in I find corrections or better ways of saying or doing things. Also this produces notes for students or others. This MIGHT over time lead to survey articles or even textbooks but I DO NOT think in those terms. The notes that are produced are VERY GOOD FOR THE STUDENTS IN THE CLASS, or THE PEOPLE IN SEMINAR but NOT all that good for anyone else. (This may be why some textbooks are crap--- it is hard to turn class-specific notes into something anyone can use.)
  5. TOSS OUT the handwritten notes. Knowing you will do this makes the typewritten version better. Also its hard to keep track of where one puts handwritten notes. I used to keep a math notebook but its hard to find anything in it.
  6. In subsequent years when I teach the material I take my typewritten notes and make handwritten notes out of them. This force me to refresh on the material and I teach better if I can glance at notes that use large letters and symbols. If I do the course often enough I no longer need to. When I taught the Fair Division course in Fall 2013 I could do this proof off the top of my head.
There are some variants, Pros, and Cons of the above system:

  1.  If the material is easy I may go straight to type and skip handwritten. This was the case for protocols for proportional  cake cutting.
  2. I sometimes translate my notes, NOT to typed notes but to slides. There are some theorems whose ONLY writeup are my slides. I may or may not ever get them into notes  (that's a tautology).  I'll mention two but note that, even more than notes, slides are not really a good read unless you are taking the class. They are (1)  An adaption of Mileti's proof of the infinite Canonical Ramsey Theory to the finite case- it gives pretty good bounds, though not the best known. Very good for teaching, so I may submit to Journal of  Pedagogical Ramsey Theory.  I am sure these slides contain a correct proof since I ran them by Mileti himself., who was surprised and delighted  that someone got around to finitizing his proofs. The slides are here and here.(2) Using linear programming on some cake cutting problems. I doubt this is new, though I could not find it in the literature. I just did it for my class. Slides are here.
  3. If my GOAL is stuff for my class I may end up giving up. For example, I sort-of want to read the LOWER BOUND of Omega(nlogn) on number-of-cuts for n people, on prop. cake cutting due to Jeff Edmonds and Kirk Pruhs (this is Jeff Edmonds paper page). But the proof looks just a bit too hard for my students (recall- they are NOT math people, they are BRIGHT students from across majors). Knowing that I won't get to teach it to them I didn't quite have the motivation to look at it. I probably will someday--- for seminar.
  4. If its not just one paper but an entire area of knowledge I try to find all the papers in the area and make a website out of them. This is easier in a limited area. My Erdos Distance Problem Website has only 10 papers on it (its prob missing a few), where as my Applications of Ramsey Theory to TCS websiteApps  has 63 papers on it (I am sure its missing many). 
  5. One can get carried away with GATHERING papers as opposed to READING them.
What if you want to read a result, you begin, and its rather hard and/or you need a lot more background? There are several competing schools of thought:

  1. If at first you don't succeed, Try Try Again (credited to  William  Edward   Hickson).
  2. If at first you don't succeed, give up, Then Try Try again. Then quit. There's no point in being a damn fool about it  (credited to W.C. Fields)
  3. If at first you don't succeed then destroy all evidence that you tried.
  4. If at first you don't succeed, skydiving is not for you.
The question of when to hold 'em, when to fold 'em, when to walk away, and when to run permeates life and mathematics as well as gambling.

      Tuesday, April 01, 2014

      I am Bill Gasarch

      I have a confession to make. Remember back in 2007 when I retired from the blog for about a year. I didn't actually retire at all but instead took the blog in a new direction by writing under the pseudonym William Ian “Bill” Gasarch, to be more playful and irreverent. All I had to do was be a little off the wall, add a FEW CAPITAL LETTERS and some mispellings and voila, same blog, new blogger. I’m especially proud of the fake conversations “Bill” had with his “relatives”. 

      Now that I have come out, I’d like to thank Samir Khuller to help me set up a website off  Maryland’s CS page. The pictures of Bill are a friend of mine, a web programming consultant. Most of the rest of the links were plagiarized from the far corners of the Internet except for the God Play--wrote that one myself. 

      In 2008, I decided to return to the blog as myself and also continue as Bill. Great fun switching context between the two personalities. I particularly enjoyed doing the typecasts with the two personalities talking to each other.

      I have to admit Dick has done me one better, with his teenage-chess-genius-turned-complexity-theorist alter ego Ken Regan.

      I’m surprised how few people had caught on, after all how many of you have actually met Bill in person? But now that I have a respectable job, I realized it just wasn't right for me to keep fooling the community.

      Thursday, March 27, 2014

      Should We Have a TCS Ethics Board?

      We sometimes hear of the (rare) scientist who fakes data or results of experiments. In theoretical computer science one cannot fake a theorem, particularly an important result that will attract close scrutiny before publication in a conference or journal. But that doesn't mean we don't have academic improprieties from outright plagiarism to heated arguments over who should receive credit for a result.

      If these issues involve submissions to a particular conference, journal or grant they generally get resolved through the program committee chair, editor-in-chief or program officer. But often these problems go beyond a particular venue.

      What if we had a TCS Ethics Board composed of a few of the most trusted members of our community? For example, if two people argue whether or not they proved the same result independently, the board could first try to come to a mutually acceptable resolution and if that fails, make an independent assessment that could be used by PC chairs and EiCs.

      For more egregious cases of intentional plagiarism and/or theft of ideas, the board could write a stern letter that would go the perpetrator's supervisor and possibly recommend banning that person from publishing in various TCS journal and conferences for a specified period of time.

      The vast majority of TCS researchers are quite honest and to a fault share credit for their ideas, but every now and then some researchers, probably not even realizing they are acting unethically, create an atmosphere of distrust with their actions. An ethics board would show that we care about proper academic behavior and giving a confidential forum where people can address their grievances and hopefully resolve issues before, as had happened, driving people out of our field.  

      Sunday, March 23, 2014

      The answer is either 0,1, or on the board

      I have heard (and later told people) that the in a math course if you don't know the answer you should guess either 0 or 1 or something on the board. This works quite often.

      I have heard that in a course on history of theater you should guess either
      the theater burned down  or prostitution.  For example, the first musical
      was The Black Crook and it happened because of a fire (see the pointer).

      In upper level cell biology the guess is If only we could solve the membrane problem.

      In a math talk you can always ask is the converse true? or Didn't Gauss prove that?

      In computer science when someone asks me about a problem I say  Its probably NP-complete.

      In Christian Bible Study a good answer is either Salvation or Jesus. These are referred to as Sunday school answers.

      If you know what the usual things to say in other fields is, please comment.