Sunday, January 05, 2020

The Wikipedia Entry on NP-Intermediary Problems lists one of mine! I'm not bragging about it.

I recently needed to look at what NP problems were possibly intermediary (neither in P nor NP-complete). So I went to Wikipedia and found this.

They had many problems, though some I had never heard of. Those that I had never heard of

should they be on the list?

That is, are they natural? That is hard to define rigorously, but I will take you through my train of thought as I read the first few:

Factoring Integers. Yes, quite possibly intermediary: If  its NPC then PH collapses, and, at least so far, does not seem to be in P.  (the NPC--> PH collapse result: We take

FACT = { (n,x) : n has a nontrivial factor ≤ x }

FACT is clearly in NP:
a complete factorization of n provides evidence that some nontrivial factor is \le x.

FACT is clearly in coNP:
a complete factorization of n provides evidence that no nontrivial factor is \le x

so if FACT is NP-complete then SAT is in coNP.

Factoring is clearly an important and well studied problem. It even has its own Wikipedia entry!

Discrete Log. Similar to Factoring. And it is also an important and well studied problem. It even has its own Wikipedia Entry!

Isomorphism Problems They list Group and Ring isomorphism. They don't list Graph, which is odd. (ADDED LATER- my bad, they do mention Graph Isom in the section on Graph Algorithms) Anyway, if Graph Isom is NPC then PH collapses, and, at least so far, there is no algorithm for Graph Isom in P. (I do not think it is know if Group Isom NPC means PH collapses, or if Ring Isom NPC means PH collapses---if you know of such a proof leave a comment and a pointer to it.)

Graph Isomorphism is a well studied problem and seems important and natural (I don't know if Graph Isomorphism has any real applications they way that factoring and DL do).  It even has its own Wikipedia entry! Group and Ring Isomorphism also seem important and natural. And they have their own Wikipedia entry!

Numbers in Boxes Problem My first reaction-Gee, whats that? For the Factoring, DL, and Isomorphism they did not define the problem-- they gave pointers to the Wikipedia entries on them. For this one there was no Wikipedia entry. There was one reference. I went to it. It was a blog entry of mine! Here it is: here, and to save you time I'll say what it is:

{ (1n,1k) : you can partition 1,...,n into k boxes so that no box has x,y,z with x + y = z }

Is this problem important? Does it exist anywhere outside of my blog entry? Yes--- a special case of it was in Dr. Ecco's Cyperpuzzles by Dennis Shasha (note- Dennis was a classmate of mine in graduate school at Harvard). I think the case was to try to partition {1,...,100} as best you can. Actually I first saw the case of the problem in his book and then generalized it.

The problem is sparse so if it was NP-complete then P = NP, very good evidence that its not NPC. And its been studied for thousands of years, with people looking for poly time algorithms (I think Pythagoras studied it) without success, so its almost surely not in P. OR that last sentence was complete nonsense. Indeed, I don't think anyone has studied the problem computationally, or, for that matter, at all. So the evidence that its not in P is... sparse.

But its worse than that. One could devise MANY sparse problems that are, since spares, likely NOT NPC, and hardly studied, so as-of-now, not in P. Should those count? Only if (a) more people study them so there is an attempt to point to to get it into P, and (b) the problem is natural (which is hard to define).

Note that I can vary the problem: x+2y=z (this relates to lower bounds on VDW numbers)
or any other combination of x,y,z or more that I like.

This raises a question:

When is a problem worthy of being put on lists of problems?

Here are some possibly criteria. One can take ANDS and ORS of them.

1) The problem has a Wikipedia entry. This might fall victim to Goodhearts law: when a measure becomes a target, it ceases to be a measure.  That is, I could make a Wikipedia entry on the Number-in-boxes problem and then say LOOK, its on Wikipedia!

2) More than X people have worked on the problem for some value of X. But here is a reason this might not be a good criteria: look at the problem

{ α : α is a reg expression that allows numbers (so a1000 is fine, makes reg expressions  VERY succint) such that L(α)=Σ* }

This problem looks natural, and was proven by Meyer and Stockmeyer to be EXPSPACE complete.
That is the only paper on this problem, yet the problem really does look natural, and the result is rightly celebrated as a natural problem that is provably not in P.

3) When people in the field look at the problem they say YEAH, thats a good problem.

4) The problem relates to other problems or other fields.

I doubt the Number-in-boxes problem satisfies any of these criteria. The variant with x+2y=z relates to Ramsey Theory. Great.

NOW, back to the list-- I won't go through any more on the list, but I note that for some of them the only reference seems to be a conversation on stack-exchange.  Some of those end up referring to real papers so are more likely natural, but some do not.

Having said that, is there any harm in the list having on it some problems that are not ... worthy? Is that even the right word to use?

Note that I don't have strong opinions on any of these matters, I am just wondering what criteria Wikipedia, and other sources, uses, when they have lists of problems.



  2. I think Wikipedia's criteria mainly consist of whether an editor ever bothered to add a problem to the list. I've taken an interest in related problems that look like good candidates for the list:

    Nash equilibrium computation, which is likely to be harder than winner determination in parity games (which is on the list despite being at severe risk of having a poly-time algorithm, since it's got a quasi-poly-time algorithm). NASH has no known subexponential algorithm and is still has the guarantee of not being NP-hard unless NP equals co-NP.

    Indeed, any seemingly-hard "total search" problem should qualify for the same reason. The wikipedia page on TFNP has a nice explanation (I did not write it!) FACTORING is arguably the most important such problem.

    The necklace-splitting problem is a strong candidate, since it's a total search problem with a wikipedia page and is at least as hard as FACTORING (unlike NASH, which has not been successfully compared with FACTORING).

    Ladner's theorem still wins w.r.t. its criterion for "NP-intermediate" - it just requires P not equals NP, but the above total search problems need NP not equal to co-NP. In turn, the above problems come out ahead of graph isomorphism - as you've noted its NP-hardness merely collapses PH to the second level.

  3. Weighted graph isomorphism is equivalent to matrix permutation problem. Mathematicians have not been able to solve this problem for more than 100 years. (When were matrices invented?)

    It is a fallacy to think that mathematicians were not interested in fast algorithm before the dawn of computers. In fact, fast algorithms were more important before computers since people had to calculate everything on paper. I recall reading somewhere that before Reimann formulated his now famous conjecture, he invented a fast approach for calculating the zeros of the zeta function. Can somebody elaborate on this point? I am not an expert.

  4. Graph isomorphism problem is in the Graph algorithm section