Sunday, February 16, 2020

Pre-(Publish and Perish)

Guest post by Evangelos Georgiadis

Quite a few posts have recently focused on papers,publications and venues; "optimal" venues for papers under different objective functions,e.g. minimizing carbon footprint while maximizing community building, networking as well as information sharing, see Moshe Vardi.

Here we would like to take a closer look at one of the key assumptions -- the paper. In order to generate a paper, one needs to come up with a result, something novel, fresh or interesting to say. The question that has baffled this author is what represents a conducive or perhaps even optimal setting for generating papers. Since papers come in different flavors ranging from "solid technical papers to risky innovative ones" the settings may vary; but ultimately, what would be interesting to investigate (or for that matter crowdsource) is whether there is a common denominator in terms of setting or environment, a necessary but not sufficient condition (so to speak).

Here are some accounts of others which may be helpful as reference points.

Knuth's papers entitled "Semantics of context free grammar" along with "The analysis of algorithms" represent two instances that suggest research institutes might not provide an optimal environment for idea generation.

As Knuth points out in "Selected Papers on Computer Languages" (Chapter 18, p. 431):
Perhaps new ideas emerge most often from hectic, disorganized activity, when a great many sources of stimulation are present at once -- when numerous deadlines need to be met, and when other miscellaneous activities like child-rearing are also mixed into the agenda.
Knuth goes on to say, that it was challenging to do creative work in office and that finding a few hideaways provided some form of solution -- aka sitting under 'that' oak tree near Lake Lagunita. That said, the inspirational setting for getting into the zone for the aforementioned two papers were provided by (Californian) beaches. Hold that observation. Is this not something we have come across somewhere else ? Fields medalist Stephen Smale in "Chaos: Finding a Horseshoe on the Beaches of Rio" suggests that some of his best work happened at his "beach office". Whether beaches do provide for a good setting remains to be shown; perhaps for very innovative ideas, oceanic freedom is necessary. That said, the author recalls (hopefully accurately enough) an account by the young James H Simons, who attended a conference in Japan in the early days. Instead of choosing a spacious accommodation (which he was able to afford), he restricted himself to the typically confined room type -- not only confined by space, but also pressured by time, young Simons was able to generate an interesting result for that conference. (This probably demonstrates that technical results don't necessarily require 'oceanic freedom'.)

Some meaningful probabilistic advice comes from the fat-tails department, in "The Black Swan" by Nassim Taleb (on page 209) : "Go to parties! If you're a scientist, you will chance upon a remark that might spark a new research. "

Murray Gell-Mann provides an interesting collective account in his Google Tech Talk entitled "On Getting Creative Ideas." He recollects a workshop he attended in 1969 in Aspen that focused on the experience of getting creative ideas, not just among mathematicians and theoretical physicists but also poets and artists. This account seems to neglect the actual setting that might nurture creative thought process, but provides interesting references to people such as Hermann von Helmholtz, who happened to have thought about this topic and partitioned the process in terms of "saturation, incubation and illumination".

For those interested in an account that focuses on the Eureka moments of exclusively mathematicians/theoretical physicists see Jacques Hadamard's book "The Mathematician's Mind". Hadamard iterated on Helmholtz's 3 stage process and it's worth taking a look at what he came up.

At last, what are good venues or workshops for generating papers ? Or let's rephrase that a bit, what type of atmosphere at venues fosters creativity -- what food for thought to provide participants and how to distribute that food for thought over a given day ? Ryan R Williams proposed (as practiced by 34th Bellairs Winter Workshop on Computational Geometry) "... easy problems, informal atmosphere focusing exclusively on thinking about problems in a cycle of down-time where one meets in two intense sessions and have free time otherwise." (This type of setting seems to resonate with the 3 stages of "saturation, incubation and illumination".)

That said, most workshops including the Simons workshops don't seem to follow such a recipe. They are more geared towards the follow-up step, namely, communicating what people have found, rather than collaborating with them to tackle open problems. Perhaps some re-evaluation might be required in how workshops are run.

Thursday, January 23, 2020

The World of Publishing

Bill is out for blogging for a couple of weeks on injured-reserve (he’ll be fine). I put together a quick blog post on what’s happening in the world of publications.

The Trump administration has suggested requiring publishers to make all papers based on US federally-funded publicly available immediately instead of after one year. The Association of American Publishers sent an open letter--how do we maintain the organizations and the people who work there if we give up a major revenue source. The ACM joined the letter which caused quite a backlash forcing the ACM to explain itself, write another letter, and run some webinars about open access the last of which is tomorrow. In the end, this is leading to some good discussions about open access and the financial models of academic societies.

The ACM also has a new policy, three options for what happens when an author changes their name: Maintain separate identities, have the two identities link to each other, or retroactively change the name on all previous papers. I can see good reasons for all three options.

Finally Moshe Vardi writes in his CACM column about the ecological cost of conferences and suggests that conferences allow authors to (video)phone it in. Emmanuel Viola offers his own thoughts. Most Conferences will continue to require authors to show up, with only occasional exceptions as needed, believing these policies will keep their conference healthy.

Personally I believe conferences should exist because researchers want to attend, not because they have to. We still need conferences so our community can get together and I don’t believe we can do that via the Internet no matter how good the VR experience gets. But we can have more videos and less conferences and reduce the costs: time, financial and environmental.

Tuesday, January 14, 2020

Quantum Provers to Infinity and Beyond

The Internets are buzzing about the new paper MIP* = RE by Zhengfeng Ji, Anand Natarajan, Thomas Vidick, John Wright and Henry Yuen. See posts by Scott, Boaz, not to mention a wonderful backstory by Vidick himself and a tweet stream by Yeun. I'm not an expert enough to verify or even try to explain the proof so I'll just give a brief overview of the result.

For those not familiar with the classes, RE (recursively enumerable) is the simplest of all complexity classes, a language is in RE if there is some Turing machine M such that x is in L if and only if M on input x accepts. For x not in L, M on x can reject or run forever. The classic halting problem, the set of descriptions of Turing machines that halt on empty input, is RE-complete. To nitpick the notation, it should have been r.e. and even c.e. (computably enumerable), a more standard notation these days. But given the importance of the result, we can give the authors a pass.

MIP* is the set of things provable to a classically random polynomial-time verifier by two separated provers with an unlimited number of quantumly entangled qubits. Without the quantum entanglement, MIP = NEXP, nondeterministic exponential time, and last year Natarajan and Wright showed that MIP* could do at least exponentially better in their paper, NEEXP in MIP*. NEEXP seems large but still only consists of computable sets. RE gets outside of the computable realm.

I found the first paper more surprising, as it showed that quantum entanglement actually gets more, much more, than classical provers. The second paper does get a much stronger and tight result, and still highly surprising in its own right, as it requires disproving the Connes' embedding conjecture. In the end we may just consider this one result, as the second paper subsumes the first both in theorem and authors.

We didn't award the 2019 theorem of the year to Natarajan and Wright, instead opting for a paper that had more, how should I say this, sensitivity. This new paper is certainly the front runner for the 2020 honors, albeit it is only mid-January.

Monday, January 13, 2020

What would you do if you showed P=NP? I would reread Factor Man by Matt Ginsberg

Lance has often said (and also in this) that if P=NP that would be great for the world: much more efficient ways to build things, science could be done better, etc, and that is much more important than that modern crypto would no longer work. We now have the technology to do private key really well--- like a thumb drive that has a billion bits for 1-time pads.

I agree that the world would be better off in some ways, I wonder how much damage would be done in the transition period from public to private key. Would the world recover enough to reap the benefits of P=NP?

First think of what YOU would do if you showed P=NP (and lets assume your algorithm is either reasonable or could be made reasonable with some time and effort).

The novel Factor Man  is about what someone who has solved P=NP does. I won't tell you how it goes, but they deal with the issue intelligently. So if I solved P=NP then I would first re-read it, and think through if I would do that, or modify what is done, or what.  Its a good start.

I reviewed the book in SIGACT News or you can read my review here

On a slightly diff note, here is the latest argument I've heard for why P=NP:

Planar 2-coloring is in P

Planar 4-coloring is in P

So

Planar 3-coloring should be in P.

This was said by a very good math/cs ugrad at UMCP. I do not know if he was kidding.


Wednesday, January 08, 2020

Silicon Valley Ethics

Spoiler Alert: This post has details from the final episodes of the HBO television series Silicon Valley

A few times I've gotten emails from people claiming they have shown P = NP and asking whether they should keep their algorithm a secret to protect the cryptography out there. My typical response is that they should use their algorithm to mine a few bitcoins and then get back to me.

The fictional characters of Pied Piper faced this dilemma when they AI they created "developed a general solution to discrete log in polynomial time" with some nice complexity class diagrams in the background.


Pied Piper was about to roll out its new internet, a distributed network that communicated between cell phones based on a compression algorithm developed by Pied Piper's CEO. Rolling out the network would reveal even more advanced compression based on breaking discrete log. "If we cancel it or shut it down, then others will try to copy or reverse engineer everything that we've built ... Our launch has to fail, publicly and spectacularly."

But here comes the P v NP dilemma: "And what about all the other stuff we're gonna do? I mean, give internet to underserved communities, students in the homework gap, refugees, genomic research. Pied Piper can help scientists cure cancer."

I'd take broken encryption over cancer any day. You can still do encryption even if P = NP, one-time pads distributed via USB drives or quantum. And cancer sucks.

They should have mined a few bitcoins.

Sunday, January 05, 2020

The Wikipedia Entry on NP-Intermediary Problems lists one of mine! I'm not bragging about it.

I recently needed to look at what NP problems were possibly intermediary (neither in P nor NP-complete). So I went to Wikipedia and found this.

They had many problems, though some I had never heard of. Those that I had never heard of

should they be on the list?

That is, are they natural? That is hard to define rigorously, but I will take you through my train of thought as I read the first few:

Factoring Integers. Yes, quite possibly intermediary: If  its NPC then PH collapses, and, at least so far, does not seem to be in P.  (the NPC--> PH collapse result: We take

FACT = { (n,x) : n has a nontrivial factor ≤ x }

FACT is clearly in NP:
a complete factorization of n provides evidence that some nontrivial factor is \le x.

FACT is clearly in coNP:
a complete factorization of n provides evidence that no nontrivial factor is \le x

so if FACT is NP-complete then SAT is in coNP.

Factoring is clearly an important and well studied problem. It even has its own Wikipedia entry!

Discrete Log. Similar to Factoring. And it is also an important and well studied problem. It even has its own Wikipedia Entry!

Isomorphism Problems They list Group and Ring isomorphism. They don't list Graph, which is odd. (ADDED LATER- my bad, they do mention Graph Isom in the section on Graph Algorithms) Anyway, if Graph Isom is NPC then PH collapses, and, at least so far, there is no algorithm for Graph Isom in P. (I do not think it is know if Group Isom NPC means PH collapses, or if Ring Isom NPC means PH collapses---if you know of such a proof leave a comment and a pointer to it.)

Graph Isomorphism is a well studied problem and seems important and natural (I don't know if Graph Isomorphism has any real applications they way that factoring and DL do).  It even has its own Wikipedia entry! Group and Ring Isomorphism also seem important and natural. And they have their own Wikipedia entry!

Numbers in Boxes Problem My first reaction-Gee, whats that? For the Factoring, DL, and Isomorphism they did not define the problem-- they gave pointers to the Wikipedia entries on them. For this one there was no Wikipedia entry. There was one reference. I went to it. It was a blog entry of mine! Here it is: here, and to save you time I'll say what it is:

{ (1n,1k) : you can partition 1,...,n into k boxes so that no box has x,y,z with x + y = z }

Is this problem important? Does it exist anywhere outside of my blog entry? Yes--- a special case of it was in Dr. Ecco's Cyperpuzzles by Dennis Shasha (note- Dennis was a classmate of mine in graduate school at Harvard). I think the case was to try to partition {1,...,100} as best you can. Actually I first saw the case of the problem in his book and then generalized it.

The problem is sparse so if it was NP-complete then P = NP, very good evidence that its not NPC. And its been studied for thousands of years, with people looking for poly time algorithms (I think Pythagoras studied it) without success, so its almost surely not in P. OR that last sentence was complete nonsense. Indeed, I don't think anyone has studied the problem computationally, or, for that matter, at all. So the evidence that its not in P is... sparse.

But its worse than that. One could devise MANY sparse problems that are, since spares, likely NOT NPC, and hardly studied, so as-of-now, not in P. Should those count? Only if (a) more people study them so there is an attempt to point to to get it into P, and (b) the problem is natural (which is hard to define).

Note that I can vary the problem: x+2y=z (this relates to lower bounds on VDW numbers)
or any other combination of x,y,z or more that I like.



This raises a question:

When is a problem worthy of being put on lists of problems?

Here are some possibly criteria. One can take ANDS and ORS of them.

1) The problem has a Wikipedia entry. This might fall victim to Goodhearts law: when a measure becomes a target, it ceases to be a measure.  That is, I could make a Wikipedia entry on the Number-in-boxes problem and then say LOOK, its on Wikipedia!

2) More than X people have worked on the problem for some value of X. But here is a reason this might not be a good criteria: look at the problem

{ α : α is a reg expression that allows numbers (so a1000 is fine, makes reg expressions  VERY succint) such that L(α)=Σ* }

This problem looks natural, and was proven by Meyer and Stockmeyer to be EXPSPACE complete.
That is the only paper on this problem, yet the problem really does look natural, and the result is rightly celebrated as a natural problem that is provably not in P.

3) When people in the field look at the problem they say YEAH, thats a good problem.

4) The problem relates to other problems or other fields.

I doubt the Number-in-boxes problem satisfies any of these criteria. The variant with x+2y=z relates to Ramsey Theory. Great.

NOW, back to the list-- I won't go through any more on the list, but I note that for some of them the only reference seems to be a conversation on stack-exchange.  Some of those end up referring to real papers so are more likely natural, but some do not.

Having said that, is there any harm in the list having on it some problems that are not ... worthy? Is that even the right word to use?

Note that I don't have strong opinions on any of these matters, I am just wondering what criteria Wikipedia, and other sources, uses, when they have lists of problems.