Sunday, February 16, 2020

Pre-(Publish and Perish)

Guest post by Evangelos Georgiadis

Quite a few posts have recently focused on papers,publications and venues; "optimal" venues for papers under different objective functions,e.g. minimizing carbon footprint while maximizing community building, networking as well as information sharing, see Moshe Vardi.

Here we would like to take a closer look at one of the key assumptions -- the paper. In order to generate a paper, one needs to come up with a result, something novel, fresh or interesting to say. The question that has baffled this author is what represents a conducive or perhaps even optimal setting for generating papers. Since papers come in different flavors ranging from "solid technical papers to risky innovative ones" the settings may vary; but ultimately, what would be interesting to investigate (or for that matter crowdsource) is whether there is a common denominator in terms of setting or environment, a necessary but not sufficient condition (so to speak).

Here are some accounts of others which may be helpful as reference points.

Knuth's papers entitled "Semantics of context free grammar" along with "The analysis of algorithms" represent two instances that suggest research institutes might not provide an optimal environment for idea generation.

As Knuth points out in "Selected Papers on Computer Languages" (Chapter 18, p. 431):
Perhaps new ideas emerge most often from hectic, disorganized activity, when a great many sources of stimulation are present at once -- when numerous deadlines need to be met, and when other miscellaneous activities like child-rearing are also mixed into the agenda.
Knuth goes on to say, that it was challenging to do creative work in office and that finding a few hideaways provided some form of solution -- aka sitting under 'that' oak tree near Lake Lagunita. That said, the inspirational setting for getting into the zone for the aforementioned two papers were provided by (Californian) beaches. Hold that observation. Is this not something we have come across somewhere else ? Fields medalist Stephen Smale in "Chaos: Finding a Horseshoe on the Beaches of Rio" suggests that some of his best work happened at his "beach office". Whether beaches do provide for a good setting remains to be shown; perhaps for very innovative ideas, oceanic freedom is necessary. That said, the author recalls (hopefully accurately enough) an account by the young James H Simons, who attended a conference in Japan in the early days. Instead of choosing a spacious accommodation (which he was able to afford), he restricted himself to the typically confined room type -- not only confined by space, but also pressured by time, young Simons was able to generate an interesting result for that conference. (This probably demonstrates that technical results don't necessarily require 'oceanic freedom'.)

Some meaningful probabilistic advice comes from the fat-tails department, in "The Black Swan" by Nassim Taleb (on page 209) : "Go to parties! If you're a scientist, you will chance upon a remark that might spark a new research. "

Murray Gell-Mann provides an interesting collective account in his Google Tech Talk entitled "On Getting Creative Ideas." He recollects a workshop he attended in 1969 in Aspen that focused on the experience of getting creative ideas, not just among mathematicians and theoretical physicists but also poets and artists. This account seems to neglect the actual setting that might nurture creative thought process, but provides interesting references to people such as Hermann von Helmholtz, who happened to have thought about this topic and partitioned the process in terms of "saturation, incubation and illumination".

For those interested in an account that focuses on the Eureka moments of exclusively mathematicians/theoretical physicists see Jacques Hadamard's book "The Mathematician's Mind". Hadamard iterated on Helmholtz's 3 stage process and it's worth taking a look at what he came up.

At last, what are good venues or workshops for generating papers ? Or let's rephrase that a bit, what type of atmosphere at venues fosters creativity -- what food for thought to provide participants and how to distribute that food for thought over a given day ? Ryan R Williams proposed (as practiced by 34th Bellairs Winter Workshop on Computational Geometry) "... easy problems, informal atmosphere focusing exclusively on thinking about problems in a cycle of down-time where one meets in two intense sessions and have free time otherwise." (This type of setting seems to resonate with the 3 stages of "saturation, incubation and illumination".)

That said, most workshops including the Simons workshops don't seem to follow such a recipe. They are more geared towards the follow-up step, namely, communicating what people have found, rather than collaborating with them to tackle open problems. Perhaps some re-evaluation might be required in how workshops are run.

Thursday, January 23, 2020

The World of Publishing

Bill is out for blogging for a couple of weeks on injured-reserve (he’ll be fine). I put together a quick blog post on what’s happening in the world of publications.

The Trump administration has suggested requiring publishers to make all papers based on US federally-funded publicly available immediately instead of after one year. The Association of American Publishers sent an open letter--how do we maintain the organizations and the people who work there if we give up a major revenue source. The ACM joined the letter which caused quite a backlash forcing the ACM to explain itself, write another letter, and run some webinars about open access the last of which is tomorrow. In the end, this is leading to some good discussions about open access and the financial models of academic societies.

The ACM also has a new policy, three options for what happens when an author changes their name: Maintain separate identities, have the two identities link to each other, or retroactively change the name on all previous papers. I can see good reasons for all three options.

Finally Moshe Vardi writes in his CACM column about the ecological cost of conferences and suggests that conferences allow authors to (video)phone it in. Emmanuel Viola offers his own thoughts. Most Conferences will continue to require authors to show up, with only occasional exceptions as needed, believing these policies will keep their conference healthy.

Personally I believe conferences should exist because researchers want to attend, not because they have to. We still need conferences so our community can get together and I don’t believe we can do that via the Internet no matter how good the VR experience gets. But we can have more videos and less conferences and reduce the costs: time, financial and environmental.

Tuesday, January 14, 2020

Quantum Provers to Infinity and Beyond

The Internets are buzzing about the new paper MIP* = RE by Zhengfeng Ji, Anand Natarajan, Thomas Vidick, John Wright and Henry Yuen. See posts by Scott, Boaz, not to mention a wonderful backstory by Vidick himself and a tweet stream by Yeun. I'm not an expert enough to verify or even try to explain the proof so I'll just give a brief overview of the result.

For those not familiar with the classes, RE (recursively enumerable) is the simplest of all complexity classes, a language is in RE if there is some Turing machine M such that x is in L if and only if M on input x accepts. For x not in L, M on x can reject or run forever. The classic halting problem, the set of descriptions of Turing machines that halt on empty input, is RE-complete. To nitpick the notation, it should have been r.e. and even c.e. (computably enumerable), a more standard notation these days. But given the importance of the result, we can give the authors a pass.

MIP* is the set of things provable to a classically random polynomial-time verifier by two separated provers with an unlimited number of quantumly entangled qubits. Without the quantum entanglement, MIP = NEXP, nondeterministic exponential time, and last year Natarajan and Wright showed that MIP* could do at least exponentially better in their paper, NEEXP in MIP*. NEEXP seems large but still only consists of computable sets. RE gets outside of the computable realm.

I found the first paper more surprising, as it showed that quantum entanglement actually gets more, much more, than classical provers. The second paper does get a much stronger and tight result, and still highly surprising in its own right, as it requires disproving the Connes' embedding conjecture. In the end we may just consider this one result, as the second paper subsumes the first both in theorem and authors.

We didn't award the 2019 theorem of the year to Natarajan and Wright, instead opting for a paper that had more, how should I say this, sensitivity. This new paper is certainly the front runner for the 2020 honors, albeit it is only mid-January.

Monday, January 13, 2020

What would you do if you showed P=NP? I would reread Factor Man by Matt Ginsberg

Lance has often said (and also in this) that if P=NP that would be great for the world: much more efficient ways to build things, science could be done better, etc, and that is much more important than that modern crypto would no longer work. We now have the technology to do private key really well--- like a thumb drive that has a billion bits for 1-time pads.

I agree that the world would be better off in some ways, I wonder how much damage would be done in the transition period from public to private key. Would the world recover enough to reap the benefits of P=NP?

First think of what YOU would do if you showed P=NP (and lets assume your algorithm is either reasonable or could be made reasonable with some time and effort).

The novel Factor Man  is about what someone who has solved P=NP does. I won't tell you how it goes, but they deal with the issue intelligently. So if I solved P=NP then I would first re-read it, and think through if I would do that, or modify what is done, or what.  Its a good start.

I reviewed the book in SIGACT News or you can read my review here

On a slightly diff note, here is the latest argument I've heard for why P=NP:

Planar 2-coloring is in P

Planar 4-coloring is in P

So

Planar 3-coloring should be in P.

This was said by a very good math/cs ugrad at UMCP. I do not know if he was kidding.


Wednesday, January 08, 2020

Silicon Valley Ethics

Spoiler Alert: This post has details from the final episodes of the HBO television series Silicon Valley

A few times I've gotten emails from people claiming they have shown P = NP and asking whether they should keep their algorithm a secret to protect the cryptography out there. My typical response is that they should use their algorithm to mine a few bitcoins and then get back to me.

The fictional characters of Pied Piper faced this dilemma when they AI they created "developed a general solution to discrete log in polynomial time" with some nice complexity class diagrams in the background.


Pied Piper was about to roll out its new internet, a distributed network that communicated between cell phones based on a compression algorithm developed by Pied Piper's CEO. Rolling out the network would reveal even more advanced compression based on breaking discrete log. "If we cancel it or shut it down, then others will try to copy or reverse engineer everything that we've built ... Our launch has to fail, publicly and spectacularly."

But here comes the P v NP dilemma: "And what about all the other stuff we're gonna do? I mean, give internet to underserved communities, students in the homework gap, refugees, genomic research. Pied Piper can help scientists cure cancer."

I'd take broken encryption over cancer any day. You can still do encryption even if P = NP, one-time pads distributed via USB drives or quantum. And cancer sucks.

They should have mined a few bitcoins.

Sunday, January 05, 2020

The Wikipedia Entry on NP-Intermediary Problems lists one of mine! I'm not bragging about it.

I recently needed to look at what NP problems were possibly intermediary (neither in P nor NP-complete). So I went to Wikipedia and found this.

They had many problems, though some I had never heard of. Those that I had never heard of

should they be on the list?

That is, are they natural? That is hard to define rigorously, but I will take you through my train of thought as I read the first few:

Factoring Integers. Yes, quite possibly intermediary: If  its NPC then PH collapses, and, at least so far, does not seem to be in P.  (the NPC--> PH collapse result: We take

FACT = { (n,x) : n has a nontrivial factor ≤ x }

FACT is clearly in NP:
a complete factorization of n provides evidence that some nontrivial factor is \le x.

FACT is clearly in coNP:
a complete factorization of n provides evidence that no nontrivial factor is \le x

so if FACT is NP-complete then SAT is in coNP.

Factoring is clearly an important and well studied problem. It even has its own Wikipedia entry!

Discrete Log. Similar to Factoring. And it is also an important and well studied problem. It even has its own Wikipedia Entry!

Isomorphism Problems They list Group and Ring isomorphism. They don't list Graph, which is odd. (ADDED LATER- my bad, they do mention Graph Isom in the section on Graph Algorithms) Anyway, if Graph Isom is NPC then PH collapses, and, at least so far, there is no algorithm for Graph Isom in P. (I do not think it is know if Group Isom NPC means PH collapses, or if Ring Isom NPC means PH collapses---if you know of such a proof leave a comment and a pointer to it.)

Graph Isomorphism is a well studied problem and seems important and natural (I don't know if Graph Isomorphism has any real applications they way that factoring and DL do).  It even has its own Wikipedia entry! Group and Ring Isomorphism also seem important and natural. And they have their own Wikipedia entry!

Numbers in Boxes Problem My first reaction-Gee, whats that? For the Factoring, DL, and Isomorphism they did not define the problem-- they gave pointers to the Wikipedia entries on them. For this one there was no Wikipedia entry. There was one reference. I went to it. It was a blog entry of mine! Here it is: here, and to save you time I'll say what it is:

{ (1n,1k) : you can partition 1,...,n into k boxes so that no box has x,y,z with x + y = z }

Is this problem important? Does it exist anywhere outside of my blog entry? Yes--- a special case of it was in Dr. Ecco's Cyperpuzzles by Dennis Shasha (note- Dennis was a classmate of mine in graduate school at Harvard). I think the case was to try to partition {1,...,100} as best you can. Actually I first saw the case of the problem in his book and then generalized it.

The problem is sparse so if it was NP-complete then P = NP, very good evidence that its not NPC. And its been studied for thousands of years, with people looking for poly time algorithms (I think Pythagoras studied it) without success, so its almost surely not in P. OR that last sentence was complete nonsense. Indeed, I don't think anyone has studied the problem computationally, or, for that matter, at all. So the evidence that its not in P is... sparse.

But its worse than that. One could devise MANY sparse problems that are, since spares, likely NOT NPC, and hardly studied, so as-of-now, not in P. Should those count? Only if (a) more people study them so there is an attempt to point to to get it into P, and (b) the problem is natural (which is hard to define).

Note that I can vary the problem: x+2y=z (this relates to lower bounds on VDW numbers)
or any other combination of x,y,z or more that I like.



This raises a question:

When is a problem worthy of being put on lists of problems?

Here are some possibly criteria. One can take ANDS and ORS of them.

1) The problem has a Wikipedia entry. This might fall victim to Goodhearts law: when a measure becomes a target, it ceases to be a measure.  That is, I could make a Wikipedia entry on the Number-in-boxes problem and then say LOOK, its on Wikipedia!

2) More than X people have worked on the problem for some value of X. But here is a reason this might not be a good criteria: look at the problem

{ α : α is a reg expression that allows numbers (so a1000 is fine, makes reg expressions  VERY succint) such that L(α)=Σ* }

This problem looks natural, and was proven by Meyer and Stockmeyer to be EXPSPACE complete.
That is the only paper on this problem, yet the problem really does look natural, and the result is rightly celebrated as a natural problem that is provably not in P.

3) When people in the field look at the problem they say YEAH, thats a good problem.

4) The problem relates to other problems or other fields.

I doubt the Number-in-boxes problem satisfies any of these criteria. The variant with x+2y=z relates to Ramsey Theory. Great.

NOW, back to the list-- I won't go through any more on the list, but I note that for some of them the only reference seems to be a conversation on stack-exchange.  Some of those end up referring to real papers so are more likely natural, but some do not.

Having said that, is there any harm in the list having on it some problems that are not ... worthy? Is that even the right word to use?

Note that I don't have strong opinions on any of these matters, I am just wondering what criteria Wikipedia, and other sources, uses, when they have lists of problems.




Tuesday, December 31, 2019

Complexity Year in Review 2019

Some great theorems this year including non-deterministic double exponential time by quantumly entangled provers and integer multiplication in O(n log n) time. But the result of the year has to go to a paper that gave a shockingly simple proof of a major longstanding conjecture.


Of course 2019 will be remembered in some circles for giving us Google's claims of quantum supremacy and all the quantum hype, deserved and otherwise, that goes with it.

Personally Bill came out with his new book Problems with a Point; Exploring Math and Computer Science co-authored with Clyde Kruskal (Amazon, blog posts). Lance became a dean



As we move into the 2020s, we tend to look back and look forward. The 2010s will go down as the decade computing and data transformed society, for better and worse. Google turned 21 this year as its OG leadership stepped down. I turned 21 in 1984, but 1984 seems closer than ever.

Last year we ended the year in review by 
We end the year with craziness, the stock market is going through wild gyrations, we have a partial government shutdown including all of NSF and an uncertain political landscape with different parties leading the two houses of congress. We're still in the midst of a technological revolution and governments around the world try to figure how to regulate it. I find it hard to predict 2019 but it will not be quiet.
2019 was not quiet and we're about to head into an impeachment trial, Brexit and a critical US presidential election. The real challenges of the twenties will come from massive transformation from automation, climate change and deepening divisions in our society. How will academia cope with changing demographics, financial challenges and educating to manage the technological revolution?

Let's all take a deep breath, roll up our sleeves and get the decade going.

Thursday, December 12, 2019

Why is there no all-encompassing term for a course on Models of Computation?

In my last blog post I asked my readers to leave comments saying what the name of the course that has some of Regular Languages, Context Free Languages  Decideability, P, NP (any maybe other stuff) in it.  I suspected there would be many different names and their were. I was able to put all but 6 into 4 equivalence classes. So that's 10 names. Thats a lot  especially compared to

(Introduction to) Algorithms

and

(Introduction to) Cryptography

which I suspect have far fewer names. One commenter pointed out that the reason for the many different names is that there are many versions of the course. That's not quite an explanation since there are also many different versions of Cryptography---at UMCP  crypto is cross listed in THREE departments (CS, Math, EE) and its taught by 6 or so different people who don't talk to each other (I am one of them). I think Algorithms is more uniform across colleges.

I think that terms Algorithms and Cryptography are both rather broad and can accommodate many versions of the courses, whereas no term seems to be agreed upon to encompass the DFA etc course.
Even saying DFA etc is not quite right since some of the courses spend little or even no time on DFA's.

Below is a list of all the names I got and some comments. Note that some schools appear twice since they have two courses along these lines.

-----------------------------------------
TITLE: (Introduction to) Theory of Computation:

Swarthmore:                                            Theory of Computation

UCSD:                                                     Theory of Computation

Saint Michaels:                                        Theory of Computation

Univ of Washington: Introduction to the Theory of Computation

Waterloo:                  Introduction to the Theory of Computing

COMMENT: Theory of Computation could have been the term that encompasses all of these courses. I speculate that it didn't catch on since it sounds too much like computability theory which is only one part of the course.

------------------------
TITLE: Formal Languages and XXX

CMU:                                                Formal Languages, Automata, and Computability

Florida Tech:                                    Formal Languages and Automata Theory

UC-Irvine:                                        Formal Languages and Automata Theory

Univ of Chicago:    Introduction to Formal Languages

University of Bucharest:                 Formal Language and Automata

TU Darmstadt:                                Formal Foundations of CS I: Automata, Formal Languages, and Decidability

TUK Germany:                               Formal Languages and Computability

COMMENT: The title makes it sound like they don't cover P and NP. I do not know if thats true; however, I speculate that, it could never be the encompassing term.

Spell Check things Automata and Computability are not words, but I've googled them and they seem to be words.

--------------------------
TITLE: Computability/Decidability and Complexity/Intractability

Reed College: Computability and Complexity

Caltech:          Decidability and Intractability

COMMENT: The title makes it sound like they don't cover regular or context free languages. I do not know if that's true; however, I speculate that, since the terms sound that way, they never caught on as the general term.

Spellecheck thinks that neither Decidability nor Decideability is a word. Google seems to say that I should leave out the e, so I will.

------------------------------
TITLE:  Blah MODELS Blah

Tel-Aviv (a long time ago) Computational Models

UIUC:                               Algorithms and Models of Computation (also has some algorithms in it)

Waterloo:                                                   Models of Computation (enriched version)

COMMENT: Models of Computation sounds like a good name for the course! Too bad it didn't catch on.  It would also be able to withstand changes in the content like more on parallelism or more on communication complexity.

------------------------------
TITLE: MISC

CMU:                                          Great Ideas in Theoretical Computer Science

UCLouvain (Belgium)                Calculabilite (Computability)

Moscow Inst. of  Phy. and Tech.: Mathematical logic and Theory of Algorithms

Portland State University:            Computational Structures

Germany:                                     Informatik III (Not all of Germany)

Univ of Chicago:                         Introduction to Complexity

COMMENT: All of these terms are to narrow to have served as a general term.