Wednesday, December 13, 2017

Our AI future: The Good and the Ugly

I don’t directly work in machine learning but one cannot deny the progress it has made and the effect it has on society. Who would have thought even a few years ago that ML would have basically solved face and voice recognition and translate nearly as well as humans.

The Neural Information Process Systems conference held last week in Long Beach, California, sold out its 7500 registration slots in 12 days. NIPS, not long ago just another academic conference, has become a major machine learning job market where newly minted Ph.D.s earn north of $300,000 and top-ranked senior academics command multimillion-dollar, multiyear contracts."

AlphaZero, an offshoot of Google’s Go programs, learned chess given only the rules in just four hours (on 5000 tenor processing units) and easily beats the best human-designed chess programs. Check out this match against Stockfish.

Just a trend that machine learning often works better when humans just get out of the way.

The advances in machine learning and automation have a dark side. Earlier this week I attended the CRA Summit on Technology and Jobs, one of a series of meetings organized by Moshe Vardi on how AI and other computing technology will affect the future job market. When we talk about ethics in computer science we usually talk about freedom of information, privacy and fairness but this may be the biggest challenge of them all.

The most stark statistic: Contrary to what certain politicians may tell you, manufacturing output in the United States has never been higher, but manufacturing jobs have declined dramatically due to automation.

The changes have hit hardest for white middle-class less educated males. While this group usually doesn’t get much attention from academics, they have been hit hard, often taking less rewarding jobs or dropping out of the job market entirely. We're seeing many young people living with their parents spending their days playing video games and see a spike in suicides and drug use. Drug overdose is the now the leading cause of death of men under 50.

There are no easy solutions. Universal basic income won’t solve the psychological need a job plays in being a part of something bigger than oneself. In the end we'll need to rethink the educate-work-retire cycle towards more life-long learning and find rewarding jobs that go around automation. This all starts by having a government that recognizes these real challenges.

Tuesday, December 12, 2017

Interesting Probability on a VERY OLD TV show

I have posted about things I see in TV or Movies that are math or CS related:

Do TV shows overestimate how much a genius can help solve crimes or make really good crystal meth which seems to be blue. YES, see here

Do TV shows get math wrong. YES, see here and about 90% of the episodes of Numb3rs

Closer to home- do TV shows say stupid things about P vs NP. Elementary (one of the two Modern Day Sherlock Holmes shows did) does  see  here

Did Kirk and Spock really defeat a computer by a trick that wouldn't work now. Yes, see Lance's post on this here

Do TV shows use the word Quantum incorrectly? They do but they are not alone as such, see here

Do people writing Futrama get their math right! Yes- see here

Do people writing 24 get their math wrong! Yes- see here

Does the Big Bang Theory mostly get things right? Yes! - see here

There are more (Seinfeld things comedians should learn proofs! Really- see here) but I can make my point just with the ones above.

ALL of the TV shows except Star Trek were from after 2000 (or so).  So, with the exception of Science Fiction, math-refs and sci-refs in TV shows are relatively recent- I had thought.

Which is why I was surprised and delighted to see, an episode of the old western (anti-western? satire of a western?) Maverick, from 1958 (before I was born!), called Rope of Cards a CORRECT and INTERESTING  math reference.Maverick bets that a random 25 cards from a deck of cards can be arranged into five  5-card pat hands (I had to look that up-- hands where you don't want to discard any cards, so  flush, a straight, a full house would qualify. 4 of a kind would be pat if there were no wild cards).  The sucker takes the bet and loses. Maverick later says the odds are high and called the game Maverick Solitaire.And that is now the name of the puzzle- see here. The prob is around 0.98.

I call this a mention of math since it has to do with probability- which may be a stretch. And I doubt the scene would encourage people to go into math. But it might encourage one to learn probability either to sucker others or to not be suckered.

So the question now is- are there other non-science-fiction, refs to math in older TV shows?
I suspect yes - similar to the one above which is gambling and probability. What is the earliest mention of math on a TV show? The oldest that did not involve science fiction or gambling?

Thursday, December 07, 2017

Razor's Edge

Informally the sensitivity conjecture asks whether every hard Boolean function has a razor's edge input, where flipping a random bit has a reasonable chance of flipping the output.

Let's be more precise. We consider functions f mapping {0,1}n to {0,1}. For every input x, the decision tree complexity at x is the least number of bits of  x you would need to query to decide whether the function outputs 0 or 1. The decision tree complexity of a function is the maximum decision tree complexity over all possible x. Most interesting functions have high decision tree complexity, even the lowly OR function requires querying every bit on the input of all zeroes. The decision tree complexity is polynomially-equivalent to randomized-complexity, quantum complexity, certificate complexity, and the degree of a polynomial that computes the function exactly or approximately. The recent paper by Aaronson, Ben-David and Kothari gives a nice chart showing the relationship between these measures and references to the various papers giving the bounds.

The sensitivity of f on an input x is the number of bit locations i such that f(x)≠f(x⊕i) where x⊕i is x with the ith bit flipped. The sensitivity of f is the maximum sensitivity over all inputs. The sensitivity conjecture states that there is some ε>0 such that the sensitivity of f is at least mε if the decision tree complexity is at least m. If the conjecture were true then for any function with maximal decision tree complexity n (querying every input bit) there must be some razor's edge input x such that flipping a random bit of x has probability at least n of flipping the output.

I find it surprising that we have no proof or counterexample to this purely combinatorial question. There is a generalization of sensitivity known as block sensitivity which is the largest set of disjoint blocks where flipping the bits in any block flips the output bit. Block sensitivity is known to be polynomially related to decision tree complexity.

In a future post I'll talk about some approaches towards resolving this conjecture.

Monday, December 04, 2017

Fireside chat with Simons Inst Director Dick Karp

Above link is Samir Khuller interviewing Dick Karp, though its labelled as a fireside chat with Dick Karp.

Very interesting to hear how TCS has evolved. More generally its good to know where you've come from to have a better idea of where you're going.

bill g.

Thursday, November 30, 2017

Kolmogorov Complexity and the Primes

Bill's post on how to derive the non-finiteness of the primes from Van der Waerden's theorem reminds me of a nice proof using Kolmogorov complexity.

A quick primer: Fixed some universal programming language. Let C(x), the Kolmogorov complexity of x, be the length of the smallest program that outputs x. One can show by a simple counting argument for every n there is an x such that C(x) ≥ n. We call such x "random".

Suppose we had a finite list of primes p1…pk. Then any number m can be expressed as p1e1···pkek. Pick n large, a random x of length n and let m be the number x expresses in binary. We can compute m from e1,…,ek and a constant amount of other information, remembering that k is a constant. Each ei is at most log m and so we can describe all of them in O(log log m) bits and thus C(m) = O(log log m). But roughly C(m) = C(x)  ≥  n = log m, a contradiction.

But we can do better. Again pick n large, a random x of length n and let m be the number x expresses in binary. Let pi be the largest prime that divides m where pi is the ith prime. We can describe m by pi and m/pi, or by i and m/pi. So we have C(m) ≤ C(i,m/pi) ≤ C(i) + C(m/pi) + 2 log C(pi) ≤ log i + log m/pi + 2 log log i + c. The 2 log C(pi) term is needed to specify the separation between the program for i and the program for m/pi.

Since C(m) ≥ log m, we have
log m ≤ log i + log (m/pi)+ 2 log log i + c
log m ≤ log i + log m - log pi + 2 log log i + c
log pi ≤ log i + 2 log log i + c
pi ≤ O(i (log i)2)

The prime number theorem has pi approximately i log i, so we get just a log factor off from optimal with simple Kolmogorov complexity.

I wrote a short introduction to Kolmogorov complexity with this proof. I originally got the proof from the great text on Kolmogorov complexity from Li and Vitányi and they give credit to Piotr Berman and John Tromp.

Monday, November 27, 2017

Van der Waerden's theorem implies the infinitude of the primes

(Sam Buss and Denis Hirschfeld helped me on this post.)

I was reading the table of contents of the American Math Monthly and saw an article by Levent Alpoge entitled

Van der Waerden and the primes

in which he showed from VDW's theorem that the set of primes is infinite. The article is  here and here. My writeup of it is here.  Prof K saw me reading the paper.

 K: I see you are interested in proving the set of primes is infinite from VDW's theorem.

BILL: Yes, who wouldn't be!!!!

 K: Well, lots of people. Including me. Can't you just state VDW's theorem and then give the normal proof? Would that count? Besides, we already have an easy proof that the set of primes is  infinite without using VDW's theorem.

I turn K's comments  into a valid question:  What does it mean to prove A from B if A is already known?

 There are two issues here, informal and formal.

Informally:  If you look at the proof of VDW-->primes infinite the steps in that proof look easier than than the usual proof that the set of primes is infinite. And the proof is certainly different. If you read the paper you will see that I am certainly not smuggling in the usual proof. Also, the proof truly does use VDW's theorem.

Formally one could (and people working in Reverse Mathematics do similar things- see the books Subsystems of Second order Arithmetic by Simpson,, and  Slicing the Truth, reviewed here) devise a weak axiom system that itself cannot prove the set of Primes is Infinite, but can prove the implication VDW-->Primes infinite.  Note that Reverse Mathematics does this sort of thing, but for proofs involving infinite objects, nothing like what I am proposing here.

Open Problem 1: Find a proof system where the implication VDW-->Primes infinte can be proven, but primes infinite cannot. Sam Buss pointed out to me that for the weak system IΔ0 it is not known if it can prove the primes are infinite.

Open Problem 2: Find a proof system where you can do both proofs, but the prove of the implication is much shorter. Perhaps look at (VDW--> there are at least n primes) and (there are at least n primes)
and look at the length of proof as a function of n.

Open Problem 3: The statement there are no primes with  n bits, the with leading bit 1 can be expressed as a propositional statement. Get lower bounds on its refuation in (say) resolution. (A commenter pointed out an error in a prior version of this one so be wary- there may be an error here as well.)

I am suggesting work on the reverse mathematics of systems much weaker than RCA0. I do not know if this is a paper, a PhD thesis, a career, a dead end, or already pretty much done but I am not aware of it.

Monday, November 20, 2017

The Grad Student Tax

By now as you've read from Luca or Scott or PhD Comics or a variety of other sources on the dangerous changes to the tax code that passed the US House of Representatives last week. Among a number of university unfriendly policies, the tax code will eliminate the tax exemption for graduate student tuition for students supported with teaching or research duties, nearly every PhD student in STEM fields. The CRA, ACM, IEEE, AAAI, SIAM and Usenix put out a joint statement opposing this tax increase on graduate students. This is real.

Without other changes, a tax on tuition will make grad school unaffordable to most doctoral students. In computer science where potential PhD students can typically get lucrative jobs in industry, we'll certainly see a precipitous drop in those who choose to continue their studies. Universities will have to adjust by lower tuition, if finances and state law allows, and raising stipends. US government science funding will at best remain flat so in almost any scenario we'll see far fewer students pursue PhD degrees particularly in CS and STEM fields. Keep in mind we already don't come close to producing enough CS PhD students entering academia to meet the dramatically growing demand and these moves could frustrate faculty who also might head off to industry.

The current senate proposal leaves the exemption in place though no one can predict what will happen the the two bills get reconciled. In the best case scenario this bill goes the same way as the failed health care reform but republicans seem desperate to pass something major this fall. So reach out to your representatives, especially your senators, and express the need to leave in the exemption.

Thursday, November 16, 2017

A Tale of Three Rankings

In the Spring of 2018 the US News and World Report should release their latest rankings of US graduate science programs including computer science. These are the most cited of the deluge of computer science rankings we see out there. The US News rankings have a long history and since they are reputation based they roughly correspond to how we see CS departments though some argue that reputation changes slowly with the quality of a department.

US News and World Report also has a new global ranking of CS departments. The US doesn't fare that well on the list and the ranking of the US programs on the global list are wildly inconsistent with the US list. What's going on?

75% of the global ranking is based on statistics from Web of Science. Web of Science captures mainly journal articles where conferences in computer science typically have a higher reputation and more selectivity. In many European and Asian universities hiring and promotion often depend heavily on publications and citations in Web of Science encouraging their professor to publish in journals thus leading to higher ranked international departments.

The CRA rightly put out a statement urging the CS community to ignore the global rankings, though I wished they made a distinction between the two different US News rankings.

I've never been a fan of using metrics to rank CS departments but there is a relatively new site, Emery Berger's Computer Science Rankings, based on the number of publications in major venues. CS Rankings passes the smell test for both their US and global lists and is relatively consistent with the US News reputation-based CS graduate rankings.

Nevertheless I hope CS Rankings will not become the main ranking system for CS departments. Departments who wish to raise their ranking would hire faculty based mainly on their ability to publish large number of papers in major conferences. Professors and students would then focus on quantity of papers and this would in the long run discourage risk-taking long-range research, as well as innovations in improving diversity or educating graduate students.

As Goodhart's Law states, "when a measure becomes a target, it ceases to be a good measure". Paradoxically CS Rankings can lead to good rankings of CS departments as long as we don't treat it as such.