Thursday, November 20, 2014

A November to Remember

The Imitation Game starring Benedict Cumberbatch as Alan Turing opens in the US on November 28. If you read this blog you should see that movie. If one challenged British scientist biograph is not enough for you, The Theory of Everything with Eddie Redmayne as Stephen Hawking opened earlier this month. Also this month has the physics-laden Interstellar and the nerd-hero robot adventure Big Hero 6. Science rules the box office, though likely to be clobbered by a Mockingjay.

Speaking of Turing, The ACM Turing Award will now come with a $1 million dollar prize, up from $250K, now on par with the Nobel Prize. Thanks Google.

Madhu Sudan will receive the 2014 Infosys Prize in Mathematics. Nimrod Megiddo wins the 2014 John von Neumann Theory Prize.

Harvard Computer Science gets 12 endowed faculty lines from former Microsoft CEO Steve Ballmer. That's about the number of former Microsoft Research theorists now on the job market. Just saying,

Sanjeev Arora asks for comments on the potential changes to STOC/FOCS discussed at the recent FOCS. Boaz Barak has set up a new CS Theory jobs site. On that note, the November CRA News has 75 pages of faculty job ads, up from 50 a year ago.

Terry Tao talked twin, cousin and sexy primes on Colbert last week. The new result he quoted is that the Generalized Elliott-Halberstam conjecture implies that there are infinitely many pairs of primes at most six apart.

Not all happy news as we lost the great mathematician Alexander Grothendieck. Tributes by Luca and Ken.

Monday, November 17, 2014

A Fly on the wall for a Harvard Faculty meeting: Not interesting for Gossip but interesting for a more important reason

I was recently in Boston for Mikefest (which Lance talked about  here)  and found time to talk to  my adviser Harry Lewis at Harvard (adviser? Gee, I already finished my PhD. Former-Advisor? that doesn't quite sound right. I'll stick with Adviser, kind of like when they refer to Romney as Governor Romney, or Palin as half-governor Palin). He also invited me to goto a Harvard Faculty meeting.

NO, I didn't see anything worth gossiping about. NO I am not going to quote Kissinger ``Academic battles are so fierce because the stakes were so low'' NO I am not going to say that under the veneer or cordiality
you could tell there were deep seated tensions. It was all very civilized. Plus there was a free lunch.

The topic was (roughly) which courses count in which categories incomputer science for which requirements.  Why is this interesting? (hmmm- IS it interesting? You'd prob rather hear that Harry Lewis stabbed Les Valiant with a fork in a dispute about whether there should be an ugrad learning theory course). Because ALL Comp sci depts face these problems.  At Mikefest and other conferences I've heard the following issues brought up:

Should CS become a service department?  Math did that with Calculus many years ago. PRO: They get to tell the dean `we need to hire more tenure track faculty to teach calculus', CON: They have to have their tenure track faculty teach calculus. (I know its more complicated than that.)

What should a non-majors course have in it?

What should CS1,CS2,CS3 have in it (often misconstrued by the question ``what is a good first language'' which misses the point of what you are trying to accomplish). For that matter, should it be a 3-long intro sequence (it is at UMCP).

Can our majors take the non-majors courses?  (At UMCP our non majors course on web design has material in it that is NOT in any majors course.)

When new courses come about (comp-bio, programming hand-held devices, Computational flavor-of-the-month) what categories to they fit into? (For an argument in favor of Machine Learning see Daume's post. ) What should the categories be anyway? And what about the functors?

Which courses were at one point important but aren't any more? UMCP no longer requires a Hardware course-- is that bad?  (Yes- when I tell my students that PARITY can't be solved by a constant depth poly sized circuit, they don't know what a circuit is!)

I don't have strong opinions to any of these questions (except that, despite my best efforts, we do not require all students to learn Ramsey Theory), but I note that all depts face these questions (or need to- I wonder if some depts are still teaching FORTRAN and COBOL- and even that's not quite a bad thing since there is so much legacy code out there.)

I have this notion (perhaps `grass is always greener on the other side') that MATH (and most other majors) don't have these problems.  AT UMCP there have only been TWO new math courses introduced on the ugrad level since 1985 :Crypto (which is cross-listed with CS), and Chaos Theory. CS has new courses, new emphasis, new requirements every few years. Oddly enough when I tell this to Math Profs they ENVY that we CAN change our courses so much. What is better chaos or stability?

When I saw Back to the future 2   in 1989 I noticed that their depiction of academic computer science in 2015 was that  Comp Sci Depts across the country agreed on what was important and be similar (as I imagine math is). Instead the opposite has happened- these things are still in flux. (If you can't trust a Science Fiction movie staring Michael J Fox what can you trust?) As a sign of that, the advanced GRE in CS never really worked and has now been discontinued.

So- will CS settle down by 2015? We still have a year to go, but I doubt it.  2025? Before P vs NP is solved?

and is it OKAY if it doesn't?


Thursday, November 13, 2014

From Homework Solution to Research Paper

Inspired by the Dantzig Story  I occasionally put an open problem on a class assignment. Never worked, though I did have a student get a research paper from solving a homework question the hard way.

Teaching in the early 90's, I showed Valiant's proof that computing the permanent of a 0-1 matrix was #P-complete, including showing that the 0-1 permanent was in #P, the class of functions representable as the number of accepting paths of a nondeterministic polynomial-time Turing machine.

I gave a homework assignment to show that the permanent of a matrix with non-negative integer entries was in #P. The answer I expected was to construct an appropriate NP machine whose number of accepting paths equalled the permanent and some students came up with such a proof.

One of the students Viktória Zankó took a different approach, creating a reduction that mapped an integer matrix A to a 0-1 matrix B such that permanent(A) = permanent(B). A fine solution reducing the problem to a previously solved case.

So what's the rub? Such a reduction was an open problem and simplified Valiant's paper. Valiant only had the reduction for integer matrices A with small entries and needed a mod trick to show the 0-1 permanent is #P-complete. Zankó's construction eliminated the need for the mod trick.

And that's how Viktória Zankó got a research paper from solving a homework problem.

Tuesday, November 11, 2014

Non controversial thoughts on rankings

US News has a ranking of CS depts and various subcategories. Recently MohammadTaghi Hajiaghay and Luca Trevisan have suggested alternative rankings here (Moh) and here (Luca). These rankings inspire me to record some thoughts about rankings.

  1. When making a ranking one must ask: What is it for? For Academic depts it may be to help guide potential ugrads or grads in their choice of where to go. Rankings of the most influential people of all time (Michael Harts book here), or in a given year (Time magazine does this) are made to (I think) organize our thoughts and start debates. Michael Hart also did a book about the most influential people as soon from the year 3000 (so half are fictional) as a way to speculate (see here). My own ranking of Bob Dylan satires here was done for my own amusement.
  2. Transparency sounds like a plus. But if a ranking is too transparent, and is considered important, than organizations might game the system. Recall Goodhart's law: When a measure becomes a target is ceases to be a measure. On the other hand, if the measure really is good then it may be okay if it becomes a target. Some measures are hard to game- like surveys of what people think.
  3. There have been a variety of rankings of presidents (see here).  These ranking say something about the times they were  done. Studying how they change over time could itself be a historical project of interest.  Another thought:  the book Hail to the chiefs it notes that James Buchanan and Andrew Johnson usually rank as the worst presidents, while Lincoln ranks as one of the best--- but this is unfair!--- Buchanan could not stop the civil war (but nobody really could) and Johnson had to clean up the mess after it (but nobody really could). The Lincoln presidency was almost entirely the civil war which Ameican won, so he gets the credit. More to the point--- ranking presidents is odd since it may depend very much on the times they govern.
  4. Bill James (KEY Baseball statistician who I think should go into the Hall of Fame for changing the way we think about Baseball) has tried to have lists of GREAT TEAMS. But there is a problem (which he fully notes)- some teams are GREAT in terms of having great players, but didn't win the world series, or have only one  pennant. Less than the sum of its parts.
  5. Numerical ratings may be odd in that they lump different items together. GPA is a bit odd--- do you prefer a student who got an A in Theory and a C in Operations Systems, or a student who got a B in both? I don't know the answer, but GPA wipes out the distinction.
  6. Rankings that compare very unlike objects are useless. Here is a ranking of CS blogs--- the criteria seems to be just one guys opinion. I  disagree with his ranking, but  I have no idea what he cares about. Also, he includes Harry Lewis's fine blog BITS AND PIECES,  which is often  about academic stuff, and also Terry Tao's fine blog WHATS NEW which is really a math blog. Very hard to compare those two to each other or to others.
  7.  The tigher the focus the more useful a ranking can be. Ranking the best novelty songs of all time would be impossible (Number one is PDQ Bach's Classical Rap) but if you restrict to, say, best science fiction  Satires I(Luke Ski's Grease Wars part 1, part 2, Part 3)- then its easier (Trivia note- Science fiction satire songs are often called FILK SONGS--- the urban legend is that at an early Science Fiction Convention  Science fiction Folk Songs was mispelled as Science fiction Filk Songs and hence the term was born.)
  8. SO, what really would help potential CS Grad Students in theory? Perhaps a grid where for every department is listed the theory faculty, and for each one the number of pubs in top tier confs, second tier confs, and journals in the last 5 years, and their area, and a pointer to their website. Then RESIST the urge to boil it down to one number.
  9. I"m reminded of being on the Grad Admissions committee. I get to look at the transcript (much more informative than the GPA), letters, possibly papers. Fortunately I don't have to boil it down to one number--- there are very few categories (accept, wait list of some sort, reject, but there can be a few others involving scholarships, but VERY few categories really).
  10. Finding what you want: I think that  Raiders of the lost ark has tone of the best ending-of-a-movie ever.  So I Googled best movie ending and  variants of it, and alas, Raiders did not do well. One of the rankings didn't have it in the top 50. So I then Googled best movie endings Raiders of the lost ark and I found a ranking that had Raider in the top 10. Yeah! But this is all silly- I found some person who agrees with me.
  11. Steve Skienna and Charlie Ward have written a book Who's bigger: Where historical figures really rank   which has a transparent and reasonable  way to measure... not clear. Probably fame.  For a review see here

Saturday, November 08, 2014

George Dantzig >= 100

We celebrate the 100th anniversary of the birth of George Dantzig today. In his obituary post we talked about his work on optimization, particularly the simplex method for solving linear programs.

For this centenary let's recall the urban legend of the famous mathematician (I heard it as John von Neumann) who as a student wasn't paying close attention in class. The professor wrote down four of the major open problems in the field. von Neumann wrote them down thinking they were homework problems. The next day he went back to the professor, ashamed that he could only solve two of them.

What does this have to do with Dantzig? Turns out he is the true source of the legend. From Dantzig's Washington Post obituary:
In 1939, Dantzig resumed his studies at the University of California at Berkeley, studying statistics under mathematician Jerzy Neyman. An incident during his first year at Berkeley became a math-world legend.
As Dr. Dantzig recalled years later, he arrived late for class one day and saw two problems on the blackboard that he assumed were homework assignments. He copied them down, took them home and solved them after a few days. "The problems seemed to be a little harder to do than usual," he said.
On a Sunday morning six weeks later, an excited Neyman banged on his student's front door, eager to tell him that the homework problems he had solved were two of the most famous unsolved problems in statistics. 
"That was the first inkling I had that there was anything special about them," Dr. Dantzig recalled.

Wednesday, November 05, 2014

Favorite Theorems: Circuit Lower Bounds

My long time blog readers should have no surprise on my final favorite theorem of 2005-2014.
Nonuniform ACC Circuit Lower Bounds by Ryan Williams (PDF)
We saw several exciting circuit lower bound results in the 80's, highlighted heavily in my first list of favorite theorems (1985-1994 sections 2.2 and 2.3). Progress happened so fast that many expected a proof that an NP-complete problem didn't have polynomial-size circuits, and thus P ≠ NP, was just around the corner. But after 1987 that progress came to a sudden stop. We saw some lower bounds for algebraic circuits or circuits of very small depth but no general lower bounds until the work of Ryan Williams.

For years I would point out that our limited knowledge of lower bounds allowed that possibility that NEXP could be computed by constant depth circuits with Mod6 gates. Williams eliminated that possibility for constant depth circuits with Modk gates for any k.

One could take the techniques and results that Williams uses in his paper and build a whole graduate computational complexity course on them. Dick Lipton did exactly that at Georgia Tech a couple years back.

Ryan Williams' greatest insight though was to find non-trivial satisfiability algorithms for ACC0 circuits and use them to give lower bounds for NEXP. Recently Ryan has turned that process around, for example getting a faster algorithm for all-pairs shortest path using the techniques from the Razborov-Smolensky circuit lower bounds. Ryan's publication page has several new results giving algorithms from lower bounds.

Monday, November 03, 2014

A few more notes about Sipser and Sipser-60th


While Lance was AT Mikefest (Sipser's 60th Bday conference), helping to organize it, emceeing the personal statements, I was... also there.
A few notes

  1. Aside from his graying hair, Mike still looks like a teenager.
  2. In 1985 Mike was one of the few people who thought P=BPP. He wrote a paper about how a certain kind of expander implies what we would now call a hard vs randomness result and presented it at the 1986 CCC (called STRUCTURES then). After the Nisan-Wigderson Hard vs Rand results most everyone thought P=BPP. But Mike was there first.
  3. I took his grad complexity class in the early 1980's. I remember him proving results that either he or someone else had JUST proven the last week. He did a good job too!  What struck me then and now is how vibrant CS is as a field that material taught LAST WEEK can be in an INTRO grad course (that's not as true anymore).
  4. After PARITY not in AC_0, and the monotone circuit results, Sipser and others were OPTIMISTIC that P vs NP would be solved ``soon''.  Oh, to be in a field in the early days when people were optimistic. But see next point.
  5. Mike claims he STILL thinks P vs NP will be solved in 20 years. I don't quite know if he REALLY thinks this or wants to make the point that we should be optmistic. Similar to Lipton thinking P=NP--- does he really think that or does he want to make the point that we shouldn't all be so sure of ourselves?
And two non-sipers notes (sort of) from Mikefest

  1.  Steve Rudich told me that I misquoted him in a blog post and people often say `Steve, do you really think we are 6 months from an independence result'. I am not surprised that I made a MISTAKE in a blog post. I am surprised that people read it, remembered it, and asked him about it. In any case I have edited that post to SAY it was a mistake and I re-iterate it now: STEVE RUDICH DOES NOT THINK WE ARE SIX MONTHS AWAY FROM AN IND PROOF FOR P VS NP.
  2. I spoke to Mauricio Karchmer who, with Avi W and others, had an approach to P vs NC^1 via comm. comp which at the time I thought was very promising--- since we really can prove things in comm. comp. Alas it still has not panned out. However, Mauricio  now thinks that (1) We can't prove lower bounds because they are false, and (2) we can't prove upper bounds because we are stupid.

Thursday, October 30, 2014

Metrics in Academics

Congratulations to the San Francisco Giants, winning the World Series last night. In honor of their victory let's talk metrics. Baseball has truly embraced metrics as evidenced in the book and movie Moneyball about focusing on statistics to choose which players to trade for. This year we saw a dramatic increase in the infield shift, the process of moving the infielders to different locations for each batter based on where they hit the ball, all based on statistics.

Metrics work in baseball because we do have lots of statistics, but also an objective goal of winning games and ultimately the World Series. You can use machine learning techniques to predict the effects of certain players and positions and the metrics can drive your decisions.

In the academic world we certainly have our own statistics, publications counts and citations, grant income, teaching evaluation scores, sizes of classes and majors, number of faculty and much more. We certainly draw useful information from these values and they feed into the decisions of hiring and promotion and evaluation of departments and disciplines. But I don't like making decisions solely based on metrics, because we don't have an objective outcome.

What does it mean to be a great computer scientist? It's not just a number, not necessarily the person with a large number of citations or a high h-index, or the one who brings in huge grants, or the one with high teaching scores, or whose students gets high paying jobs. It's a much more subjective measure, the person who has a great impact. in the many various ways one can have an impact. It's why faculty applications require recommendation letters. It's why we have faculty recruiting and P&T committees, instead of just punching in a formula. It's why we have outside review committees that review departments and degrees, and peer review of grant proposals.

As you might have guessed this post is motivated by attempts to rank departments based on metrics, such as described in the controversial guest post last week or by Mitzenmacher. There are so many rankings based on metrics, you just need to find one that makes you look good. But metric-based rankings have many problems, most importantly they can't capture the subjective measure of greatness and people will disagree on which metric to use. If a ranking takes hold, you may optimize to the metric instead of to the real goals, a bad allocation of resources.

I prefer the US News & World report approach to ranking CS Departments, which are based heavily on surveys filled out by department and graduate committee chairs. For the subareas, it would be better to have, for example, theory people rank the theory groups but I still prefer the subjective approach.

In the end, the value of a program is its reputation, for a strong reputation is what attracts faculty and students. Reputation-based rankings can best capture the relative strengths of academic departments in what really matters.