Thursday, October 30, 2014

Metrics in Academics

Congratulations to the San Francisco Giants, winning the World Series last night. In honor of their victory let's talk metrics. Baseball has truly embraced metrics as evidenced in the book and movie Moneyball about focusing on statistics to choose which players to trade for. This year we saw a dramatic increase in the infield shift, the process of moving the infielders to different locations for each batter based on where they hit the ball, all based on statistics.

Metrics work in baseball because we do have lots of statistics, but also an objective goal of winning games and ultimately the World Series. You can use machine learning techniques to predict the effects of certain players and positions and the metrics can drive your decisions.

In the academic world we certainly have our own statistics, publications counts and citations, grant income, teaching evaluation scores, sizes of classes and majors, number of faculty and much more. We certainly draw useful information from these values and they feed into the decisions of hiring and promotion and evaluation of departments and disciplines. But I don't like making decisions solely based on metrics, because we don't have an objective outcome.

What does it mean to be a great computer scientist? It's not just a number, not necessarily the person with a large number of citations or a high h-index, or the one who brings in huge grants, or the one with high teaching scores, or whose students gets high paying jobs. It's a much more subjective measure, the person who has a great impact. in the many various ways one can have an impact. It's why faculty applications require recommendation letters. It's why we have faculty recruiting and P&T committees, instead of just punching in a formula. It's why we have outside review committees that review departments and degrees, and peer review of grant proposals.

As you might have guessed this post is motivated by attempts to rank departments based on metrics, such as described in the controversial guest post last week or by Mitzenmacher. There are so many rankings based on metrics, you just need to find one that makes you look good. But metric-based rankings have many problems, most importantly they can't capture the subjective measure of greatness and people will disagree on which metric to use. If a ranking takes hold, you may optimize to the metric instead of to the real goals, a bad allocation of resources.

I prefer the US News & World report approach to ranking CS Departments, which are based heavily on surveys filled out by department and graduate committee chairs. For the subareas, it would be better to have, for example, theory people rank the theory groups but I still prefer the subjective approach.

In the end, the value of a program is its reputation, for a strong reputation is what attracts faculty and students. Reputation-based rankings can best capture the relative strengths of academic departments in what really matters.

Tuesday, October 28, 2014

Sipser Symposium



On Sunday we had the Symposium on Theoretical Computer Science on the Occasion of Michael Sipser's 60th birthday to celebrate what Mike has brought to research (seminal work on the complexity of randomness and circuits), service (ten years leading the MIT math department before recently becoming Dean of Science) and education (his great textbook and the corresponding popular course he still teaches).

We had an incredible turnout for the symposium and banquet that followed. I counted five Turing Award Winners (Mike's advisor Manuel Blum, Leslie Valiant, Shafi Goldwasser, Silvio Micali and Ron Rivest), five of the nine Nevanlinna Prize winners (Mike's student Daniel Spielman, Avi Wigderson, Valiant, Peter Shor and Madhu Sudan) and nine Gödel Prize winners.

Manuel Blum talked about his work with Jeremiah Blocki, Anupam Datta, and Santosh Vempala about a human computable hash function to create different passwords for every website. I've seen Manuel and Santosh produce passwords from arbitrary words and I'm impressed.

Johan Håstad recounted the early days of circuit complexity. Avi also talked about lower bounds. Shafi talked on her great paper with Mike showing public and private coin interactive proofs have the same power and a recent application of that result by Moni Naor et al showing how a defendant can convince the police his DNA doesn't match without revealing his DNA.

A number of Mike's PhD students also gave talks. Dan Spielman gave a framework for designing quantum algorithms without knowing much quantum.

The banquet included several stories, thanks and toasts. Nearly all of Mike's students participated in some way, a great collection of men and women I'm proud to be part of: David Barrington, Ravi Boppana, Jonathan Buss, Andrew Chou, Aditi Dhagat, Lance Fortnow, David Gillman, Michelangelo Grigni, Christos Kapoutsis, Marcos Kiwi, Mary (Bruzzese) O'Connor, Sofya Raskhodnikova, Zack Remscrim (current), Alexander Russell, Leonard Schulman, Daniel Spielman, Ravi Sundaram, Andrew Sutherland and Yiqun Yin .

Thursday, October 23, 2014

Guest Post by Dr. Hajiaghayi: A new way to rank departments

(This is a guest post by MohammadTaghi Hajiaghayi. His name is not a typo- the first name really is MohammadTaghi.)

Due to our belief in the lack of transparency and well-defined measures in methods used by U.S News to rank CS departments in theoretical computer science (and in general), my PhD. student Saeed Seddighin and I have worked for several months to provide a ranking based on a real and measurable method of the number of papers in TCS for the top 50 US Universities. To make this possible, we gathered the information about universities from various resources. You may see the ranking and our exact methodology here.

Indeed  we have some initial rankings based on similar measures for computer science in general as well which we plan to release soon (we are still in  the process of double-checking or even triple-checking our data and our analysis due to several factors). CS theory ranking is our initial ranking release to get feedback at this point.

Please feel free to give us feedback (hajiagha@cs.umd.edu).

Wednesday, October 22, 2014

MSR SVC Letters

The Committee for the Advancement of Theoretical Computer Science put together an open letter to several research leaders at Microsoft.
We feel that there should have been a better way to close down this lab, one that would have allowed them to have continuous employment until academic jobs are available again in September 2015. Given that this lab was continuing to produce exceptional — indeed revolutionary — research, we fail to understand why closing it had to be done so suddenly.
I recommend reading the whole letter. Many in the theory community and beyond, including myself, signed the original letter or added their support in the comments.

Yesterday Harry Shum, Microsoft Executive VP for Technology and Research sent a reply.
No one at Microsoft feels good about the fact that a significant number of our friends and colleagues were laid off.  These people contributed to the success of Microsoft over many years.  As one can readily imagine, the decisions made about how the cuts were implemented within MSR were extremely complicated and personally painful.  We feel with you the sense of loss that has been evoked by the closing of our Silicon Valley lab. 
Theory Matters has a cover email. More on the CRA Policy Blog.

I'd like to thank the CATCS leadership in putting together the letter which expressed well the thoughts and concerns of the community. The tone of the letter hit the right notes and really made Microsoft research leadership think about the important role the company plays in the larger computer science academic world.

Tuesday, October 21, 2014

Martin Gardner Centennial

Martin Gardner was born on October 21, 1914, so today is his Centennial (he died on May 22, 2010, at the age of 95). We've mentioned him in the blog before:

  1.  The Life of Martin Gardner
  2.  Contribute to the Gardner Centennial
  3.  Another Post on Martin Gardner
  4. I used the anagram Tim Andrer Gran in both my review of the Lipton-Regan book (see here) and my Applications of Ramsey Theory to History paper (see here)

So what can I add on his centennial?

  1. He was not the first person to write on recreational mathematics, but he was certainly early and did it for a long time.
  2. I suspect he influenced everyone reading this who is over 50. For every y, y is under 50 and reading this column, there exists x such that MG influenced x and x influenced y.
  3. The line between ``recreational'' and ``serious'' math is sometimes blurry or hard to see. An obvious case of this was Euler and the Bridges problem leading to graph theory. At one time solving equations was done for competition, which seems recreational. Galois theory is not recreational. 
  4. Donald Knuth's book Selected Papers in Discrete Math (reviewed by me here) states I've never been able to see the boundary between scientific research and game playing.
  5. I am reading a book  Martin Gardner in the 21st century which is papers by people who were inspired by him. The papers really do blur the distinction between recreational and serious. Some are rather difficult but all start out with a fun problem.
  6. Aside from recreational math he did other things- magic, and debunking bad science.  (Fads and Fallacies in the name of science was excellent.) He was a well rounded person which is rare now. 
  7. Brian Hayes and Ian Stewart and others do what he did, but given the times we live in now, its hard capture the attention of a large segment of the public. (analogous to that when I was a kid there were only a handful of TV stations, now there are... too many?)
  8. When I was in high school I went to the library looking for math books I could read (naive?). I found one of his books (collection of his columns) and began reading it. I learned about casting out nines and I learned what was to be the first theorem I ever learned a proof of outside of class (given that I was probably 12 it may be the first proof I learned ever). It was that (in todays lang) a graph is Eulerian iff every vertex is even degree.

Thursday, October 16, 2014

The Curious Case of NP and NEXP

NP (nondeterministic polynomial time) and NEXP (nondeterministic exponential time) are provably different classes by the nondeterministic time hierarchy. No surprise, given exponentially more time we expect to solve more problems. But the proof requires collapses at many input lengths and odd things happen when we look at the infinitely-often question.

We say a language L is in i.o.-C for a complexity class C if there is an A in C such that for infinitely many n, A and L agree on strings of length n (for all x of length n, x is in A if and only if x is in L). Straightforward diagonalization shows that EXP is not in i.o.-P.

However we showed a relativized world where NEXP is in i.o.-NP in a recent paper (Theorem 18).
The construction is not particularly difficult but rather surprising: There is a possibility that one can get exponential improvement for nondeterministic computation for infinitely many input lengths.

Also consider the following facts: NEXP is not in i.o.-co-NP by straight diagonalization. If NEXP is is in i.o.-NP then
  • NEXP is in i.o.-EXP
  • EXP is in i.o.-NP and thus EXP is in i.o.-co-NP (since EXP is closed under complement).
This is not an immediate contradiction since i.o. inclusion is not transitive, even though those i.o.'s happen at about the same length. You can't combine these relativizable statements to prove NEXP is not in i.o.-NP.

NEXP in i.o.-NP is not one of my most complicated relativization results, but definitely one of the strangest.

Monday, October 13, 2014

Luddite or not?

My first ever guest post for Lance was on Are you a luddite. I certainly am to some extent a luddite, but there are some things where it not clear if they are luddite-ish or not.

  1. I prefer reading books to blogs. This came up when I reviewed both Lipton and Lipton-Regan blog-books, and I am now reading some of Terry Tao's Blog book.  l look forward to reading Scott's Blog book. At first I thought that preferring books was luddite-ish. But some high tech people and some young people who I've asked AGREE with me. Why is this?
  •  When reading a blog (or doing anything on line) its so easy to get distracted, e.g. OH, I WONDER IF WHITEY FORD IS STILL ALIVE SO I"LL PUT HIM ON MY LIST OF LIVING FAMOUS PEOPLE OVER 80 (he is, he's 85, and has the same birthday (though not year) as Martin Gardner), OH, I wonder if the word Buypartisan (that is NOT misspelled) is on my list-of-cool-new-words that I keep, OH I wonder how many people have registered for Theory Day. OH, Lipton just posted about Definitions not being needed and used that quote from The Treasure of Sierra Madre (see here) that was satirized in the movie UHF, I wonder if that clip is on You-Tube (It is here). OH, I can write a blog about Math in Weird-Al songs, for example Polka Patterns.
  • If I read a blog with a proof in it I tend to say  I'll read that later.
  •  I work better with pen and paper on hand. This may change if the way to mark up pdf and other documents gets better.
  1. (I do not why it restarted at number 1. I don't care to fix it- is that Luddite or not wanting to waste time on something unimportant?)
  2. Of course, the blog reading issue  is MY fault for being distracted.
  3. I don't pay my bills on line. There have been many data breaches and that gets darling and I nervous. Is this Luddite? Not sure--- is banking off-line any safer? I ask non-rhetorically.
  4. In a small class I use the blackboard. Some of my systems faculty have gone from board to slides and then back to board.  For a big class I have to use slides, though that may be an argument for small classes.

BILL: When I goto that conference I am going to bring some math to read during the talks I don't understand.

DARLING: Isn't that rude?

BILL: Many people in the audience will have their laptops out, reading email, managing their facebook page, etc.

DARLING: But the speaker can at least imagine they are taking notes

BILL: Unlikely. In the next talk the speaker will become the laptop person.

The fact that I don't look at a laptop during a talk is probably a plus- and not a Luddite thing.
  1. We still don't have Netflix. We watch less TV this way? Worse TV this way? Not clear how this one goes.
  2. I used to write things out before typing them in, now I type them in directly. I wonder if that's good or bad.
  3. I used to have notebooks of random math stuff in them. Now I can't get myself to write things out by hand. That's probably bad.
  4. If someone asks me a question I am too quick to goto the web rather than try to answer it myself. This is mixed--- I don't waste time on problems I can't solve, but I also don't have the joy of solving them. I think of myself as not being a good problems-solver, but this could be a self-fulfilling prophecy that the web makes easier to indulge in.
  5. This is a DUH-comment- I hate technology that does not work. One of the worst episodes of Star Trek was The Ultimate Computer which showed that a good human is better than a MALFUNCTIONING computer. Well DUH. I had a rant about electronic refereeing - and not a single comment accused me of being a Luddite. In short- I hate technology that doesn't work. Duh.
So- your thoughts? Some Luddite things may not be Luddite, but just better. And technology will change yet again, making us all Luddites.

Thursday, October 09, 2014

2014 Fall Jobs Post

Tis the season for the fall jobs post. Please list any jobs, academic or industrial, in theoretical computer science broadly construed in the comments to this post. If you are a job seeker check this page often as new jobs get added over time.

As always the best places to look for academic CS positions are the job sites at the CRA and the ACM. Also check out postdoc and other opportunities on Theory Announcements. It never hurts to check out the webpages of departments or to contact people to see if positions are available.

I expect the computer science market to be quite robust again. CS enrollments continue to explode putting pressure to hire more faculty. Several good places last year had open positions unfilled.

We should also see a growth in instructor positions, as deans and provosts need to meet enrollment demands but aren't ready to commit faculty positions until they are sure CS is not in another bubble. Don't ignore these positions, think of an instructor as a teaching postdoc.

The market in theoretical computer science is a harder call. What will be the effect of Microsoft adding fifteen theorists to the market? That also suggests fewer industrial research and postdoc opportunities in theory.

Try to make yourself more versatile. Perhaps you could teach machine learning or computer security, areas of strong need closely related to theory.

Good luck to everyone on the market. I look forward to seeing your names in the 2015 spring jobs post.

Monday, October 06, 2014

The Complexity of NIM. Open?

Recall 1-pile NIM:

Let A be a finite set of Naturals. NIM(A) is the following game: There are n stones on the board. Players I and II alternate removing a\in A stones. The first player who can't win loses. Note that if 1\in A then `can't move' means that the other player took the last stone. If (say) 2 is the min elt of A then its possible there is 1 stone on the board and a player can't move.


The following  are known and easy to prove:
  1. If A={1,L} and L is even then II wins iff n\equiv 0,2,4,...,L-2 mod L+1
  2. If A={1,L,L+1} and L is odd then II wins iff n\equiv 0,2,4,...L-1 mod 2L+1
  3. If  A={1,L,L+1} and L is even then II wins iff n\equiv o,2,4,...,L-2 mod 2L
  4. If A= {L,...,M} then II wins iff n\equiv 0,2,4,...,L-2 mod L+1
  5. For ANY set A there will be a mod pattern, after a certain point.
(I think that if 1\in A then the mod pattern goes from the beginning, but if 1\notin A then its possible that it does not start for a while.)

This raises the following computational problem: How hard is the problem of, given finite set A, find the mod pattern. I would want to know the complexity as a function of the size of the representation of A, or possibly just |A|log_2(max elt of A).  Has this been looked at? Some Google searches and asking around did not yield anything. I'm hoping that asking my readers may yield something.

Thursday, October 02, 2014

Favorite Theorems: Multilinear Circuits

In the past decade we have seen a strong program in algebraic circuit complexity. If you just define circuits using multiplication and addition gates, sometimes you can use the algebraic structure to get lower bounds that prove much more difficult in the Boolean case. For October's favorite theorem we highlight a couple of Ran Raz's results.

The main result of the first paper is basically in the title. Raz creates a polynomial function f that can be computed by a polynomial multilinear circuit of depth O(log2 n) but cannot be computed by any multilinear formula. (A formula, unlike a circuit, cannot use the output of a gate more than once). His proof works by examining the rank of the partial derivative matrix of a function chosen in a specified random way.

Ran Raz followed up with Amir Shpilka and Amir Yehudayoff to show lower bounds for syntactically multilinear circuits and exponential bounds for constant-depth multilinear circuits.

The second paper showed the most well-known of the algebraic functions (permanent and determinant) do not have poly-size multilinear formulas. Raz again starts with the partial derivative matrix, this time combined with random restrictions.

More recent work in algebraic circuit complexity focuses on the VP versus VNP problem, the algebraic version of P v NP. VP = VNP if we can solve the permanent on poly-size algebraic circuits. Proving VP ≠ VNP is a goal of the Geometric Complexity Theory program as well as a consequence of tighter bounds on constant-depth algebraic circuits