Thursday, February 19, 2015

And the Winners Are...

[Shortly after this post went up, STOC announced their accepted papers as well]

I always like that the announcement of accepted papers for the Computational Complexity Conference happens around the time of the Academy Awards. These acceptances are the Oscars for our community that shares its name with this conference and the blog.

The title that interests me is Identifying an honest EXP^NP oracle among many by Shuichi Hirahara since it seems closely related to some of my own research. Not only cannot I not find the paper online, I can't even find the author's email. Shuichi, if you reading this, please send me a copy of your paper.

Luckily not all papers are so elusive. Murray and Williams show that proving the NP-hardness of computing the circuit complexity would require proving real complexity class separation results. Oliveira and Santhanam give tight lower bounds on how much you can compress majority so that you can compute it with constant-depth circuits. A different Oliveira has two papers in the conference, a solely authored paper showing that polynomials of low individual degree with small low-depth arithmetic circuits have factors similarly computable, and a paper with Shpilka and Volk on hitting sets for bounded-depth multilinear formula. A hitting set is a small easily and deterministically computable set that contains, for every such arithmetic circuit, an input with a non-zero output.

Many more interesting papers and you can see them all at the conference in Portland, this year part of the Federated Computing Research Conference which includes STOC, SPAA and EC, which now stands for Economics and Computation. My tip: book your hotels now, they fill up fast.

Tuesday, February 17, 2015

Stephan Colbert, Jon Stewart, Andrew Sullivan, William Gasarch

Stephan Colbert is leaving the Colbert Report

Jon Stewart is leaving the Daily Show

Andrew Sullivan is no longer blogging

Bill Gasarch has resigned as SIGACT News Book Review Editor

Where will Gasarch get his news from?

Where will Colbert-Stewart-Sullivan get their reviews-of-theory-books from?


Why am I stepping down? I've been SIGACT News book Review editor for 17 years  (just as long as Jon Stewart has been doing The Daily Show.)  That's enough (more than enough?) time. I want time to spend more time with my books and my family. I will be on sabbatical next year so I am generally cutting down on my obligations.

 I have enjoyed it, gotten to know some publishers, got more free books than I know what to do with. I haven't paid for a math or CS book in... probably 17 years.

While writing reviews is great, figuring out who reviews other books, getting them the books, getting the review from them, editing it all into a column four times a year, can get to be routine. Though I DO like reading the reviews.

Who will take over? I asked Lance who would be good and he said `someone old who still reads books'- so I asked Fred Green who agreed to take the job. I then had to get my files (of reviews, of who-owes-me-reviews, of which-books-do-I-want-reviewed, etc) in order to email to him. The usual- I wish I had cleaned up my files years ago so I could benefit from it.

The main PLUS of the job was that I got to read lots of books and learn about some fields.  As someone who would rather read a good book rather than produce a bad paper, the job suited me.

The main NEGATIVE of the job was seeing so many books that I WANT to review but either didn't have time to (and I usually KNEW that and had someone else review it) or found myself unable to (gee that books is harder than I thought!) leave my office for someone else to review.

The biggest change that Fred will encounter is e-books.Will publishers want to send out free e-books instead of hardcopy? Will reviewers want hardcopy? This is of course a very tiny part of a more profound conversation of what will happen to the book market once e-books are more common.


Wednesday, February 11, 2015

Richard Hamming Centenary

Richard Hamming was born a hundred years ago today in Chicago. He worked on the Manhattan Project during World War II, moved to Bell Labs after the war and started working with Claude Shannon and John Tukey. It was there that he wrote his seminal 1950 paper Error detecting and error correcting codes.

Suppose you send a string of bits where a bit might have been flipped during the transmission. You can add an extra parity bit at the end that can be used to detect errors. What if you wanted to correct the error? Richard Hamming developed an error-correcting code (now called the Hamming code) that encodes 4 bits into 16 codewords of 7 bits  each such that every two codewords differ in at least three bits (which we now call the Hamming distance). 

0000000 1101001 0101010 1000011

1001100 0100101 1100110 0001111

1110000 0011001 1011010 0110011

0111100 1010101 0010110 1111111

If there is a single error then there is a unique codeword within one bit of the damaged string. By having an error-correcting code you can continue a process instead of just halting when you detect a bad code.

The Hamming code is a linear code, the bitwise sum mod 2 of any two codewords is another codeword. This linear idea led to many more sophisticated codes which have had many applications in computer science, practical and theoretical.

Hamming received several awards notably the 1968 ACM Turing Award and the inaugural IEEE Richard W. Hamming Medal in 1988 given for exceptional contributions to information sciences, systems, and technology. Hamming passed away in 1998. 

Sunday, February 08, 2015

Pros and Cons of students being wikiheads

A wikihead   is someone who learns things from the web (not necc. Wikipedia) but either learns things that are not correct or misinterprets them. I've also heard the term webhead but thats ambigous since it  also refers to fans of Spiderman.

I like to end the first lecture of Discrete Math by talking about SAT and asking the students if they think it can be solved much faster than the obvious 2^n algorithm. This  semester in  honors DM I got the usual heuristics (look for a contradiction!) that may well help but certainly won't get down to much better than 2^n in all cases. This leads to nice discussions of worst-case vs avg-case and formal vs what-works-in-practice.

I also got the following answers:

SAT cannot be done better than 2^n since P  ≠ NP.

and

SAT can be done in O(n) time with a Quantum Computer.

They both made there assertions boldly! I gently corrected them. They had both read it on the web.

I suspect that the P ≠ NP person read something that was correct (perhaps a survey that said 80% of all theorists THINK P ≠ NP)  and misconstrued it, while the second person read something that was just wrong (perhaps one of those by the many worlds quantum theory a quantum computer can look at all possibilities at the same time people).

 SO- they went and looked up stuff on their own (YEAH) but didn't quite understand it (BOO)
or read incorrect things (BOO). But I will correct them (YEAH). But there are other people who will never get corrected (BOO). But there are others who will get interested in these things because of the false things they read (YEAH?) The quantum person might either NOT go into quantum computing since he thinks its all bogus now, or GO INTO it since he is now curious about what is really going on.

SO the real question is: if people get excited about math or science for the wrong reasons, is that good?bad? Do you know of examples where incorrect but exciting science writing lead to someone doing real science?





Thursday, February 05, 2015

Computers Turn Good and Evil

Entertainment Weekly proclaimed 2015 the year that artificial intelligence will rule the (movie) world with talk of the Terminator reboot, the new Avengers movie Age of Ultron, where Ultron is an attempt at a good AI robot turned evil, and Chappie who saves the world. And then there is Ex Machina, where Domhnall Gleeson "must conduct a Turing test, the complex analysis measuring a machine’s ability to demonstrate intelligent human behavior, while wrestling with his own conflicted romantic longing for the humanoid."

Let's not forget the return of the Star Wars droids and the hacker movie Blackhat that has already come and gone. On TV we have new computer-based procedurals, one for adults (CSI:Cyber) and one for kids (Buddy: Tech Detective).

With Wired proclaiming AI Has Arrived, and That Really Worries the World’s Brightest Minds and Uber investing in technology to eliminate their drivers, what is the average person to think. Let me quote the famous philosopher Aaron Rodgers and say "Relax". We still control the technology, don't we?

Monday, February 02, 2015

Less elegant proofs that (2n)!/n!n! is an integer

(A conversation between Bill (the writer of this post) and Olivia (14-year old daughter of a friend.) All of the math involved is here.

Bill: Do you know what 10! is?

Olivia: Yes, when I turned 10 I yelled out ``I AM 10!''

Bill: I mean it in a different way. In math 10! is 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2.

Olivia: Okay. So what?

Bill: Do you think that (10)!/5!5! is an integer?

Olivia: No but I'll try it. (She does) Well pierce my ears and call be drafty, it is!

Bill: Do you think that (100)!/50!50! is an integer?

Olivia: Fool me once shame on you, Fool me twice... uh, uh, We don't get fooled again was a great song by the Who!

Bill: Who? Never mind, I'll save a whose-on-first-type blog for another day.

Olivia: What?

Bill: He's on  second, but never mind that. It turns out that (1) (100!)/50!50! is an integer, and (2) I can prove it without actually calculating it. (Bill then goes through combinatorics and shows that n!/k!(n-k)! solves a problem in combinatorics that must have an integer solution.)

Olivia: You call that a proof! That's INSANE You can't just solve a problem that must have an integer solution and turn that into a proof that the answer is an integer. Its unnatural. It is counter to the laws of God and Man!

Inspired by Olivia I came up with a LESS elegant proof that (2n)!/n!n! is always an integer.  Inspired by that I also came up with a LESS elegant proof that the Catalan numbers are integers. See the link above for all proofs.

But realize- WE  think it is OBVIOUS that (2n)!/n!n! is an integer (and for that matter n!/k!(n-k)! and the Catalan numbers) and we are right about that--- but there was a time when we would have reacted like Olivia.

I've seen 5th graders who were sure there must be a fraction whose square is 2 since (1) I wouldn't have asked them if there wasn't, and (2) the concept of a problem not having a solution was alien to them. In an earlier grade the concept of having a problem whose answer was negative or a fraction was alien to them.

When teaching students and daughters of friends we  should be aware that what we we call obvious comes from years of exposure that they have not had.

When Olivia turns 15 will she say `I AM 15!' More accurate would be `I am 15!.'

Thursday, January 29, 2015

Reusing Data from Privacy

Vitaly Feldman gave a talk at Georgia Tech earlier this week on his recent paper Preserving Statistical Validity in Adaptive Data Analysis with Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold and Aaron Roth. This work looks at the problem of reuse of the cross-validation data in statistical inference/machine learning using tools from differential privacy.

Many machine learning algorithms have a parameter that specifies the generality of the model, for example the number of clusters in a clustering algorithm. If the model is too simple it cannot capture the full complexity of what it is learning. If the model is too general it may overfit, fitting the vagrancies of this particular data too closely.

One way to tune the parameters is by cross-validation, running the algorithm on fresh data to see how well it performs. However if you always cross-validate with the same data you may end up overfitting the cross-validation data.

Feldman's paper shows how to reuse the cross-validation data safely. They show how to get an exponential (in the dimension of the data) number of adaptive uses of the same data without significant degradation. Unfortunately their algorithm takes exponential time but sometimes time is much cheaper than data. They also have an efficient algorithm that allows a quadratic amount of reuse.

The intuition and proof ideas come from differential privacy where one wants to make it hard to infer individual information from multiple database queries. A standard approach is to add some noise in the responses and the same idea is used by the authors in this paper.

All of the above is pretty simplified and you should read the paper for details. This is one of my favorite kinds of paper where ideas developed for one domain (differential privacy) have surprising applications in a seemingly different one (cross-validation).


Monday, January 26, 2015

A nice problem from a Romanian Math Problem Book


(All of the math for this problem is here)
My Discrete Math Honors TA Ioana showed me a Romanian Math Problem book
(She is Romanian) and told the following problem:


(All ai in this post are assumed to be natural numbers)

Show that for all n ≥ 6 there exists (a1,...,an) such that 1/a12 + ... + 1/an2 = 1.

(sum of reciprocals squared)

Normally my curiosity exceeds my ego and I would look up the answer.
But it was in Romanian! Normally I would ask her to read the answer to me.
But I was going out of town! Normally I would look it up the answer on the
web. But this is not the kind of thing the web is good at!

So I did the obvious thing- worked on it while watching Homeland Season 2
the first four episodes. And I solved it! Either try to solve it yourself
OR goto the link.

Some possibly open questions come out of this

1) I also prove that, for all k there is an n0=n0(k) such that

all n ≥ n0 there exists (a1,...,an) such that1/a1k+ ... + 1/ank = 1.


(sum of reciprocal kth powers)

We showed above that n0(2) ≤ 6, its easy to show no(2) ≥ 6, so n0(2)=6.

Obtain upper and lower bounds on n0(k).

2) What is the complexity of the following problem:

Given k,n find out if there exists (a1,...,an) such that1/a1/k + ... + 1/ank = 1.

If so then find the values (a1,...,an).

(We know of an example where the Greedy method does not work.)

3) What is the complexity of the following problem: Just as above
but now we want to know HOW MANY solutions.

4) Meta question: How hard are these questions? The original one was
on the level of a high school or college math competition. The rest
might be easy or hard. I suspect that getting an exact formula for
n0(k) is hard. I also suspect that proving that this is hard
will be hard.


Thursday, January 22, 2015

There Should be an Algorithm

My high school daughter Molly was reading her Kindle and said "You know how you can choose a word and the Kindle will give you a definition. There should be an algorithm that chooses the right definition to display depending on the context". She was reading a book that took place in the 60's that referred to a meter man. This was not, as the first definition of "meter" would indicate, a 39 inch tall male. A meter man is the person who records the electric or gas meter at your house. Today we would use a gender neutral term like "meter reader" if technology hadn't made them obsolete.

Molly hit upon a very difficult natural language processing challenge known as word-sense disambiguation with the most successful approaches using supervised machine learning techniques. If anyone from Amazon is reading this post, the Kindle dictionary would make an excellent application of word-sense disambiguation. You don't need perfection, anything better than choosing the first definition would be welcome. Small tweaks to the user interface where the reader can indicate the appropriate definition would give more labelled data to produce better algorithms.

And to Molly: Keep saying "There should be an algorithm". Someday there might be. Someday you might be the one to discover it.

Monday, January 19, 2015

The two defintions of Chomsky Normal Form

I have eight  textbooks on Formal Lang Theory on my desk.  Six of them define a CFG to be in Chomsky Normal Form if every production is of the form either A-->BC or A--> σ (σ a single letter). With that definition one can show that every e-free grammar can be put in Chomsky Normal Form, and using that, show that CFL ⊆ P. There is a very minor issue of what to do if the CFL has e in it.

Two of the books (Sipers and Floyd-Beigel) define a CFG to be in Chomsky Normal Form if every rule is A-->BC or A--> σ OR S-->e and also that S cannot appear as one of the two nonterminals at the right hand side of a production of the form A-->BC. With this definition you can get that every CFL (even those with e in them) has a grammar in Chomsky Normal Form.

The definitions are NOT equivalent mathematically, but they are equivalent in spirit and both aim towards the same thing: getting all CFL's in P (That's  why I use them for.  What did Chomsky used them for?)

The first definition is correct historically- its what Chomsky used (I am assuming this since it's in the earlier books). The second one could be argued to be better since when you are done you don't have to deal with the e. I still like the first one, but its six of one, half a dozen of the other. One person's  floor function is another person's  ceiling function.

I don't have a strong opinion about any of this, but I will say that if you use the second definition then you should at least note that there is another definition that is used for the same purpose. Perhaps make a homework  out of this.

There are many other cases where definitions change and the new one leads to more elegant theorems and a better viewpoint than the old one. There are even cases where the new definition IS equivalent to the old one but is better. IMHO the (∃) definition of NP is better than the definition that uses nondeterministic Poly Time TM's since this leads to the definition of the poly hierarchy. One drawback- if you want to define NL then I think you DO need nondeterministic TM's.

The only problem I see with changing definitions is if you are reading an old paper and don't quite know which definition they are using.

What examples of a definition changing (or an equivalent one being more used) do you approve of? Disapprove of?


Thursday, January 15, 2015

The Impact Factor Disease

The Institute of Science Information (ISI) was founded in 1960 to help index the ever growing collection of scientific journals. The founder of ISI, Eugene Garfield, developed a simple impact factor to give a rough estimate of quality and help highlight the more important journals. Roughly the impact factor of a journal in year x is the average number of citations each article from years x-1 and x-2 receives in year x.

Thomson Scientific bought out ISI in 1992 and turned the data collection into a business. Impact factors are not only being used to measure the quality of journals but of authors (i.e. researchers) and institutions as well. In many parts of the world a person's research quality is being measured strongly or sometimes solely by their ability to publish in high impact factor journals.

This is bad news for computer science since conference proceedings in our field have historically more prestige than journals. We mitigate the ISI factors pretty well in the US but in many other countries this puts computer science at a disadvantage. The need for impact factor publications is one of the reasons conferences are experimenting with a hybrid model.

In a new trend I get many announcements of journals highlighting their ISI impact factor, mostly very general and previously unknown to me. Our old friends WSEAS say "The ISI Journals (with Impact Factor from Thomson Reuters) that publish the accepted papers from our Conferences are now 32" in the subject line of their email.

It's the vanity journal with a twist: publish with us and you'll raise your official numerical prestige. So we get a set of journals whose purpose is to raise the value of researchers who should have their value lowered by publishing in these journals.

Raise your research standing the old fashioned way: Do great research that gets published in the best venues. The numbers will take care of themselves.

Tuesday, January 13, 2015

Barrier's for Matrx Mult Lower bound


Matrix Mult:

The usual algorithm is O(n^3).

Strassen surprised people by showing an O(n^{2.81}) algorithm. (Now its so well known that its not surprising. I try to teach it to people who don't already know its true so they'll be surprised.)

Over the years the exponent came down, though there was a long time between Coopersmith and Winograd's exp of 2.3755 and the recent  improvements.

Are these improvements going to lead to the exp 2+epsilon?

Alas the answer is prob no :-(

In Fast Matrix Mult: Limitations of the Laser Method Ambainis, Filmus, and Le Gall show that the methods that lead to the algorithms above will NOT lead to an exp of 2+epsilon.

How to react to news like this? Barriers should not make us give up; however, they should make us look for new techniques, perhaps guided by the barrier. I would like to think that if you know where NOT to look then it helps you know where TO look.

There have been barriers that were broken (IP=PSPACE didn't relativize, the recent 2-prover PIR used techniques NOT covered by the barrier result, and the Erdos distance problem was proven after it was known that the current techniques `wouldn't work').   I am sure there are other examples, feel free to leave some in the comments

Will this result help guide researchers in the right direction? Lets hope so!

Of course, its possible that 2+epsilon is false and the barrier result is a first step in that direction.

Which one am I routing for? Neither- I am routing for finding out!

bill g.

Thursday, January 08, 2015

The History of the History of the History of Computer Science

In 2007, the science historian Martin Campbell-Kelly wrote an article The History of the History of Software, where he writes about how he initially wrote histories of the technical aspects of computer software back in the 70's but now he's evolved into writing more about the applications and implications of software technologies. He argues that the whole field of the history of software has moved in the same directions.

Donald Knuth made an emotional argument against this trend last May in his Stanford Kailath lecture Let's Not Dumb Down the History of Computer Science. If you can find an hour, this is a video well worth watching.

In the January CACM Thomas Haigh gave his thoughts in The Tears of Donald Knuth. Haigh argues that Knuth conflates the History of Computer Science with the History of Computing. Haigh says that historians focus on the latter and the History of Computer Science doesn't get enough emphasis.

Let me mention two recent examples in that History of Computing category. The Imitation Game give a great, though slightly fictionalized, portrait of the computing and computer science pioneer Alan Turing focusing on his time at Bletchley Park breaking the Enigma code. Walter Isaacson, whose histories of Franklin, Einstein and Jobs I thoroughly enjoyed, writes The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution which tells the stories of computers from Ada Lovelace to Google (oddly stopping before social networks).

But what can we do about the History of Computer Science, particularly for theoretical computer science? We live in a relatively young field where most of the great early researchers still roam among us. We should take this opportunity to learn and record how our field developed. I've dabbled a bit myself, talking to several of the pioneers, writing (with Steve Homer) a short history of computational complexity in the 20th Century and a history chapter in The Golden Ticket.

But I'm not a historian. How do we collect the stories and memories of the founders of the field and tell their tales while we still have a chance?

Monday, January 05, 2015

Why do students do this?

Before my midterm in ugrad  Theory of Computation  I gave the students a sheet of practice problems to do that I would go over before the midterm.

One of them was: Let L be in DTIME(T(n)). Give an algorithm for L*. Try to make it efficient. What is the time complexity of your algorithm? (I had done that if L is in P then L^* is in P in class earlier in the term.)

My intention was that they do the Dynamic Programming solution. Since it wasn't being collected I didn't have to worry about what would happen if they did it by brute force.  When  I went over it in class I did the Dynamic Programming Solution, which is roughly T(n)^3 time.

I allow my students to bring in a sheet of notes that they make up themselves.

On the exam was the problem: Let L_1 \in DTIME(T_1(n)) and L_2\in DTIME(T_2(n)).
Give an algorithm for L_1L_2. What is the time complexity of your algorithm?

Of my 20 students 5 of them gave me, word for word, the dynamic programming solution to the L, L* problem.

Why would they do this? Speculations:
  1. They just copied it off of their cheat sheet with no understanding.
  2. They wanted pity points (they didn't get any and I told the class that if a similar thing happens on the final I will give them LESS THAN zero on the problem).
  3. They so hoped that the L, L* problem would be on the exam (possibly becuase it was on their cheat sheet) that  they misread the problem.
  4. They thought `Dr. Gasarch wouldn't have put it on the practice exam unless it was on the real exam' (or something like that), so they misread it. 
The five students were not very good  (they did poorly on other problems as well, and on the HW), so it was not a matter of good students being confused or getting nervous.

But I ask-- (1) is this kind of thing common? For my Sophomore Dscrete Math yes, but I was very slightly surprised to see it in my senior course. (2) Do you have examples? I am sure that you do, but my point is NOT to do student-bashing, its to ask WHY they do this.

Monday, December 29, 2014

2014 Complexity Year in Review

Theorem of the year goes to 2-Server PIR with Sub-polynomial Communication by Zeev Dvir and Sivakanth Gopi. In Private Information Retrieval you want to access information copied in multiple databases in a way so that no database knows what question you asked. In 1995 Chor, Kushilevits, Goldreich and Sudan showed how to use n1/3 bits of communication with two databases. Razborov and Yekhanin proved that using current techniques the bound could not be broken.  Dvir and Gopi
developed new techniques to break that barrier using nO(√(log log n/log n)) bits with two databases, less than nδ for any δ. Bill posted more on this result back in August.

And of course lots of other great work on extended formulations, circuits, algorithms, communication complexity and many other topics. We also had another round of favorite theorems for the past decade.

2014 will go down as the year computer science exploded. With a big convergence of machine learning/big data, the connectedness of everything, the sharing economy, automation and the move to mobile, we have a great demand for computer scientists and a great demand from students to become computer scientists or have enough computing education to succeed in whatever job they get. Enrollments are booming, CS departments are hiring and demand far outstrips the supply. A great time for computer science and a challenging one as well.

We say goodbye to G.M. Adelson-VelskyAlberto Bertoni, Ed Blum, Ashok Chandra, Alexey Chervonenkis, Eugene DynkinClarence "Skip" Ellis, Alexander Grothendieck, Ferran Hurtado, Mike Stilman, Ivan Stojmenovic, Berthold VöckingAnn Yasuhara, Microsoft Research-Silicon Valley and The New York Times Chess Column.

Thanks to our contributors Andrew ChildsBrittany Terese Fasy and MohammadTaghi Hajiaghayi.

Looking ahead 2015 brings the centenary of the man we know for balls and distance and the fiftieth anniversary of the paper that brought us the title of this blog. Have a great New Years and remember, in a complex world best to keep it simple.

Monday, December 22, 2014

Undergraduate Research

I just received the Cornell Math Matters, dedicated to the memory of  Eugene Dynkin who passed away on November 14 at the age of 90. In my freshman year at Cornell, Dynkin recruited me into his undergraduate research seminar building on the success he had with a similar seminar he ran when in Russia. I didn't last long, making the conscious choice not to work on research as an undergrad but to focus on enjoying the college experience. I missed out on a great opportunity but I don't regret that decision.

Reluctantly I wouldn't give that advice to today's undergrads. Getting into a good graduate program has become much more competitive and even a small amount of research experience may make a large difference in your application. I encourage any undergrad who may consider a PhD in their future to talk to some professors and get started in a research program. But don't let it run your life, make sure you enjoy your time at college. You'll have plenty of time to spend every waking moment on research once you start graduate school.

Thursday, December 18, 2014

The NIPS Experiment

The NIPS (machine learning) conference ran an interesting experiment this year. They had two separate and disjoint program committees with the submissions split between them. 10% (166) of the submissions were given to both committees. If either committee accepted one of those papers it was accepted to NIPS.

According to an analysis by Eric Price, of those 166, about 16 (about 10%) were accepted by both committees, 43 (26%) by exactly one of the committees and 107 (64%) rejected by both committees. Price notes that of the accepted papers, over half (57%) of them would not have been accepted with a different PC. On the flip side 83% of the rejected papers would still be rejected. More details of the experiment here.

No one who has ever served on a program committee should be surprised by these results. Nor is there anything really wrong or bad going on here. A PC will almost always accept the great papers and almost always reject the mediocre ones, but the middle ground are at a similar quality level and personal tastes come into play. There is no objective perfect ordering of the papers and that's why we task a program committee to make those tough choices. The only completely fair committees would either accept all the papers or reject all the papers.

These results can lead to a false sense of self worth. If your paper is accepted you might think you had a great submission, more likely you had a good submission and got lucky. If your paper was rejected, you might think you had a good submission and was unlucky, more likely you had a mediocre paper that would never get in.

In the few days since NIPS announced these results, I've already seen people try to use them not only to trash program committees but for many other subjective decision making. In the end we have to make choices on who to hire, who to promote and who to give grants. We need to make subjective decisions and those done by our peers aren't always consistent but they work much better than the alternatives. Even the machine learning conference doesn't use machine learning to choose which papers to accept.

Monday, December 15, 2014

Joint Center for Quantum Information and Computer Science

(Guest post by Andrew Childs who is now at the Univ of MD at College Park)



We have recently launched a new Joint Center for Quantum Information and Computer Science (QuICS) at the University of Maryland. This center is a partnership
with the National Institute of Standards and Technology, with the support and participation of the Research Directorate of the National Security Agency/Central Security Service. QuICS will foster research on quantum information and computation.

We are pleased to announce opportunities for Hartree Postdoctoral Fellowships
(deadline: December 30, 2014) and Lanczos Graduate Fellowships. Outstanding postdoctoral and graduate researchers with an interest in quantum information processing are encouraged to apply.

QuICS complements a strong program in quantum physics research at the Joint Quantum Institute. Maryland is also home to a new Quantum Engineering Center.
It's an exciting time for quantum information here.

Thursday, December 11, 2014

The Ethics of Saving Languages

The linguist John McWhorter wrote an NYT opinion piece entitled Why Save a Language? where he argues why we should care about saving dying languages, basically that language gives us a window into culture. As a computer scientist I appreciate the scientific value of studying languages but perhaps the question is not whether we should care but is it ethical to save languages?

Languages developed on geographical and geopolitical boundaries. Even as new methods of communication came along such as postal mail, the printing press, the telephone and television there never was a strong reason to learn multiple languages save for some small European nations and some professions such as academics and politics.

Then came the Internet and location mattered less but language barriers still persist. I've certainly noticed a marked increase in the number of young people around the world who know basic conversational English, much from the content they consume online. There's also a sizable amount of content in all the major languages.

But if you speak a small language where all the other native speakers are geographically very close to you, you lose this networked connection to the rest of humanity. Your only hope is to learn a second language and that second language might become a first language and so many of these small languages start to disappear.

I understand the desire of linguists and social scientists to want to keep these languages active, but to do so may make it harder for them to take advantage of our networked society. Linguists should study languages but they shouldn't interfere with the natural progression. Every time a language dies, the world gets more connected and that's not a bad thing.

Tuesday, December 09, 2014

Godel and Knuth Prize Nominations due soon. Which would you rather win? Or ...

(Alg Decision Theory conference in Kentucky: here.)

Knuth Prize Nominations are due Jan 20, 2015.
For info on the prize see here, if you want to nominate someone
go here.

Godel Prize Nominations are due Jan 31, 2015.
For info on the prize see here, if you want to nominate someone
go here

Would you rather:

  1. Win a Godel Prize
  2. Win a Knuth Prize
  3. Have a prize named after you when you are dead
  4. Have a prize named after you when you are alive
I pick 4; however, I doubt I'll have of 1,2,3,4 happen to me.

How about you?

Monday, December 01, 2014

Cliques are nasty but Cliques are nastier

BILL: Today we will show that finding large cliques is likely a nasty problem

STUDENT: Yes! In my High School the Cliques were usually at most six people and they gossiped about everyone else. They were very nasty.

BILL: Um, yes, picture in your school that everyone is a node in a graph and that two nodes are connected if they are friends. In your school a clique of size 6 would be 6 people who all liked each other

STUDENT: Actually, the people in a clique secretly hated each other and sometimes belonged to other cliques that would gossip about people in the first clique.

BILL: We might need  the Hypergraph version to model your school.


Computer Scientists and Graph Theorists call a set of nodes that are all connected to each other a CLIQUE - pronounced CLEEK

High School Kids call a group of people who hang out together a CLIQUE- pronounced CLICK.

Which term came first? Why are they pronounced differently when they are quite related to each other? Do the members of a high school clique really hate each other?

Sunday, November 23, 2014

Guest Post about Barbie `I can be an engineer' -- Sounds good but its not.


There is now a I can be an engineer Barbie. That sounds good! It's not. Imagine how this could be turned around and made sexist. What you are imagining might not be as bad as the reality. Depends on your imagination.

Guest Blogger Brittany Terese Fasy explains:

Remember the controversy over the Barbie doll that said
"Math class is tough!"?  Well, Barbie strikes again.

If you haven't heard about II can be a computer engineer  it is a story about how Barbie, as a "computer engineer" designs a game, but cannot code it herself.  She enlists the help of her two friends, Steven and Brian, to do it for her.  Then, she gets a computer virus and naively shares it with hersister.  Again, Steven and Brian must come to the rescue.  Somehow, in the end, she takes credit for all of their work and says that she can be a computer engineer.  Gender issues aside, she does not embody a computer engineer in this book. For more details, please see here.

Children need role models.  Naturally, parents are their first role models.  And, not everyone's parent is a computer engineer / computer scientist.  So, books exploring different career choices to children provides the much-needed opportunity for them to learn about something new, to have a role model (even if if that role model is fictional).  In principle, this book is fantastic; however, it fails to convey the right message.  That is why I started a petition to Random House to pull this book off the market.The petition is here.

Progress was made as Barbie issued an apology: here. And Amazon and Barnes and Nobles  removed the book from its catalog. However, neither Random House, nor the author of the book have issued a statement, and it is still available at Walmart.

Until the book is completely off the market we should not stop! And maybe one day, we'll see
Barbie: I can be a Computational Geometer on the shelves.

Thursday, November 20, 2014

A November to Remember

The Imitation Game starring Benedict Cumberbatch as Alan Turing opens in the US on November 28. If you read this blog you should see that movie. If one challenged British scientist biograph is not enough for you, The Theory of Everything with Eddie Redmayne as Stephen Hawking opened earlier this month. Also this month has the physics-laden Interstellar and the nerd-hero robot adventure Big Hero 6. Science rules the box office, though likely to be clobbered by a Mockingjay.

Speaking of Turing, The ACM Turing Award will now come with a $1 million dollar prize, up from $250K, now on par with the Nobel Prize. Thanks Google.

Madhu Sudan will receive the 2014 Infosys Prize in Mathematics. Nimrod Megiddo wins the 2014 John von Neumann Theory Prize.

Harvard Computer Science gets 12 endowed faculty lines from former Microsoft CEO Steve Ballmer. That's about the number of former Microsoft Research theorists now on the job market. Just saying,

Sanjeev Arora asks for comments on the potential changes to STOC/FOCS discussed at the recent FOCS. Boaz Barak has set up a new CS Theory jobs site. On that note, the November CRA News has 75 pages of faculty job ads, up from 50 a year ago.

Terry Tao talked twin, cousin and sexy primes on Colbert last week. The new result he quoted is that the Generalized Elliott-Halberstam conjecture implies that there are infinitely many pairs of primes at most six apart.

Not all happy news as we lost the great mathematician Alexander Grothendieck. Tributes by Luca and Ken.

Monday, November 17, 2014

A Fly on the wall for a Harvard Faculty meeting: Not interesting for Gossip but interesting for a more important reason

I was recently in Boston for Mikefest (which Lance talked about  here)  and found time to talk to  my adviser Harry Lewis at Harvard (adviser? Gee, I already finished my PhD. Former-Advisor? that doesn't quite sound right. I'll stick with Adviser, kind of like when they refer to Romney as Governor Romney, or Palin as half-governor Palin). He also invited me to goto a Harvard Faculty meeting.

NO, I didn't see anything worth gossiping about. NO I am not going to quote Kissinger ``Academic battles are so fierce because the stakes were so low'' NO I am not going to say that under the veneer or cordiality
you could tell there were deep seated tensions. It was all very civilized. Plus there was a free lunch.

The topic was (roughly) which courses count in which categories incomputer science for which requirements.  Why is this interesting? (hmmm- IS it interesting? You'd prob rather hear that Harry Lewis stabbed Les Valiant with a fork in a dispute about whether there should be an ugrad learning theory course). Because ALL Comp sci depts face these problems.  At Mikefest and other conferences I've heard the following issues brought up:

Should CS become a service department?  Math did that with Calculus many years ago. PRO: They get to tell the dean `we need to hire more tenure track faculty to teach calculus', CON: They have to have their tenure track faculty teach calculus. (I know its more complicated than that.)

What should a non-majors course have in it?

What should CS1,CS2,CS3 have in it (often misconstrued by the question ``what is a good first language'' which misses the point of what you are trying to accomplish). For that matter, should it be a 3-long intro sequence (it is at UMCP).

Can our majors take the non-majors courses?  (At UMCP our non majors course on web design has material in it that is NOT in any majors course.)

When new courses come about (comp-bio, programming hand-held devices, Computational flavor-of-the-month) what categories to they fit into? (For an argument in favor of Machine Learning see Daume's post. ) What should the categories be anyway? And what about the functors?

Which courses were at one point important but aren't any more? UMCP no longer requires a Hardware course-- is that bad?  (Yes- when I tell my students that PARITY can't be solved by a constant depth poly sized circuit, they don't know what a circuit is!)

I don't have strong opinions to any of these questions (except that, despite my best efforts, we do not require all students to learn Ramsey Theory), but I note that all depts face these questions (or need to- I wonder if some depts are still teaching FORTRAN and COBOL- and even that's not quite a bad thing since there is so much legacy code out there.)

I have this notion (perhaps `grass is always greener on the other side') that MATH (and most other majors) don't have these problems.  AT UMCP there have only been TWO new math courses introduced on the ugrad level since 1985 :Crypto (which is cross-listed with CS), and Chaos Theory. CS has new courses, new emphasis, new requirements every few years. Oddly enough when I tell this to Math Profs they ENVY that we CAN change our courses so much. What is better chaos or stability?

When I saw Back to the future 2   in 1989 I noticed that their depiction of academic computer science in 2015 was that  Comp Sci Depts across the country agreed on what was important and be similar (as I imagine math is). Instead the opposite has happened- these things are still in flux. (If you can't trust a Science Fiction movie staring Michael J Fox what can you trust?) As a sign of that, the advanced GRE in CS never really worked and has now been discontinued.

So- will CS settle down by 2015? We still have a year to go, but I doubt it.  2025? Before P vs NP is solved?

and is it OKAY if it doesn't?


Thursday, November 13, 2014

From Homework Solution to Research Paper

Inspired by the Dantzig Story  I occasionally put an open problem on a class assignment. Never worked, though I did have a student get a research paper from solving a homework question the hard way.

Teaching in the early 90's, I showed Valiant's proof that computing the permanent of a 0-1 matrix was #P-complete, including showing that the 0-1 permanent was in #P, the class of functions representable as the number of accepting paths of a nondeterministic polynomial-time Turing machine.

I gave a homework assignment to show that the permanent of a matrix with non-negative integer entries was in #P. The answer I expected was to construct an appropriate NP machine whose number of accepting paths equalled the permanent and some students came up with such a proof.

One of the students Viktória Zankó took a different approach, creating a reduction that mapped an integer matrix A to a 0-1 matrix B such that permanent(A) = permanent(B). A fine solution reducing the problem to a previously solved case.

So what's the rub? Such a reduction was an open problem and simplified Valiant's paper. Valiant only had the reduction for integer matrices A with small entries and needed a mod trick to show the 0-1 permanent is #P-complete. Zankó's construction eliminated the need for the mod trick.

And that's how Viktória Zankó got a research paper from solving a homework problem.

Tuesday, November 11, 2014

Non controversial thoughts on rankings

US News has a ranking of CS depts and various subcategories. Recently MohammadTaghi Hajiaghay and Luca Trevisan have suggested alternative rankings here (Moh) and here (Luca). These rankings inspire me to record some thoughts about rankings.

  1. When making a ranking one must ask: What is it for? For Academic depts it may be to help guide potential ugrads or grads in their choice of where to go. Rankings of the most influential people of all time (Michael Harts book here), or in a given year (Time magazine does this) are made to (I think) organize our thoughts and start debates. Michael Hart also did a book about the most influential people as soon from the year 3000 (so half are fictional) as a way to speculate (see here). My own ranking of Bob Dylan satires here was done for my own amusement.
  2. Transparency sounds like a plus. But if a ranking is too transparent, and is considered important, than organizations might game the system. Recall Goodhart's law: When a measure becomes a target is ceases to be a measure. On the other hand, if the measure really is good then it may be okay if it becomes a target. Some measures are hard to game- like surveys of what people think.
  3. There have been a variety of rankings of presidents (see here).  These ranking say something about the times they were  done. Studying how they change over time could itself be a historical project of interest.  Another thought:  the book Hail to the chiefs it notes that James Buchanan and Andrew Johnson usually rank as the worst presidents, while Lincoln ranks as one of the best--- but this is unfair!--- Buchanan could not stop the civil war (but nobody really could) and Johnson had to clean up the mess after it (but nobody really could). The Lincoln presidency was almost entirely the civil war which Ameican won, so he gets the credit. More to the point--- ranking presidents is odd since it may depend very much on the times they govern.
  4. Bill James (KEY Baseball statistician who I think should go into the Hall of Fame for changing the way we think about Baseball) has tried to have lists of GREAT TEAMS. But there is a problem (which he fully notes)- some teams are GREAT in terms of having great players, but didn't win the world series, or have only one  pennant. Less than the sum of its parts.
  5. Numerical ratings may be odd in that they lump different items together. GPA is a bit odd--- do you prefer a student who got an A in Theory and a C in Operations Systems, or a student who got a B in both? I don't know the answer, but GPA wipes out the distinction.
  6. Rankings that compare very unlike objects are useless. Here is a ranking of CS blogs--- the criteria seems to be just one guys opinion. I  disagree with his ranking, but  I have no idea what he cares about. Also, he includes Harry Lewis's fine blog BITS AND PIECES,  which is often  about academic stuff, and also Terry Tao's fine blog WHATS NEW which is really a math blog. Very hard to compare those two to each other or to others.
  7.  The tigher the focus the more useful a ranking can be. Ranking the best novelty songs of all time would be impossible (Number one is PDQ Bach's Classical Rap) but if you restrict to, say, best science fiction  Satires I(Luke Ski's Grease Wars part 1, part 2, Part 3)- then its easier (Trivia note- Science fiction satire songs are often called FILK SONGS--- the urban legend is that at an early Science Fiction Convention  Science fiction Folk Songs was mispelled as Science fiction Filk Songs and hence the term was born.)
  8. SO, what really would help potential CS Grad Students in theory? Perhaps a grid where for every department is listed the theory faculty, and for each one the number of pubs in top tier confs, second tier confs, and journals in the last 5 years, and their area, and a pointer to their website. Then RESIST the urge to boil it down to one number.
  9. I"m reminded of being on the Grad Admissions committee. I get to look at the transcript (much more informative than the GPA), letters, possibly papers. Fortunately I don't have to boil it down to one number--- there are very few categories (accept, wait list of some sort, reject, but there can be a few others involving scholarships, but VERY few categories really).
  10. Finding what you want: I think that  Raiders of the lost ark has tone of the best ending-of-a-movie ever.  So I Googled best movie ending and  variants of it, and alas, Raiders did not do well. One of the rankings didn't have it in the top 50. So I then Googled best movie endings Raiders of the lost ark and I found a ranking that had Raider in the top 10. Yeah! But this is all silly- I found some person who agrees with me.
  11. Steve Skienna and Charlie Ward have written a book Who's bigger: Where historical figures really rank   which has a transparent and reasonable  way to measure... not clear. Probably fame.  For a review see here

Saturday, November 08, 2014

George Dantzig >= 100

We celebrate the 100th anniversary of the birth of George Dantzig today. In his obituary post we talked about his work on optimization, particularly the simplex method for solving linear programs.

For this centenary let's recall the urban legend of the famous mathematician (I heard it as John von Neumann) who as a student wasn't paying close attention in class. The professor wrote down four of the major open problems in the field. von Neumann wrote them down thinking they were homework problems. The next day he went back to the professor, ashamed that he could only solve two of them.

What does this have to do with Dantzig? Turns out he is the true source of the legend. From Dantzig's Washington Post obituary:
In 1939, Dantzig resumed his studies at the University of California at Berkeley, studying statistics under mathematician Jerzy Neyman. An incident during his first year at Berkeley became a math-world legend.
As Dr. Dantzig recalled years later, he arrived late for class one day and saw two problems on the blackboard that he assumed were homework assignments. He copied them down, took them home and solved them after a few days. "The problems seemed to be a little harder to do than usual," he said.
On a Sunday morning six weeks later, an excited Neyman banged on his student's front door, eager to tell him that the homework problems he had solved were two of the most famous unsolved problems in statistics. 
"That was the first inkling I had that there was anything special about them," Dr. Dantzig recalled.

Wednesday, November 05, 2014

Favorite Theorems: Circuit Lower Bounds

My long time blog readers should have no surprise on my final favorite theorem of 2005-2014.
Nonuniform ACC Circuit Lower Bounds by Ryan Williams (PDF)
We saw several exciting circuit lower bound results in the 80's, highlighted heavily in my first list of favorite theorems (1985-1994 sections 2.2 and 2.3). Progress happened so fast that many expected a proof that an NP-complete problem didn't have polynomial-size circuits, and thus P ≠ NP, was just around the corner. But after 1987 that progress came to a sudden stop. We saw some lower bounds for algebraic circuits or circuits of very small depth but no general lower bounds until the work of Ryan Williams.

For years I would point out that our limited knowledge of lower bounds allowed that possibility that NEXP could be computed by constant depth circuits with Mod6 gates. Williams eliminated that possibility for constant depth circuits with Modk gates for any k.

One could take the techniques and results that Williams uses in his paper and build a whole graduate computational complexity course on them. Dick Lipton did exactly that at Georgia Tech a couple years back.

Ryan Williams' greatest insight though was to find non-trivial satisfiability algorithms for ACC0 circuits and use them to give lower bounds for NEXP. Recently Ryan has turned that process around, for example getting a faster algorithm for all-pairs shortest path using the techniques from the Razborov-Smolensky circuit lower bounds. Ryan's publication page has several new results giving algorithms from lower bounds.

Monday, November 03, 2014

A few more notes about Sipser and Sipser-60th


While Lance was AT Mikefest (Sipser's 60th Bday conference), helping to organize it, emceeing the personal statements, I was... also there.
A few notes

  1. Aside from his graying hair, Mike still looks like a teenager.
  2. In 1985 Mike was one of the few people who thought P=BPP. He wrote a paper about how a certain kind of expander implies what we would now call a hard vs randomness result and presented it at the 1986 CCC (called STRUCTURES then). After the Nisan-Wigderson Hard vs Rand results most everyone thought P=BPP. But Mike was there first.
  3. I took his grad complexity class in the early 1980's. I remember him proving results that either he or someone else had JUST proven the last week. He did a good job too!  What struck me then and now is how vibrant CS is as a field that material taught LAST WEEK can be in an INTRO grad course (that's not as true anymore).
  4. After PARITY not in AC_0, and the monotone circuit results, Sipser and others were OPTIMISTIC that P vs NP would be solved ``soon''.  Oh, to be in a field in the early days when people were optimistic. But see next point.
  5. Mike claims he STILL thinks P vs NP will be solved in 20 years. I don't quite know if he REALLY thinks this or wants to make the point that we should be optmistic. Similar to Lipton thinking P=NP--- does he really think that or does he want to make the point that we shouldn't all be so sure of ourselves?
And two non-sipers notes (sort of) from Mikefest

  1.  Steve Rudich told me that I misquoted him in a blog post and people often say `Steve, do you really think we are 6 months from an independence result'. I am not surprised that I made a MISTAKE in a blog post. I am surprised that people read it, remembered it, and asked him about it. In any case I have edited that post to SAY it was a mistake and I re-iterate it now: STEVE RUDICH DOES NOT THINK WE ARE SIX MONTHS AWAY FROM AN IND PROOF FOR P VS NP.
  2. I spoke to Mauricio Karchmer who, with Avi W and others, had an approach to P vs NC^1 via comm. comp which at the time I thought was very promising--- since we really can prove things in comm. comp. Alas it still has not panned out. However, Mauricio  now thinks that (1) We can't prove lower bounds because they are false, and (2) we can't prove upper bounds because we are stupid.

Thursday, October 30, 2014

Metrics in Academics

Congratulations to the San Francisco Giants, winning the World Series last night. In honor of their victory let's talk metrics. Baseball has truly embraced metrics as evidenced in the book and movie Moneyball about focusing on statistics to choose which players to trade for. This year we saw a dramatic increase in the infield shift, the process of moving the infielders to different locations for each batter based on where they hit the ball, all based on statistics.

Metrics work in baseball because we do have lots of statistics, but also an objective goal of winning games and ultimately the World Series. You can use machine learning techniques to predict the effects of certain players and positions and the metrics can drive your decisions.

In the academic world we certainly have our own statistics, publications counts and citations, grant income, teaching evaluation scores, sizes of classes and majors, number of faculty and much more. We certainly draw useful information from these values and they feed into the decisions of hiring and promotion and evaluation of departments and disciplines. But I don't like making decisions solely based on metrics, because we don't have an objective outcome.

What does it mean to be a great computer scientist? It's not just a number, not necessarily the person with a large number of citations or a high h-index, or the one who brings in huge grants, or the one with high teaching scores, or whose students gets high paying jobs. It's a much more subjective measure, the person who has a great impact. in the many various ways one can have an impact. It's why faculty applications require recommendation letters. It's why we have faculty recruiting and P&T committees, instead of just punching in a formula. It's why we have outside review committees that review departments and degrees, and peer review of grant proposals.

As you might have guessed this post is motivated by attempts to rank departments based on metrics, such as described in the controversial guest post last week or by Mitzenmacher. There are so many rankings based on metrics, you just need to find one that makes you look good. But metric-based rankings have many problems, most importantly they can't capture the subjective measure of greatness and people will disagree on which metric to use. If a ranking takes hold, you may optimize to the metric instead of to the real goals, a bad allocation of resources.

I prefer the US News & World report approach to ranking CS Departments, which are based heavily on surveys filled out by department and graduate committee chairs. For the subareas, it would be better to have, for example, theory people rank the theory groups but I still prefer the subjective approach.

In the end, the value of a program is its reputation, for a strong reputation is what attracts faculty and students. Reputation-based rankings can best capture the relative strengths of academic departments in what really matters.

Tuesday, October 28, 2014

Sipser Symposium



On Sunday we had the Symposium on Theoretical Computer Science on the Occasion of Michael Sipser's 60th birthday to celebrate what Mike has brought to research (seminal work on the complexity of randomness and circuits), service (ten years leading the MIT math department before recently becoming Dean of Science) and education (his great textbook and the corresponding popular course he still teaches).

We had an incredible turnout for the symposium and banquet that followed. I counted five Turing Award Winners (Mike's advisor Manuel Blum, Leslie Valiant, Shafi Goldwasser, Silvio Micali and Ron Rivest), five of the nine Nevanlinna Prize winners (Mike's student Daniel Spielman, Avi Wigderson, Valiant, Peter Shor and Madhu Sudan) and nine Gödel Prize winners.

Manuel Blum talked about his work with Jeremiah Blocki, Anupam Datta, and Santosh Vempala about a human computable hash function to create different passwords for every website. I've seen Manuel and Santosh produce passwords from arbitrary words and I'm impressed.

Johan Håstad recounted the early days of circuit complexity. Avi also talked about lower bounds. Shafi talked on her great paper with Mike showing public and private coin interactive proofs have the same power and a recent application of that result by Moni Naor et al showing how a defendant can convince the police his DNA doesn't match without revealing his DNA.

A number of Mike's PhD students also gave talks. Dan Spielman gave a framework for designing quantum algorithms without knowing much quantum.

The banquet included several stories, thanks and toasts. Nearly all of Mike's students participated in some way, a great collection of men and women I'm proud to be part of: David Barrington, Ravi Boppana, Jonathan Buss, Andrew Chou, Aditi Dhagat, Lance Fortnow, David Gillman, Michelangelo Grigni, Christos Kapoutsis, Marcos Kiwi, Mary (Bruzzese) O'Connor, Sofya Raskhodnikova, Zack Remscrim (current), Alexander Russell, Leonard Schulman, Daniel Spielman, Ravi Sundaram, Andrew Sutherland and Yiqun Yin .

Thursday, October 23, 2014

Guest Post by Dr. Hajiaghayi: A new way to rank departments

(This is a guest post by MohammadTaghi Hajiaghayi. His name is not a typo- the first name really is MohammadTaghi.)

Due to our belief in the lack of transparency and well-defined measures in methods used by U.S News to rank CS departments in theoretical computer science (and in general), my PhD. student Saeed Seddighin and I have worked for several months to provide a ranking based on a real and measurable method of the number of papers in TCS for the top 50 US Universities. To make this possible, we gathered the information about universities from various resources. You may see the ranking and our exact methodology here.

Indeed  we have some initial rankings based on similar measures for computer science in general as well which we plan to release soon (we are still in  the process of double-checking or even triple-checking our data and our analysis due to several factors). CS theory ranking is our initial ranking release to get feedback at this point.

Please feel free to give us feedback (hajiagha@cs.umd.edu).

Wednesday, October 22, 2014

MSR SVC Letters

The Committee for the Advancement of Theoretical Computer Science put together an open letter to several research leaders at Microsoft.
We feel that there should have been a better way to close down this lab, one that would have allowed them to have continuous employment until academic jobs are available again in September 2015. Given that this lab was continuing to produce exceptional — indeed revolutionary — research, we fail to understand why closing it had to be done so suddenly.
I recommend reading the whole letter. Many in the theory community and beyond, including myself, signed the original letter or added their support in the comments.

Yesterday Harry Shum, Microsoft Executive VP for Technology and Research sent a reply.
No one at Microsoft feels good about the fact that a significant number of our friends and colleagues were laid off.  These people contributed to the success of Microsoft over many years.  As one can readily imagine, the decisions made about how the cuts were implemented within MSR were extremely complicated and personally painful.  We feel with you the sense of loss that has been evoked by the closing of our Silicon Valley lab. 
Theory Matters has a cover email. More on the CRA Policy Blog.

I'd like to thank the CATCS leadership in putting together the letter which expressed well the thoughts and concerns of the community. The tone of the letter hit the right notes and really made Microsoft research leadership think about the important role the company plays in the larger computer science academic world.

Tuesday, October 21, 2014

Martin Gardner Centennial

Martin Gardner was born on October 21, 1914, so today is his Centennial (he died on May 22, 2010, at the age of 95). We've mentioned him in the blog before:

  1.  The Life of Martin Gardner
  2.  Contribute to the Gardner Centennial
  3.  Another Post on Martin Gardner
  4. I used the anagram Tim Andrer Gran in both my review of the Lipton-Regan book (see here) and my Applications of Ramsey Theory to History paper (see here)

So what can I add on his centennial?

  1. He was not the first person to write on recreational mathematics, but he was certainly early and did it for a long time.
  2. I suspect he influenced everyone reading this who is over 50. For every y, y is under 50 and reading this column, there exists x such that MG influenced x and x influenced y.
  3. The line between ``recreational'' and ``serious'' math is sometimes blurry or hard to see. An obvious case of this was Euler and the Bridges problem leading to graph theory. At one time solving equations was done for competition, which seems recreational. Galois theory is not recreational. 
  4. Donald Knuth's book Selected Papers in Discrete Math (reviewed by me here) states I've never been able to see the boundary between scientific research and game playing.
  5. I am reading a book  Martin Gardner in the 21st century which is papers by people who were inspired by him. The papers really do blur the distinction between recreational and serious. Some are rather difficult but all start out with a fun problem.
  6. Aside from recreational math he did other things- magic, and debunking bad science.  (Fads and Fallacies in the name of science was excellent.) He was a well rounded person which is rare now. 
  7. Brian Hayes and Ian Stewart and others do what he did, but given the times we live in now, its hard capture the attention of a large segment of the public. (analogous to that when I was a kid there were only a handful of TV stations, now there are... too many?)
  8. When I was in high school I went to the library looking for math books I could read (naive?). I found one of his books (collection of his columns) and began reading it. I learned about casting out nines and I learned what was to be the first theorem I ever learned a proof of outside of class (given that I was probably 12 it may be the first proof I learned ever). It was that (in todays lang) a graph is Eulerian iff every vertex is even degree.

Thursday, October 16, 2014

The Curious Case of NP and NEXP

NP (nondeterministic polynomial time) and NEXP (nondeterministic exponential time) are provably different classes by the nondeterministic time hierarchy. No surprise, given exponentially more time we expect to solve more problems. But the proof requires collapses at many input lengths and odd things happen when we look at the infinitely-often question.

We say a language L is in i.o.-C for a complexity class C if there is an A in C such that for infinitely many n, A and L agree on strings of length n (for all x of length n, x is in A if and only if x is in L). Straightforward diagonalization shows that EXP is not in i.o.-P.

However we showed a relativized world where NEXP is in i.o.-NP in a recent paper (Theorem 18).
The construction is not particularly difficult but rather surprising: There is a possibility that one can get exponential improvement for nondeterministic computation for infinitely many input lengths.

Also consider the following facts: NEXP is not in i.o.-co-NP by straight diagonalization. If NEXP is is in i.o.-NP then
  • NEXP is in i.o.-EXP
  • EXP is in i.o.-NP and thus EXP is in i.o.-co-NP (since EXP is closed under complement).
This is not an immediate contradiction since i.o. inclusion is not transitive, even though those i.o.'s happen at about the same length. You can't combine these relativizable statements to prove NEXP is not in i.o.-NP.

NEXP in i.o.-NP is not one of my most complicated relativization results, but definitely one of the strangest.

Monday, October 13, 2014

Luddite or not?

My first ever guest post for Lance was on Are you a luddite. I certainly am to some extent a luddite, but there are some things where it not clear if they are luddite-ish or not.

  1. I prefer reading books to blogs. This came up when I reviewed both Lipton and Lipton-Regan blog-books, and I am now reading some of Terry Tao's Blog book.  l look forward to reading Scott's Blog book. At first I thought that preferring books was luddite-ish. But some high tech people and some young people who I've asked AGREE with me. Why is this?
  •  When reading a blog (or doing anything on line) its so easy to get distracted, e.g. OH, I WONDER IF WHITEY FORD IS STILL ALIVE SO I"LL PUT HIM ON MY LIST OF LIVING FAMOUS PEOPLE OVER 80 (he is, he's 85, and has the same birthday (though not year) as Martin Gardner), OH, I wonder if the word Buypartisan (that is NOT misspelled) is on my list-of-cool-new-words that I keep, OH I wonder how many people have registered for Theory Day. OH, Lipton just posted about Definitions not being needed and used that quote from The Treasure of Sierra Madre (see here) that was satirized in the movie UHF, I wonder if that clip is on You-Tube (It is here). OH, I can write a blog about Math in Weird-Al songs, for example Polka Patterns.
  • If I read a blog with a proof in it I tend to say  I'll read that later.
  •  I work better with pen and paper on hand. This may change if the way to mark up pdf and other documents gets better.
  1. (I do not why it restarted at number 1. I don't care to fix it- is that Luddite or not wanting to waste time on something unimportant?)
  2. Of course, the blog reading issue  is MY fault for being distracted.
  3. I don't pay my bills on line. There have been many data breaches and that gets darling and I nervous. Is this Luddite? Not sure--- is banking off-line any safer? I ask non-rhetorically.
  4. In a small class I use the blackboard. Some of my systems faculty have gone from board to slides and then back to board.  For a big class I have to use slides, though that may be an argument for small classes.

BILL: When I goto that conference I am going to bring some math to read during the talks I don't understand.

DARLING: Isn't that rude?

BILL: Many people in the audience will have their laptops out, reading email, managing their facebook page, etc.

DARLING: But the speaker can at least imagine they are taking notes

BILL: Unlikely. In the next talk the speaker will become the laptop person.

The fact that I don't look at a laptop during a talk is probably a plus- and not a Luddite thing.
  1. We still don't have Netflix. We watch less TV this way? Worse TV this way? Not clear how this one goes.
  2. I used to write things out before typing them in, now I type them in directly. I wonder if that's good or bad.
  3. I used to have notebooks of random math stuff in them. Now I can't get myself to write things out by hand. That's probably bad.
  4. If someone asks me a question I am too quick to goto the web rather than try to answer it myself. This is mixed--- I don't waste time on problems I can't solve, but I also don't have the joy of solving them. I think of myself as not being a good problems-solver, but this could be a self-fulfilling prophecy that the web makes easier to indulge in.
  5. This is a DUH-comment- I hate technology that does not work. One of the worst episodes of Star Trek was The Ultimate Computer which showed that a good human is better than a MALFUNCTIONING computer. Well DUH. I had a rant about electronic refereeing - and not a single comment accused me of being a Luddite. In short- I hate technology that doesn't work. Duh.
So- your thoughts? Some Luddite things may not be Luddite, but just better. And technology will change yet again, making us all Luddites.

Thursday, October 09, 2014

2014 Fall Jobs Post

Tis the season for the fall jobs post. Please list any jobs, academic or industrial, in theoretical computer science broadly construed in the comments to this post. If you are a job seeker check this page often as new jobs get added over time.

As always the best places to look for academic CS positions are the job sites at the CRA and the ACM. Also check out postdoc and other opportunities on Theory Announcements. It never hurts to check out the webpages of departments or to contact people to see if positions are available.

I expect the computer science market to be quite robust again. CS enrollments continue to explode putting pressure to hire more faculty. Several good places last year had open positions unfilled.

We should also see a growth in instructor positions, as deans and provosts need to meet enrollment demands but aren't ready to commit faculty positions until they are sure CS is not in another bubble. Don't ignore these positions, think of an instructor as a teaching postdoc.

The market in theoretical computer science is a harder call. What will be the effect of Microsoft adding fifteen theorists to the market? That also suggests fewer industrial research and postdoc opportunities in theory.

Try to make yourself more versatile. Perhaps you could teach machine learning or computer security, areas of strong need closely related to theory.

Good luck to everyone on the market. I look forward to seeing your names in the 2015 spring jobs post.

Monday, October 06, 2014

The Complexity of NIM. Open?

Recall 1-pile NIM:

Let A be a finite set of Naturals. NIM(A) is the following game: There are n stones on the board. Players I and II alternate removing a\in A stones. The first player who can't win loses. Note that if 1\in A then `can't move' means that the other player took the last stone. If (say) 2 is the min elt of A then its possible there is 1 stone on the board and a player can't move.


The following  are known and easy to prove:
  1. If A={1,L} and L is even then II wins iff n\equiv 0,2,4,...,L-2 mod L+1
  2. If A={1,L,L+1} and L is odd then II wins iff n\equiv 0,2,4,...L-1 mod 2L+1
  3. If  A={1,L,L+1} and L is even then II wins iff n\equiv o,2,4,...,L-2 mod 2L
  4. If A= {L,...,M} then II wins iff n\equiv 0,2,4,...,L-2 mod L+1
  5. For ANY set A there will be a mod pattern, after a certain point.
(I think that if 1\in A then the mod pattern goes from the beginning, but if 1\notin A then its possible that it does not start for a while.)

This raises the following computational problem: How hard is the problem of, given finite set A, find the mod pattern. I would want to know the complexity as a function of the size of the representation of A, or possibly just |A|log_2(max elt of A).  Has this been looked at? Some Google searches and asking around did not yield anything. I'm hoping that asking my readers may yield something.

Thursday, October 02, 2014

Favorite Theorems: Multilinear Circuits

In the past decade we have seen a strong program in algebraic circuit complexity. If you just define circuits using multiplication and addition gates, sometimes you can use the algebraic structure to get lower bounds that prove much more difficult in the Boolean case. For October's favorite theorem we highlight a couple of Ran Raz's results.

The main result of the first paper is basically in the title. Raz creates a polynomial function f that can be computed by a polynomial multilinear circuit of depth O(log2 n) but cannot be computed by any multilinear formula. (A formula, unlike a circuit, cannot use the output of a gate more than once). His proof works by examining the rank of the partial derivative matrix of a function chosen in a specified random way.

Ran Raz followed up with Amir Shpilka and Amir Yehudayoff to show lower bounds for syntactically multilinear circuits and exponential bounds for constant-depth multilinear circuits.

The second paper showed the most well-known of the algebraic functions (permanent and determinant) do not have poly-size multilinear formulas. Raz again starts with the partial derivative matrix, this time combined with random restrictions.

More recent work in algebraic circuit complexity focuses on the VP versus VNP problem, the algebraic version of P v NP. VP = VNP if we can solve the permanent on poly-size algebraic circuits. Proving VP ≠ VNP is a goal of the Geometric Complexity Theory program as well as a consequence of tighter bounds on constant-depth algebraic circuits

Tuesday, September 30, 2014

Dagstuhl on Algebra in Computational Complexity

(Reminder- Theory day at UMCP: here is the link. )

There was a Dagstuhl on Algebra in Computational Complexity Sept 22-26.
I learned stuff in the talks, over meals, and even in my room alone at night.

1) Recall that a while back Ryan Williams (the theorist, not the American-Football player) showed that NEXP is not in ACC. His proof involved MANY things but one of the core things was an ALGORITHM for a version of  SAT (I think Succinct-SAT) that was ever-so-slightly better than brute force. So one lesson is that people in complexity theory should know some algorithms. At Dagstuhl Ryan presented work that shows that people in algorithms should know complexity. He used some old results about circuits to obtain  algorithm for all-pairs shortest path that has complexity n^3/X where X=2^{\Omega(log n)^{1/2}. The ultimate goal is to either prove or disproof that all-pairs... has n^{3-ep} algorithms or not. On the NOT side he has (with other people, including Virginia Williams) a large class of problems , including APSP, that either all have n^{3=ep} or none of them do.

2)  There were two (possibly three) talks on VP and VNP. Both are circuit classes defined by Valiant. Meena Mahajan has some natural problems that are complete for VNP (if you consider looking at Homomorphic polynomials natural) and Eric Allennder had a dual notion to VP and VNP. A sequence of polynomials is in VP  if there is an arithmetic circuit  family of polynomial bounded size and degree that computes the sequence. (Circuit C_n computes poly f_n). VNP is if there is a sequence C_n of poly bounded size and degree such that f_n(x) = sum as y\in {0,1}^p(n) C_n(x,y).

This is usually discussed over a finite field. Eric's result depended on which field it was.

3)  Stephen Fenner talked about some combinatorial games that were PSPACE complete. The black-white-poset-game is as follows: there is a poset where every node is colored white or black. One player is called black, the other is called white. Players alternate removing nodes of their color, and if they remove a node they remove all nodes above it. Either player can go first, so you may have a game where if B goes first he wins, but if he goes second he does not. Fenner and his co-authors have shown that the general problem of, given a Black-whie Poset and who goes first, determining who wins, is PSPACE complete. They showed that other versions are in P.

4)  In the 1990's Karchmar and Wigderson had an approach to NC^1 vs NC^2 and P vs NC^1 that looked promising--- they defined problems in communication complexity (where we actually DO have results!) that would imply some separations. This lead to monotone circuit results, but not to real circuit results. Or Meir spoke on some problems in that realm which can now be approaced with information complexity. Are we closer to a true separation? Next Dagstuhl!

5)  David Zuckerman gave a nice talk on Malleable codes. I was intrigued by some of the math that he did. The Sum-Product theorems are along the lines of: If A, B are large sets of reals (or other domains) then either A+A or AA is large. Or one could say that if A,B,C are all large than AB+C is large. David used an entropy version of this--- if A,B,C have large min-entropy, then so does AB+C.

6)  Kopparty showed that Polynomial Id Testing and Polynomail Factoring are related and may be equivalent.

7)  Over Lunch Jacobo Toran told me the following rather odd result.

a) Imagine the following GI algorithm: given two graphs look at the degree sequence, then the degrees of the degress, then... (this goes on n times). Two graphs are isom if they are never found to be non-isom. DOESN"T WORK-  Cai-First-Immerman showed that. Even so, we'll call that FOCSI-isom. Note that FOCS-isom is in P.

b) Recall that G (as a matrix) and H (as a matrix) are isom if there exists a Perm Matrix P such that GP = PH. We can expand what P can be- say to doubly-stocastic (every row and every column adds to 1) We call two graphs G,H STOC-isom if there exists a Double stocastic matrix P such that GP=PH. This is in Poly Time by Linera Programming.

c) FOCS-isom and STOC-isom are the same! Too bad, I thought that STOC-isom might be a way to get GI in P.

8) Sometimes at a conference I find a book in the library and read it at night and learn something out of nowhere. This happened with Mitzenmacher-Upfal book on Prob. and Computing (It could be called ``The Alice book'' as there is a picture of Alice from Alice in Wonderland on the cover. For the longest time a famous compiler book was called The Dragon Book). I read parts of it and even made up notes that I may use in an ugrad course:  the coupon collector problem: A cereal company puts coupons labeled 1,2,...,n at random in boxes. If you mail in one of each you get a free box of cereal. You are DETERMINED to get that box of cereal. What is the expected number of boxes of cereal you must buy? It turns out to be nln(n), and is very tight around that.

9) I learned more from the talks, more from the meals, and more from my own alone time, but what is above is a good sampling.

10) More chalk-talks then I would have thought. Either chalk or slides can be done well or poorly.

11) Looking forward to the next Dagstuhl!