Computational Complexity: October 2006

Tuesday, October 31, 2006

Algorithms and Networks in The Times

Steve Lohr wrote a New York Times essay on the CSTB 2016 Symposium held earlier this month. Lohr highlights the presentations of Richard Karp on the Algorithmic Nature of Scientific Theories and Jon Kleinberg on Social Networks, both of whom gave similar presentations at FOCS.

Monday, October 30, 2006

Network Neutrality

Some members of our community have worked on network protocols that use economic mechanisms to distribute bandwidth throughout the network. Would this research violate the concept of network neutrality? That depends on what you mean by network neutrality.

The Internet pioneer Tim Berners-Lee writes

It is of the utmost importance that, if I connect to the Internet, and you connect to the Internet, that we can then run any Internet application we want, without discrimination as to who we are or what we are doing. We pay for connection to the Net as though it were a cloud which magically delivers our packets. We may pay for a higher or a lower quality of service. We may pay for a service which has the characteristics of being good for video, or quality audio. But we each pay to connect to the Net, but no one can pay for exclusive access to me.

So an ISP like Comcast could accept money from say Disney to distribute their bits faster as long as they make the same deal available to all other content providers.

Some people take a more stringent view like the (failed) Markey Amendment to the COPE Act.

Each network provider has the duty to operate its broadband network in a non-discriminatory manner so that any person can offer or provide content, applications, and services through, or over, such broadband network with equivalent or better capability than the provider tends to itself or affiliated parties, and without the imposition of a charge for such nondiscriminatory network operation.

The amendment would have required ISPs to provide the same service to all content providers without additional charge. This might prevent any market mechanism for bandwidth distribution.

Our role as theorists is not to debate the ethical issues of network neutrality but rather to help frame the debate by understanding the technical issues raised by the various definitions of network neutrality and how these rules can affect the overall efficiency of routing bits through the net.

Sunday, October 29, 2006

The Generalist

As you readers know, my colleague Janos Simon went to FOCS last week. Janos wasn't an author, he wasn't on the program or organizing committees, he wasn't an invited speaker, he didn't sing in the band and he doesn't live within driving distance of Berkeley. Janos Simon went to FOCS not because he had an official reason, but rather to keep up with the latest results in theory. FOCS had low attendance this year (about 220) because it doesn't attract many Janos Simons, people who travel to FOCS because they want to, not because they feel they have to.

Janos Simon is part of a dying breed, a generalist in theoretical computer science. Janos goes to most STOC and FOCS conference and few specialized conferences like SODA and Complexity. In three decades of research, Janos has studied complexity, algorithms (distributed, parallel, randomized), VLSI and networks well as having interests in nearly every area of theory. It's impossible to arrange parallel sessions at FOCS in way that won't cause a conflict for Janos.

Back in the 70's a graduate student could keep up with the latest research in all of TCS and produce significant results in his or her career in many different areas. Today we seem to push out very specialized researchers focused not just on some subarea like algorithms or complexity but even in more specialized areas like streaming algorithms or PCPs.

Such specialization has its risks, for example getting so caught up in an area that you lose track of the motivation for studying the topic in the first place. Also we lose some sense of community when theorists in different areas lose interest in following each others work. But can a young Janos Simon survive today as a generalist or are we doomed to further specialization as students search for successful research agendas?

Thursday, October 26, 2006

Whose Thesis is it Anyway?

Ten years ago Bob Soare realized that the name "recursion theory" did not do justice to his field so he single-handedly changed the name of the field to Computability theory. Now he wants to correct the early history of computability and credit to whom credit is due, naming Turing.

The canonical wisdom in most computability textbooks is that there were several researchers in the early 1930's working on various formal definitions to capture the intuitive notion of a computable function, and that Turing was just one of them and shares the credit with them. This is incorrect. It was Turing and Turing alone not Church, not Kleene, not Post, and not even Gödel himself. It was Turing who:

gave the first convincing model for an intuitively computable function;
gave a precise demonstration that this formal model captured the intuitive notion; and
defined the universal Turing machine, which is of immense importance.

You can see Soare's historical view here.

Yesterday Soare gave a preview of a talk at the Chicago logic seminar. He focused mostly on the work of the 30's and how Kleene later established the terminology Recursion Theory and Church's Thesis. Soare argues that Turing deserves most of the credit for the Thesis because Turing gave solid arguments for the thesis and Gödel always gave Turing credit. Church may have first formulated the thesis but Turing's work made it credible.

We computer scientists have been using "Church-Turing Thesis" for years and with Soare's historical background, Church-Turing still seems like the way to go.

Wednesday, October 25, 2006

Introductory CS Sequences

At most universities the first year CS courses tend to cover programming. These courses differ from first-year sequences in other departments in several important ways.

In most disciplines, the topics of the introductory sequence have not significantly changed since I went to college and even since my father went to school. In CS our introductory courses seem to change every few years. I don't think anyone currently teaches PL/C (or any other PL/1-variant) that I learned in CS 101 at Cornell. Computer Science didn't even exist as a field when my father when to college in the 50's.

In most disciplines any professor in the department deeply knows the material taught in the introductory sequence. Any professor could teach the intro sequence, or if they can't it's not because they don't know the material. This is certainly not true in computer science.

A professor at a state university noted that their CS majors had internships after their first year and commented "CS is the only discipline where we have to make the students employable after the first year."

In non-CS scientific disciplines, different universities generally teach the same material to their first-year students. Different physics departments teach with different books at different levels and maybe material in different orders but there is general agreement of what basic concepts of physics that first year students should know.

Go to any computer science conference and you'll hear discussion about what programming language gets taught at their schools. Nearly every department has people disagreeing about which programming language to teach to the first years. I tend to stay out of these fights for it is a lose-lose proposition. Win the argument and you'll end up teaching the class.

Tuesday, October 24, 2006

FOCS Funding, Music and Talks

Another Janos Simon report from the FOCS conference in Berkeley.

News from yesterday's Business Meeting (continued). Most of the news from NSF is a repeat of last years sorry state of affairs: no money, not enough influence.

Influence: We (TCS) are a lowly leaf, instead of a child of the root in the CS organizational tree. This is both bad for money, and very bad for influence and image. Bill Steiger is working hard for a change.

Money: bad news. More precisely, when Bill Steiger got to NSF, all his budget was already spent. (Budget = $4.1 million, plus 2.8, a share of the money for three subprograms). Of the ~500 CS proposals to NSF, about 90-100 were for Theory. Bill managed to fund 18, by hard work, persuading others to let Theory have a bit of their budget, etc. The SING (network) program was also seriously underfunded (~100 proposals, 7 funded, one of these was theory-flavored.)

Perspectives for 2007 are not good. Again, the money is pretty much spent, but Steiger will do the best he can. NSF budget should be within 10% of 2006. About 4-5 CAREER grants might be funded, no large Theory projects will be. SING will again be very competitive.

If you are interested in submitting a proposal do talk to Bill Steiger. He will try to help.

An appeal by Bill Steiger: please send "nuggets" to him—descriptions of neat Theory results. They are VERY useful.

Dick Karp spoke, and thanked Bill in the name of the Theory community. After a round of applause, Karp explained that other science communities lobby for NSF projects: astronomers get observatories, satellites; physicists get accelerators, or focused projects. They have an internal discussion and then lobby lawmakers and NSF. CS, in particular Theory, should do likewise.

He appealed for mobilizing the community. NSF wants CS to suggest visionary activities, and CRA got a few millions to organize us to suggest ideas. Theory should make sure we are well represented.

Avi briefly noted that out of 0.5 billion to CS, Theory gets 6-10 million, yet we are about 20% of the community. We need politics. We should also think of service at NSF as part of our service.

At the end of the meeting, it was announced that Hal Gabow will be the next TC chair.

The final part of the proceedings was moving to the adjacent ballroom where we were treated to a rock concert. Christos Papadimitriou, suitably dressed in all black was the keyboard player, Anna Karlin had one of the electric guitars: the rest of the band (Lady X and The Positive Eigenvalues) was unknown to me. Lady X was Andrea Frome (a Berkeley grad student), and the performances were very good. Eventually Christos also sang. (Luca reviews the concert with pictures.)

MONDAY talks.

There was an invited talk by Terry Senjowski, a distinguished neuroscientist. He talked about how vision is influenced by knowledge and expectations, and how our brains seem to have learning feedback mechanisms, and that simple neural networks can learn surprisingly complicated stuff. I can report more fully if there is interest; he did not make direct connections to Theory.

Again today there were a number of very interesting papers. A cute question from the point location talk: in 2D is nearest neighbor as hard as point location? As usual, clever crypto papers, and quantum results, including one by Xiaoming Sun and Andy Yao with real heavy duty math.

A most interesting/controversial talk was by Leslie Valiant. He explored paths to try to prove that NC²=P^#P—he thinks this would be cleaner than the current complexity zoo. The paper is a continuation of the research direction started with his matchgate computations. He considers "holographic reductions" that conserve number of solutions, not by direct translation as the usual parsimonious reductions do, but in complicated ways: they are many-to-many maps but still conserve counts. Using these, he is able to prove interesting stuff, like counting number of satisfying assignments mod 7 is in P, but 7 may be the only such modulus.

The paper that won the Machtey award is very nice, and Nick Harvey did a superb job of presenting it. He has probabilistic algorithms to find matchings in non-bipartite graphs that achieve time O(n^ω) where ω is the exponent of matrix multiplication. There was another algorithm with the same bound (Mucha and Sankowski, FOCS 2004), but Nick Harvey managed to do what no other general matching algorithm does: it is easy to understand. Read it!

Finally a paper I could not hear, but is interesting if only to show how taste changes. Back in the remote prehistory of CS—around the very early sixties—people were very interested in small universal machines. They wanted to see how simple could objects be and yet have undecidability "built in." More precisely, for what pairs (s,a) is there a universal Turing machine with s states and a tape symbols? (The product sa is the "size" of the machine). The smallest machines were constructed by Minsky, using as a subroutine "tag systems" that are a kind of "Post production systems" that are a kind of type 0 grammar. The simulations introduced an exponential overhead, so these smallest machines were very slow. It has been an open problem for over 40 years whether this exponential slowdown was really necessary. The paper by Woods and Neary shows that this is NOT necessary, and gives a polytime algorithm for all known minimal-size universal Turing machines.

Monday, October 23, 2006

FOCS Day 1 and Business Meeting

Janos Simon continues his reports from Berkeley.

Dick Karp gave a 1-hour invited talk in two parts. First he gave a short version of "Why TCS is important to other sciences." This is a sermon that ought to be delivered to deans, and to non-theory folks in CS.

Highlights:

TCS points out the computational nature of natural and social processes, and provides a language for their descriptions analogous to Math providing equations that describe phenomena and methods to manipulate equations.
TCS alters world views in other areas, providing paradigms to understand and to model.

Examples:

Biology: cell process regulation.
Learning: (in humans, animals) paradigms, models and algorithms from Computational Learning Theory
Molecular self-assembly
Strategic Behavior of companies
Physics: Quantum Mechanics and quantum computing.
Math: too numerous to mention, algebra, Fourier techniques, Social sciences: web, data for social interactions and behavior, etc.

Karp also gave examples of where convergence between CS and other areas benefited both disciplines including Belief propagation, error correcting codes and constraint satisfaction.

Janos' Comment: I think it is important to emphasize that CS contributes more than a passive lens: we are not a tool, but a partner.

The second part of Karp's talk was a set of examples from Computational Molecular Biology, illustrating some of the contributions/research accomplishments. He gave 5 examples of "Algorithmic Challenges in Computational Molecular Biology." [Many are associated with Berkeley because Karp is quite familiar with these]

Sequencing Genomes. He talked mostly about Myers' contribution to shotgun sequencing. His algorithmic ideas were crucial to the success of the technique, which, against biologists' expectations, became the standard. The technique: extract many random sequences of the genome, of known fixed length, 2, 10, 50, 150 kb pairs. read 500-1000 nucleotides from opposite ends computationally assemble the genome. Myers came up with several tricks the do NOT work for arbitrary sequences, but take into account the domain. Had to use biological knowledge.
Parametric Sequence Alignment. The Needleman-Wunsch algorithm (aka dynamic programming) gives best alignment of two sequences (DNA fragments) given scores for matching letters, mismatches, or matching a letter with a space (modeling insertions/deletions). What if these costs are not known? Clever algorithm by Pachter and Strumfels, using polytopes for this parametric problem
Inferring Evolutionary History (Daskalakis, Mossel and Roch, STOC 06)
Genetic Variation Zhihong Ding, Vladimir Filkov, Dan Gusfield: A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem
Regulation of Gene Expression (ran out of time)

Some other interesting talks I heard:

Nguyen, Ong, Vadhan: Statistical zero-knowledge from any 1-way function. Omer Reingold liked this. I did too. They prove that 1-way functions (in some sense the weakest crypto assumption) is sufficient for giving a zero-knowledge proof—under new definitions. The usual one is that soundness is statistical (absolute) and zero-knowledge is computational (guaranteed only about polynomially bounded adversaries). This paper inverts the roles, answering a question proposed in 1992. The proof technique is new.

Assaf Naor presented two difficult and impressive results, showing that problems of embedding one metric space in another have a deep mathematical background, and interesting CS applications. The papers have difficult proofs that may well be worth studying.

Santosh Vempala presented a great result (with Laci Lovasz) showing how sampling log-concave functions is related to optimization problems.

I found the results on derandomization, and hardness amplification quite interesting (section 4B) and was convinced of the virtues of smoothed analysis (concurrent section 4A). Ryan O'Donnell gave a beautiful presentation of new nonapproxability results in session 5A, and I learned that honest majority is easier in a quantum setting in session 5B.

Highlights of the business meeting:

Machtey Award for Best Student Paper: Nicholas J. A. Harvey for Algebraic Structures and Algorithms for Matching and Matroid Problems.

Stats: 243 submissions, 71 accepted (~30%), ~220 attendees. By comparison
SODA 387/139 36%
ESA 215/55 26%
STOC 280/78 27%
CCC 85/28 32%

Deadlines: CCC Dec 3, program Chair Peter Bro Miltersen, STOC Nov 20 (no joint submissions to CCC)

STOC 07 will be at FCRC in San Diego. Hotel will cost 129/139 single/double. Dates are June 11-13 for STOC, June 8-16 for FCRC.

FOCS 07 will be in Providence, RI. Program chair is Alistair Sinclair. Location is the new Renaissance Hotel, currently under construction.

FOCS 08 will be in Philadelphia.

Our community received a number of prizes, previously reported here: Jon Kleinberg, Nevanlinna Prize, Danzig Prize to Eva Tardos, and co-winners of the Fulkerson Prize: Manindra Agrawal, Neeraj Kayal and Nitin Saxena, "PRIMES is in P", Annals of Mathematics, Volume 160, issue 2, 2004, Pages 781--793. and Mark Jerrum, Alistair Sinclair and Eric Vigoda, "A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries", J. ACM, Volume 51, Issue 4, 2004, Pages 671--697. Neil Robertson and Paul D. Seymour, "Graph Minors. XX. Wagner's conjecture", Journal of Combinatorial Theory, Series B, Volume 92, Issue 2 , 2004, Pages 325--357. (not really TCS, but we'll claim it!)

There was a new proposal by Umesh Vazirani: let us have no paper proceedings. Instead, let the papers be submitted (perhaps as little as one week before the conference) to a website. This would allow postponing submissions by perhaps 2 months, and would get extra benefits (cheaper registration, text searchable, available everywhere.

A lively discussion ensued. The economics of the proposal are unclear: would it decrease attendance? Should we worry about the economics? Some people like having the paper version at the conference. Making the "electronic proceedings" have page numbers, formatting, etc. would still imply text processing and editing. On the other hand, an electronic version would have substantial advantages. Would IEEE let us do it? What about libraries who paid for the proceedings?

It was pointed out that the IEEE electronic library is unreasonably expensive, and many institutions do not have it. A quick vote was taken, and results were inconclusive. Many wanted a CD with the proceedings. About an equal number of people wanted proceedings with CD as no proceedings and electronic versions only.

Our leaders promised to study the matter.

Finally the NSF report. Bill Steiger is going back to academia, in August. He was warmly thanked for his efforts for the community. I will give a detailed report tomorrow.

Sunday, October 22, 2006

FOCS Begins

Weblog correspondent Janos Simon reports from Berkeley.

This is a pre-FOCS FOCS report, filling in for Lance.

FOCS 2006 is at Berkeley Marina, a lovely place on the Bay, with expensive sailboats next to the hotel gardens, against the beautiful backdrop of the San Francisco Bay, with the Berkeley hills on the other side. All of this even comes at a relatively cheap price.

Unfortunately the hills of Berkeley are not so near: downtown Berkeley is about 3 miles from the hotel, on the other side of train tracks and the interstate. This not only prevents one from easily strolling around the Berkeley campus and stopping at one of the nice coffeehouses or bookstores, but also makes getting dinner more of an adventure, involving a cab or a hotel shuttle. I am not complaining about the organizers: putting a conference together is a difficult balancing act, with too many constraints.

The conference itself should be excellent, with very strong papers, including the solution of the 2-player Nash equilibrium that won the best paper award. There will be three invited talks, all dealing in one way or other with Theory interacting with other areas of knowledge: tomorrow (Sunday) Karp will talk about "Theory of Computation as a Lens on the Sciences", Monday Terry Senjowski will teach us about Vision, and Tuesday Jon Kleinberg will discuss networks, social and technological. The biggest innovation relative to previous FOCS will be the presentation of Lady X and the Positive Eigenvalues, a rock concert. [For those with a historical bug, this is not an absolute first. Dexter Kozen, with Anna Karlin of the Severe Tire Damage rock band—the first to broadcast a rock concert on the internet—have in the past help expand our horizons in these directions, I believe at the Palo Alto conference, but I cannot offer precise pointers.]

The conference has parallel sessions (pros and cons have been previously and exhaustively discussed), so I will only be able to provide a very incomplete report of the papers presented. I should also add that I expect to miss many talks, not only due to my non-quantum nature and consequent lack of ability to be in two places at the same time, but also due to random phenomena: sleeping late, meeting friends, discussing ideas, wandering into the wrong lecture room, and forgetting about the time of talks I wanted to hear. So the papers I'll mention are not necessarily the ones I chose to listen to, just the ones I happened to not miss, and inferences about my taste or the quality of papers ignored would likely be mistaken.

Given that caveat, I can talk about two very nice results to be presented today. I heard them at Chicago before the conference.

The first is Tom Hayes' paper that gives a sufficient condition for certain important probabilistic models to converge rapidly. For definitions, technical details, etc read the paper. What I liked is that the paper is not only a significant technical improvement, but this seems to be the "correct" reason that lurked behind previous proofs, and this (I think) will be the proof we will teach.

The second is the paper by Belkin, Narayanan and Niyogi on estimating the surface area of a convex body. Both the area and the volume of high-dimensional convex bodies are very hard to approximate by the "natural" method of approximating the boundary by a simple smooth object: the number of parameters to do so grows too fast. For the volume, one can use random walks: an appropriate random walk on the space is stopped and we record whether we stopped inside or outside the object. The ratio of these events can be used to estimate the volume. Unfortunately this does not work for the surface area. The very cute idea of the Belkin et al paper is to put a heat source inside the object, and observe the amount of heat that leaks outside: this will be proportional to the area

Of course there are lots of technical difficulties, but they can be overcome. The details are not easy, but I thought the idea was new and nice.

Thursday, October 19, 2006

Quickies

David Pennock gives a CS view of prediction markets and gambling on his new weblog Oddhead. For those interested in prediction markets also take a look at the group blog Midas Oracle.

Chris Leonard, former editor of the Elsevier theory journals, returns to academic publishing for the open-access publisher BioMed Central to help them expand into physics, math and CS. He writes about changes in scientific publishing (and other topics) in his weblog.

Should teachers try to make math interesting and relevant? No, according to Tom Loveless.

And I'd be remiss not to mention the lengthy New York Times profile on Shing-Tung Yau.

Wednesday, October 18, 2006

For-Profit Universities

On Monday the Chicago Bears played the Arizona Cardinals at the University of Phoenix stadium. Is this the stadium where the University of Phoenix normally plays their football games? No, the University of Phoenix doesn't even have a football team or a traditional college campus at all. Rather the University of Phoenix is a for-profit university with about 300,000 students taught at several small campuses in the US and beyond and online focusing on career-oriented majors. In an AP Interview, their new president Bill Pepicello explains their mission.

Our philosophy for serving students is the same as Harvard or Ohio State, and that is we're mission-driven. The mission of, say, Harvard is to serve a certain sector of the population and their mission is not to grow. And that's true of higher education in general. The reason the University of Phoenix exists at all is that is that all of those various (universities) and their missions did not provide access to a large number of students who are capable and wanted access to higher education. And that's our mission.

This university fills a need for training Americans for new careers as we continue to ship manufacturing and service jobs overseas. But the University of Phoenix doesn't fulfill other traditional university functions like basic or applied research or being the intellectual center of their communities.

For-profit universities haven't posed much of a threat for the traditional university model in the US. But in the long run as people get more comfortable with the virtual world of the internet, universities with a large on-line presence might radically change higher education as we know it.

Meanwhile India has a shortage of trained professionals as the country explores how to best address its higher educational needs. Germany on the other hand is focusing on encouraging high-level research by establishing elite universities, graduate schools and research clusters.

So how did the University of Phoenix get their name on the football stadium? The same reason the White Sox play in U.S. Cellular field—corporate sponsorship. At least we don't put corporate logos on the fronts of our jerseys like the European footballers.

Tuesday, October 17, 2006

Scott and P versus NP

In the eighth Complexitycast, Bill Gasarch and I dissect Scott Aaronson's Ten Arguments for Believing P≠NP and briefly Bill's P versus NP Poll. MP3 (24:55, 4.3MB)

Monday, October 16, 2006

Favorite Theorems: The Yao Principle

September Edition

What is the best a probabilistic algorithm can do for the worst-case input? Perhaps it might be easier to show the limitations of a deterministic algorithm on the average over an adversarially chosen distribution of inputs. Andrew Yao observed these values are one and the same.

Andrew Yao, "Probabilistic Computations: Toward a Unified Measure of Complexity." FOCS 1977

Best to view this result as a zero-sum game. Alice chooses a deterministic algorithm A and Ian chooses an input I. Ian receives t dollars from Alice where t is the "cost" of algorithm A on input I. By applying von Neumann's minimax theorem Ian and Alice have probabilistic equilibrium strategies. That is Ian has a distribution of inputs that achieve an expected cost t no matter what algorithm Alice chooses. Also Alice has a probabilistic algorithm (a randomized choice of deterministic algorithms) that will achieve an expected cost of the same t no matter what input Ian chose.

The Yao Principle applies when we don't consider the algorithmic complexity of the players. For example in communication complexity we have two players who each have a separate half of an input string and they want to compute some function of the input with the minimum amount of communication between them. The Yao principle states that the best probabilistic strategies for the players will achieve exactly the communication bounds as the best deterministic strategy over a worst-case distribution of inputs.

The Yao Principle plays a smaller role where we measure the running time of an algorithm since applying the Principle would require solving an extremely large linear program. But since so many of our bounds are in information-based models like communication and decision-tree complexity, the Yao Principle, though not particularly complicated, plays an important role in lower bounds in a large number of results in our field.

Sunday, October 15, 2006

Numb3rs of Collaborators

Last week's episode of Numb3rs The Mole had a mildly interesting academic side story. [Mild Spoiler Warning] Charlie, the mathematician, discovered that his friend Larry, the physicist, published a paper without asking Charlie to collaborate on the math, which according to Charlie would make the paper go from "very good" to "great". Larry later confessed to Charlie's dad that he worried too much about relying so much on a single collaborator, especially one so busy helping his brother at the FBI. Eventually Larry and Charlie talked out their issues and agreed to work together again.

In the real academic world you rarely see two people collaborate almost exclusively over a long period of time. We all have certain people with whom we collaborate often because we have similar interests, complement each other's skills, or simply that we work well together. But having a single collaborator can lead to narrow research, using the other as a crutch and worrying that outsiders won't know which one is the stronger researcher. But most importantly we thrive on variety and having different collaborators keeps research exciting.

Friday, October 13, 2006

What Happened to Departmental Tech Reports?

Imagine back to the early 90's before we had a world-wide web. You had a new result, a nice result but not so important that you would do a mass email. You wanted to mark the result with a time-stamp and make it available to the public so you created a departmental technical report, basically handing a copy of the paper to a secretary. You would get a report number and every now and then a list of reports was sent out to other universities who could request a copy of any or all of the reports. Eventually the paper would go to some conference and journal but the technical report made the paper "official" right away.

As the web developed CS departments started putting their tech reports online. But why should you have to go to individual department web sites to track down each report? So people developed aggregators like NCSTRL that collected pointers to the individual paper and let you search among all of them. CiteSeer went a step further, automatically searching and parsing technical reports and matching citations.

But why have technical reports divided by departments? We each live in two communities—our university and our research field. It's the latter that cares about the content of our papers. So now we see tech report systems by research area, either area specific systems like ECCC or very broad report systems like arXiv that maintain specific lists in individual subareas that bypass the department completely.

What's next? Maybe I won't submit a tech report at all letting search engines like Google Scholar or tagging systems like CiteULike organize the papers. Departmental tech reports still exist but don't play the role they once did and who can predict how we will publish our new results even five or ten years down the road.

Wednesday, October 11, 2006

STOC Undergrad Research Competition

The ACM Symposium on Theory of Computing (STOC) will host its first Undergraduate Student Research Competition in 2007 sponsored by Microsoft Research. From the entrants a number of students will be invited to attend the conference at FCRC in San Diego in June where they will present their work at a poster session. Four to five finalists will give oral presentations at the conference. The top three finalists will receive cash prizes and advance to the grand finals of the broader ACM Student Research Competition.

As an added bonus participating in a project can only help your grad school applications as many Ph.D. programs like to see some research experience. Submission deadline is not until February 23 but you'll have to start very soon to get a project ready in time. So go find some ideas from a friendly CS theory professor at your university and get cracking.

See you in San Diego.

Tuesday, October 10, 2006

Wholly Natural

My daughter's math text list natural numbers as {1,2,&hellip} and whole numbers as {0,1,2,&hellip}. I remembered them the other way around, after all zero seems natural but is a whole lot of nothing. But for her homework she has to use the textbook terminology lest she gets marked wrong.

Afterwards in my class I used the phrase "For every natural number n" and then stopped myself and asked the class about what natural and whole means to them. The class had diverse opinions and also mentioned something called counting numbers. Searching various web sources seem to agree for the most part that whole numbers start with 0 but are less clear on the natural and counting numbers.

For that class and in most of what I do the distinction is irrelevant since we usually only care about "sufficiently large" n. But if you are using a context where zero matters, please use the terms "nonnegative integers" and "positive integers" so as to not confuse those of us who can't keep our whole and naturals straight.

Monday, October 09, 2006

Theory Confidential

I hate to keep secrets but in our field much of what we discuss should be kept confidential as much as possible. What do we need to keep quiet about?

Recommendation Letters. Should only be read by during the recruiting process and never by the candidate except as required by law. If someone other than the candidate asks you for a recommendation you shouldn't even mention to the candidate that you were asked.
Referee Reports. While the reports themselves are furnished to the author, the identity of the referees must be kept secret. Any discussion between the referees and authors should go through the editor.
Committees. Any discussion in a program committee (or any other kind of committee) should remain closed except as agreed upon in the committee. This allows the committee members to speak freely. In a PC you shouldn't even mention whether a paper was submitted.
NSF Panels. You should not disclose any discussion during an NSF panel, or even the fact that you were a panelist.
Salaries. You can announce your own salary but you shouldn't mention other people's salaries. Exceptions for surveys and states that require that the public have access to the salaries of all of their employees including state university professors. Update: I've just been told I am not allowed to publicly announce my salary as a University of Chicago employee. Apparently the employer can decide the appropriate policy for salary disclosure in the US.
Personal Information. Disabilities, Illnesses physical and mental, Gender, Sexual Orientation, Marriage, Children, Religion, Race and other related issues except as necessary or as already publicly known.
Email and Personal Discussions. You shouldn't reveal research or other discussions with someone else without their permission.

So what can you talk about? Anything made public, on a web page, in writing or in "public" is fair game. After all we bloggers do need stuff to write about.

If you truly want things you say to remain private then don't say it. With many theorists having loose lips and the minimal security of email you cannot count on the fact that what you say that should not be spread will remain unspread.

Saturday, October 07, 2006

Brochure Season

Tis the season that my fellow professors and I start receiving collections of brochures, newsletters and posters from various CS departments around the country. I never see this propaganda from the top four departments (MIT, CMU, Berkeley and Stanford) but usually that next level or below trying to exclaim how great they are.

The brochures look the most impressive, for example Wisconsin-Madison with 36 glossy pages including a two-page spread on Jin-Yi Cai and the P vs. NP problem.

The thrill is in the chase for Cai and others in Theory of Computing. He describes his research with the language of an artist, drive by "elegance, internal logic and beauty." The usefulness of the findings in this field can often be transformational, but they may not be evident until decades later.

Newsletters focus more on recent hires, awards and research activities. Rutgers, for instance, has a full page on new hire (and former NEC colleague) Joe Kilian. The profile even mentions Joe's work on the Franklin eBookman. "As a theorist, it was rather strange to realize that I could go to Staples and buy a device that contains actual code I've written."

The posters have a more specific function like advertising the graduate program or announcing distinguished lecturers. They can't really expect us to travel a thousand miles to see a single talk. I suspect the distinguished lecturer posters really say "We are a real department because famous computer scientists visit us."

Brochures, newsletters and posters all have the same true purpose of branding, so we think of those departments when we recommend graduate schools for our undergrads, faculty jobs for our Ph.D. students and when we fill out those questionnaires that lead to departmental rankings.

The departmental web page has become the true portal that potential students and faculty go to to explore a department. Most CS departments that home pages high on content but often low on visual appeal. Departments should put as much effort into the style of their web pages as much as they do for those brochures, newsletters and posters.

Thursday, October 05, 2006

Embargoed Science

NPR's On the Media interviewed an old college friend of mine, science writer Vincent Kiernan, about his new book Embargoed Science.
"Embargoed Science" refers to articles that some journals, such as Science or Nature, send to science writers in advance of publication under strict rules not to write about the article until publication date. The embargo supposedly gives writers a chance to do interviews and background research before they have to write the stories. Kiernan argues that embargoes misdirect science journalists.

It sets up a system by which journalists are given powerful incentives to cover a certain kind of story about science – the latest "eureka moment," and those eureka moments come several times each week. And so the journalists are directed toward those stories and away from critical, skeptical coverage of science as an enterprise, with all its fraud, with all its successes, with all the ethical issues…Lots of very, very good science gets published in many, many journals. There are literally thousands of journals, and journalists monitor 30, 40, if they're lucky – the ones with embargoes. The ones that are not embargoed, that also publish very, very good science, like a bunch of geophysical journals, they are largely ignored by journalists.

We can learn a lesson here, though perhaps not the lesson Kiernan wants us to learn. If we want more publicity for our best results, we should embargo those results, sending the articles and supplementary explanatory material to reporters in advance. Though this would mean not publicly announcing those results before they appear in journal, something that might not fly very well in the computer science community.

Wednesday, October 04, 2006

Recommendation Systems

The big CS news of the week: Netflix, the online DVD subscription rental site, has announced a million-dollar prize for substantially improving their movie recommendation system. The New York Times has the story, NPR has an interview with James Bennett, Netflix vice-president of recommendation systems, and John Langford gives his take.

Recommendation Systems (or Collaborative Filtering) tries to match one person's interests based on the interests of a large collection of other users. At first this seems like an easy task, just trying to match vectors. But the simple ideas don't work very well. The AI community have developed much more sophisticated techniques that have been implemented in companies that take recommendation systems seriously, like Amazon and Netflix. Apparently these techniques have reached the point of diminishing returns, thus the contest.

I understand that Amazon wants to sell more stuff, but why does Netflix take the problem so seriously, to the point of having a VP of recommendation systems as well as running this contest? They only recommend to already paying subscribers, the amount of extra business they get or keep by a strong recommendation system seems minimal.

But I shouldn't complain. Too often the public thinks of computer science as simply writing programs and making them run quickly. The Netflix contest sheds light on a different view of CS that shows the depth in a seemingly simple problem.

The million-dollar prize puts recommendation systems in the same class as the P versus NP Millenium Prize. Though if you could show P = NP by giving a quick algorithm for NP-complete problems, you can use that algorithm to develop a great recommendation system and collect a cool two mill.

Tuesday, October 03, 2006

Commitment

Quick quiz

Can you submit a paper to a conference that you are not sure you can attend?
You have agreed to give an invited talk at a conference. But you find yourself traveling too much. Can you cancel your talk?
You have accepted a tenure-track job at a good school but then you get an offer at a more desirable place? Can you take back your acceptance at the first place?
You promised to referee a paper by a specific date. But life gets busy. Can you let the deadline slide?

Of the correct answers to all of the above is "No!" When you submit a paper to a conference, agree to give a talk, accept a job or promise to referee you make a commitment and you should, as a responsible member of the scientific community, honor your commitments.

Many members of our community treat such commitments quite seriously but unfortunately too many of us don't. For the latter group put yourself on the other side. Think of the editor dealing with referees who aren't refereeing and the author not getting his paper reviewed in a reasonable amount of time. Think of the department that has stopped their job search believing they have filled their opening. Think of the conference organizers having to reshuffle their program for talks not given.

Sometimes you have extenuating circumstances, like an illness, that do give you reasonable excuse. And you could always ask; people will often modify or let you out of your commitments if you make a polite request. But you must make every effort to honor your commitments. If you don't think you can then you shouldn't commit in the first place.

The ultimate commitment you make is the commitment to research. Once you start graduate school you make a promise to focus on science and your research as your main objectives. Only by keeping that commitment can you truly succeed as a scientist.

How much commitment do you need? In a ham and cheese omelet the chicken and the cow are involved but the pig is committed.

Sunday, October 01, 2006

The New CCC

Not the Conference on Computational Complexity but the Computing Community Consortium, a new organization funded by the NSF and organized by the Computing Research Association. The CCC will develop major research opportunities and "grand challenges" enlisting community involvement in creating new research visions and activities.

The original NSF solicitation focused on large-scale infrastructure but a CCC white paper, while purposely vague, seems to take a broader view.

Compelling visions take many forms. History has amply demonstrated the importance of entrepreneurial, grassroots efforts as creative engines in computing research. History has also demonstrated the value of large teams, large facilities, and large amounts of funding. Many see an increasing need for shared research facilities and teams in our field to allow us to tackle certain "grand challenge" problems. Planning for large-scale research should not, and need not, harm smaller-scale efforts or place them at a disadvantage.
The challenge for the Computing Community Consortium is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to articulate those research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with funding organizations to move the challenges and visions toward funding initiatives. The CCC will do this without harming the research environment that has created the computing world of today.

The NSF has awarded six million dollars over three years to the project, a rather large sum for an organization that won't actually do any research. Given this commitment, the CCC will play a major role in shaping CS research for several years. We should all work to help the CCC truly reach its potential of developing new funded programs across the full spectrum of computer science research.