Wednesday, May 31, 2006

Dispersing Ramsey Graphs

Perhaps the very first example one sees of the probabilistic method: Show there exists a graph on n vertices with no clique or independent set of size k = 2log n. Simply pick the graph at random. For any set S of k vertices the probability that the graph restricted to S will be a clique or independent set is at most p=2-(k choose 2)/2. The probability that any subset S is a clique or independent is at most p times (n choose k) which is less than one for k = 2log n. So there must be some graph with no clique or independent set of size k.

Actually constructing such a Ramsey graph is another story. You can create the graph in quasipolynomial (2poly log n) time using standard derandomization techniques. In 1981, Frankl and Wilson had a polynomial-time construction of a graph that had no clique or independent sets of size 2(log n log log n)1/2. That bound stood until the recent STOC paper by Barak, Rao, Shaltiel and Wigderson creating a graph with no clique or independent set of size 2(log n)o(1).

Barak et. al. were not trying to create Ramsey graphs, rather to create randomness dispersers for two independent weakly random sources. As a bonus they improved the Frankl-Wilson bound giving yet another example where proving one kind of theorem in complexity gives an exciting result in an apparently unrelated area.

Tuesday, May 30, 2006

The South Has Risen

Who would have thought the next theoretical computer science powerhouse would have come from Atlanta? Georgia Tech this year has hired Santosh Vempala and Adam and Yael Tauman Kalai into an already amazing theory group. In the past few years Georgia Tech has gone from a good theory program to one of the largest and strongest in the country. How did they accomplish that feat? Seeing potential where others haven't, solving two-body problems both inside and outside the department and most importantly having the resources to go after opportunities when they occur. Alas two of their recent hires came at the expense of Chicago/TTI but hey, all's fair in love, war and recruiting.

It's not like other departments have stopped hiring in theory. In the past two hiring years alone several schools have hired junior theorists including Carnegie-Mellon, Cornell, Michigan, MIT Math, Penn State, Rochester, Stanford, Washington and Wisconsin. Many universities realize that in order to have a strong CS department one needs a strong theory group and in order to improve or even maintain strength in theory one needs strong young talent.

Thursday, May 25, 2006

Talking to Publishers

A few publishers showed their wares at STOC. This is a good opportunity to talk them about book ideas or about publisher's policies and how they operate. You can also get some good books at a discounted price; I picked up J. Michael Steele's The Cauchy-Schwarz Master Class from Cambridge University Press based on a recommendation.

I had a lengthy conversation with Sweitze Roffel, who took over Chris Leonard's position as publishing editor of the Elsevier theory journals. Sweitze is quite aware of the negative perception of Elsevier in the theory community and wants to talk to the community about their concerns. So talk to him at conferences or send him email and tell him your concerns about Elsevier's policies. A couple of nuggets.

  • Sweitze will move the offices for his journals from Amsterdam to New York to emphasize the global nature of Elsevier and be closer to editors.
  • Elsevier has an arrangement with Microsoft's Academic Search but negotiations with Google Scholar are going slowly because of Google's "secretive" policies.
  • Elsevier plans to give contributors (editors, referees, authors) access to their Scopus system. Elsevier also has their own free academic search site Scirus.
  • The free access experiment of Information and Computation continues.
  • Elsevier is exploring starting their own version of Springer's Lecture Notes in Computer Science series and also a Review journal.
I also talked to Lauren Cowles, an editor for Cambridge University Press who wrote a good comment on a publisher's view of putting books on the web. She pointed out Victor Shoup's A Computational Introduction to Number Theory and Algebra is available for purchase and also freely available online.

Wednesday, May 24, 2006

Flying American

A commenter asked about the Fly America Act that requires, with few exceptions, that when traveling abroad on US money, such as an NSF grant, one must use a US-based carrier. This silly protectionist law just supports American airlines with tax-payer dollars and in the end wastes both money and time. Not only should I be able to fly a foreign carrier to other countries I should even be able to fly a foreign carrier between US cities.

As a practical matter the Act is more a nuisance than a serious problem. US carriers have an extensive collection of foreign routes, you can fly most foreign flights via a US-airline codeshare, and in a jam one could launder some grant money to a non-grant account and then use non-grant money to fly the foreign carrier. Not that I ever have, ever would or ever condone taking the last action.

Tuesday, May 23, 2006

Thoughts from STOC

I don't go to many talks at STOC, I prefer hanging out in the hallways and talking with other attendees. But the best talk I saw so far was the first, Atri Rudra gave an easy to follow overview of a his paper Explicit Capacity-Achieving List-Decodable Codes with Guruswami. Both student paper award winners also gave nice overviews of their technical results. Much better than trying to wow us with complicated formulas.

What were the folks in the hallways talking about? Worry about funding in the short term but cautious optimism a few years down the line. Some optimism on employment; many good places hired in theory this year and most students found postdoc, faculty or industrial positions. I didn't see many students scrambling for jobs. Last time STOC was in Seattle (1989) I was one of those scramblers.

Prabhakar Raghavan, a theorist who now heads Yahoo! Research, gave the first invited talk about some mathematical questions related to Yahoo. Prabhakar spent a considerable part of his talk on sponsored search, the bidding and ranking mechanisms for those who pay to be listed right of the search results. Yahoo currently uses a variant of second-price auctions that is non-truth telling and has some other flaws but is simple enough for people to understand how much they will pay. Google on the other hand use more complicated schemes with nicer properties but most if not all of its users don't really understand the bidding mechanism.

Russell Impagliazzo gave the other invited talk on pseudorandomness, his title slide containing a joke only my generation would get.

Let's secretly replace Al's coffee cup of random bits with pseudorandom bits and see if he notices.
Russell's take-home message: Randomness does not help in algorithms but we can't prove it doesn't help until the circuit complexity people (like Russell) get on the ball and prove some good lower bounds.

On a personal note, today I have lived as long as my father. Puts a real perspective on life.

Monday, May 22, 2006

STOC Business Meeting

Howdy from Seattle. Here's some highlights from the STOC business meeting.

78 papers accepted out of a record 288 submissions.

There were 275 registrants including 109 students. Rooms at the conference hotel ran out slightly before deadline and the organizers scrambled to find rooms at a nearby hotel. Be sure and reserve your hotel early in the future.

As of January 2, Bill Steiger has become the new program director for theoretical computer science at the NSF. He expects to fund 15-20+ awards from an estimated 100 theoretical computer science submissions to the Theoretical Foundations solicitation due this Thursday.

Richard Karp gave a long talk about the activities of the SIGACT Funding Committee. More on this in a future post.

Tom Leighton won the SIGACT Distinguished Service Award. The best student paper award winner was split between Anup Rao for Extractors for a Constant Number of Polynomial Min-Entropy Independent Sources and Jakob Nordström for Narrow Proofs May Be Spacious: Separating Space and Width in Resolution. The best paper award went to Irit Dinur for The PCP Theorem by Gap Amplification.

FOCS 2006 October 22-24 in Berkeley. STOC 2007 June 11-13 in San Diego as part of FCRC. STOC 2008 in Victoria, British Columbia May 18-20.

Friday, May 19, 2006

Favorite Theorems: Primality Algorithms

April Edition

For many years the standard example of a probabilistic algorithm checked whether a number was prime.

Riemann's Hypothesis and Tests for Primality, Gary Miller, STOC 1975, JCSS 1976.

A Fast Monte-Carlo Test for Primality, Robert Solovay and Volker Strassen, SICOMP 1977.

Probabilistic Algorithm for Testing Primality, Michael Rabin, Journal of Number Theory, 1980.

Let n be an odd number and n≠rq for r,q>1. Fermat's little theorem shows that for any a, 1<a<n, if an≠a (mod n) then n is composite. But for a special set of composites called the Carmichael Numbers, this test will fail for all a. Miller adds the condition that a(n-1)/2k-1 and n are relatively prime for all 2k dividing n-1 and shows that assuming the Extended Riemann Hypothesis, for any composite n there is an a < O(log2 n) passing the test. This gives a polynomial-time (in the number of bits to express n) algorithm for primality assuming ERH. Rabin showed that one could instead choose random a's giving a probabilistic algorithm with no assumption. The resulting test is now called the Miller-Rabin primality test. Solovay and Strassen give a different test based on the Jacobi Symbol.

Of course now we know that primality has a polynomial-time algorithm with no unproven assumptions. So why did I choose these obsolete algorithms for favorite theorems? They are not so obsolete as they are considerably more efficient then Agrawal-Kayal-Saxena. But more importantly they served as the motivation for the study of probabilistic computation, much like Shor's Quantum Factoring Algorithm motivates quantum computing today. Without studying probabilistic computation we would have had no modern cryptography, no derandomization results, no Toda's theorem, no interactive proofs and no probabilistically checkable proofs with their hardness of approximation consequences.

Wednesday, May 17, 2006

Taulbee Survey

Via CRA Bulletin the 2004-2005 Taulbee Survey has been posted. A wealth of statistics about computer science in the US and Canada. The charts fall into several major categories.
  1. Ph.D. Production and Employment. Most CS Ph.D's ever (1189) last year. Nearly all of them found jobs.
  2. Bachelor and Master's Students. Enrollment continues to decline.
  3. Faculty Demographics. Faculty sizes continue to grow, though slowly, and are getting slightly more diverse.
  4. Research and Graduate Student Support. A drop in research expenditures likely due to fewer grants.
  5. Faculty Salaries. Truly useful when negotiating your pay. An interesting inversion where the schools ranked 13-24 in CS pay higher salaries than the schools ranked 1-12. And just who is being paid $400K?
The end of the report sums up the statistics nicely.
As predicted last year, our field is producing Ph.D.s at a record rate, and the short-term forecast is for continued record production. While there is no evidence in our employment statistics that the increased production is resulting in an inability of Ph.D. graduates to find work, an increasing fraction of new Ph.D.s appear to be taking positions outside of North America. In the wake of accelerating globalization of the marketplace, this is not surprising.

Three consecutive years of decreasing numbers of new Ph.D. students, and a sharply reduced pipeline at the Bachelor's level, will make it difficult to sustain this production rate in the longer term. Moreover, it is not yet clear when the decline in our undergraduate program enrollments will end. The double-digit percent decrease in bachelor's production observed this year is likely to continue for the next several years. Coupled with the declining representation of women in our undergraduate programs, our ability to produce a workforce that is sufficiently educated technically to meet the needs of the job market in computing is being severely challenged. The declining enrollments at the Bachelor's level also will increasingly challenge the ability of CS/CE departments to grow their faculty as they desire.

Tuesday, May 16, 2006

Graph Theory and the NSA

How could I not blog about Jonathan Farley's op-ed piece The NSA's Math Problem? Farley looks at the NSA's use of phone data as and argues that using graph theory to analyze the calls won't help find terrorists. But Farley doesn't do a particularly good job making his case.
First, the "central player" the person with the most spokes might not be as important as the hub metaphor suggests. For example, Jafar Adibi, an information scientist at the University of Southern California, analyzed e-mail traffic among Enron employees before the company collapsed. He found that if you naively analyzed the resulting graph, you could conclude that one of the "central" players was Ken Lay's…secretary.
Somehow if we find the "secretary" of a central US terrorist that should be considered a major success. But more importantly if we know just a little about some terrorist activities the graph can give us a great advantage in finding important sites. Google's PageRank primarily uses graph theory very successfully to rank order search results and there is no reason similar ideas won't work on phone data as well.

By no means should we accuse someone of being a terrorist solely because of their calling pattern. Will all terrorists be found on phone data alone? Of course not. But using graph information can give us important information as to where to look and narrow the search.

The main objection to the NSA's work is not that the phone data has little value but that it is too valuable. You can use the data to find out much more about Americans than who is a terrorist. We lose our freedom against government intrusion when the NSA has this data and that's what we need to argue against, not some fake argument that the NSA's algorithms won't work.

Steven Levy has a more reasonable discussion in Newsweek.

Monday, May 15, 2006

Computer Reliability

Suppose we could create a system where all automobile traffic in the US would be controlled by a central computer system. Traffic would flow more smoothly, fuel consumption could be better controlled, but with a catch that failures in the system could cause 10,000 deaths/year.

Keep in mind that we now have over 38,000 automobile fatalities per year. Even if we leave out alcohol-related deaths that number drops only to about 24,000. Still the public would find 10,000 deaths unacceptable and we would junk a system that would actually save lives as well as time and fuel.

I find when we frame the debate on the unreliability of computers, we usually measure it against perfection, rather than measuring it against the status quo. Whenever I read the Inside Risks column at the end of each CACM, I feel they fail to point out how little the risks are compared to the advantages of computing, instead of pointing out how bad the risks compared to unachievable perfection.

Consider electronic voting. I noticed that Diebold, the company in the middle of the electronic voting controversy, also makes the ATMs I use to withdraw money from my bank. ATMs are not foolproof, thieves have managed to fake ATM cards and discover passcodes to steal money from these machines. But banks know that the labor cost savings they get from ATMs greatly outweigh the losses.

Will electronic voting ever completely prevent any kind of fraud? Of course not. But will it beat out the systems we currently have in place? That's not that high a bar to pass.

I can vote proxies on my stocks and mutual funds over the Internet. There are some real money issues involved in the proxies. If Internet voting works well enough when serious money is involved, why can't we use it for general elections as well?

Saturday, May 13, 2006

FCRC

This Monday May 15th is the early registration deadline for this year's Conference on Computational Complexity in Prague. The early registration date for the Electronic Commerce (EC) conference is Tuesday the 16th.

Next year both Complexity and EC will be part of the Federated Computing Research Conference (FCRC) in San Diego along with a plethora of other conferences including STOC, Computational Learning Theory (COLT), and Parallel Algorithms and Architecture (SPAA). June 13th is the day of death: STOC (2 tracks), EC, Complexity and COLT all have sessions that day. A theorist could find him or herself wanting to see five talks all given at the same time.

Thursday, May 11, 2006

Game Science

Ehud Kalai proposes renaming the Game Theory Society to the Game Science Society and welcomes your comments. The goal of changing the name of the society is to change the name of the field to better describe what the field does and broaden its image.
An important example where the expanded name may get additional support is within a university. For example, it is hard to imagine the creation of a department devoted to the study of a theory. On the other hand a department devoted to study a science seems more plausible. To put things in perspective think of an analogy within another young field. Devoting major resources to a subject called "computing theory" less likely than devoting major resources to a subject called "computer science."
I've mentioned changing the name of Game Theory before but then again I never felt Computer Science is a great name. Following Kalai's reasoning, what if we renamed "Theory of Computing" to "The Science of Computing". Would that make our field sound more noble and generate more funding?

Wednesday, May 10, 2006

The Importance of Natural Proofs

Razborov and Rudich's paper Natural Proofs gives a combinatorial framework for proofs to show that circuits in a given class cannot achieve a given computational task. The paper explains the difficultly of extending these proofs to show, for example, that NP does not have polynomial-size circuits (and thus P≠NP). But do natural proofs really present a barrier to proving important circuit lower bounds?

One approach to showing NP does not have polynomial-size circuits: Find some property C of functions such that SAT is in C. We then show, using some sort of inductive argument, that no function computable by polynomial-size circuits can have property C. This would imply SAT, cannot have polynomial-size circuits.

Briefly a natural proof is such a proof where C has two properties.

  1. Largeness: C contains many functions.
  2. Constructivity: One can efficiently verify that a function f is in C.
Razborov and Rudich show that such a proof against polynomial-size circuits would break pseudorandom generators and in particular imply that the discrete logarithm is not hard. So under reasonable hardness assumptions, natural proofs cannot be used to prove lower bounds against polynomial-size circuits. See the paper for more details.

Sounds bad for proofs against circuits. But let's consider the two properties. The authors give a good argument why the largeness condition should hold, however

We do not have any similar formal evidence for constructivity, but from experience it is plausible to say that we do not yet understand the mathematics of Cn outside exponential time (as a function of n) well enough to use them effectively in a combinatorial style proof. We make this point in Section 3, where we argue that all known lower bound proofs against nonmonotone circuits are natural by our definition.
Indeed they do show all known proofs are natural, but in some cases go through considerable effort to "naturalize" these proofs (as opposed to relativization where the fact that a theorem relativizes follows immediately from the proof).

Consider what I call quasinatural proofs, where we only require the largeness condition. One might say that if discrete logarithm is hard then a quasinatural proof must prove the nonconstructivity of C. But really you get a conditional. If there are quasinatural proofs against polynomial-size circuits then

If C is constructive then Discrete logarithm is easy
which is just a "pigs can fly" theorem that we see often in complexity.

Avi Wigderson points out that unconditionally you cannot have a natural proof showing that the discrete logarithm problem is hard. If we unravel this statement then we get that giving a quasinatural proof showing discrete logarithm is hard would require proving that discrete logarithm is hard, hardly a surprise.

I don't have an example of a quasinatural proof not known to be natural as we have very limited techniques for proving circuit lower bounds. Natural proofs do give us some insight into what kind of proof techniques we need for stronger lower bounds, but they do not, in and of themselves, present a major barrier to finding such proofs.

Tuesday, May 09, 2006

Forward-Looking Links

A computer science paper goes through many phases: manuscript, technical report, conference submission and proceedings and journal submission and published version. As a paper goes through these stages they usually improve, adding more background and intuition, better and more detailed proofs and so on. If someone wants to read your paper, you'd like them to look at the latest version. How do you make sure that they even know about the latest version?

You can't go into everyone's paper proceedings and add a yellow sticky to your paper saying to check out the new and improved journal version. But in this electronic age we can, in principle, add these notes.

First of all keep the papers on your webpage up to date. Many people just go to an author's page to download a paper and often they find some ancient version.

But after that then what? ECCC allows one to submit a revised version or add a comment which could point to a revised version. arXiv allows one to add journal information to an existing paper. Both of these require actions by authors that rarely happen. The digital libraries of proceedings publishers ACM and IEEE-CS don't have any mechanism to add pointers to later papers.

The field should have some standard mechanism for updating pointers to future papers. Until then we have to rely on the readers to find the latest papers on their own and perhaps hope that paper search tools like Citeseer, Google Scholar and Microsoft Academic Search will point to the latest and greatest version of a paper.

Sunday, May 07, 2006

Rejection and Rejecting

Luca and Oded talk about rejection. Here are some thoughts.

Rejection hurts. Academics thrive on earning the respect of their peers and it's tough to think that someone doesn't want you. So go ahead and be depressed for a day or two and then move on.

Stanford is my ultimate rejecter, having turned me down for undergrad, grad, junior and senior faculty positions without the least bit of interest. But I got my revenge–I once got a parking ticket at Stanford and I never paid it. Ha!

A few people have complained to me about how rejections letters are written. A rejection letter contains exactly one bit of useful information. The rest is irrelevant and you should not let it get to you.

Suppose Alice sends email to her friend Bob asking if Bob's department would be interested in her. In academics, Bob usually won't give his real thoughts ("We are looking for strong candidates and you are not one of them"), instead he'll find some property P such that Alice has P but they won't hire in P, for example "Unfortunately we are not looking for any cryptographers this year." A couple of warnings for Bob:

  1. Be sure P is not illegal, i.e., based on religion, race, gender, etc. Even if you don't discriminate, saying that you do is not a smart thing.
  2. Bob's department might end up hiring a cryptographer. Then Alice will realize that Bob didn't want Alice because she was a cryptographer, rather he just didn't want Alice.

Thursday, May 04, 2006

The Cost of Big Science

A panel at the National Academy of Sciences suggests that the US should support Fermilab's bid to land the International Linear Collider.

International
Linear Collider. Ballpark Cost: $4 Billion to $10 Billion
via Chicago Sun-Times

As a scientist I should support such a project, especially one located in the suburbs of Chicago. But at what cost? As a perspective, next year's proposed budget for the entire National Science Foundation is just over $6 billion.

This ILC reminds me of the Superconducting Super Collider, a project that spent $2 billion dollars digging a hole in Texas that was killed by Congress in 1993 once the projected costs topped $12 billion.

Putting large dollars into a single basket will take away the incentive to increase basic research funding in other scientific endeavors. The main argument for the ILC at Fermilab is not that the research won't get done, it just won't get done in the US. So let the Europeans or the Japanese have the flashy expensive collider and let the US do what it does best—basic research advancing science over a large range of disciplines.

Wednesday, May 03, 2006

Four Coloring the United States

For a talk I wanted to show a map of the United States using four colors with the usual constraint that every pair of states that share a common border have different colors. The four-color theorem says that such a coloring must exist.
So I tried Googling to find such a picture of a four-colored United States but I couldn't find one. NASA states it as a challenge but doesn't give the solution. Most maps I found like this one



use five or more colors, probably because they use a simple greedy algorithm.

Four coloring the US is not difficult. I found a Map Maker utility that lets you color the states anyway you want. Here is my four coloring.

Four-Colored United States
My independent sets:
  1. AK, AL, AR, CT, DE, HI, IL, ME, MI, MN, MT, NE, NM, NV, SC, VA, WA
  2. AZ, DC, FL, KS, KY, MS, NC, ND, OR, PA, RI, TX, VT, WI, WY
  3. CA, CO, GA, ID, IN, LA, MA, MO, NJ, SD, WV
  4. IA, MD, NH, NY, OH, OK, TN, UT
The United States cannot be three colored, just consider Nevada and its neighbors. When writing this post I Googled on "four-color theorem" and the first link was this page which features a four-colored United States. Well at least I got a weblog post from all this work.

Tuesday, May 02, 2006

New Priorities for Computability Theory

Bob Soare, who wrote one of the great textbooks on recursion theory and then almost single-handedly changed the name of the field to computability theory, teaches an intense two-quarter class on the topic every other year in Chicago. To my surprise just now, halfway through the second quarter (15 weeks into computability theory) he is just proving the solution of "Post's Problem", the existence of incomplete degrees, that excited Gödel in his letter.

When I sat in on Soare's class in the early 90's, by this time he had covered much more complicated finite injury arguments and was starting the infinite injury constructions like the Sacks Density Theorem: Given r.e. sets A and B, such that A is reducible to B and B is not reducible to A, there is a set C that lies in between.

I asked Soare about this last week. He isn't dumbing down his class, rather he's acknowledging a change in direction in the field, more along the lines of looking at the complexity of reals (infinite sequences of bits), often by examining those defined by infinite branches of computable trees.

For example, many computability theorists today are studying notions of "random reals", infinite sequences that share some properties of randomly chosen numbers. They have shown neat connections to Kolmogorov complexity and connections to Chaitin's Ω. For any reasonable ordering, Chaitin's Ω is a computably enumerable random real and Kucera and Slaman show the surprising result that the converse is true as well.

One used to measure a recursion theorist by the difficulty of their constructions; now we see more a focus on the beauty of the theorems and their proofs.

Soare is working on a new version of his textbook that will differ in a couple of ways. He is changing terminology (recursive and r.e become computable and c.e.), but more importantly he changes the focus to more strongly develop the theory that drives computability today. A good lesson: Fields change over time and better to acknowledge and embrace those changes than to fight them.