Thursday, September 30, 2004

The Specialization of Computer Science

I heard the following from a senior economist recently.
A researcher at the beginning of his career has to please others. In order to receive a Ph.D., get a job and eventually tenure, he has to not only produce good research but research of value to others. The researcher might think that once he gets tenure he can do the research he wants, but by that time he has become so specialized it is impossible to change direction.
Looking at economics can give us a glimpse into the near future of computer science as a discipline. Though economics as a field has been around for a very long time, only since the 1950's did economics develop as a rigorous mathematical discipline. This gives economics about a 15-year head start on computer science.

When I started in grad school in the mid-80's, I could follow nearly every talk at the major theory conferences, STOC and FOCS, though I would not have followed any computer science talk which might have been true 10-15 years earlier. These days I can understand the importance of the main results of maybe half of the talks and have actual interest in only a small fraction of these. The growth of specialty conferences such as Complexity, SODA (algorithms), SoCG (Computational Geometry), Crypto, RANDOM/APPROX, COLT (Learning Theory), LICS (Logic in CS), QIP (Quantum) and so on have only increased this divide. We get less and less crossbreeding between these subfields of theory.

On the other hand, some researchers still can and do (with some effort) change research areas in theory and general computer science even after tenure. But in 15 years will we look like economics where a late change in field is difficult to impossible and soon after that like mathematics where it often takes years of graduate school just to understand the major open questions in an area.

Tuesday, September 28, 2004

Computer Science's Newest Genius

Congratulations to Daphne Koller, this year's MacArthur Fellow in computer science. Koller uses a strong mathematical approach to decision making and learning.

Monday, September 27, 2004

Republicans and Democrats on Science Research

From a comment on my last post.
I think most computer scientists, even conservatives vote Democrat for one reason. Democrats fund the NSF, and the NSF gives us fat paychecks.
From discussion I have with other computer scientists, I don't find science funding a major factor in their voting decisions. On top of that the preface doesn't hold water. I went and computed the average yearly increase in the NSF budget during the tenures of the last several presidents.
  • Carter, 7.9%
  • Reagan, 11.0%
  • Bush Sr., 10.6%
  • Clinton, 7.6%
  • Bush Jr., 9.1%
The Democratic and Republican platforms have similar goals in scientific research though the Republican platform goes into more detail. From the Democratic Platform:
We will invest in the technologies of the future, from renewable energy to nanotechnology to biomedicine, and will work to make permanent the research and development tax credit. We will achieve universal access to broadband services, which could add $500 billion to our economy, generate 1.2 million jobs, and transform the way we learn and work. And we will put science ahead of ideology in research and policymaking.
The Republican Platform takes two pages to give the same ideas (except for that last sentence). Here is the section on Research and Development.
America's economy is undergoing a fundamental transition from one based primarily on manufacturing to one based on innovation, services, and ideas. Two-thirds of America's economic growth in the 1990s resulted from the introduction of new technology and 60 percent of the new jobs of the 21st century require post-secondary education, yet only one-third of America's workforce has achieved that level.

In order to maintain America's global leadership, Republicans have provided unprecedented support for federal research and development to help spur innovation. Federal R&D funding is up 44 percent from 2001 to $132 billion in 2005, which includes a 26 percent increase in support for basic research. The President has doubled the budget for the National Institutes of Health and increased the National Science Foundation budget by 30 percent. President Bush and the Republican Party also support making the R&D tax credit permanent.

The rapid pace of technological development demands that we remain on the leading edge of innovation and science. Republicans are committed to providing the investment and incentives needed to foster next generation technologies. The 21st Century Nanotechnology Research and Development Act, passed by a Republican Congress and signed by President Bush, increased funding for nanotechnology research. In addition, the President has dedicated $1.7 billion over five years to develop hydrogen fuel cells and related next-generation energy technologies. The President's support for NASA and vision for space exploration will also enhance scientific development and technological breakthroughs.

In short the parties do not differ much on a future research investment. Both platforms also push science education. The Republicans have had a better historical record of science funding but Bush has come under fire for ignoring science in policy making. Better not to worry about science and use other factors in your choice of president.

Sunday, September 26, 2004

The Blue State of Science

I usually avoid politics in this weblog but I cannot totally ignore the US presidential election happening slightly more than a month from now. But nothing I will say would make much difference; nearly all computer scientists, and I expect most scientists, will vote for Kerry (or would vote for Kerry if they were US citizens). It's not based on a single issue. If we never went to war in Iraq, the economy were booming, Osama Bin Laden was behind bars and Bush supported a woman's right to choose, you would all still vote for Kerry.

What makes scientists so liberal? Why doesn't a field like computer science draw from a wide political spectrum? Does this liberal attitude stifle real political debate in scientific departments or even worse discourage those with more conservative views from entering our field?

Thursday, September 23, 2004

NP-Completeness is Illuminated

Another literary reference to a hard combinatorial problem. Jonathan Safran Foer describes plans for a wedding reception in his disjointed novel Everything is Illimuniated.
The hardwood floors were covered in white canvas, and tables were set in a line stretching from the master bedroom to the kitchen, each feathered with precisely positioned name cards, whose placement had been agonized over for weeks. (Avra cannot sit next to Zosha, but should be near Yoske and Libby, but not if it means seating Libby near Anshel, or Anshel near Avra, or Avra anywhere near the centerpieces, because he's terribly allergic and will die. And by all means keep the Uprighthers and Slouchers on opposite sides of the table.)

Tuesday, September 21, 2004

The Beauty of the Magic Number

Baseball has a large number of mathematical nuggets but since my childhood I have always liked the simplicity of the magic number. In a division race, the magic number is the minimum number of wins by the first place team and number of losses of the second place team to guarantee the first place team wins the division.

Let's do an example. As I write this the New York Yankees have 94 wins, the Boston Red Sox have 60 losses. The easiest way to compute the magic number comes from working backwards from the definition. There are 162 games in a season so the Yankees magic number is 162+1-(94+60) = 9. Any combination of nine Yankees wins and Red Sox losses and the Yankees wins the American League East. The "+1" comes from the fact that in a tie the Yankees would still need to win a one-game playoff to win the division.

What can the magic number teach us about complexity? Consider the RIOT Baseball Project at Berkeley. Not satisfied with the magic number, the project computes the First Place Clinch Number as the "Number of additional games, if won, guarantees a first-place finish." To compute this number one has to look not only at the current standings but the schedule of remaining games between the teams.

My main issue of the clinch number relates to complexity. Not only is it more complicated to compute; to update the clinch number after a game sometimes requires recomputing the number from scratch. The magic number has a simple update function counting down like a rocket launch. Yankees win the magic number drops by one. Red Sox lose the magic number drops by one. If the Yankees beat the Red Sox, both events happen so the magic number drops by two. And once the magic number hits zero you pop the champagne. That's the beauty of the magic number.

Sunday, September 19, 2004

Quantum in the North; Random in the South

A couple of interesting workshops happening this week both studying information, not the usual classical notions of information but quantum and random information.

At Banff in the Canadian Rockies we have the BIRS Workshop on Quantum Computation and Information Theory. This workshop will examine the properties of quantum information and tie it together with people working on quantum algorithms and complexity.

Meanwhile in Córdoba, Argentina on the edge of the Sierra Chica mountain is the Conference on Logic, Computability and Randomness 2004. Much of this conference focuses on random sets, not from a probability distribution but single sets that "look" random. Rater surprisingly, whether we define random sets by passing statistical tests, foiling betting strategies or using Kolmogorov complexity, these notions often lead to equivalent definitions of random.

While circumstances keep me in Chicago I say to the participants of both meetings: Enjoy the mountains and save some theorems for me.

Friday, September 17, 2004

NSF News

Arden Bement, who has been serving as the acting director of the National Science Foundation since Rita Colwell stepped down in February, has been nominated for the permanent director position.

The US Senate continues to work on the appropriation bills to set funding levels for next years US fiscal year that starts October 1. The CRA has a summary of the various bills affecting research funding in computer science (see also this note from AIP). Most worrisome (as I've mentioned before) is the funding for the VA-HUD bill that covers the NSF. The House committee would cut the budget by 2%. The CRA has an Advocacy Alert, still time to contact your senators.

Wednesday, September 15, 2004

Email's Curse of Success

Computer Scientists have communicated via email since the mid-1980's. Back then email worked quite well: You would send a message and usually get a quick response. We avoided telephone tag, trying to reach each other by phone when we needed to talk to each other quickly. Research ideas spread quickly; the world became smaller. Even within a department communication became paperless as one can get a message out to everyone far quicker electronically.

So what went wrong? We still use email today as the primary source of communication among computer scientists. But send a message today and I've learned to wait on average a couple of days to expect a response, if I get one at all.

Spam is the obvious culprit. Spam does clog our inboxes and even worse many of us don't carefully go through our spam folders and some legitimate mail gets unread. Spam has also made some computer scientists reluctant to share their email addresses online. But spam is not the only issue.

Email has become the communication of choice in the rest of the world as well. Besides messages from other scientists, I get email about my daughter's soccer team, announcements of upcoming concerts, warnings from the local police departments, a morning summary of the New York Times, financial information, utility bills and much more. All legitimate and usually useful email but it takes longer to work through it and slows down the time to respond to other scientists. Not to mention the many other web distractions such as news and weblogs (So stop reading this blog and respond to my emails. You know who you are.)

I can't rely on older technologies; since computer scientists expect email they check even less often their phone messages and postal mail. I can't rely on newer technology; computer scientists are surprisingly slow in adapting to new tools (like mail attachments) and it'll be years before instant messaging becomes common in the scientific community.

Oddly enough in our highly connected society it becomes harder and harder to get someones attention. So what am I doing? Slowly collecting the cell phone numbers of other computer scientists. Want mine? Send me an email.

Tuesday, September 14, 2004

Is the AP Test to Blame for Shifting CS Enrollments?

It is no secret that undergraduate enrollments in computer science have been dropping over the past five years. Students follow the money and, with the perception of a weaker job market in computer-oriented careers, less students are willing to study computer science.

Why does computer science follow the job market so closely? We don't see such swings in physics or history but such swings are common in engineering disciplines. Are undergraduate viewing computer science more as engineering than science? And why?

One theory I recently heard puts the blame on the Advanced Placement (AP) Computer Science Exam given to high school students. The reasoning goes as follows: The AP exam has a strong emphasis on the Java programming language and so high school teachers, teaching to the exam, focus most of their course on syntax and coding of Java. This gives the impression to students that computer science = programming.

I don't agree with this assessment. I looked at some sample CS AP tests. The tests, particularly the AB exam, requires some knowledge of data structures and simple algorithms. Nothing deep but enough that students should realize that computer science is more than just programming.

There was a surge of interest in computer science when I started college in the early 1980's (with the advent of personal computers) before an AP test in Computer Science even existed. Also I've heard of declines in enrollments outside the US where they don't use the AP tests.

But in the end we shouldn't be that worried about shifting enrollments. Advances in computer technology have helped drive computer science from a virtually non-existent discipline forty years ago to one that many universities now consider one of their most important departments. Better to have enrollments that swing up and down with the state of the computer industry than one that stagnates at the low end.

Sunday, September 12, 2004

Favorite Theorems: List Decoding

August Edition

In coding theory, one typically maps a string to a code such that with some small amount of error in the code one can still recover the original string. What if the amount of error is too much to give a unique decoding? In the 1950s Peter Elias suggested the idea of list decoding, coming up with a short list of possibilities one of which is correct.

Madhu Sudan showed that list decoding can be achieved in scenarios where one cannot do unique decoding.

Decoding of Reed-Solomon Codes Beyond the Error-Correction Bound by Madhu Sudan

In this paper Sudan gives a polynomial-time list-decoding algorithm that can deal with errors in the code beyond what regular codes can handle. Later Guruswami and Sudan give a polynomial-time algorithm that handles what is believed to be the best possible amount of error.

List-decodable codes have had applications to many areas including pseudo-random generators, extractors, hard-core predicates, probabilistically-checkable proofs and a neat result by Sivakumar on the implications of SAT being membership-comparable.

We've seen many other important papers in coding theory from computer scientists over the last decade. Besides the work on list decoding I should also mention Spielman's breakthrough result showing linear time encodable and decodable codes building on the initial work of using expander graphs for codes developed by Sipser and Spielman.

For much more on coding see the surveys by Sudan on List Decoding and Guruswami on Error-correcting codes and Expander Graphs.

Thursday, September 09, 2004

Which Affiliation?

Dieter van Melkebeek and I had a mild disagreement over an issue in writing a paper and he suggested I discuss it on my weblog. So here goes.

We're working on a journal version of a paper where we did the research back when we were both in New Jersey (Dieter is now on the Wisconsin faculty). Dieter listed our current institutions on the paper. I think that we should list our institutional affiliation when we performed the research, or more precisely the place paying the salary at that time. I don't feel that strongly about the issue so I am letting Dieter have his way, especially since he did the vast majority of the writing. I did ask that we have footnotes explaining our affiliations at the time of the research. (As a side note, footnotes should also contain other support information such as grants, where one was physically located when the research was done if different and contact information, though these days one needs only know how to spell my name correctly when using Google).

So I'll open this up to my readers. Which affiliation should one use? Or are we just arguing over an issue that nobody really cares about?

Wednesday, September 08, 2004

Just Because I'm a Scientist Doesn't Mean I Know Anything

My six-year old daughter playing with her friend on the computer ran into some problems. So she found me in the house and said, "Daddy, I know you are a computer scientist so you know everything about computers and we need you to fix our game."

I've learned to expect this attitude from adults. Often when I talk to a non-academic about being a computer scientist I get responses like, "Should I get Windows or a Mac?", "I was thinking of setting up a wireless network." or "What do you think of that Google IPO?". Of course this is on top of them thinking I have a cushy academic job teaching three hours a week with summers off.

For my daughter I just went ahead and fixed her problem. I know when she becomes a teenager she'll run circles around me on the computer and will say "When I was a kid I thought you knew everything about computers. What do you computer scientists do anyway?" Then we can talk.

Monday, September 06, 2004

The Electoral College

As everyone knows from the 2000 election, the United States does not use a majority rule to choose the president, rather they use a more complicated system known as the Electoral College. With some calls for the abolishment of the Electoral College, let's take a look at the College from a computer science point of view and see the rather clever device our founding fathers have created.

In short the Electoral College works as follows: 538 electors are allocated to states as the sum of the senators (2 for each state) and representatives (proportional to population). In most states, each voter picks a single candidate and the candidate that wins the most votes receives all of the electoral votes for that state. The candidate winning the majority of the electoral votes becomes president. More details here.

In computer science terms (assuming two candidates), we have a weighted majority of majorites or a depth-2 neural net. It has some properties that you would want:
  • Monotonicity: If a candidate wins the election and more people vote for him, he will still win.
  • Fairness: Barring a tie, if all the votes were switched the other candidate would win.
The College does lack symmetry, a permutation of the voters could lead to a different result. Only the simple majority function has symmetry, monotonicity and fairness. But symmetry is not necessary for an election scheme.

The United States is just that, a collection of fifty states each with their own laws, cultures and economies, united under some common principles. A simple majority would have the large populations centers overwhelm the rest of the country in choosing our leader. A majority of majorities would give some states far more power than their size would dictate. So a compromise was formed, a weighted majority of majorities to give small states some but not too much influence. The fact that this process does not always agree with majority is not a bug but a feature that preserves the balance between small and big states, rural and urban America. It also keeps balance between states of the same size, an lopsided vote in California would not overwhelm a closer vote in New York.

Some things I would change in the Electoral College: Electors should be required to vote for the candidate they represent; for each state we should have a ranked voting method instead of plurality takes all; the tie-breaking rules should be changed, now they give too much power to the small states.

The winner of the World Series in baseball is not the team that scores the most runs but the team that wins the most games, a majority of majorities and most people feel it gives a better indication of the better team. Why shouldn't elections deserve a system at least as sophisticated?

Thursday, September 02, 2004

Is Encryption Doomed?

Considerable Slashdot discussion about a Technology Review article Is Encryption Doomed? subtitled Our entire information society rests on a fragile foundation that mathematicians are racing to dismantle.

That fragile foundation is the P versus NP question. Much of current cryptography is based on the NP problem Factoring which would have an efficient algorithm if P were equal to NP. If P = NP than no other method of public-key cryptography would be possible either. However, as I have argued before, the loss in public-key crypto would be greatly offset by the gain in efficiency of nearly every other aspect of our lives. Encryption's doom would be society's gain.

Factoring is not believed to be NP-complete and we could have the worst of all possible worlds with no cryptography and no new algorithms for problems we care about, one of five possibilities explored by Russell Impagliazzo.

Racing to dismantle is a bit of an overstatement. First of all we have nothing to worry about. As Juris Hartmanis once said, "We all know that P ≠ NP, we just don't know how to prove it." Remember too that all mathematics is based on unprovable assumptions about consistencies of logical theories. Doesn't seem to seriously bother anyone.

The article quotes Adleman as saying "From my perspective, we are no nearer to solving the problem now that we were when bell-bottom pants were cool." Adleman is an optimist; we do not even currently have a good approach to showing P ≠ NP. We are much further away from solving the problem than ever before.