Tuesday, August 31, 2004

Theory in the Coca-Cola Capital of Iowa

  

Today I gave the talk at the Atlantic Theory Seminar, not Atlantic as in ocean but Atlantic, Iowa (population 7,257). Located halfway between the theory groups of Iowa State and Nebraska, the Cass County branch of the Iowa Western Community College hosts a theory seminar every couple of months. Pretty impressive that these theory groups, faculty, postdocs and students, drive two hours each just to join for a seminar and talk theory with each other. Quite an enjoyable day in rural America.

Sorry about the quality of the picture above. We didn't have a digital camera so I took the picture with my cell phone.

Sunday, August 29, 2004

Computationally Simple One-Way Functions

An upcoming FOCS paper is drawing a lot of interest at TTI and Chicago, Cryptography in NC0 by Benny Applebaum, Yuval Ishai and Eyal Kushilevitz.

Don't let "NC0" scare you, it simply means that every output bit depends on a constant number of input bits. In this paper the authors show that under reasonable assumptions one can have one-way functions and pseudo-random generators where each output bit depends on only four input bits. Quite surprising that such simple functions can be so hard to invert.

Their proof uses a clever but not overly complicated reduction using randomized polynomials from a larger class (⊕L/poly) to NC0 and so if one-way function and PRGs exist in the larger class than slightly weaker versions exist in NC0 respectively. The larger class contains many functions conjectured to be one-way or a PRG, including those based on factoring.

I remember Mike Sipser mentioning in a complexity class I took back in the mid-80's that "We don't even know how to show there are no one-way functions in NC0." Now we know why.

Friday, August 27, 2004

What happened to the future?

Disney World has two areas originally designed to give a glimpse of "the future", Tommorowland in Magic Kingdom and Future World at EPCOT, which itself once stood for Experimental Prototype Community of Tomorrow. These areas give somewhat a nostalgic view of the future from the time of my childhood but not of the future from today.

Not only has society's view of the future changed but even the view about the future has changed. We live in an era of massive changes in technology particularly in access to communication and information. In the "Spaceship Earth" ride that shows the history and future of communications, the future shows two kids seeing and playing games with each other through video monitors, a task easily accomplished today with webcam-equipped networked PCs. But the ride missed many aspects of computers and the web. Who would have thought when I was a kid that I would now be writing this post on a commuter train that soon will be read around the world.

On the other hand we also dreamed of flying cars and space planes but the technology of transportation has not significantly changed since I was born and I don't expect major changes in the next forty years. So the future has come but not quite as we expected.

Technology has made the world a more homogeneous place. When at the Norway pavilion at EPCOT, an American visitor asked a Disney worker from Norway about what the country was like to visit. "The towns are like mid-size American cities" was the reply. I guess it is a small world after all.

Wednesday, August 25, 2004

ε and the Olympics

Back from Disney World and ready to tackle the new academic year. Thanks to Adam Klivans for his guest posts. Interesting to see that COLT glorifies the open problems. In complexity we prefer to see them closed.

Varsha Dani emailed a note about an article describing an Amazon tribe with no concept of numbers. I guess they don't send anyone to the olympics where numbers seem king.

Watching gymnastics I am always amazed how winning seems to depend on sticking the landing. After a rather complicated and lenghty routine why should whether your feet move at the end be the difference between Gold and Bronze? The answer lies in the ε; the best gymnasts can uniformly hit the major elements of their routines but consistently sticking the landing is much more difficult and that what makes or breaks the result.

We see the ε issue in many scenarios, most notably the 2000 US presidential election where confusing ballots in one county created a huge controversy. The budget of the NEC Research Institute is miniscule compared to the revenue of NEC corporate. A couple of years ago NEC eeked out a profit about the same size of the cost of the Institute and all of a sudden the NEC Research Institute looked very expensive to NEC. In another example, consider the relatively large power a small party can have in a coalition in a parlimentary system.

The best way to avoid the ε problem is by decisive victories. Solidly beat your opponent in an election or have enough parlimentary seats so you don't need small partners. Make enough of a profit so that a cheap research lab stays that way. One only gets the ε problem when one has a statistical tie and then the small things take on great importance.

In gymnastics one would want a system where a greatly superior gymnast can score high enough that a small jump on the landing won't make a difference. But the current system doesn't allow that; with a top score of 10 the world's best gymnasts all get well over nine, making it impossible to put enough of a gap between a the best and the rest.

Friday, August 20, 2004

"Conclusions and Open Problems" (by Adam Klivans)

Unless Lance has been involved in some unfortunate roller coster accident, this is most likely my last post. It remains to be seen whether I will be allowed to keep my posting privileges here on the board (place your bets on "No"). What I've taken away from this experience, more than anything else, is that posting every day is a tremendous time sink. Lance, kudos to you for the almost daily missive.

At the end of most technical talks is the obligatory "Conclusions and Open Problems" slide, usually the least thought out moment of the presentation, which consists of a brief summary of the talk and a list of the most obvious (difficult) open problems. For the last few years, COLT has glorified the open problems section and allocates about an hour of time for a presentation of open problems. The open problems themselves must be submitted months beforehand and are refereed (how rigorously is anyone's guess); accepted problems appear in the proceedings. A list of this year's open problems can be found on the COLT 2004 program schedule -- the session was held on Friday evening.

Are there any other computer science conferences where open problems are refereed and given their own slot in the program? It always seems to work out well at COLT, and while I am aware of open problems being presented at rump sessions at various conferences, I don't know of other venues which require advance submission of problems.

With that, gentle reader, I return you to Lance's wise embrace. If I ever get to guest blog again, I will reveal what's hidden in Lance's "Private" directory here on fortnow.com. Some "lost" theorems apparently-- here's an aborted post entitled "Soon I will be famous: a simple proof that NP = coNP" -- and here's another unfinished note with some not so nice things to say about Algorithms. And what's this? A love letter to Jessica Simpson? Oh if only we had more time.

Thursday, August 19, 2004

Constant Depth Circuits, Fourier Transform, and Learnability (by Adam Klivans)

First, thanks to Jeff Erickson for the lone comment on my previous post. For a second I was worried that I would have to post about Quantum Learning to bait Scott Aaronson into commenting. Fortunately, we're past that. I also realize now that being in Chicago I'm just a two hour drive away from you Jeff at UIUC. Maybe you, Ernie, and I should get some 3-D hotcakes at the local Waffle House-- on me.

Many top notch researchers come to visit the Toyota Technological Insitute here in Chicago. This fall alone Lenore Blum, Bruno Codenotti, Prahladh Harsha, and Jaikumar Radhakrishnan will be in residence along with TTI's regular faculty.

Yishay Mansour visited in August for about two weeks, and I had the chance to ask him about his work along with N. Linial and N. Nisan which pioneered the use of discrete Fourier analysis in computational learning theory. The paper Constant Depth Circuits, Fourier Transform, and Learnability gives a quasi-polynomial time learning algorithm for constant depth circuits with respect to the uniform distribution. More importantly, it pointed out a connection between the Fourier concentration of a Boolean function and its learnability.

If f is a Boolean function, we could imagine writing f as a multilinear polynomial in n variables mapping {-1,1}^n to {-1,1}. Every Boolean function can be written as such a polynomial whose total degree is at most n. Each coefficient of this polynomial measures the correlation of f with that monomial. These are the Fourier coefficients of f. Linial, Mansour, and Nisan showed that if the sum of the squares of the coefficients of monomials of degree d and larger is small, then there is an n^{O(d)} time algorithm for learning f with respect to the uniform distribution-- roughly speaking it is sufficient to estimate the coefficients of only the low degree monomials and output this polynomial.

What does any of this have to do with constant depth circuits? They also proved that for any circuit of depth d, the sum of the squares of the coefficients on the terms of degree (log n)^d and higher decays rapidly. From the above paragraph this gives us a quasi-polynomial time algorithm. For a rough intuition as to why this is true, consider one implication of Hastad's Switching Lemma, namely that parity cannot even be approximated by constant depth circuits. Thus every coefficient of a sufficiently large monomial (each monomial is a parity function) must be small.

As it turns out, according to Yishay, the work stemmed from an effort to prove circuit lower bounds, rather than a plan to develop new learning algorithms. The authors were inspired by Kahn, Kalai, and Linial's work on the influence of variables on Boolean functions and thought that discrete Fourier analysis might be the right tool for studying circuits. They happened to be correct-- in an unexpected way.

Wednesday, August 18, 2004

Banff Revisited (by Adam Klivans)

Hey, you're back! And I'm really starting to feel comfortable at this posting gig. Thanks for all of the comments I got about my previous post-- all zero of them.

A few weeks ago Lance reported on the events from a complexity workshop he was attending at the Banff International Research Station, an institute similar to the International Space Station except that it's in Canada rather than outer space.

What Lance didn't mention was that at the exact same time, some important computer science conferences were taking place just down the street at the Banff Park Lodge. I was participating in the 17th annual Conference on Learning Theory (COLT) which was co-located with the International Conference on Machine Learning (ICML) and the Conference on Uncertainty in Artificial Intelligence (UAI).

Back when COLT stood for Computational Learning Theory (circa 1988-2002), the conference was known for focusing on the computational complexity of machine learning. Nowadays the conference still operates from a theoretical perspective, but, as the Conference on Learning Theory, the program covers everything from game theory and economics to kernel methods in addition to traditional PAC style results.

A PAC style result that appeared in COLT 2004 which may be of interest to the readers of this web log is Polynomial-Time Prediction Strategy with Almost Optimal Mistake Probability by Nader Bshouty. His paper solves a problem in the online learning setting. In this setting, an unknown function f is chosen from some fixed concept class and at time t a learner is presented with an input x chosen according to some distribution D.

The goal of the learner is to run in time polynomial in log t (and all of the other relevant parameters) and predict the value of f(x) with mistake probability O(1/t) (Haussler, Littlestone, and Warmuth proved that this mistake probability is optimal). Bshouty shows that if the concept class is PAC learnable, then there exists an online learning algorithm which runs in time polynomial in log t and achieves mistake probability O(log t/ t). His algorithm has an exponential improvement in running time as previous solutions ran in time polynomial in t. The algorithm works by iteratively creating a branching program based on hypotheses considered at previous time steps. A boosting-type procedure dictates which branch to take for any new input.

In the spirit of being controversial (Lance asked me to be controversial), I could discuss the pros and cons of the change from Computational Learning Theory to Conference on Learning Theory (as a concrete example of differences in the community, about half the participants at the COLT business meeting wanted to see COLT co-located with STOC in 2006-- University of Washington folks make your move-- and the others wanted to co-locate with ICML). I'll leave that, however, for my faithful readers to debate. I will point out that it's hard to argue with the increase in attendance at COLT over the last few years.

Tuesday, August 17, 2004

Favorite Theorems: The Harmonic Sieve (by Adam Klivans)

What does Lance Fortnow do after getting a paper accepted to FOCS? He goes to Disneyworld for a week, of course. While Lance blasts pasts celestial satellites on Space Mountain, I will humbly try to replace the irreplaceable.

Considering the topic of yesterday's post, if there were a list of favorite theorems in computational learning theory, a field which makes only brief appearances here on the weblog despite its many connections to computational complexity, Jeff Jackson's algorithm for learning DNF formulas would certainly be on it.

A DNF formula is a Boolean formula written as an OR of ANDs (e.g. x_1 and x_2 OR x_3 and x_5). The size of a DNF formula is equal to the number of terms or ANDs. The problem of PAC learning an unknown polynomial-size (in n, the number of variables) DNF formula with respect to an arbitrary distribution remains one of the most notorious open problems in the field (for background on PAC learning see this post or Kearns and Vazirani's excellent book An Introduction to Computational Learning Theory).

In fact, even if we restrict the underlying distribution on examples to be the uniform distribution, the fastest algorithm for learning DNF formulas runs in quasi-polynomial time (a result due to K. Verbeurgt-- the main idea being that only terms of logarithimic length have a chance at being satisfied, so longer terms can be ignored).

If, however, we allow the learner to make queries to the unknown DNF formula, i.e. if the learner can choose any input x and ask for the value of the DNF formula evaluated on x, then the learner can succeed in polynomial-time.

The solution, due to Jeff Jackson in 1994, shows how to learn polynomial-size DNF formulas with respect to the uniform distribution in polynomial-time (again assuming the learner has query access to the unknown DNF). His algorithm, which he has called the Harmonic Sieve due to its use of Fourier analysis, builds on work due to Blum, Furst, Jackson, Kearns, Mansour, and Rudich (``Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis'') which showed that for any DNF formula with s terms, there exists a parity function which agrees with the DNF formula on roughly a 1/2 +1/s fraction of inputs.

The next step of the algorithm involves a novel application of Boosting algorithms (see this post for more on Boosting) for combining these parity functions to obtain an accurate hypothesis. The output of the Harmonic Sieve is not a DNF formula but a threshold of parity functions.

The Harmonic Sieve is one of the rare examples in computational learning theory of a polynomial-time algorithm for an expressive concept class. It is natural to ask whether the queries are essential for the algorithm. Unfortunately it seems like the answer is yes-- we do not know how to learn decision trees or even juntas (both strictly weaker concept classes than DNF formulas) in polynomial-time with respect to the uniform distribution unless the learner has query access to the unknown function. Removing the dependence on queries would be a real breakthrough.

By the way, the interface on blogger.com is worse than I ever could have imagined. Apologies in advance for formatting errors.

Monday, August 16, 2004

Favorite Theorems: Parallel Repetition

July Edition

Consider a simple Arthur-Merlin game: Arthur probabilistically chooses a string r sends it to Merlin who responds with y and then Arthur runs some algorithm A(r,y) to decide whether to accept. Merlin's goal is to achieve the highest acceptance probability possible p for Arthur. Suppose we run the game twice in parallel, Arthur sends r1 and r2 and Merlin sends y1 and y2 and Arthur accepts if A(r1,y1) AND A(r2,y2). The highest possible acceptance probability will be p2.

Now consider the MIP model with two Merlins M1 and M2 who cannot communicate with each other. Arthur sends u and v to M1 and M2 respectively who respond with y and z. Arthur accepts based on some function A(u,v,y,z). Once again M1 and M2 try to achieve the highest possible acceptance probability p. Now we run the game twice in parallel, Arthur sending u1 and u2 to M1 and v1 and v2 to M2 receiving y1 and y2 from M1 and z1 and z2 from M2 and accepting if A(u1,v1,y1,z1) AND A(u2,v2,y2,z2).

One might assume that the best the provers can achieve is p2 (an assumption in fact made in an early paper co-authored by a certain weblog author) but in some circumstances the provers can do better. However Ran Raz shows that if p is less than 1, one can get an exponential decrease in p with a polynomial number of parallel rounds in

A Parallel Repetition Theorem by Ran Raz

This paper settles one of the more perplexing aspects of multiple prover proof systems with a highly complicated proof. The result also plays a critical role in reducing the number of queries in probabilistically checkable proof systems which led to some optimal approximation bounds.

As a side note I am off on vacation tomorrow and Adam Klivans will guest blog in my absence. Enjoy.

Thursday, August 12, 2004

Wisdom of Crowds

Keeping with this week's theme of prediction, I just finished reading The Wisdom of Crowds written by New Yorker writer James Surowiecki. The book makes the case that large groups can make great decisions, often better than any individual in a group, if three conditions occur:
  1. diversity of the members of the group,
  2. independent opinions of the group members, and
  3. a method for aggregation of the opinions.
Surowiecki's very readable book gives many examples where group decisions do quite well (sports betting, Google's search techniques based on other's web pages, Linux) and where group decisions fail (stock market bubbles, committee meetings, strong CEOs).

Chapter 8 is devoted to science and how many widely spread scientists developing and criticizing various theories lead to explosive growth in our understanding. He also notes that this ideal world has its flaws as unknown researchers have a harder time selling their work than more established scientists.

I don't agree with all the conclusions drawn by Surowiecki but he does lay out what we need to do and not do to benefit from the pooled knowledge of a group. We can also draw lessons in computer science as computation and information gets more distributed that we need to integrate to find the best solutions we can.

Tuesday, August 10, 2004

Fun with Information Markets

Just over a year ago the Department of Defense cancelled their program on using markets to predict future world events, an overreaction that stopped funding a potentially powerful prediction tool. Robin Hanson has a comprehensive web page giving a history and plenty of links.

Information markets live in limited academic-based markets like the Iowa Electronic Market and offshore sites like Tradesports. For example the current price on Tradesports for Bush winning the election is 51.5 which translates to a 0.515 probability that Bush will win indicating a very close contest.

For each state, Tradesports has a security on whether Bush will win that state. They also have some bundles of states. The price for Florida is 50.1, Ohio 55.4 and Bush winning both Florida and Ohio is 47.1. This gives a surprising correlation between Florida and Ohio. If you believe the theory there is a very high 0.94 probability that Bush wins Ohio given that he wins Florida and with a 0.89 probability these two very different swing states will go the same way.

Tradesports gives David Vitter a 59 percent chance of becoming a senator from Louisiana. David Vitter is the brother of CS theorist and former SIGACT chair Jeff Vitter.

Monday, August 09, 2004

Micromorts

Currently on airplanes children under two can ride free by sitting on a parent's lap. The FAA is considering whether to require such children to have their own seat in a child seat similar to the ones most states require for cars. Sounds reasonable? One argument against goes as follows: If we require parents to pay for a seat for a children there is a chance they will drive instead greatly increasing their risk.

How can we evaluate risk? Decision scientists have developed a measure called micromorts (μmorts). A μmort is a one-millionth chance of death. Sounds gruesome but by counting micromorts we can analyze the right choices to keep the most people alive.

All of three lap children have died in airplane crashes where their parents survived since 1987. The average driver runs the risk of about .02 μmorts/miles. If the average car trip is say 500 miles that translates to about 10 μmorts for each child in the car. Three laptop children have died in airplane crashes where the parent has survived since 1987. This translates to the equivalent of 300,000 car trips or about 15,000/year. About 6 million children ride on laps on airplanes each year, so if more than 0.25% of them were to ride in a car instead because of the higher prices, we would about cost lives by requiring safety seats on planes. My numbers, drawn from various internet sources, don't tell the whole story but nevertheless we can and should do a full analysis before setting policy.

It would be nice to have a list of various activities and how many μmorts they use, say you feel like parachuting, you can get an idea of how dangerous it is compared to say riding a bicycle. But we don't get such lists and people have to use their own judgments and often make the wrong decisions. We can also give a cost amount to a μmort; how much is it worth to save lives?

By finding statistics online you can calculate the risks in your various activities. You need to use about 3 μmort/day on average to keep a 10% chance of accidental death in your life. Spend them wisely.

Friday, August 06, 2004

When to Announce?

Suppose you have some partial solutions of a popular problem. At what point do you announce your results? If you announce your partial results you run the risk of someone else taking your ideas and solving the full problem and you won't get as much credit as you deserve. If you wait and try to extend the work yourself someone else might get the same results you already have and you'll lose or at best have to share the authorship.

If you are completely altruistic you should announce your progress as this will best advance science quickly. But as in the end you need to worry about your own publication record, particularly for a young researcher, the answer isn't so clear. Of course it depends on many factors including your belief that you or others could extend the work as well as when the next conference deadline occurs.

Oddly enough before the internet (in the eighties) such decisions were easier. You could write up a technical report to establish your result and you would have months before your work spread throughout the community. This gives you plenty of time to try and extend the work. The quick spread of information not only improves collaborative work as it does, but forces us to make decisions that we could avoid in the past.

Wednesday, August 04, 2004

Small Circuits

Fix a constant k. In 1982 Ravi Kannan showed that some Σ2p∩Π2p language must not have nk-size (nonuniform) circuits. Here is a proof sketch: A simple counting argument shows there is a function that depends only on the first 5k log n inputs that is not equivalent to a nk-size circuit. Just by writing out the quantifiers in Σ4p you can compute the lexicographically first such function. Now we have two cases:
  1. If SAT does not have polynomial-size circuits then SAT then Σ2p∩Π2p which contains SAT does not have nk-size circuits.
  2. If SAT has polynomial-size circuits then Σ4p2p∩Π2p (Karp-Lipton) and thus Σ2p∩Π2p does not have nk-size circuits.
This is a wonderful example of a non-constructive proof and giving an explicit Σ2p∩Π2p language without quadratic-size circuits is open. With better known collapses we can improve the result from Σ2p∩Π2p to S2p.

Vinod Variyam recently observed that the class PP which is not known to contain S2p also cannot have nk-size circuits. Here is his proof: If PP has nk-size circuits then PP is in P/poly which implies the polynomial-time hierarchy and in particular Σ2p is in MA which is in PP which has nk-size circuits contradicting Kannan.

Read Variyam's paper for details and references.

Monday, August 02, 2004

Larry Stockmeyer

We lost a great complexity theorist over the weekend. From Phokion Kolaitis:
It is with great sadness that I write to inform you that Larry Stockmeyer passed away. He died at his home as he had wished when he fell terminally ill a few weeks ago.

Larry was one of the pioneers of computational complexity who made fundamental and lasting contributions to the field. His death creates a void in our community that cannot be filled.

Indeed Stockmeyer developed many of the important early concepts in complexity such as alternation and the polynomial-time hierarchy, concepts that have laid the foundation for many important works in computational complexity. He had a number of great results throughout his career in complexity and nearly all areas of theoretical computer science. Our community has lost one of its giants.

Strangers in the Same Place

Professor X and Professor Y from the same university attend the same conference. At the end of the conference, Professor X says in a surprised tone "That's the most time I have talked with Professor Y all year." He shouldn't be surprised; this is a story I've heard over and over again (and have even told myself).

A professor's life has many responsibilities. Teaching and research of course but also paper writing, grant proposals, meeting with students, and administrative tasks including seemingly endless committee meetings. When I visit another university I leave most of these responsibilities behind so I can focus on research. I also expect the people who invited me to make time in their schedules so we can work together. That way even a short visit can be quite productive.

As the length of the visit increases it becomes harder to avoid these other responsibilities and the amount of research time per day decreases. In the extreme, two people who work at the same university for years end up spending very little time talking research together.

This explains why teleconferencing will never replace traveling no matter how technologically advanced. The social requirements of a short visit require people to spend time together in ways a teleconference cannot. What teleconferencing will do is "allow" me to attend those endless committee meetings wherever I am.