Monday, August 18, 2014

Complexity versus Algorithms: The FOCS Challenge

In recent years, I've heard complaints from my complexity colleagues that FOCS and STOC are mostly algorithms and from the algorithm buddies that STOC and FOCS are mostly complexity. What exactly counts as a complexity or algorithms paper has become quite blurred in recent years. So let's try an experiment. Below is a poll I've created using titles from the upcoming FOCS conference. Which of these papers do you consider complexity? Does complexity in the title make them a complexity paper? 

If you are interested, you can find the manuscripts for most of these papers on the FOCS accepted papers list.

survey tools

Disclaimer: This is a completely non-scientific poll solely for the interest of the readers of this blog. The results will have no effect on future conference papers.

Thursday, August 14, 2014

Favorite Theorems: Limited Independence

When can limited randomness act as well as true random bits?
Polylogarithmic independence fools AC0 circuits by Mark Braverman (JACM 2010)
To explain this result consider choosing uniformly from among the following four strings:
000 110 101 011
If we look at any two of the bits, say the first and third, all four possibilities 00 10 11 01 occur. The sequence is thus 2-wise independent. We can get 2-wise independence using only two random bits to choose one of the four strings. In general one can get k-wise independent in n-bit strings using O(k2 log n) random bits.

Braverman shows that any polylogarithmic independent distribution fools polynomial-size constant-depth circuits (AC0), or more precisely for a size s depth d circuit C, for k=(log (m/ε))O(d2), the probability that C will output 1 with uniformly-random inputs will differ at most ε from the probability C outputs 1 from inputs chosen from a k-wise random distribution. Braverman's result followed after Bazzi and Razborov proved a similar result for depth-2 circuits (CNF formulas).

Another nice result along these lines: Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco Servedio and Emanuele Viola show that Bounded Independence Fools Halfspaces. A half-space is just a weighted threshold function (is the sum of wixi at most some given θ). Diakonikolas et al. show that one can fool halfspaces with k=O(ε-2log2(1/ε)), in particular for constant ε, k is a constant independent of the number of variables.

Tuesday, August 12, 2014

Subhash Khot wins Nevanlinna

At the opening ceremonies of the International Congress of Mathematicians in 2014, Subhash Khot was awarded the Rolf Nevanlinna Prize, given every four years to an under-40 researcher for outstanding contributions in Mathematical Aspects of Information Sciences. Subhash's citation reads
Subhash Khot is awarded the Nevanlinna Prize for his prescient definition of the “Unique Games” problem, and leading the effort to understand its complexity and its pivotal role in the study of efficient approximation of optimization problems; his work has led to breakthroughs in algorithmic design and approximation hardness, and to new exciting interactions between computational complexity, analysis and geometry.
Khot's work has indeed generated a large research agenda over the last decade. I highlighted his work in March's favorite theorems post.

In other big news, we have our first female Fields Medalist Maryam Mirzakhani for contributions to the dynamics and geometry of Riemann surfaces and their moduli spaces. Still no female winners among the nine Nevanlinna winners. Artur Avila, Manjul Bhargava and Martin Hairer also received Fields medalists. Stanley Osher won the Gauss Prize, Phillip Griffiths the Chern Medal and Adrián Paenza the Leelavati Prize.

Pictures and press releases and citations of all the prize winners.

Monday, August 11, 2014

Questions that arose teaching High School students crypto

I taught a 3-week, 3-hours-a-day course to High School student titled

Computer Science: A Hands Off Approach.

Given that time constraints and the fact that some already know (say) Java and some don't know any language, this seemed like a good choice.

I decided to teach mostly pre-RSA crypto with the following theme: Alice and Bob want to pass secret messages. How do they do it? I cover Shift, affine, general sub, Vigenere, Matrix, 1-time pad, Diffie-Helman (a highpoint of the course since Alice and Bob don't have to meet in a dark alley). In also did secret sharing with polynomials, error correcting codes (elementary), Huffman codes, and some applications of mod arithmetic.

While teaching this course some points of interest came up. I suspect most are know and I appreciate polite comments telling me so.

  1. A student suggested this cipher:   code a,b,c,...,z into a 100-letter alapahbet and map each letter to a set of symbols that is the size of the freq. For example, if e occurs 9% of the time then map e to 9 letters. Then use those letters at random. This would seem to foil freq analysis? Does it?  Has it been used? What are the downsides.
  2. Many students suggested using Vigenere but instead of having every xth letter be done by a different shift, have it be affine or general. Of course this can be cracked the same way Vig is cracked. But it does raise an interesting point: Which ciphers are used and not used can be the based on when things were discovered. Martians may very well have used some kind of Vig where every xth letter is a different gen sub cipher.
  3. Wikipedia and other sources say that the Vig cipher was unbroken for 300 years. A student pointed out that it might have been broken but the one who broke it didn't say. Jon Katz (crypto prof at UMCP) can't believe it wasn't broken immediately, but of course hindsight is 20-20.
  4. (I have commented on this before) A matrix cipher with a 10x10 matrix seems uncrackable using plaintext only.  I have made this question rigorous here.
  5. I made the comment that 1-time pads are not used much (is this even true? Might depend on the definition of ``must'') because getting a perfect source of randomness is hard. During WW II they also would be hard to use because its hard to carry around a billion bits. But now that would not be a problem. Again--if history had been different we may use 1-time pads, or quasi-random ones today!
  6. I told the students about arithmetic mod n. One of the students really really didn't like that (say) in mod 7 we have 1/3 = 5. He REALLY wants 1/3 to be between 0 and 1. I suspect he didn't care much for discrete logs either. This was a GOOD student- so his objection was not that it was hard.
  7. For some of the students their first exposure to matrices was matrix codes over mod 26. I hope they can recover from that.
  8. Most of the students know what logs were, but weren't that comfortable with them. And here I go and teach them discrete logs! I hope they can recover from that.
  9. I showed the students that there were quadratic equations over mod 12 with more than 2 roots and challenged them to see how many roots they could come for other mods. One of my students ran massive computer tests and found stuff and in the end had a result that didn't need all of his computations:  x^2 \equiv 0 mod n^2 has n roots. And I later had on a HW x^a \equiv 0 mod n^a. I am sure none of this is new, but its new to him when he discovered it and of course new to the class when I taught it.
  10. I taught the class the Baby-Step/Giant-Step Discrete Log algorithm which has sqrt(p) prepocessing an sqrt(p) running time. Its not used because it also takes sqrt(p) space; however, it was good to show them that Discrete Log can be done in sqrt(p) time, much better than p time--- hence Alice and Bob need to pick their parameters larger than they may have thought when doing Diffie-Helman. That night I easily worked out that it can be modified to do p^{2/3} preprocessing (and space) but just p^{1/3} time. HW was p^a prep, p^{1-a} time. One of the students inquired if this algorithm has a name. I then looked over the web but couldn't find it anywhere so I told them to call it  The Gasarch Variant of Baby-Step, Giant-Step. I also quickly told them to NOT be impressed--- and this helped me make a point I had made often, that CS is a NEW field, so NEW that one can present new or newish results to HS students. I also made the point that I am sure this variant is KNOWN to anyone who would care, but (1) they may not care since it takes even more space if x is larger than 0.5  and more time if x is less than 1/2  and (2) not everything is no the web. That last point freaked them out more than the quadratic equation mod 12 that had more than two roots.
UPSHOT- teaching HS students CS can bring up some issues that you had not thought of. Or at least that I had not thought of.



Thursday, August 07, 2014

The n^{1/3} barrier for 2-server PIR's broken! A lesson for us all!

Recall: PIR stands for Private Information Retrieval. Here is the model: A database is an n-bit string (my wife tells me this is not true). The user wants to find the ith bit without the database knowing what bit the user  wants. The user COULD just request ALL n bits. Can the user use less communication? See here for a website of many papers on PIR.

  1. If there is just one copy of the DB and there are no comp. constraints on the computational power of the DB then ANY protocol requires n bits comm.
  2. If there is one copy of the DB then, with some comp constraints, you can do better. I state one result: if quadratic residue is hard then there is a protocol using n^epsilon bits of comm. (Kushilevitz and Ostrovsky). The rest of this post is about the info-theoretic case, so the DB has no comp. constraints.
  3. If there are two copies of the DB then there is a protocol that uses n^{1/3} bits. (Chor, Kushilevitz, Goldreich, Sudan)
  4. If there are three copies of the DB then there is a protocl that uses n^{1/32582658} bits. Really! (Yekhanin).
  5. If a 2-server protocol is a bilinear-group protocol (which all prior constructions were) then it must  take n^{1/3}.(Razborov and Yekhanin)
  6. Most known constructions put into a geometric framework (Woodruff and Yekhanin). 
Recently Dvir ahd Gopi posted a paper  here which gives a 2-server PIR protocol that uses n^P bits where P= sqrt( (log log n)/log n)). That breaks the barrier proven by RY! How is that possible?

The barrier result was only for a certain type of PIR. It DID cover all the known PIR schemes at the time. But it does not cover this one.

This is how Barrier SHOULD work--- they should not discourage, but they should point to difficulties so that someone can overcome them.

Also note, the new result used the some of the Framework of Woodruff and Yekhanin. So good to unify and obtain  new proofs of old results.

Tuesday, August 05, 2014

How I know Moneyball is a true story. Do we clean up theorems and life too much?

A few years ago I saw the movie Moneyball about how the Oakland A's used intelligent statistics to... win?
No, but to do better-than-expected.  Even if I didn't know it was a true story I would have assumed it was because the following are very rare or odd in a fictional story:

  1. At the end of the team doesn't win- it just does better than expected. In the typical sports movie the underdog pulls it all together and wins. In some the underdogs loses but they are now better people or something. In an episode of Cheers where they were the underdog to Gary's Tavern in a bloody mary making contest  the Cheers gang cheats and wins. But in Moneyball, and in NO other sports (or contest) movie that I know of,  does the underdog do better-than-expected in  an undramatic matter. This is NOT a complaint- just note that its real life.
  2. In Moneyball the General Manager wants to try out mathematical methods and the Manager resists. In most movies its the suits that are wrong and the people on the ground that are right. This is even a theme of many articles about business that I read in magazines on airplanes. So this inversion is odd - but again, you can't argue that a true story is unrealistic or undramatic.
  3. Bill Beane, the one who wants to use math techniques, thinks that what Baseball scounts look for is the wrong thing. In fact, they misjudged him when he was a young player. But in what direction?--- they thought he was BETTER than he was. If this was a fictional story surely the scouts would think he was WORSE than he was.
I am glad that (as far as I know) neither the book nor the movie tried to make the story more satisfying or dramatic.

In academia we do clean things up for a better story line. If the true motivation for working on a problem doesn't really make sense when you see the final paper,  we change our motivation. Our original proof is intuitive but ugly, so we change it to be polished but less clear where it came from. Historians often simplfiy to make sense of things. I am NOT complaining- merely asking, do we do it too much?

When I was in ninth grade and was told that you could solve a quadratic equation (I rederived the quadratic formula once a month to make sure I could), a cubic, a a quartic, but not quintic, I immediately said "I want to goto College to learn why you can't solve a quintic" That sparked my interest in math.

Is the above story true? I am sure that in ninth grade I did learn that the quintic was unsolvable and that was of great interest to me, and I really did rederive the quadratic equation once a month. And I was interested to learn that the quintic was not solvable. But  I really doubt the story is as clean as presented above.  Even so, the story is true in spirit. However, I would not want to push the point.

How about you? Do you tell stories about yourself or about others that are just a little too polished? Not so much false, and not even to put yourself in a better light, but just a little to clean to have really happened.

Thursday, July 31, 2014

How NOT to bet

(True Story, but names are changed for no good reason.)

Alice, Bob, and Carol are betting on who will be the Republican Nominee in 2016.

  1. Alice bets on Jeb Bush. Alice's reasoning is that in the past the Republicans always end up with a moderate who they think can win. NOTE: From Alice's prediction you can tell NOTHING about her politics.
  2. Bob bets on Paul Ryan. Bob's reasons  that in the past the Republicans always end up with someone who they are familiar with, quite often someone who has run before. Normally this might be someone who ran in the primaries four years earlier; however, none of Herman Cain, Michelle Bachman, Rick Santorum, Newt Gingrich, seem likely. Rick Perry is possible but seems unlikely. That leaves Paul Ryan (Curiosity- people who lose, even if its close, do not run again. Not sure why. So it won't be Romney.) NOTE: From Bob's prediction you can tell NOTHING about his politics.
  3. Carol bets on Rand Paul. Carol's reasoning is that the American Public is tired of Big Government and is ready for the Rand Paul Revolution! NOTE: From Carol's prediction you can tell EVERYTHING about her politics.
Carol is letting her opinion drive her betting. Can someone take advantage of this? YES- if someone is betting from sentiment or emotion or conviction rather than from facts, that gives you an advantage in a bet. This does not always work, but its a good bet. Now watch, I'll lose my bet to Carol (NOTE- I am Bob).
One full dollar!

I think I read that you are better off betting AGAINST a triple crown if a horse won the first two legs for the same reason- there will be emotion driving the bet FOR, and that increases the payoff for those betting AGAINST. But of course nothing is guaranteed.  And there are some cases where its just ridiculous to vote against (Secretariat  in 1973); however, to use my system you can't just skip one since you ``just know'' that there will be a triple crown. Its a long term strategy. And one could be wrong.  That's why its called gambling!

Are there other cases where long term betting with facts and against sentiment can give you an edge?

Sunday, July 27, 2014

The Change Problem and the Gap between Recreational and Serious Mathematics

In a prior post (here) I discussed the change problem: given a dollar, how many ways can you make change using pennies, nickels, dimes, and quarters (and generalize to n cents). Scouring the web I found either programs for it, the answer   (242), and answers like `is the coefficient of x^100 in ... . What I didn't find is a  formula for n cents. So I derived one using recurrences and wrote this up with some other things about the change problem, and I posted it on arXiv. (I have updated the paper many times after comments I got. It is still where it was, but updated, here.)

I then got some very helpful emails pointing me to the vast math literature on this problem. I was not surprised there was such a literature, though I was surprised I hadn't found any of it searching the web.
The literature didn't have my formula, though it had several ways to derive it (some faster than mine).

Why isn't the formula out there? Why couldn't I find the literature?

  1. The formula falls into a space right between recreational and serious math. I use a recurrence  but not a generating function. Recreational math just wants to know the answer for 100, so a program suffices (or some clever by-hand recurrences, which are also in my paper). Serious math people are happy to show that a formula CAN be derived and to refine that in terms of how quickly the coefficients can be found.
  2. There is a gap between recreational and serious math. In particular- the serious people aren't talking to (or posting on websites to) the rec people.
  3. Different terminologies. I was looking for ``change problems'' and ``coin problems'' and things like that when I should have been looking for ``Sylvester Denumerant'' or ''Frobenius problem''
For some areas Google (and other tools) are still not as good as finding someone who knows about your area. My posting on the blog got some people who know stuff's attention, and my posting on arXiv got one more person (who knew A LOT!).  I'm glad they emailed me comments and references so I improved the paper and cited the literature properly. But this seems so haphazard! Google didn't work,  mathstack exchange and other similar sites didn't work (that is where I saw the Rec people post but not get a formula). Is it the case that no matter how good Google gets we'll still need to find people who know stuff? Thats fine if you can, but sometimes you can't.