Sunday, October 30, 2005

List Decoding Beyond Expectations

The recent FOCS conference had two best paper award winners, the Khot-Vishnoi paper I had mentioned in my post on unique games and Correcting Errors Beyond the Guruswami-Sudan Radius in Polynomial Time by Parvaresh and Vardy.

List decoding considers codes where we have too many errors in the code to uniquely decode a string. These algorithms creates a short list of all of the possible decodings. Last year I had discussed list decoding in my Favorite Theorems series and said

Guruswami and Sudan give a polynomial-time algorithm that handles what is believed to be the best possible amount of error.
Guruswami and Sudan's algorithm works for the standard Reed-Solomon codes and is likely the best possible for that code. Parvaresh and Vardy develop a new but related encoding to get an improvement on the rate of the code, the ratio of the original message length to the code length. Guruswami and Sudan show how to list decode a 1-ε fraction of errors using Reed Solomon codes with a rate of O(ε2) while Parvaresh and Vardy achieve the same errors with their code with rate O(ε/log(1/ε)).


  1. Isn't log(1/e) a negative number? It doesn't appear that e is 'epsilon', because the other function is e*e.

  2. It's \epsilon on my screen, bram. Font problem, probably (unless something really strange is happening her).

  3. Perhaps I just don't understand the terminology - what does 1-e errors mean? What is meant by 'rate'? 1-e obviously doesn't refer to an absolute number, does it meant that an e proportion of all values are altered or that a 1-e proportion of all values are altered? Is 'rate' a time-based measure or an amount of error correcting code measure?

  4. I should have said a "1-ε" fraction of errors, i.e., for a code of length m, all but εm entries can be corrupted, where entries are elements of some finite field. The rate refers to the throughput: If a code of length m encodes an input of length n (with the same alphabet) then the rate is n/m.

  5. This was asked last time, but never answered: what is the evidence that Guruswami and Sudan's algorithm is likely the best possible for the Reed-Solomon code?

  6. Wow, that is a significant result!

    Any idea what the patent status of this new algorithm will be?

    Is that rate the information-theoretic limit? (I suspect that it is, but don't know the appropriate approximations for factorials offhand.)

    Any idea if it will be possible to apply techniques which use the fact that errors tend to be next to each other in the real world to this approach?

    Does this technique require a lot of random access, or can it do a decent job of avoiding cache misses and disk seeks as well?

    It's good to see some theory making practical contributions, a little surprising that the most exciting stuff is information theoretic right now though.

  7. This was asked last time, but never answered: what is the evidence that Guruswami and Sudan's algorithm is likely the best possible for the Reed-Solomon code?

    It is not clear if there is any "compelling" evidence. Beyond the radius decoded by their algorithm, we do not know if there will always only be a polynomial number of codewords to output (which is a guarantee we need to be able to decode in polynomial time). It is possible this number shoots to super-polynomial just beyond this radius --- some works have explored this possibility, though the question remains wide open. It could also be that the choice of evaluation points used for the Reed-Solomon encoding matters, and that for a clever choice one could decode from many more errors.