## Wednesday, May 20, 2009

### Is FKS and other Data Structure Algorithms actually being used?

MEM PROBLEM: Let n,U be such that n < < U. We want to do the following: For all sets A &sube {1,...,U} such that |A|=n we want to store A in s(n,U) cells and be able to tell MEMBERSHIP in q(n,U) probes where a probe is just LOOKING at the contents of a cell. A cell may have O(log U) bits in it.

Fredman, Komlos, and Szemeredi showed (in this paper) that MEM PROBLEM can be done with s=O(n) and q=O(1).
1. They actually showed something stronger, that you can get s=n+o(n) and q=O(1). But this does not concern us.
2. The result with s=O(n) and q=O(1) is a simple algorithm with reasonable constants.
3. Part of the algorithm involves showing that a randomly picked hash function works with nonzero probability; however, with a slight change in some parameters you can get that most hash functions work (of the type they use) work, so this is not an obstacle to actually using their algorithm.
4. SO, here is my question/answer/new question:
1. Is anyone actually using the algorithm?
2. I would suspect NO because it does not allow for INSERT and DELETE. However, if you just INSERT or DELETE a few things I think it should be fine. So maybe YES.
3. This paper does MEM in O(1), space O(n), and INSERT/DELETE in amortized expected time O(1).
4. There has been alot of work on data structures with this model. The model seems reasonable, the problems seem like ones people in the real real world want to do quickly. SO--- are the data structures being used? If not, then are variants being used? This seems like an area where the gap between theory and practice is small and could be bridged. Has it been? If not why not?

1. On a related note, what's the latest gossip on who's going where? :)

2. Make them a part of STL and other standard libraries. They will soon be used

3. FKS obviously competes with Bloom filters. The Wikipedia claims Bloom filters are used to speed up miss determination and digests in things like BigTable and SquidCache, so these might be some of the places to look.

4. As John points out, there are (more modern) alternatives to FKS for most applications. For example, multiple choice hashing/cuckoo hashing have similar properties to FKS, and are more flexible (in terms of handling inserts and deletes).

As a starting point for some modern usage of these and related hashing tools, can I point you to for example:

A. Kirsch, M. Mitzenmacher, and G. Varghese
Hash-Based Techniques for High-Speed Packet Processing

It's on my web page (and the book it's in is apparently finally about to come out).

5. It seems Godel Prize for 2009 has been announced.

6. There exists few open source libraries for practical implementations of static perfect hashing. For example:
http://cmph.sourceforge.net/index.html (in c)
http://sux4j.dsi.unimi.it/ (in java)