Monday, November 24, 2008

Should tables be sorted? Yes but--- a question

In Yao's paper Should tables be sorted he did the following (I am paraphrasing alot).

A (U,n;q)-WPDS (Word Probe Data Structure) for MEM is the following;
  1. A function that does the following: Given A\substeq U, |A|=n, return the elements of A in some order. Think of them as being in CELL[1],...,CELL[n].
  2. (Assume Step 1 has been done so there are elements in CELL[1],...,CELL[n].) Given u\in U we want to know if u\in A. We have a query algorithm that asks q adpative queries of the form What is in CELL[i]?, and then outputs YES or NO. If YES then u\in A, if NO then u\notin A.
Thoughts, results, and a question I really want someone to answer.
  1. I call it a WBDS to distinguish from the bit probe model.
  2. The obvious appraoch would be to SORT the elements of A and use binary search. Hence q=log(n+1).
  3. For some cases where U is small compared to n, there are constant-probe algorithms.
  4. KEY: Yao showed that if U is large compared to n then sorting IS the best you can do. How big? Let R(a,b,c) be the RAMSEY number associated to a-uniform hypergraphs, b-sized monochromatic sets, and c colors. Yao showed that if U\ge R(n,2n-1,n!) then sorting is optimal.
  5. QUESTION: Is a lower value known? I have looked in various places but couldn't find a lower value. However, this may be a case where I'm better off asking an expert. I hope one is reading this blog!.
  6. This is one of the first applications of Ramsey Theory to computer science. It may be the first, depending on how you defined application, Ramsey Theory, and computer science.


  1. Is WBDS the same thing as cell-probe model?

    I think it is kind of similiar to the deterministic dictionary problem, which is still open.

  2. YES this is the CELL PROBE MODEL. I didn't call it that
    since there are two kinds:
    where the cells holds BITS
    and where the cells hold
    WORDS. But YES, it is
    the Cell Probe Model
    and I should have said so.

  3. Well, it's not really the cell-probe model. In the cell-probe model, cells can store anything you want, not just the elements in the data structure.

    People have called this model "implicit data structures" (see work by Munro, Francheschini, Grossi etc). Supposedly, "implicit" means that you're just storing a permutation, and the data structuring is implicit in the order of this permutation.

    More recent work on dictionaries (that the anonymous commenter alludes to) has focused on the true model, where you can store functions of your data elements in any cell. See work by [Fredman, Komlos, Szemeredi] for randomized dictionaries, and [Hagerup, Miltersen, Pagh], [Pagh], [Ruzic] for deterministic.

    By now, there is nothing interesting in Yao's paper for computer science, just for people passionate about Ramsey theory.

  4. I believe what you are asking for is in the paper "Implicit O(1) probe search" by Fiat and Naor, SICOMP 1993. They show how to search in O(1) time for quite large U, but still much smaller than the Ramsey numbers for which there is a lower bound.

  5. Implicit probe search presents a trade-off between two parameters:
    1. The size of the domain from which we are given n elements to store
    2. The number of queries we need.

    Fiat and Naor showed that if you are given a set of size n and the domain size is poly(n), you can perform implicit probe search in constant time. This was improved by Gabzion Shaltiel to a domain size which is quasi polynomial in n.

    Gabizon and Hassidim showed that you can get o(log n) even when the domain size is exponential in n.