Tuesday, November 07, 2006

PIR Update

A guest post from Bill Gasarch.

There have been two important NEW results in the field of Private Information Retrieval so its worth reviewing the field and the new results (Some background here).

This post is about Information-Theoretic Private Information Retrieval. We state the problem and five results in order of discovery, with the last two being the recent ones.

PROBLEM: A database is an n-bit string (my wife tells me that databases are more complicated than that). The USER wants to know the ith bit but does not want the DB to have ANY CLUE of which bit he wanted.

EASY ANSWER: Request the ENTIRE DB. That is n bits.
QUESTION: Can we do this with substantially fewer than n bits?
ANSWER: NO if there is only one copy of the database.
NEW QUESTION: What if there are k copies of the DB?

1. 2 DBs, O(n1/3) bits; for k≥ 4, k DBs O(n1/k)
Private Information Retrieval, Chor, Kushilevitz, Goldreich, Sudan, JACM-1998, (Earlier version FOCS 95)

NOTE: Introduced the field. Journal version has easy constructions. Conference version had slightly harder poly-interpolation constructions material that are a prelude to item (3) below.
NOTE: FOCS version has O(n1/k) result, Journal omits it since item 2 below had already appeared.
NOTE3: k=3 gives O(n1/3)—can't use that its 3 DB's instead of 2.

2. k DB's, O(n1/(2k-1))
Upper Bound on the Communication Complexity of Private Information Retrieval Ambainis, ICALP97,

NOTE: Used Recursion. Constant is exponential. Later papers that reduced the constant to polynomial lead the way to the next paper.
NOTE3: k=3 gives O(n1/5).

3. k DB's, nO(loglog k/(k log k))
Breaking the O(n1/(2k-1)) barrier for information-theoretic Private Information Retrieval Beimel, Ishai, Kushilevitz, Raymond. FOCS 02.
NOTE: Brilliant use of polynomials.
NOTE3: Techniques yield k=3, O(n1/5.25)

4. 2 DB Ω(n1/3) LOWER BOUND
An Ω(n1/3) lower bound for bilinear group based Private Information Retrieval Razborov, Yekhanin, FOCS 2006.

NOTE: Hard paper! Modeled 2-DB-Private Information Retrieval via Latin Squares and then used representation theory of finite groups to push through the lower bound. ALL current 2-DB protocols fit the model, so lower bound is either legitimate or will point way to other techniques. My money is on legit. Even so, there aren't that many 2-DB protocols so who knows…

5. 3 DB O(n1/32,586,658) PIR! Really!
New Locally Decodable Codes and Private Information Retrieval Schemes, Sergey Yekhanin

NOTE: Wins award for largest distance between GREATNESS of content and AWFULNESS of title. I would have called it

A 3 DB O(n1/32,586,658) PIR! Really!

NOTE: Uses Mersenne primes. The number used is based on the largest Mersenne prime known, which is 232,586,658-1. If there are infinitely many Mersenne primes then, for infinite number of n, get k=3, n1/loglog n-Private Information Retrieval
NOTE: Vast improvement on k=3, n1/5.25

In the constructions in items (2) and (3) above the k-DB PIR was constructed recursively out of k'-DB PIR's for k'<k. Hence this massive improvement for 3-DB PIRs leads to improvements for these schemes.
PIR scheme (2): the new 3-DB PIR leads to a k-DB, O(n1/(bk-1)) where b is enormous.
PIR scheme (3): the new 3-DB PIR leads to the same asymptotic k-DB nO(loglog k/(k log k) but with a much smaller constant in the O-of term.

1. You don't make clear what m of n security each of these is. Isn't the new mersenne prime related technique only for 2 of n? Does its construction fit the model of the 2-DB lower bound? Trivially, if 3 of 3 can be done with some efficiency then 2 of 2 can be done with the same efficiency simply by giving two of the three requests to the same server.

In actual practice, of course, you would never use a mersenne prime greater than 31, and possibly not even greater than 7.

2. Bram, I think you are confused: if one database gets two of the three requests, then the user's anonymity may no longer be guaranteed.

PS: In practice, you would never model a database as an n-bit string...

3. Anonymous: That's the case for PIR which only guarantees anonymity if only one of the 3 databases collude, however there are schemes which are secure if two of the 3 databases collude. In practice, you'd really like to use a scheme where only a small number of a large number of databases have to be trusted.

4. The main interest in having additional servers in the PIR definition was reduced complexity -- not greater reliability. Even with the new Mersenne prime results, all of the solutions with constant numbers of databases are very expensive compared with the non-private solution of querying a single bit.

An 'any 2 of 3' requirement makes 3 servers at least as expensive as 2 servers, which seems to be a step in the wrong direction for complexity.