Comments on Computational Complexity: Finding Duplicates

Anonymous, I think the binary search method you p...

2005-03-07T07:51:00.000-06:00

Anonymous,

I think the binary search method you posted will fail on the input:

1, 2, 3, 4, 4

After the first pass,
mid = 3
n1 = 2
n2 = 1
n3 = 2

Since n1 >= n3 is true in this case, the algorithm will close in on the first part of the list, which does not contain the duplicate.

See an earlier post for a binary search which I think does work.

The following is a partial answer to the problem a...

2005-03-07T07:50:00.000-06:00

The following is a partial answer to the problem above.
[ I was attending the New Horizons workshop in Kyoto, Japan last week; after Muthu's talk I was
discussing the following with Muthu. Jun Tarui (tarui+at-mark+ice.uec.ac.jp)]

Finding duplicates in the stream setting above requires Omega(log n/ loglogn) passes.

Proof Sketch:

Assume n is odd.
Consider a restricted class of inputs such that A(1), A(2), ..., A((n+1)/2) are distinct and
A((n+1)/2 + 1), ..., A(n+1) are distinct.
A solution for this class of inputs with s bits of memory and r stream-passes
implies a two-party protocol for the following communication game
with r rounds and s communication bits in each round.

Alice and Bob get size-(n+1)/2 subsets A and B of {1,...,n} respectively
and the task is to find and agree
on some w that is in the intersection of A and B.

This task is equivalent to the monotone Karchmer-Wigderson game for the Majority
funtion of n variables (Alice and Bob get a max term and a min term).
A protocol with r rounds and s bits per round for this game correnponds to
a monotone (i.e. AND/OR) circuit computing the Majority function
with depth at most r and fan-in at most 2^s.
Hastad's size lower bound for such circuits imply the claimed bound on the number of passes
above for s=O(log n).

When n is even, consider the complexity of any function that outputs 1 if the number of 1's
is more than n/2 and outputs 0 if it is less than n/2. (end-of-proof-sketch)

I think the binary search would work as follows: ...

2005-03-07T01:44:00.000-06:00

I think the binary search would work as follows:

binsearch(a , b)
begin
-> choose mid = (a + b)/2
-> count no. of elements <, = and > mid, say n1, n2 and n3
-> if n2 >= 2 end
-> else if n1 >= n3 binsearch(a , mid -1)
-> else binsearch(mid + 1, b)
end

Now, just call binsearch(1 , n).

We need to maintain the values of n1, n2 and n3 in each pass and the values of a and b all the time. All of these require O(log n) space.

Regarding the binary search algorithm, I think the...

2005-03-05T23:30:00.000-06:00

Regarding the binary search algorithm, I think the statement "Uses constant amount of space" is wrong...

The variable "upper" needs to be able to represent N, which requires log N bits. So even the binary search method requires log N space.

> "binary search takes more than O(n) queries." Y...

2005-03-05T17:36:00.000-06:00

> "binary search takes more than O(n) queries."

Yes, but Lance also asked: "Now suppose A is streamed, that is we get in order A(1),A(2),�,A(n+1) and then get another pass etc. How many passes do you need to find a w?"

The binary search method solved this variant of the problem, unless I'm mistaken.

As an aside, note that if exactly one element is d...

2005-03-05T16:00:00.000-06:00

As an aside, note that if exactly one element is duplicated it can be done in one pass using O(log n) space: simply add all the numbers, substract n(n+1)/2 and that is the duplicated number.

Alex

binary search takes more than O(n) queries.

2005-03-05T13:25:00.000-06:00

binary search takes more than O(n) queries.

Just for fun, here's a C++ function that seems to ...

2005-03-05T11:18:00.000-06:00

Just for fun, here's a C++ function that seems to work, using a binary search. Uses constant amount of space. Sorry for the formatting...

unsigned findDup(const std::vector<unsigned> &vec, bool diagnostics)
{
unsigned lower = 1, upper = vec.size()-1;

while(true)
{
unsigned count = 0, middle = (upper+lower)/2;

for(unsigned i=0; i<vec.size(); i++)
if(lower <= vec[i] && vec[i] <= middle) count++;

if(count > middle-lower+1) upper = middle;
else lower = middle+1;

if(lower >= upper)
return upper;
}
}

http://domino.research.ibm.com/Comm/wwwr_ponder.ns...

2005-03-04T21:36:00.000-06:00

http://domino.research.ibm.com/Comm/wwwr_ponder.nsf/challenges/January2004.html

"Create a linked list kind of structuure where the...

2005-03-04T19:16:00.000-06:00

"Create a linked list kind of structuure where there are n+1 nodes" sounds like O(n) space, not O(log n).

"First a warm-up puzzle: Find w using only O(n) qu...

2005-03-04T12:59:00.000-06:00

"First a warm-up puzzle: Find w using only O(n) queries to entries of A (remember you only get O(log n) space). Hint: Use pointer chasing."
Ans: create a linked list kind of strucutre where there are n+1 nodes,
a) value at node i, v[i]=A[i]
b) pointer from node i to node
v[i].
check for loops in the linked list by using pointer chasing, starting from node n+1.