## Friday, October 17, 2008

### Finding the ith largest of n numbers for i,n SMALL

Exactly how many comparisons does it take to find the ith largest of n numbers? For i=1 and i=2 these numbers are known exactly. Beyond that I'm not sure, though I do know that we don't know (say) how many comparisons to find the 15th largest element out of 30. (There is a table in KNUTH, but I could not find a website of these things. If someone knows one, please comment.)

The following conjecture, if true, would help speed up computer searches for such algorithms and may lead to interesting math in and of itself. Let V(i,n) be the min number of comparisons (worst case) to find the ith largest of n numbers.
Conjecture: There is an algorithm for finding the ith of n numbers that uses V(i,n) comparisons that begins by comparing x1 to x2, x3 to x4, x5 to x6, ..., xn-1 to xn (if n is even, else end at xn-2 to xn-1).
A weaker conjecture may be more tractable, where we allow V(i,n) + 1 or + some constant.

1. For a related problem, how many comparisons per element does it take to find the max/min over windows of size c given an array of numbers? This is an important problem for people doing signal processing/computer vision.

If the windows have size c=n where n is the total number of elements, then this problem is just finding the max/min of a set and the number of comparisons needed is known (see any textbook). For the more general problem, the answer is *not* known, however see the following paper for a recent result:

http://arxiv.org/abs/cs.DS/0610046

2. Why on earth would anyone be doing computer searches for such algorithms?

3. In the same trend here is a paper about the minimal number of comparisons :
http://www.mat.unb.br/~ayala/4FordJohnson.ps
Usefulness of computing the exact minimal number of comparisons seems quite limited.
Daniel Lemire's algorithm seems to stress other motivations like working space and latency.

4. "In the same trend here is a paper about the minimal number of comparisons :
http://www.mat.unb.br/~ayala/4FordJohnson.ps"
Sorry i meant minimal number of comparisons for sorting.

5. Here a simple recursive algorithm that, in the best case, requires about 2n comparisons to find the ith largest, independent of i. It relies on recursive partitioning. It assumes that partition() rearranges the list, placing a pivot element at location j, elements less than the pivot before j, and the bigger elements after j. Here is some pseudocode:

iThLargest(list, i) {
j = partition(list)
if (j is i)
return list[j]
else if (j < i)
iThLargest(list[1 to j-1], i)
else
iThLargest(list[j+1 to end], i-j)
}

The algorithm also uses about 2n comparisons in the average case. However, the worst case is quadratic in n.

There is also a well-known median-finding algorithm that runs with time linear in n, in the worst case. The median is the case i=floor(n/2).

Alejo

6. Alejo, I don't think your analysis of quickselect is correct. In the best case, it takes ~ N compares (you get lucky and partition on i in the first pass). I believe the average number of compares (assuming the array is shuffled) is

~ 2 N + i ln ( N / i) + (N - i) ln (N / (N - i))

which is ~ (2 + 2 ln 2) N for i = N/2.

7. Bill: as someone not familiar with the research of this problem, is there any reason to expect that this conjecture might be true? I see an algorithm for i=1 that begins with that, but is there a known, optimal algorithm of that form for i=2?

At this point the conjecture sounds a bit arbitrary, though no doubt helpful for the problem if we knew it to be true.

8. Andy, the canonical algorithm for i = 2 involves a tournament to find the maximum and then a tournament of the elements that lost to the maximum.

9. There is a paper "On the lower bound for minimum comparison selection", by Ruzicka, Wiedermann from 1978, cf. http://dml.cz/bitstream/handle/10338.dmlcz/103726/AplMat_23-1978-1_1.pdf showing the best known (in late seventies) lower bounds for the selection problem. I do not know whether these bounds have been improved since then.

10. Dorit Dor, Uri Zwick,
Median selection requires (2+eps)n comparisons. SIAM Journal on Discrete Mathematics 14, 312-325 (2001) (also see Zwick's homepage for draft)

refers to a 1974 TechReport by Kirkpatrick. As reported in Dor/Zwick, an exact formula for V(3,n), n>=50 is known due to Kirkpatrick. They also reproduce the exact formula. I haven't looked into Kirkpatrick's paper yet, might be difficult to obtain it.

If there is a uniform optimal algorithm in case i=3, what does it do for n<50? Please post about it if you find smth interesting.