## Thursday, December 17, 2009

### A hard problem inspired by an easy problem

The following problem was problem 1 (the easy one) on the Maryland Math Competition 2009 (I will later report on how the students did on it).
1. Show that for every set of three integers we can find two of them whose average is an integer.
2. Show that for every set of five integers we can find three of them whose average is an integer.
I am sure that most of my readers can do this problem. Here is a generalization. I do not know if it is true.
(Conjecture) For all k, every set of 2k-1 integers, there exists k of them whose average is an integer.
1. The UMCP competition asked for the k=2 and k=3 cases of the conjecture. They are true and easy.
2. I have done the k=4 case. It was tedious but not hard.
3. I think I have done the k=5 case but it was alot of cases so I may have missed one.
SO, is the conjecture true or not? I have asked around and people who should know don't seem to, so I throw it open to the blogosphere.

What I hope happens: Someone posts a nice proof.

What would also be okay: Someone posts a counterexample.

1. You can probably solve the first few k systematically by writing an easy program that looks through all the cases. (Only the remainder mod k of the input numbers matter.)

2. Oooh even I could quickly see k=2 (I usually suck at these "competition" questions):

POINT: To get a non-integral average you have to add an odd number and an even number.

I assume you just generalize this idea with more cases for k>2. (too lazy to try it now).

For general k, it seems like one of those Ramsey type things, probably not trivial.

3. Relevant citation from "Zero-sum sets of prescribed size" N Alon, M Dubiner, Combinatorics, Paul Erdos is Eighty (1993) vol. 1 pp. 33-50: "any sequence of 2n−1 integers contains a subsequence of cardinality n the sum of whose elements is divisible by n" (P. Erdo ̋s, A. Ginzburg and A. Ziv, Theorem in the additive number theory, Bull. Research Council Israel 10F (1961), 41-43).

4. Here's a link to the paper John Mount points to: http://www.tau.ac.il/~nogaa/PDFS/egz1.pdf. That seems to settle Bill's conjecture.

5. Skimming the paper in question. It has multiple proofs of the theorem- but none of them are trivial they all need at least one more big idea to get the Ramsey structure to come out.

6. Bah! I was about to post my Cauchy-Davenport proof, and along comes Lance and posts a reference to a paper with one. :)

Seriously, I suspect there's not a substantially easier proof than via Cauchy-Davenport. p = 2 is direct Pigeonhole. The easiness of the p = 3 case is a fluke due to the fact that obviously in any counterexample (over Z/pZ, and the proof that it reduces to those cases really is easy) the sequence can't hit every congruence class. so there has to be a congruence class with at least (2p-1)/(p-1) members of the sequence by Pigeonhole; for p = 3 this is enough, for p > 5 it's not, and in fact Pigeonhole estimates should tend to get worse as p goes to infinity.

So you need to use something along the lines of: If the sequence isn't supported on at least 3 elements of Z/pZ, we're done by Pigeonhole. (Another way of seeing why the p = 3 case is easy.) So we try to build up a subsequence summing to 0 recursively; at each step we have at least 2 different choices of the next element of the sequence to add, so the partial sums of different sequences "branch out" very fast. And even though some of these branches intersect, the intersection isn't enough to prevent every congruence class from being hit.

What I just outlined is morally a very rough version of the Cauchy-Davenport argument. I'd be interested in seeing a proof even for p = 5 or p = 11 that didn't either involve lots of casework, or proceed along roughly the above lines.

7. For 4 just use 2.
So at least its easy for powers of 2.

8. This comment has been removed by the author.

9. This comment has been removed by the author.

10. Mudetasigma, (1 + 3 + 5)/3 = 3. Maybe you were dividing by 2.

11. @mudeltasigma - 5 + 7 + 9 = 21

12. I'm amused that it seems everyone is trying to make things too difficult or complex.

The problem with three integers is that you know there will be combinations of even and/or odd integers. If you have at least two even integers then the average of them will be an odd or even integer (E.G., 8 & 10, ave = 9; 8 & 12, ave = 10), and if you have at least two odd integers the average will be an integer, since the sum of two odd integers is an even number, which when divided by 2 is an integer, odd or even.

The problem with five integers will have a similar solution for any given three. In one of the above posts that examined all combinations of 1 through 9, there were three combinations that had an integer as average: {1,3,5. Ave=3}, {3,5,7. Ave=5}, and {5,7,9. Ave=7}. Puzzled why it was not seen.

13. Is there an easy algorithm to find the set?

14. Here's the original proof by Erdos-Ginzburg-Ziv:

http://www.math-inst.hu/~p_erdos/1961-25.pdf

(it's the simplest one, as I can see)

15. How to find such a set is an interesting question. I have a poly-time algorithm in mind, but there may be much more efficient techniques that I haven't heard about (interested to hear if this has a linear-time solution or some such). There is a n^3 dynamic programing solution. Make a table indexed by three integers: a from 0 to 2n-1 the length of an initial sequence, b from 0 to n how many items have been selected from the interval [1,a], c the residue class mod n. Then table(a,b,c)=1 says that there is a set of exactly b distinct values from the first a values that sum to exactly c modulo n (=0 says none such, and=? means not filled in yet). This table, as is typical, is obvious for a=0 and can be filled in incrementally. At the end table(2n-1,n,0) has a 1 (since we know there is a solution) and back-chaining gives us the solution.

16. An O(n^2) solution seems to come via divide and conquer. The recurrence relation being
T(n) = 2T(n/2) + O(n^2).
Let me check if I am doing this correctly.

17. do you homeworkbefore posting1:31 AM, December 18, 2009

Bill, learn to do your own research before burdening your readers. It's not a good post if you don't even *try* to find relevant papers to your question; unless of course you expect your graduate students to come up with the same sloppiness please refrain from posting bad posts. If this aint possible please refrain from posting at all.

if you had found the paper yourself and included this in your post, you could have asked for interesting questions pertaining to an improvement of "this and that".

18. I don't understand the harshness of the previous comment. Does every post have to be a well-researched, polished piece of artwork? This is a blog, not a publishable article. Bill did say that he had asked several people if the conjecture was true. If he had given this problem to a graduate student as a possible thesis topic, the student would have to do the necessary research of course. I, for one, am glad to find out quickly that the papers mentioned here exist. There are some knowledgeable people out there reading this blog, and sometimes it's more efficient to ask than to research everything yourself, especially in areas outside your own.

19. Anonymous 1:31 AM, it's called "comparative advantage." Lance could probably tell you about it. Bill asked a few people whether they had pointers to a proof, after which the opportunity cost became too high for him to search the literature himself. And unless you knew about the Erdos-Ginsburg-Ziv paper or had the idea to use Cauchy-Davenport it would probably be nontrivial to find.

20. Nice problem, i really like such posts!

Venkat, such a divide-and-conquer approach seems to work for any k that is a power of 2. I suspect that this is mentioned in one of the papers. However, i though it would be nice to mention this here. Specifically, we can easily prove via induction that, for any k that is a power of 2 and any set of integers A of size 2k-1, there is a subset B of size k such that sum_{i in B} mod k = 0.

The induction start k = 2 is trivial (as explained above).

For the induction step, assume as induction hypothesis that the claim holds for some k that is a power of 2, and consider a set of integers A of size 4k-1. Moreover, consider an arbitrary subset A' of A of size 2k-1, by the induction hypothesis, there must be a subset B_1 of A' of size k such that sum_{i in B_1} i mod k = 0. Remove B_1 from A. By iterating this as often as possible, namely three time, we obtain three sets B_1,B_2,B_3 of size k with sum_{i in B_j} i mod k = 0, for 1 <= j <= 3. Hence, there must be two sets among these sets whose union is a set B of size 2k with sum_{i in B} i mod 2k = 0 (this is basically the case k=2 with three integers). qed.

Did i miss something? Can these arguments be extended to arbitrary k?

cheers,
tim

21. Picking up on Venkat's idea, is there a randomized divide-and-conquer algorithm that terminates in O*(n^{3/2}) time with high probability? (O* hides polylog factors.) Suppose we split the list randomly into two parts: then w.h.p., the two parts will be of size n +/- O*(sqrt{n}), and the hidden solution we want to find, will split into parts of size n/2 +/- O*(sqrt{n}), whp. So, suppose the recursive guarantee we want is: given a sequence S and an interval [a,b], find for each i in [a,b] and each c in Z_n, some subsequence (if one exists) of length i summing to c (mod n).

This heuristically a gives the recurrence T(n) = 2T(n/2) + O*(n^{3/2}), which solves to O*(n^{3/2}). One potential issue is that even deeper down in the reduction, we perhaps need to enumerate all c in Z_N, where N is the parameter of the original problem, rather than the current, smaller "n".

I don't have time right now to think much more about this, and thought I will suggest this heuristic idea for one of the young people to fix :)

22. I stumbled across another interesting theorem when thinking about the proof of 2k-1 theorem:

Any set of N integers contains a subset whose sum is a multiple of N.

This has a simple proof so I'll let the readers figure it out.

23. An unusual aspect of this theorem is that the bound is tight. For odd k, having k-1 copies of n and k-1 copies of -n is a simple construction which has no subset of exactly k elements whose sum is 0. Maybe this observation could help with finding a simpler proof.

24. @AK: The empty subset has a sum of zero, which is always a multiple of N, QED.

25. I think I have a proof for all complex n's (based on induction and assuming that there is a proof for all primes). A proof for primes however still eludes me.