Thursday, June 02, 2011

Partioning Students

One last Molly post for the end of the school year. Tomorrow is her birthday and I enter that moment I have been dreading for the past thirteen years: Two teenage daughters.

Molly's 7th Grade Science teacher wanted to break her class into groups of three to work on a particular project. Each person in the class was asked to list three people they wanted to work with and one person they didn't. The teacher would partition the class so everyone wanted to work with someone else in their group and the person they didn't want to work with was not in their group.

The next day the teacher she thought the partitioning should be easy but took her many hours. One student said she should just ask a computer to find the partitioning. Molly said she needed the right algorithm. When Molly told me about it that night and asked if I could help her figure out the algorithm. I said the problem was likely "NP-complete" (since it is a variation of Exact Cover) and would have to search all the 190 million different partitions of her class of 18 into groups of 3. OK, I probably said this to get out of thinking about an algorithm but maybe Molly got just a little more understanding of P versus NP.

Bill and I are at FCRC next week. Hope to see you there.

16 comments:

1. Lance --

Admittedly without thinking much about the specific problem, I'll complain about your response.

Sure, let's assume the problem is NP-complete. That doesn't mean that the particular instance is hard to solve. One might suspect, if there are an abundance of available solutions for this problem instance, that a good heuristic would find one very quickly.

I say this because I teach an algorithms class, and in that class I specifically try to teach that "It's not do-able" when you get an NP-complete problem is not a suitable answer. (In the real world, I'd like to think that's the sort of attitude that would lead to a stagnant career.) Indeed, this is why I spend some time in the class on heuristic methods (even though that's somewhat unusual, I understand, for algorithms classes).

A better answer would be that the problem seems NP-complete, so what can we do about that? (Use a heuristic, settle for an approximation, change the underlying problem somehow, see if it's small enough for brute force to work,...)

2. ok, but 190 million combinations is nothing for a PC and should have been doable.

3. Andreas Bjorklund10:07 AM, June 02, 2011

To complement Michael's and Scott's answers (and to take the opportunity to advertise my own work) I just want to point out that there are much faster exact algorithms than "trying all partitions" for problems like exact cover by 3-sets. See

http://arxiv.org/abs/0910.0460

4. Yeah, this seems fairly tractable with the size given (and somewhat interesting.)

I don't suppose you could ask your daughter ('s teacher) for (anonymized) data here so people can play with algorithms?

5. This sort of problem, with working sub-solutions, might lend itself well to genetic algorithms. If there is no solution, it won't be able to tell you, but on the chance there is it'd be worth running a simulation for 40 minutes to see what it finds.

6. five posts and nobody pointing out the typo in the title? for shame!

7. It's interesting to think about the thresholds at which this becomes feasible. Obviously, the class is going to have some sort of social structure...but let's ignore that and assume all likes and dislikes are chosen u.a.r. In that case, the chance that two randomly chosen partners are OK with me are ((16 choose 2) - (13 choose 2))/(17 choose 2) ~ 0.31. Making lots of (false) approximations about uniformity, the probability a group of 3 is OK with all members is about 0.31^3 ~ 0.03, and the expected number of valid groups of 3 is about 24. I'd happily search for a size-six cover out of 24 subsets. (and I somewhat doubt one exists (in expectation.))

8. Each person list 3 people they want to work with and 3 they don't. As their group must contain one of each, there are only 9 possible groups this person could be in. So if you build a solution iteratively by trying all 9 groups for an ungrouped person, you should only have to try at most 9^6, or about half a million partitions. With pruning, this should be pretty fast.

9. Does this count as a further argument for small class sizes?

10. Is it clear that a solution always exists?

11. this looks like the stable allocation problem which is somehow a generalisation of the stable marriage one.

12. Is it clear that a solution always exists?

Actually, it is clear that there are scenarios where there is no solution. One simple example is where person A lists persons B,C,D as ones they want to work with and all three of those listed person A as the person they did not want to work with.

13. I am a PhD student in computer science and operation research and my day-to-day tasks generally consist of looking for practical solutions to NP-Hard problems. The discussion here is very good and I think a glimpse into my working process may continue to shed light on how these problems may be solved in practice and connect several points already mentioned.

The first major issue is what problem instances to test on. Ideally you have some real-world data which can be used for testing. Without real-world data randomly generated instances can be reasonable but the design of the instance generator must be taken with care to minimize bias. Once the testing data has been selected, the next step is to start with a greedy or approximation algorithm and a lower bound (if one is easy to find). In the event that the greedy or approximation algorithm is "close" to the lower bound we may stop here. This is very context dependent. For some problems being within 10% of optimum is reasonable while in other problems every 0.1% closer to optimum is worth considering.

In the event that there is a "large" gap between the greedy solution and the lower bound then the next step is to try complete search methods to find the true optimal solution. The most popular approaches are, enumeration, constraint programming (AI tree search), and integer programing. Each one of these can be very effective in practice for particular types of problems up to some fixed instance size.

However, the unfortunate reality is that all complete methods cannot escape exponential growth. No matter how effective a specialized complete search is, there is always some input size which it too large to solve completely in practice. In the event you need to solve a problem of that size, local search or heuristic approaches, such as, simulated annealing, tabu search, genetic algorithms, and MCMC, become a necessity.

To get back to the problem mentioned here. I would anticipate it can be solved using a complete search technique (as others have suggested enumeration seems feasible). However, consider solving the same problem for the entire school instead of just one class, with this change of context the solution approach maybe significantly different.

14. Just as a simple illustration: Assume student A would like to work with students B, C, or D. A then must be paired with at least one of B, C, or D. Let’s assume A and B are put into the same group, and that B would like to work with E, F, or G. Now at least one of E, F, or G must have named A or B -- or else A and B are not a valid pairing. Very quickly you can list out all of the valid groups of which A could be a member. In this particular problem there are at most 45 such groups (if everyone wanted to work with A), but for a random dataset it’s usually a pretty small number.

The easiest approach would be to find a student that’s only a member of one valid group, remove those three students from the set, and repeat the process for the remaining fifteen, then twelve, etc., until you’ve either found a solution or, much more typically for random datasets, shown that no solution exists.

In random datasets there’ll usually be a student with only one valid group, but in real datasets with more reciprocal preferences (A wants to work with B, and B wants to work with A) it may well be that all students are members of two or more valid groups. In that case just start with the student with the smallest number of valid groups.

I tried several randomly generated datasets and could show in each case that no solution existed in about ten minutes using pencil and paper. When I rigged one preference for each student to ensure a solution existed, I easily found the intended solution and two others, and could show they were the only possible solutions.

I then tried to generate a realistic dataset that might have a lot of solutions. I made some of the students more popular and some less popular and, more importantly, introduced more reciprocal preferences than were typically present in the random datasets. I was able to find the 120 possible solutions in less than an hour with pencil and paper.

I then tried to come up with the worst case for this algorithm. Carefully choosing all the preferences (and here ignoring the one student each excludes from their lab group) I came up with a dataset where each student was a member of ten valid groups. Even here, though, when each remaining set of fifteen, twelve, nine, and six students are evaluated the number of valid groups decreases and the total number of partitions that needs to be analyzed seems to be on the order of thousands, not hundreds of millions.

This general approach seems to work quickly even for larger class sizes. (So far I’ve only tried a 36-member class, which would have over 3E+23 possible partitions.)

15. I think Lance stumbled on to the solution to the two-teenage-daughter problem... whatever the issue that his daughters need help with; boyfriends, girlfriends, gossip at school, sports, what clothes to wear or not wear, etc. just say you can't help since its NP-complete, go ask your mother!

16. This partitioning is not at all an NP-complete problem, but is in fact an extension of the classic matchmaking algorithm and one that has a polynomial time solution.

In this particular example, the students do not rank their potential collaborators in an absolute order of preference, but in fact, under-specify their preferred order (by listing their top three collaborators and their least favorite potential collaborator). Therefore the teacher is free to create an arbitrary preference order of potential collaborators for each student as long as it conforms to each student's specified constraints.

Although it is true that the teacher cannot guarantee that each student will end up with their most preferred collaborators - it is nonetheless possible to define an optimal grouping of students: that is, for every student, there is no other possible grouping which that student would prefer over their assigned group, and in which the two other co-collaborators would also prefer to be in over their assigned groups.