## Tuesday, December 19, 2006

### Secret Santa is NP-Complete

Every year my group of friends undertakes a Secret Santa gift exchange.  When we started we each drew names from a hat and bought a gift for the names we drew.  Being budding programmers, we soon dispensed with the hat and wrote a program to do the work for us.  In 1999 a friend and I wrote a C++ app to do the work.  Though we've been running it every year, the code hasn't much changed in the intervening seven years.  It is getting dated and needs reworking.

Toward that end, I've been contemplating the match routine lately.  The current one does something naive like:

Pick a person

Choose a match at random

If there is a conflict with that person, slide to the next one on the list.

Once a match is found, remove those people from the relative lists and pick a new person.

If a match cannot be made, start the process over.

With a small number of people and not a lot of blacklisting (husband shouldn't draw wife's name, etc.), this algorithm will work.  However, for a complicated list of people, this algorithm is nondeterministic and could theoretically run forever.  I was thus searching for a better algorithm.  One which was faster than a complete enumeration of all options and deterministic in nature.

This weekend I took the final for my Formal Models of Computation course.  (Yes, this ties in with the above--be patient) The last thing we covered was complexity classes and the concepts of P and NP.  What follows is a brief description of this concept.  For a more formal handling of the subject, check out the Wikipedia entry

In theoretical computer science, they don't use real computers.  Rather, they use formal models called Turing Machines.  These have the same power to solve problems that modern computers do, just a bit less efficiently.  They are a good proxy for real computers.  The speed of these machines is measured roughly in terms of the input.  So given an input of length n, a linear algorithm would take O(cn) time, where c is a constant.  We usually ignore these constants and just call it O(n) time.

There is a class of problems called P or Polynomial-time which represent those problems that can be solved by a Turing machine in a time which is a polynomial of the input length.  That is, O(n^2), O(n^3), ... , O(n^k).  These are generally thought of as those problems that computers can efficiently solve.

There is another class of problems called NP or Nondeterministic-Polynomial-time.  These represent those problems that can be solved by a nondeterministic Turing machine in polynomial time.  A nondeterministic Turing machine is bascially one that can do more than one thing at once.  When it comes to a fork in the algorithm, it can take both forks simultaneously.

It is assumed that NP describes a bigger universe of problems than P.  That is, P !=NP.  What takes nondeterministic Turing machines polynomial time takes regular Turing machines exponential time.  That is, they take something like O(2^n) time.

Back to my Secret Santa problem.  It was just after my final that I turned my thoughts back to solving this problem.  It then hit me that what I was trying to do was impossible.  There is a well-known NP-class problem which involves finding a Hamiltonian circuit.  A Hamiltonian circuit is a path that traverses an entire graph by visting each node exactly one time.  It turns out that this is exactly the problem I was trying to solve.  Imagine my Secret Santa problem as a graph where each person is a node and there are edges between all nodes that are not blacklisted.  In this view of the problem, I'm trying to find a path around the graph, visting each node once.  In theoretical computer science this is known as a reduction.

This analysis pretty much dashes my chances of finding an elegant solution to the problem.  There is no true solution other than brute force trying each combination.  With the small number of nodes in my usual matching, this works but I still want something better.  All is not lost, however.  There are some techniques I can use to get close to the solution without necessarily trying all of the combinations which I intend to investigate.  I'll write about them after I understand more.

Wow.  I guess I did learn something practical in that class after all.

1. The Hamiltonian circuit problem is only NP-complete for general graphs.  If you restrict the kinds of graphs that you consider, the problem may be become tractable.
For example, if you restrict the graphs to ones where every vertex has degree 2 (circles), the Hamiltonian circuit problem is trivial.
I wonder if there is a way to add reasonable constraints to the Secret Santa problem that allows for an elegant solution?
Of course, the brute-force solution also has its charms...
Given a set P of people P1, P2, P3... Pn;
And a set E of disallowable edges E1 = P1a -> P1b, E2 = P2a -> P2b, ... Em = Pma -> Pmb;
Find a Secret Santa cycle Pi1 -> Pi2 -> ... Pin -> Pi1 so that none of the edges of the set are disallowed;
Algorithm: construct the set of all permutations of P that start with P1 -- this is a set of size (n-1)!
Construct an enumerator to iterate through this set, starting with P1 -> P2 -> P3 ... -> Pn, and ending with P1 -> Pn -> Pn-1 -> ... P1
For each permutation, determine by inspection whether it contains any forbidden edges.  If not, use this as the solution.
It may be objected that there is no randomization in this algorithm.  This is true as far as it goes.  I considered various methods of introducing randomness, and decided the simplest method would be to randomize the initial assignment of the Pi subscripts to the individual names.

2. As Maurits said, if you simplify your search space you can very easily get to a quicker deterministic algorithm.
The easiest way is to constrain to a "Secret Santa circle" in that each 0 -> 1 -> 2 -> 0.  In this case it's simply a matter of shuffle the list of people and iterate through giving each the next person in the list.  Pick a shuffle algorithm of your choice (Knuth's is O(n)) and the iteration is O(n).
Blacklists obviously complicate things, but under the above constraints shouldn't be too complex.

3. A blacklist makes creating a graph with all vertexes degree 2 very difficult.  Likewise, it would make the "create a list an shuffle it" hard too.  At least, it would unless I'm misunderstanding it.  If you have a bunch of names with no constraints then you can just assign each randomly to a position in the chain and come up with a solution but once you have blacklists, things become much harder.
From some initial reading it appears that the are heuristics that can be applied which sometimes work and some backtracking methods which will help to constrain the searches.  It's still a difficult problem, but not quite as intractable.

4. Indeed, blacklisting is the heart and soul of the problem.
If there is no blacklist then the mathematical problem is "Find a Hamiltonian circuit through the *complete* graph on n vertices" - as you note, this is trivially solved by listing the vertices in whatever order you see fit.
If the blacklist is so intense that each person is only willing to deal with two other people, then the mathematical problem is "Find a Hamiltonian circuit through a graph G, all of whose vertices have degree 2" - this is also trivial.  I was mistaken in my initial statement that all such graphs are circles; you could also have disconnected graphs with several circles.
If the blacklist is symmetric* and without any other internal structure, then you have a restatement of the general Hamiltonian circuit problem on an undirected graph, which is NP-complete and the brute-force solution is as good as you can get.  A good implementation of the brute-force algorithm should be able to step through the various combinations in polynomial memory, and only take exponential time for very severe blacklists.  I would not be surprised if there was a polynomial algorithm for finding a Hamiltonian circuit given a high enough edge density.
If there is hope in finding a wholly polynomial algorithm, it lies in finding an internal structure to the blacklist.
Your example blacklist entry is encouraging: "husband shouldn't draw wife's name".
Other sample blacklist entries: "person shouldn't draw the same name they drew last year."
If the blacklist entries can be standardized, the possibility of a polynomial algorithm that leverages the structure materializes.  For example, the nodes could be structured into families, and the rule "members of a family cannot draw each other" formalized; this allows analysis impossible in the general case.  In particular, if one family is bigger than all the others put together, no perfect Secret Santa chain exists.
If the set of people is the same for two years running, and the blacklist is symmetric*, a trivial (if rather lame) solution is simply to reverse the chain from the previous year; give to the person who gave to you.  This violates the "Secret" condition, though, so I'll make it explicit: "person shouldn't know, or be able to deduce, who their Secret Santa is; nor who anyone else's Secret Santa is, with the exception of the person whose name they draw."  This is the source of the need for randomization.
* By which I mean there are no one-way entries... for example, if Alice can give to Bob, but Bob is restricted from giving to Alice, then the analogy to the general Hamiltonian problem is broken.

5. > I would not be surprised if there was a polynomial algorithm for finding a Hamiltonian circuit given a high enough edge density.
Bingo!
http://web.cs.wpi.edu/~gsarkozy/Cikkek/32.pdf
... which is a generalization of...
http://citeseer.ist.psu.edu/dahlhaus93parallel.html
So if you can guarantee that
1) the blacklist is symmetric
AND
2) any individual person is eligible to give/receive from at least half of the remaining people
... then there's a polynomial algorithm.
See the Find_Hamiltonian_Circuit problem in the .pdf above.
If your blacklist is more intense than this... well... the brute force algorithm works, it's just a little slow.
It's also much easier to implement, I would think...
If you're a zombiemaster with unlimited computing power, this is the kind of NP-complete problem that lends itself well to parallelizability, since it reduces into independent components.  You could find (n-1) slaves and give each of them one of the following tasks:
1) Find the solution given P1 -> P2
2) Find the solution given P1 -> P3
...
n-1) Find the solution given P1 -> Pn-1
These slaves could each, in turn, recruit (n-2) subslaves and parcel the task out one chain deeper.  Eventually, one of the slaves' slaves' slaves' slaves' ... slaves finds the solution and reports back up the chain; each superior can then instruct its other subordinates to cease processing.
In the ultimate example of cheap parallel power, I was fortunate enough to be a student of Leonard Adleman when he solved a seven-node Hamiltonian path problem using DNA molecules:

6. Oh, and here's a C++ implementation of the solution that works for Dirac graphs (each vertex has degree at least n/2:)
(section 6 - hamilton.cpp)

7. I'm not sure that the reduction of the Secret Santa problem to the Hamiltonian Circuit problem is valid.  While it is true that a solution to the Hamiltonian Circuit problem would solve the Secret Santa problem, I think there are valid solutions to the Secret Santa problem that would not be encoded by a Hamiltonian circuit.  Take the simple case of four people (A,B,C and D) with no constraints.  The solution A->B,B->C,C->D,D->A would solve the problem and be a Hamiltonian Circuit.  But A->B,B->A,C->D,D->C would be a valid Secret Santa solution but not a Hamiltonian Circuit.  Even if we want to restrict things so that people don't have each other as Secret Santas, for six people we would have the solution A->B,B->C,C->A,D->E,E->F,F->D.  Or am I missing something?

8. Interesting observation Eugen.  It is certainly possible for a secret santa problem to be solved by creating subgraphs, each of which is a hamiltonian circuit.  While this makes the n in O(2^n) a smaller n, each subgraph is still trying to create a hamiltonian circuit.  I'm not sure how much this make it a non-NP problem.  It also seems to introduce new choke points into the algorithm.  If I arbitrarily break the graph into some smaller number of sets, how do I know that these sets will match and if they don't?  If they don't, I'm back to my initial problem of a Hamiltonian circuit.
It's also possible that the graph isn't fully connected.  I think this would have to be taken into account by an algorithm before we tried to find a hamiltonian circuit for it.

9. I think Eugen is referring to the implementation most used by non-programmers -- to wit, pulling names out of a hat.  This results, not in a Hamiltonian circuit, but in a series of cycles.  You might get lucky and get one big cycle, but the probability is very large that you will have at least two.
It is fairly trivial to write a computer program that follows this implementation.  It's certainly not NP-complete.
There are several problems with this, though.
1) Someone might draw their own name from the hat.  This is typically worked-around by having them simply draw another name.
2) The very last person might draw their own name from the hat.  This is typically worked-around by having someone in authority look at the last two names in the hat, and hand them out manually to the last two people.
3) The worst problem, though, is this makes the payoff inelegant.  It's traditional for the unwrapping of Secret Santa presents to be in a particular order, along the "found" paths in the graph.  Something like:
Alice opens the present with "To Alice" on it.  "Wow!" she says, "a left-handed corkscrew!"  After some gushing, John reveals himself to be Alice's Secret Santa.
Now it's *John*'s turn.  John opens the present with "To John" on it.  "Wow!" he says, "a dehumidifier!"  After some gushing, Eric reveals himself to be ...
... and so on up the chain.
If you have a single Hamiltonian cycle, this works out great.  Alice is the last person's Secret Santa.
If you have multiple cycles in the graph, you get to a sticking point.  Fred opens the "To Fred" gift.  "Wow!" he exclaims, "Season tickets to Husky volleyball!"  After some gushing, *Alice* reveals herself to be Fred's Secret Santa.
At this point, there's an impasse.  There are people left that haven't opened gifts... and there are unopened gifts... but *who gets to go?*
The tie is broken somehow, and the next cycle is traversed.

10. Eugen's "no two-cycles" constraint is interesting.  That breaks the analogy with drawing names out of a hat.  I can't immediately decide whether adding this constraint makes the problem NP-complete.

11. Exanding audience...
http://channel9.msdn.com/ShowPost.aspx?PostID=273941#273941

12. There is another difficulty with "pass the hat" model.  That is, it doesn't allow for blacklists.  If you are doing a secret santa exchange with a bunch of unrelated people (a work group, a class of kids), this isn't a problem.  However, if you are doing it in a group where there are families (brother/sister, husband/wife, etc.) then you really don't want a husband getting his wife for the exchange.  The chances are pretty strong that he's getting her a gift elsewhere anyway.  At this point, you need to pull a name from the hat and make sure it isn't on someone's list.  If it is, throw it back and draw another.  In a well connnected graph this still mostly works, but at the constraints increase, there is greater and greater likelihood that toward the end you'll get into a situation where the person drawing the name is blacklisted off of everyone left in the hat.  Oops.  Time to start over or do some serious backtracking.