Disjoint Sets
April 2, 2010
Disjoint sets are a data structure that supports grouping, or partitioning, n distinct elements into a collection of sets, each containing a distinct subset of the elements. The operations supported by the data structure are find, which determines to which subset a particular element belongs, and union, when merges two subsets into one.
It is straight forward to implement disjoint sets using linked lists. The problem is that such a representation can have quadratic performance in the worst case, and the worst case, or something close to it, occurs reasonably often.
Thus, most algorithms textbooks describe a data structure called a disjoint-set forest. Each subset is represented by a tree, with each node containing a pointer to its parent; these are generalized trees, not binary trees, with potentially many children at each node. The make-set function returns a singleton tree, find chases parent pointers to the root of the tree, and union makes the root of one of the trees point to the root of the other in its parent. Two heuristics are common; union joins the tree with smaller rank (the upper bound on the depth of the tree) as a sub-tree to the tree with greater rank, and path compression replaces the parent pointer of each node with a pointer to the root of the tree instead of a pointer to the parent.
Your task is to implement a library that provides the make-set, find and union operations for disjoint sets, using the disjoint-set forest data structure described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.