Union-Find(disjoint-set) Data Structure

One of the main challenges in implementation of Kruskal’s algorithm is deciding whether to add edge (u, v) to a component or not. What we need to do there is, we need to find the components that u and v lie in, and see if they are the same or not. If they’re the same, don’t add, if they’re different, do add.

We need a data structure that support operations of union and find.

So the union-find implementation works as follows, we maintain all the vertices, all the connected components, and the members that the components are gonna live in a set of trees.

These trees are called up-trees because here, children points to the parents but are drawn upside down so the root is at the bottom with leaves at top. In addition we also have an array of pointers that point from all the vertices of the graph to their location in the up-trees.

All the vertices lie only once in this structure. Additionally all the vertices that lie in a up-tree is said to lie in one component.

For instance, let’s suppose we have three components:

  1. g -> c -> d -> a (a is root)
  2. h -> i -> j -> e (e is root and h the leaf)
  3. f -> b

Name of the component is the vertex at the root. So name of component of g is a and for j is e. We will also have to see that when we do union and find operations none of these would take more than ‘log n’ steps so that everything be efficient.

Union-Find Implementation

find(g) will return a; find(b) will return b.

To perform union operation, the root of the smaller tree will point to the root of the larger tree.

Now the access time for smaller tree will increases by one(smaller tree nodes require traversal to their roots than to the added larger tree root). While the access time of larger component remains same.

Hence for each union operation, the path length increases by one but the length of the tree would be at most log n.

how?

What happens to the size of the tree after union operation?

After union the path length of one of the trees increases by one and the size of the tree becomes 2k.

It is obvious that for an increase in path length by one, size of the tree atleast doubles.

Hence if the tree length becomes n, the path length will be at most log n.

Analysis- O(m logm)

Moreover for a connected graph with number of edges = m,

n-1(for a tree) ≤ m < n²-n (for a complete graph)

log(n-1)log(m) < log(n²-n)

O(logn)O(logm) < O(logn²)

O(logm) ~ O(logn) almost same.

So for a total m number of edges total number of find operations take mlogm steps.

And unioning two components basically take 1 step.

Hence net complexity is O(mlogm) time.