Centroid decomposition

DISCLAIMER: I will not be going over centroid tree. Just the decomposition technique as its far more useful than the actual centroid tree.

Usefulness

Before I describe the technique, it's better if i go over the usefulness of this technique first. Suppose we have been given a problem where we have to find the sum of F that is commutative that we have to compute over all possible paths in out tree. Oh and yes.. THIS TECHNIQUE IS ONLY FOR TREES.

For now, let F be sum of edgeweights. Note that this could be done using other techniques too, and even faster than centroid decomposition, but this will serve as a nice example for the basics. We will approach harder problems later.

Main Idea

So the idea here is, it would be very nice if we could root the tree at some point, and then calculate this function for all paths passing through the root. This would have been done easily with a DFS. The problem is, we will then need to root the tree at various points, and calculate the function for all the paths passing through the selected root. Moreover, how will we even manage to keep track of the paths for which we have already calculated this function F so as to prevent overcounting? Turns out, we can do all this by selecting what we call the centroid of the tree at each step.

First, suppose we choose a node V as our root. Then, by our assumption, we will calculate the function F for all paths that pass through this root. Notice that all paths passing through the root will pass through two different subtrees of the root, unless one of them has an endpoint at the root. So first we can calculate the total sum of paths so that one endpoint is the root using a simple DFS. 

To calculate cross-subtree paths, notice that we can do so too by keeping a sum of all paths that ended in a subtree of the root that we have already visited during this DFS (so as to prevent overcounitng the same path) so that one of their endpoints was the root. So we are basically doing the following:

(u,v) = (u,root) + (root,v) 

Here, u is our current node, and v is a node in some other subtree. So what happens is that, when we are at a node u, we want to consider all other nodes v and take this sum. Now obviously we will count everything twice: once for u and once for v for all the pairs u and v. This can be avoided if we maintain that we only consider those V's that we have already visited. How I think of it is that during a DFS we are visiting the subtrees from left to right and we only consider those v's that are in subtrees left of me. So when doing the DFS, we keep a sum of all these (root,v) so that we can directly combine the result as (u,root)*number of nodes in previous subtrees + sum(root,v).

So till now we have established how to calculate our function F for all paths passing through our selected node V. After this, if we simply delete V, our tree gets disconnected into a forest of nodes, so that none of the paths we already considered are present in any tree. Moreover all the paths that we had considered are now deleted. Also, all paths that did NOT pass through V still REMAIN in one of the trees in the forest.

What next? Well, just recurse in the individual trees as if they were independent!  In this way, we will able to compute the function for all the paths. So the key is, we select a node V each time and then compute for all paths the function F so that the path passes through my root node. Here we utilize the fact that the root node breaks up the path into two parts which we can calculate using a dfs. Most of the time we will some sort of lookup table when we are going down the subtree so that we can compute the function F so that one endpoint is at our current node and another node is at a subtree that we had earlier visited. Then delete the node V and recurse on each of the tree that we get individually.

Centroid

This is all nice and fine, but, how to find a reasonably good node V each time, so that finding this node is fast and the overall time is fast too. I mean, if we just selected a random node, it could very well take up quadratic time overall. Thankfully, here is where the centroid saves us. 

Firstly, lets just assume that once we have a node V, we can do those calculation in O(size_of_tree) time, since afterall we are basically doing DFS. (Note that this mostly depends on the problem, but is generally never more than O(size_of_tree*logn) time. 

So we define a Centroid of the tree as a node so that, if we root the tree at that node, all it's subtree will be less than size_of_tree/2. Now suppose we can find the centroid in O(size_of_tree) time too. 
So how to compute the overall time complexity? Well, think of the deletion event as a branching point. At each deletion or "branching" the subsequent trees that form are all less than or equal to size_of_tree/2. So, we can branch consecutively atmost log(n) times. Now, at each level of this branching, the TOTAL sum of sizes of all trees formed after K branchings will always be bounded by O(n). Since we assumed that each computation for a given centroid takes O(n) time, so at each level ( which constitutes of all the trees formed after K branchings) the sum of time taken will also be O(N). As number of levels is logn, total time complexity is nlogn. 

This is a quite common method of computing time complexities of divide and conquer techniques. It is analogous to complexity of merge sort for example, but instead of dividing into exactly 2 trees at each stage, we break into multiple trees, although sum of all nodes is still bounded by the same number, which is all that matters.

How to find a centroid

Finding the centroid is explained pretty well here. I won't be too repetitive :P












Comments

Popular posts from this blog

Small-to-Large

Segment Tree Beats - An introduction

Getting prepared for RMO