Spanning Trees

Spanning Trees in a Graph

We are given a graph with say N vertices and M edges. We call a subset of these edges a spanning tree of the graph so that if we keep only these edges, the graph still remains connected and so that the residual graph is a tree. This can also be thought of as specifying discreetly that we have to select exactly N-1 edges out of the total M that we are given.

Some points that should be specified here is that there can be a lot of different number of spanning trees of a graph. Also, ALL the nodes need to be connected by the set of edges that we had selected in our spanning tree.

Moreover, it doesn't really depend on whether we have weighted edges or unweighted edges, though we will shortly be looking at different types of spanning trees, some of which do take advantage of edge weights.

Finding Spanning Trees

To find any random spanning tree of a graph a simple DFS will obviously suffice. Assuming the graph is connected, the edges that we traversed during the DFS will form the spanning tree edge set. Same can be done using a BFS too. In this case, each time we visit a new node for the first time, we add the parent edge to the spanning tree set. In each case we can verify quite easily that the resultant set is such that it has exactly N-1 edges and that everything is connected if we consider ONLY those edges.

But finding random spanning trees is never quite what a problem usually looks for. So I will describe some special types of spanning trees that crop up from time to time.

Minimum Spanning Tree

We are initially given a weighted graph. We define the cost of a spanning tree as the sum of the edgeweights in the tree. Our task is to find a spanning tree whose cost is the minimum out of all the possible spanning trees possible. This can have varied applications as using a similar method we can also find a Maximum spanning tree --- Just negate all the edge weights!. Or a minimum product spanning tree! Here we take the log of the edgeweights. Still problems regarding minimum spanning trees are quite few.

To find a suitable minimum spanning tree, we primarily have 2 algorithms: Prim's algorithm which is basically a modifed dijkstra of sorts and then there is the Kruskal's algorithm which is far more useful. This is quite a comprehensive analysis of the Kruskal's algorithm and it goes against the convention of wikipedia articles being hard to understand :P

(The idea behind Kruskal's algorithm can be used in more areas as well. The technique of sorting the edges and going over them in ascending/descending order and merging nodes as required can be used in problems that are not related to minimum spanning tree problems too. )

(I wont talk much about MSTs or minimum spanning trees as this a topic which has a lot of material on the net.)

DFS-Spanning Tree

Remember earlier how we described a method to find a random spanning tree of the graph by just doing a DFS on the tree? Well, we call this tree a DFS-Spanning tree. The algorithm? Well, just start at a node and do a DFS! To visualize this tree, imagine that whenever you are visiting a new node, it's as if you are adding a child to your current node in the tree.

Here, we have a graph and a possible DFS-Spanning tree. Here we would have gone from 1 to 3 to 2 to 5, then backtracked till 3, then went to 6 and then to 4. However, what could we possibly gather from such a random selection of edges, as we generally do not include any special conditions when doing the DFS for creating this tree. Turns out, a lot. And all of it hinges on one key property :

BACK EDGES CAN ONLY GO TO AN ANCESTOR.

Firstly what is a backedge? Well, suppose we are doing a DFS and suppose we are at a node v. When we go over its adjacency list, there can be only two types of edges

Edges whose other endpoint is unvisited
Edges whose other endpoint is already visited

In the first case, we simply recurse to this new node and continue our DFS.

In the second case however, we do not recurse to this node. This edge is thus called a backedge: it connects our node to a node that has been already visited by our algorithm. This is also analogous to cycle detection -- Note that this back edge basically forms a cycle -- Since we had already visited the other endpoint, then there must be some path that we had taken to come from that node to this one(an advantage of DFS, we never "teleport" from one node to the other, there has to be connecting edges in between, even if we are backtracking) and since this edge is a brand new edge, that means that we have discovered an entirely new path to that node. Hence a cycle.

So why is our claim that a backedge only exists to one of my current node's ancestor? Well, suppose that it was in some other subtree which had branched out from one of my ancestors. Let this ancestor be lca = LCA(u,v) where v is my current node and u is the other visited endpoint.

Since it was in a different subtree of lca, that means it must have had backtracked from u to the lca before our current node v was reached. But if this edge were present, it would have visited v before it would have considered backtracking, because remember that backtracking is done ONLY when none of the neighbours are in a non-visited state. But if node v was visited from u, then it would have been a descendent of u itself and so our initial claim that u and v were in different subtrees would have been false.

Bridges and Articulation points

But is this property that useful? Why, yes!
For example, this can be used to find the articulation points and bridges in a tree. In short, articulation points are nodes such that, if we delete that node then the graph becomes disconnected. Similarly, bridges are edges upon whose deletion the graph becomes disconnected. How to know if a node is an articulation point? Just see if there is a back edge that from this node's subtree that goes to an ancestor of my current node. Correctness? well, a path from another subtree to my current node's subtree cannot exist by the same argument as with establishing that backedges can go only to my ancestor. Also suppose that there was no backedge that started from my subtree and went to one of my ancestors. Then the only path from, let's say, the root, to my subtree would be through my current node and if this node were to be deleted then the graph would be disconnected.
What we use here is a technique called a DFS lowlink. It basically assigns a "dfs timestamp" to each node, that is, after visiting how many nodes am i visiting my current one. A DFS-lowlink is basically the node with the least timestamp that i am connected to via a path that dosnt use parent edges through which we visited it(duhh. then we could just make the smallest timestamp node our dfs-lowlink). Basic idea is that since i can only go to an ancestor, if the dfs lowlink of any of my children is MORE than my DFS number, then it means that it could not have gotten "before" me per-se and so it would get disconnected when my current node is deleted. Bridges can be found using a similar idea.
Dizzy? read this tutorial for more details. I just gave a VERY short version :P

Other Usefullness

DFS-spanning tree can be used in a lot of other ways utilising the property that we can have backedges only to my ancestor.

For example, just to prove how important this property, the greatest advantage that we get is that we could maintain a segment tree while doing the DFS if we want. The segtree will basically contain the nodes that we visit, with the ith index containing the node with the ith depth in my current ancestor chain. Since backedges only go to my ancestors, each edge will be like a range on the segtree. When we move down, we can just update the depth[node]th index of the segtree. When backtracking, we dont even need to delete anything. Just overwrite it!

Another property is that we can detect cycles using this DFS spanning tree. Moreover, we can quite easily determine the length in case only disjoint cycles are present. If no such restriction is there, we can check for the existance of an odd cycle easily too. And many more fun stuff!

Basically what I mean to say is that, even in complex graphs, we can compute things as if doing them over a linear chain or an array. Which is significantly easier.

BFS-Spanning tree

The idea is similar to that of a DFS-spanning tree -- we keep the vertices visited as the childs of the current node and hence a tree like structure forms. However, this time no such nice properties exist, except perhaps that back edges could go only to other nodes of the current node's level or to a level above. Thus this is not too useful in solving problems.

A couple of problems

Problem 1:

We are given a hidden directed graph of N nodes and M edges. We are also given the outdegree and indegree of each node. Moreover, for each pair of node(u,v) there is atleast one edge either from u to v or v to u, or maybe both. As queries we can basically determine atmost N edges in the graph. We have to tell if the graph forms a SCC.

Solution:

Sort the nodes by their outdegree. Now select the node with lowest outdegree as the root. Now do a BFS. The resultant BFS-spanning tree will have the property that each node with level 2 and onwards(root is level 0) will have a directed edge to the root. This is because no edge exists from root to that node, otherwise it would have been in the first level, so it is NECESSARY that an edge from the node to the root exist. So these nodes will be part of the SCC. Again, for the nodes in the first level, we can prove that they have a edge to root as well. Suppose they did not. If they have a child then obviously they are a part of the SCC as the child will be a part as proved above. If they dont have a child, then either then, suppose they dont have a edge to the parent either. So they can have edges only to other nodes in the first level. Moreover these nodes too must be childless otherwise again we have an SCC. In the worst case, suppose all the nodes at the first level are childless. Let there be K such nodes. thus outdegree of root is K. Now, suppose each node has nodes to all other nodes in the first level. So it will have a degree of K-1. Clearly a contradiction since the root has the minimum outdegree. So it must have an edge to the root as well to have a outdegree atleast equal to K. Hence we can see that each node that can be visited from the root will be a part of the SCC. Thus it only suffices to check that all nodes are visible from the root.

Problem 2:

A graph is given with N nodes and M edges. We have to either print a 4-coloring of the graph vertices, OR we have to find a odd cycle so that, if we delete the edges of this odd cycle only, the graph still remains connected.

Solution:

First, find any spanning tree of the graph. Then delete the edges of the spanning tree. Now consider the graph that is left. If the graph is not bipartite, then there must exist an odd cycle. So we can just print this odd cycle.( using DFS to find the odd cycle). Note that the graph will remain connected even after deletion of the odd cycle because we still have out hidden spanning tree edges which keeps the graph connected all the time.

If the graph is bipartite, then we can easily get a 4-coloring since both this bipartite "subgraph" and the spanning tree itself, being a tree, is bipartite and hence 2-colorable. Hence we can combine all the edges and make a 4-coloring.

Comments

Shashwat ChandraJanuary 30, 2019 at 7:59 PM
mashallah blog , mashallah , insallah u win icpc 2022
AnshulSeptember 26, 2019 at 12:14 PM
I lost my mind when I read "How BIpartite is used with spanning tree". Who has planted these seeds into your brain?
Any book or something
My god I would have never been able to see that coming.
Taneka GreyMay 5, 2025 at 12:33 AM
Spanning trees are crucial for network design and optimization. If you're into tech, the yolov8 documentation also provides great insights into enhancing model performance for real-time tasks.

Search This Blog

Coding Start Up!