Graphs For Beginners - Part - 2

Some More Notations

I will start with some notation first. Though this will be very short.

First, when talking about nodes, we just refer to them using numbers. Or using alphabets. When using examples, it's rare that we will ever consider more than 26 nodes ( normally less than 10).

Second, when talking about edges, we refer to them as e = (u,v) which means that edge e connects node u to node v. Somehow we generally use characters like u,v,w to refer to nodes. Just a preference. 

When referring to undirected graphs, we use lines to represent them. Directed versions generally have arrows to point form where the edge is starting and where it is ending.

When referring to a graph, we use the notation G = (V,E) which means that Graph G has vertices belonging to the set V and edges belonging to the set E. Note that the edges in E has its endpoints as vertices in V. (think about it... if it were not so.. then that edge would not have much meaning).

Another thing that is often referred to is the degree of a node. For an undirected graph, degree is just he number of edges whose one endpoint is that node. Visually, it is the number of edges "incident" on a node. 

For a directed graph, degree is again split into two : outdegree and indegree. The names are quite transparent in this regard : outdegree means the number of edge going out of this node and indegree means number of edge ending at this node.

Lastly, as it should be obvious, the neighbourhood of a node basically means all those nodes connected to it by an edge. Two nodes that are connected by an edge are called neighbours.

Graph representation

There are more or less three ways to represent a graph.


First, is the edgelist.


This is as simple as it sounds: we just maintain a list of all edges in the form of pairs (u,v) [Edge connecting U to V]

When the edges have weights, we store them as 3-tuples - (u,v,w) where w is the edge weight. 



Advantages : We are able to sort the edges directly by their weights. Also, suppose some problem requires that we compute some kind of answer for each edge. Since edgelist is often stored in an array, we can easily set information to it using array indexing. Moreover, we can also maintain the order in which the edges were given in the input too, which is sometimes useful in some problems when we have to print information about the edges in the order that they were given in the input.


Disadvantages: Well, the biggest disadvantage is that given a vertex u, we cannot get the edges incident on it very quickly. Rather, we have to scan through the entire list. Each time. Needless to say, we won't go much far using this method.


Second, Adjacency Matrix Representation

Here, if we have N nodes in our graph, we create a NxN matrix to store the necessary information. The cell (i,j) stores whether there is an edge from i to j. In case of an unweighted graph, we can just store a boolean value which says that whether there is a edge going from i to j. In case of a weighted graph we can keep -1 if it does not exist or the edge weight if it DOES exist.

If you observe carefully, it can be seen that the for an undirected graph, the matrix is actually symmetric along the diagonal, because there, cells (i,j) and (j,i) will store the EXACT same values. Always.





Advantages: Mainly, and more or less the only, advantage is the constant time checking of existance of an edge between two nodes. If we want to see if there is an edge from node i to j, we just check the value of cell (i,j) in the matrix. As simple as that. If it contains 0 or -1 (whatever special value you are using to denote the absence of an edge) then we know that no edge is there. Otherwise we get the weight of the edge.


Disadvantage: Memory. See that each edge basically corresponds to 1 cell in case of a directed graph, while 2 in case of an undirected graph. Now suppose we have 1e5 nodes and 2e5 directed edges. Then the size of the matrix becomes 1e5 * 1e5 = 1e10. Normally we can access atmost 5e7 "units" of memory and this clearly exceeds that. By a lot. Moreover note that this matrix will have only 2e5 of its cells filled with some useful value. Obviously thus this is a very poor representation in terms of memory as it wastes a LOT of precious memory.


Third, Adjacency list

This is by far the most commonly used representation of a graph. Here, for each node, we keep a list of the nodes that go out from it. This way, just by iterating over this list, we can get which nodes it is connected to it.


Advantages : We can iterate over the list of a particular node to get which all nodes it is connected to. Also note that this uses O(edge_count) memory as each edge is present in the adjacency list of atmost two vertices. So total amount of this data is atmost 2*edge_count when summed over all vertices. O(edge_count).

Disadvantages: Constant time checking as for existance of edge as we had in a matrix representation is no longer possible. Similarly, the original ordering of the edges is not preserved anymore.


Adjacency Matrix and Lists are the most popular represantations. Mostly adjacency list though. So get acquainted with it. Still, edgelists are sometimes useful to, according to the problem though.






Comments

Popular posts from this blog

Small-to-Large

Segment Tree Beats - An introduction

Getting prepared for RMO