# ICS 311 #14B: Graphs

## Outline

1. Topological Sort
2. Strongly Connected Components
3. Related Concepts

• Required: CLRS 3rd Ed. Sections 22.4-22.5.
• Screencasts 14F (also in Laulima)

## Topological Sort

A directed acyclic graph (DAG) is a good model for processes and structures that have partial orders: You may know that a > c and b > c but may not have information on how a and b compare to each other.

One can always make a total order out of a partial order. This is what topological sort does. A topological sort of a DAG is a linear ordering of vertices such that if (u, v) ∈ E then u appears somewhere before v in the ordering.

### Outline of Algorithm:

Topological-Sort(G) is actually a modification of DFS(G) in which each vertex v is inserted onto the front of a linked list as soon as finishing time v.f is known.

### Examples

Some real world examples include

• Scheduling 100,000 independent tasks on a high performance computing system (research by Dr. Henri Casanova)
• Producing 5,000,000 documents that reference each other such that each document is produced before the ones that reference it.

Here is the book's example ... a hypothetical professor (not me!) getting dressed (what node did they start the search at? Could it have been done differently?): We can make it a bit more complex, with catcher's outfit (click to compare your answer): The answer given starts with the batting glove and works left across the unvisted nodes. What if we had started the search with socks and worked right across the top nodes? If you put your clothes on differently, how could you get the desired result? Hint: add an edge.

As noted previously, one could start with any vertex, and once the first tree was constructed continue with any artibrary remaining vertex. It is not necessary to start at the vertices at the top of the diagram. Do you see why? ### Time Analysis

Time analysis is based on simple use of DFS: Θ(V + E).

### Correctness

Lemma: A directed graph G is acyclic if and only if a DFS of G yields no back edges.

See text for proof, but it's quite intuitive:

⇒ A back edge by definition is returning to where one started, which means it completes a cycle.
⇐ When exploring a cycle the last edge explored will be a return to the vertex by which the cycle was entered, and hence classified a back edge.

Theorem: If G is a DAG then Topological-Sort(G) correctly produces a topological sort of G.

It sufficies to show that

if (u, v) ∈ E then v.f < u.f

because then the linked list ordering by f will respect the graph topology).

When we explore (u, v), what are the colors of u and v?

• u is gray, because it is being explored when (u, v) is found.
• Can v be gray too? No, because then v would be an ancestor of u, meaning (u, v) is a back edge, contradicting the DAG property by the lemma above.
• Is v white? Then it becomes a descendant of u. By the parentheses theorem, u.d < v.d <v.f < u.f.
• Is v black? Then v is finished. Since we are exploring (u, v) we have not finished u. Therefore v.f < u.f.

## Strongly Connected Components

Given a directed graph G = (V, E), a strongly connected component (SCC) of G is a maximal set of vertices CV such that for all u, vC, there is a path both from u to v and from v to u.

#### Example:

What are the Strongly Connected Components? (Click to see.) ### Algorithm

The algorithm uses GT= (V, ET), the transpose of G = (V, E). GT is G with all the edges reversed.

```Strongly-Connected-Components (G)
1.  Call DFS(G) to compute finishing times u.f for each vertex u ∈ E.
2.  Compute GT
3.  Call modified DFS(GT) that considers vertices
in order of decreasing u.f from line 1.
4.  Output the vertices of each tree in the depth-first forest
formed in line 3 as a separate strongly connected component.
```

### Example 1

#### First Pass of DFS: #### Second Pass of DFS: ### Why it Works

#### Informal Explanation

(This is from my own attempt to understand the algorithm. It differs from the book's formal proof.)

G and GT have the same SCC.     Proof:

• If u and v are in the same SCC in G, then there is a path p1 from u to v and a path p2 from v to u.
• Reversing the edges, path p1 becomes a path from v to u and p2 becomes a path from u to v.

A DFS from any vertex v in a SCC C will reach all vertices in C (by definition of SCC).

• Then why can't we call DFS on unvisited vertices to find the SCCs in the first pass, line 1?
• Because this first unconstrained DFS could also get vertices not in C, as there may be a path from v in C to v' where there is no path from v' to v (so v' is not in C)!

So how does the second search on GT help avoid inadvertent inclusion of v' in C?

• v' will have an earlier finishing time than some of the other vertices in C, because at least some of those vertices (in particular, v from which v' was reached) are still active (gray) when v' is finished (Parentheses Theorem).
• In the second search, the component C to which v belongs is processed before v' and its component, because v has a later finishing time, so the entire component will be explored before other components (in particular, that containing v').
• Since GT has the same SCC as G, the component found from v in the second search is the same component as in the previous search.
• But in this second search, v' will not be reached. Why? Because we are using reversed edges in GT. If v' could be reached from C in GT, then v would be reachable from v' in G, and so v' would be a member of C, a contradiction.
• So, due to the topological sort, the trees constructed in the second search cannot contain vertices v' that do not belong to the SCC of the other vertices in the tree. This along with the fact that any DFS from a vertex v in a SCC C will find all vertices in C means that the trees constructed by the second DFSs find exactly the vertices in the SCCs.

In the example above, notice how node c corresponds to v and g to v' in the argument above. But we need to also say why nodes like b will never be reached from c in the second search. It is because b finished later in the first search, so was processed earlier and already "consumed" by the correct SCC in the second search, before the search from c could reach it. The following fact is useful in understanding why this would be the case.

#### Component Graph

• Define GSCC to be a graph of the SCCs of G obtained by collapsing all the vertices in each SCC into one vertex for the component but retaining the edges between SCCs.
• Then this component graph GSCC is a Directed Acyclic Graph. (If there were any cycles, vertices in each component would be reachable from all others, so they would be one component.)

Here is GSCC for the above example: The first pass of the SCC algorithm essentially does a topological sort of the graph GSCC (by doing a topological sort of constituent vertices). The second pass visits the components of GTSCC in topologically sorted order such that each component is searched before any component that can reach that component.

Thus, for example, the component abe is processed first in the second search, and since this second search is of GT (reverse the arrows above) one can't get to cd from abe. When cd is subsequently searched, one can get to abe but it's vertices have already been visited so can't be incorrectly included in cd.

### Example 2

Start at the node indicated by the arrow; conduct a DFS; then click to compare your answer: ### Analysis

We have provided an informal justification of correctness: please see the CLRS book for a formal proof of correctness for the SCC algorithm. The CLRS text says we can create GT in Θ(V + E) time using adjacency lists.

• The easy approach is to simply copy the graph, but given the size of some graphs we work with, it would be much better to reverse the edges in place (and reverse them back when done).
• Problem for class: How can this be done? (A naive implementation could end up undoing some of its own work, as it confuses already-reversed edges with those to be reversed.)

The SCC algorithm also has two calls to DFS, and these are Θ(V + E) each.

All other work is constant, so the overall time complexity is Θ(V + E).

## Related Graph Concepts An articulation point or cut vertex is a vertex that when removed causes a (strongly) connected component to break into two components.

A bridge is an edge that when removed causes a (strongly) connected component to break into two components.

A biconnected component is a maximal set of edges such that any two edges in the set lie on a common simple cycle. This means that there is no bridge (at least two edges must be removed to disconnect it). This concept is of interest for network robustness.

Nodari Sitchinava (based on material by Dan Suthers)
Some images are from the instructor's material for Cormen et al. Introduction to Algorithms, Third Edition.