- Topological Sort
- Strongly Connected Components
- Related Concepts

- Required: CLRS 3rd Ed. Sections 22.4-22.5.
- Screencasts 14F (also in Laulima)

A **directed acyclic graph** (DAG) is a good model for processes and structures that have
partial orders: You may know that *a* > *c* and *b* > *c*
but may not have information on how *a* and *b* compare to each other.

One can always make a **total order** out of a partial order. This is what topological sort
does. A **topological sort** of a DAG is a linear ordering of vertices such that if (*u*,
*v*) ∈ *E* then *u* appears somewhere before *v* in the ordering.

Some real world examples include

- Scheduling 100,000 independent tasks on a high performance computing system (research by Dr. Henri Casanova)

Here is the book's example ... a hypothetical professor (not me!) getting dressed *(what node
did they start the search at? Could it have been done differently?)*:

We can make it a bit more complex, with catcher's outfit (click to compare your answer):

*The answer given starts with the batting glove and works left across the unvisted nodes. What
if we had started the search with socks and worked right across the top nodes? If you put your
clothes on differently, how could you get the desired result? Hint: add an edge.*

As noted previously, one could start with any vertex, and once the first tree was constructed continue with
any artibrary remaining vertex. It is not necessary to start at the vertices at the top of the
diagram. *Do you see why?*

Time analysis is based on simple use of DFS: Θ(*V* + *E*).

** Lemma**: A directed graph

See text for proof, but it's quite intuitive:

⇒ A back edge by definition is returning to where one started, which means it completes a cycle.

⇐ When exploring a cycle the last edge explored will be a return to the vertex by which the cycle was entered, and hence classified a back edge.

** Theorem:** If

It sufficies to show that

if (u,v) ∈Ethenv.f<u.f

because then the linked list ordering by *f* will respect the graph topology).

When we explore (*u*, *v*), what are the colors of *u* and *v*?

*u*is gray, because it is being explored when (*u*,*v*) is found.- Can
*v*be gray too? No, because then*v*would be an ancestor of*u*, meaning (*u*,*v*) is a back edge, contradicting the DAG property by the lemma above. - Is
*v*white? Then it becomes a descendant of*u*. By the parentheses theorem,*u.d*<*v.d*<.*v.f*<*u.f* - Is
*v*black? Then*v*is finished. Since we are exploring (*u*,*v*) we have not finished*u*. Therefore.*v.f*<*u.f*

Given a directed graph *G* = (*V*, *E*), a **strongly connected component
(SCC)** of *G* is a maximal set of vertices *C* ⊆ *V* such that for all
*u*, *v* ∈ *C*, there is a path both from *u* to *v* and from *v*
to *u*.

What are the Strongly Connected Components? (Click to see.)

The algorithm uses *G ^{T}*= (

Strongly-Connected-Components (G) 1. Call DFS(G) to compute finishing timesu.ffor each vertexu∈E. 2. ComputeG3. Call modified DFS(^{T}G) that considers vertices in order of decreasing^{T}u.ffrom line 1. 4. Output the vertices of each tree in the depth-first forest formed in line 3 as a separate strongly connected component.

*(This is from my own attempt to understand the algorithm. It differs from the book's formal
proof.)*

*G* and *G ^{T}* have the same SCC.

- If
*u*and*v*are in the same SCC in*G*, then there is a path*p*_{1}from*u*to*v*and a path*p*_{2}from*v*to*u*. - Reversing the edges, path
*p*_{1}becomes a path from*v*to*u*and*p*_{2}becomes a path from*u*to*v*.

A DFS from any vertex *v* in a SCC *C* will reach *all* vertices in *C* (by
definition of SCC).

- Then why can't we call DFS on unvisited vertices to find the SCCs in the first pass, line 1?
- Because this first unconstrained DFS could also get vertices
*not*in*C*, as there may be a path from*v*in*C*to*v'*where there is no path from*v'*to*v*(so*v'*is not in*C*)!

So how does the second search on *G ^{T}* help avoid inadvertent inclusion of

*v'*will have an earlier finishing time than some of the other vertices in*C*, because at least some of those vertices (in particular,*v*from which*v'*was reached) are still active (gray) when*v'*is finished (Parentheses Theorem).- In the second search, the component
*C*to which*v*belongs is processed before*v'*and its component, because*v*has a later finishing time, so the entire component will be explored before other components (in particular, that containing*v'*). - Since
*G*has the same SCC as^{T}*G*, the component found from*v*in the second search is the same component as in the previous search. - But in this second search,
*v'*will*not*be reached. Why? Because we are using reversed edges in*G*. If^{T}*v'*could be reached from*C*in*G*, then^{T}*v*would be reachable from*v'*in*G*, and so*v'*would be a member of*C*, a contradiction. - So, due to the topological sort, the trees constructed in the second search cannot contain
vertices
*v'*that do not belong to the SCC of the other vertices in the tree. This along with the fact that any DFS from a vertex*v*in a SCC*C*will find*all*vertices in*C*means that the trees constructed by the second DFSs find exactly the vertices in the SCCs.

In the example above, notice how node *c* corresponds to *v* and *g* to *v'*
in the argument above. But we need to also say why nodes like *b* will never be reached from
*c* in the second search. It is because *b* finished later in the first search, so was
processed earlier and already "consumed" by the correct SCC in the second search, before the search
from *c* could reach it. The following fact is useful in understanding why this would be the
case.

- Define
*G*to be a graph of the SCCs of G obtained by collapsing all the vertices in each SCC into one vertex for the component but retaining the edges between SCCs.^{SCC} - Then this
**component graph**. (If there were any cycles, vertices in each component would be reachable from all others, so they would be one component.)*G*is a Directed Acyclic Graph^{SCC}

Here is *G ^{SCC}* for the above example:

The first pass of the SCC algorithm essentially does a topological sort of the graph
*G ^{SCC}* (by doing a topological sort of constituent vertices). The second pass visits
the components of

Thus, for example, the component *abe* is processed first in the second search, and since
this second search is of *G ^{T}* (reverse the arrows above) one can't get to

Start at the node indicated by the arrow; conduct a DFS; then click to compare your answer:

We have provided an informal justification of correctness: please see the CLRS book for a formal proof of correctness for the SCC algorithm.

The CLRS text says we can create *G ^{T}* in Θ(

- The easy approach is to simply copy the graph, but given the size of some graphs we work with, it would be much better to reverse the edges in place (and reverse them back when done).
*Problem for class: How can this be done? (A naive implementation could end up undoing some of its own work, as it confuses already-reversed edges with those to be reversed.)*

The SCC algorithm also has two calls to DFS, and these are Θ(V + E) each.

All other work is constant, so the overall time complexity is Θ(V + E).

An **articulation point** or **cut vertex** is a vertex that when removed causes a
(strongly) connected component to break into two components.

A **bridge** is an edge that when removed causes a (strongly) connected component to break
into two components.

A **biconnected component** is a maximal set of edges such that any two edges in the set lie
on a common simple cycle. This means that there is no bridge (at least two edges must be removed to
disconnect it). This concept is of interest for network robustness.

Nodari Sitchinava (based on material by Dan Suthers) Some images are from the instructor's material for Cormen et al. Introduction to Algorithms, Third Edition.