ICS 311 #14A: Graphs

Representations, BFS and DFS Search


Prerequisites review

Outline

  1. Graph Definitions & examples
  2. Graph ADT
  3. Representations (Implementations) of Graph ADT
  4. Breadth-first Search
  5. Depth-first Search

Readings and Screencasts

NOTE: This might seem like a lot, but you should know most of the material from ICS 241, and this should be a review for you.

The video lectures and notes below provide material not found in the textbook: defining graphs, an ADT, and implementations.


Graphs

Definitions

A graph G is a pair

G = (V, E)

where

V = {v1, v2, ... vn}, a set of vertices
E = {e1, e2, ... em} ⊆ VV, a set of edges.

Undirected Graphs

In an undirected graph the edge set E consists of unordered pairs of vertices. That is, they are sets e = {u, v}. Edges can be written with this notation when clarity is desired, but we will often use parentheses (u, v).

No self loops are allowed in undirected graphs. That is, we cannot have (v, v), which would not make as much sense in the set notation {v, v}.

We say that e = {u, v} is incident on u and v, and that the latter vertices are adjacent. The degree of a vertex is the number of edges incident on it.

The handshaking lemma is often useful in proofs:

ΣvVdegree(v) = 2|E|

(Each edge contributes two to the sum of degrees.)

Directed Graphs

In a directed graph or digraph the edges are ordered pairs (u, v).

We say that e = (u, v) is incident from or leaves u and is incident to or enters v. The in-degree of a vertex is the number of edges incident to it, and the out-degree of a vertex is the number of edges incident from it.

Self loops (v, v) are allowed in directed graphs.

Paths

A path of length k is a sequence of vertices ⟨v0, v1, v2, ... vk⟩ where (vi-1, vi) ∈ E, for i = 1, 2, ... k. (Some authors call this a "walk".) The path is said to contain the vertices and edges just defined.

A simple path is a path in which all vertices are distinct. (The "walk" authors call this a "path").

If a path exists from u to v we say that v is reachable from u.

In an undirected graph, a path ⟨v0, v1, v2, ... vk⟩ forms a cycle if v0 = vk and k ≥ 3 (as no self-loops are allowed).

In a directed graph, a path forms a cycle if v0 = vk and the path contains at least one edge. (This is clearer than saying that the path contains at least two vertices, as self-loops are possible in directed graphs.) The cycle is simple if v1, v2, ... vk are distinct (i.e., all but the designated start and end v0 = vk are distinct). A directed graph with no self-loops is also simple.

A graph of either type with no cycles is acyclic. A directed acyclic graph is often called a dag.

Connectivity

A graph G' = (V', E') is a subgraph of G = (V, E) if V'V and E'E.

An undirected graph is connected if every vertex is reachable from all other vertices. In any connected undirected graph, |E| ≥ |V| - 1 (see also discussion of tree properties). The connected components of G are the maximal subgraphs G1 ... Gk where every vertex in a given subgraph is reachable from every other vertex in that subgraph, but not reachable from any vertex in a different subgraph.

A directed graph is strongly connected if every two vertices are reachable from each other. The strongly connected components are the subgraphs defined as above. A directed graph is thus strongly connected if it has only one strongly connected component. A directed graph is weakly connected if the underlying undirected graph (converting all tuples (u, v) ∈ E into sets {u, v} and removing self-loops) is connected.

Variations

A bipartite graph is one in which V can be partitioned into two sets V1 and V2 such that every edge connects a vertex in V1 to one in V2. Equivalently, there are no odd-length cycles.

A complete graph is an undirected graph in which every pair of vertices is adjacent.

A weighted graph has numerical weights associated with the edges. (The allowable values depend on the application. Weights are often used to represent distance, cost or capacity in networks.)

Graph Size in Analysis

Asymptotic analysis is often in terms of both |V| and |E|. Within asymptotic notation we leave out the "|" for simplicity, for example, writing O(V + E), O(V2 lg E), etc.

Many Applications ...


Graph ADT

These are detailed slightly more in the Goodrich & Tamassia excerpt uploaded to Laulima.

Graph Accessors

numVertices()
    Returns the number of vertices |V|

numEdges()
    Returns the number of edges |E|

vertices()
    Returns an iterator over the vertices V

edges()
    Returns an iterator over the edges E

Accessing Undirected Graphs

degree(v)
    Returns the number of edges (directed and undirected) incident on v.

adjacentVertices(v)
    Returns an iterator of the vertices adjacent to v.

incidentEdges(v)
    Returns an iterator of the edges incident on v.

endVertices(e)
    Returns an array of the two end vertices of e.

opposite(v,e)
    Given v is an endpoint of e.
    Returns the end vertex of e different from v.
    Throws InvalidEdgeException when v is not an endpoint of e.

areAdjacent(v1,v2)
    Returns true iff v1 and v2 are adjacent by a single edge.

Accessing Directed Graphs

directedEdges()
    Returns an iterator over the directed edges of G.

undirectedEdges()
    Returns an iterator over the undirected edges of G.

inDegree(v)
    Returns the number of directed edges (arcs) incoming to v.

outDegree(v)
    Returns the number of directed edges (arcs) outgoing from v.

inAdjacentVertices(v)
    Returns an iterator over the vertices adjacent to v by incoming edges.

outAdjacentVertices(v)
    Returns an iterator over the vertices adjacent to v by outgoing edges.

inIncidentEdges(v)
    Returns an iterator over the incoming edges of v.

outIncidentEdges(v)
    Returns an iterator over the outgoing edges of v.

destination(e)
    Returns the destination vertex of e, if e is directed.
    Throws InvalidEdgeException when e is undirected.

origin(e)
    Returns the origin vertex of e, if e is directed.
    Throws InvalidEdgeException when e is undirected.

isDirected(e)
    Returns true if e is directed, false otherwise

Mutators (Undirected and Directed)

insertEdge(u,v)
insertEdge(u,v,o)
    Inserts a new undirected edge between two existing vertices, optionally containing object o.
    Returns the new edge.

insertVertex()
insertVertex(o)
    Inserts a new isolated vertex optionally containing an object o (e.g., the label associated with the vertex).
    Returns the new vertex.

insertDirectedEdge(u,v)
insertDirectedEdge(u,v,o)
    Inserts a new directed edge from an existing vertex to another.
    Returns the new edge.

removeVertex(v)
    Deletes a vertex and all its incident edges.
    Returns object formerly stored at v.

removeEdge(e)
    Removes an edge.
    Returns the object formerly stored at e.

Annotators (for vertices and all types of edges)

Methods for annotating vertices and edges with arbitrary data.

setAnnotation(Object k, o)
    Annotates a vertex or edge with object o indexed by key k.

getAnnotation(Object k)
    Returns the object indexed by k annotating a vertex or edge.

removeAnnotation(Object k)
    Removes the annotation on a vertex or edge indexed by k and returns it.

Changing Directions

There are various methods for changing the direction of edges. I think the only one we will need is:

reverseDirection(e)
    Reverse the direction of an edge.
    Throws InvalidEdgeException if the edge is undirected


Graph Representations

There are two classic representations: the adjacency list and the adjacency matrix.

In the adjacency list, vertices adjacent to vertex v are listed explicitly on linked list G.Adj[v] (assuming an array representation of list headers).

In the adjacency matrix, vertices adjacent to vertex v are indicated by nonzero entries in the row of the matrix indexed by v, in the columns for the adjacent vertices.

Adjacency List and Matrix representations of an undirected graph:

Adjacency List and Matrix representations of a directed graph:

Consider this before reading on: What are the asymptotic complexities of these methods in each representation?

Are edges first class objects in the above representations? Where do you store edge information in the undirected graph representations?

Complexity Analysis

Adjacency List

Space required: Θ(V + E).

Time to list all vertices adjacent to u: Θ(degree(u)).

Time to determine whether (u, v) ∈ E: O(degree(u)).

Adjacency Matrix

Space required: Θ(V2).

Time to list all vertices adjacent to u: Θ(V).

Time to determine whether (u, v) ∈ E: Θ(1).

So the matrix takes more space and more time to list adjacent matrices, but is faster to test adjacency of a pair of matrices.

"Modern" Adjacency Representation

Goodrich & Tamassia (reading in Laulima) propose a representation that combines an edge list, a vertex list, and an adjacency list for each vertex:

The sets V and E can be represented using a dictionary ADT. In many applications, it is especially important for V to enable fast access by key, and may be important to access in order. Each vertex object has an adjacency list I (I for incident), and the edges reference both the vertices they connect and the entries in this adjacency list. There's a lot of pointers to maintain, but this enables fast access in any direction you need, and for large sparse graphs the memory allocation is still less than for a matrix representation.

See also Newman (2010) chapter 9, posted in Laulima, for discussion of graph representations.


BFS and DFS Overview

Before starting with Cormen et al.'s more complex presentation, let's discuss how BFS and DFS can be implemented with nearly the same algorithm, but using a queue for BFS and a stack for DFS. You should be comfortable with this relationship between BFS/queues and DFS/stacks.

Sketch of both algorithms:

  1. Pick a starting vertex and put it on the queue (BFS) or stack (DFS)
  2. Repeat until the queue/stack is empty:
    1. Dequeue (BFS) or pop (DFS) the next vertex v from the appropriate data structure
    2. If v is unvisited,
      • Mark v as visited (and process it as needed for the specific application).
      • Find the unvisted neighbors of v and queue (BFS) or push (DFS) them on the appropriate data structure.

Try starting with vertex q and run this using both a stack and a queue:

       

BFS's FIFO queue explores nodes at each distance before going to the next distance. DFS's LIFO stack explores the more distant neighbors of a node before continuing with nodes at the same distance ("goes deep").

Search in a directed graph that is weakly but not strongly connected may not reach all vertices.


Breadth-first Search

Given a graph G = (V,E) and a source vertex sV, output v.d, the shortest distance (# edges) from s to v, for all vV. Also record v.π = u such that (u,v) is the last edge on a shortest path from s to v. (We can then trace the path back.)

Analogy Send a "tsunami" out from s that first reaches all vertices 1 edge from s, then from them all vertices 2 edges from s, etc. Like a tsunami, equidistant destinations are reached at the "same time".

Use a FIFO queue Q to maintain the wavefront, such that vQ iff the tsunami has hit v but has not come out of it yet.

At any given time Q has vertices with d values i, i, ... i, i+1, i+1, ... i+1. That is, there are at most two distances on the queue, and values increase monotonically.

Examples

Book's Example: Undirected Graph

A directed example:

Let's do another (number the nodes by their depth, then click to compare your answer):

Time Analysis

(This is an aggregate analysis.) Every vertex is enqueued at most once. We examine edge (u, v) only when u is dequeued, so every edge is examined at most once if directed and twice if undirected. Therefore, O(V + E).

Shortest Paths

Shortest distance δ(s, v) from s to v is the minimum number of edges across all paths from s to v, or ∞ if no such path exists.

A shortest path from s to v is a path of length δ(s, v).

It can be shown that BFS is guaranteed to find the shortest paths to all vertices from a start vertex s: v.d = δ(s, v), ∀ v at the conclusion of the algorithm. See book for a formal proof.

Informally, we can see that all vertices at distance 1 from s are enqueued first, then via them all nodes of distance 2 are reached and enqueued, etc., so inductively it would be a contradiction if BFS reached a vertex c by a longer path than the shortest path because the last vertex u on the shortest path to the given vertex v would have been enqueued first and then dequeued to reach v.

Breadth-First Trees

The predecessor subgraph of G is

Gπ = (Vπ, Eπ) where
Vπ= {v ∈ V : v.π ≠ NIL} ∪ {s} and
Eπ = {(v.π, v) : v ∈ Vπ - {s}}

A predecessor subgraph Gπ is a breadth-first tree if Vπ consists of exactly all vertices reachable from s and for all v in Vπ the subgraph Gπ contains unique simple and shortest paths from s to v.

BFS constructs π such that Gπ is a breadth-first tree.


Depth-first Search

Given G = (V, E), directed or undirected, DFS explores the graph from every vertex (no source is vertex given), constructing a forest of trees and recording two time stamps on each vertex:

Time starts at 0 before the first vertex is visited, and is incremented by 1 for every discovery and finishing event (as explained below). These attributes will be used in other algorithms later on.

Since each vertex is discovered once and finished once, discovery and finishing times are unique integers from 1 to 2|V|, and for all v, v.d < v.f.

(Some presentations of DFS pose it as a way to visit nodes, enabling a given method to be applied to the nodes with no output specified. Others present it as a way to construct a tree. The CLRS presentation is more complex but supports a variety of applications.)

DFS explores every edge and starts over from different vertices if necessary to reach them (unlike BFS, which may fail to reach subgraphs not connected to s).

As it progresses, every vertex has a color:

WHITE = undiscovered

GRAY = discovered, but not finished (still exploring vertices reachable from it)
v.d records the moment at which v is discovered and colored gray.

BLACK = finished (have found everything reachable from it)
v.f records the moment at which v is finished and colored black.

Pseudocode

While BFS uses a queue, DFS operates in a stack-like manner (using the implicit recursion stack in the algorithm above).

Another major difference in the algorithms as presented here is that DFS will search from every vertex until all edges are explored, while BFS only searches from a designated start vertex.

Example:

One could start DFS with any arbitrary vertex, and continue at any remaining vertex after the first tree is constructed. Regularities in the book's examples (e.g., processing vertices in alphabetical order, or always starting at the top of the diagram) do not reflect a requirement of the algorithm.

Let's do this example (start with the upper left node, label the nodes with their d and f, then click to compare your answer):

Time Analysis

The analysis uses aggregate analysis, and is similar to the BFS analysis, except that DFS is guaranteed to visit every vertex and edge, so it is Θ not O:

Θ(V) to visit all vertices in lines 1 and 5 of DFS;

ΣvV |Adj(v)| = Θ(E) to process the adjacency lists in line 4 of DFS-Visit.

(Aggregate analysis: we are not attempting to count how many times the loop of line 4 executes each time it is encountered, as we don't know |Adj(v)|. Instead, we sum the number of passes through the loop in total: all edges will be processed.)

The rest is constant time.

Therefore, Θ(V + E).

Classification of Edges

This classification will be useful in forthcoming proofs and algorithms.

Here's a graph with edges classified, and redrawn to better see the structural roles of the different kinds of edges:

DFS Properties

These theorems show important properties of DFS that will be used later to show how DFS exposes properties of the graph.

Parentheses Theorem

After any DFS of a graph G, for any two vertices u and v in G, exactly one of the following conditions holds:

Essentially states that the d and f visit times are well nested. See text for proof. For the above graph:

Corollary: Nesting of Descendant's Intervals

Vertex v is a proper descendent of vertex u in the DFS forest of a graph iff u.d < v.d < v.f < u.f. (Follows immediately from parentheses theorem.)

Also, (u, v) is a back edge iff v.du.d < u.fv.f; and a cross edge iff v.d < v.f < u.d < u.f.

White Path Theorem

Vertex v is a descendant of u iff at time u.d there is a path from u to v consisting of only white vertices (except for u, which was just colored gray).

(Proof in textbook uses v.d and v.f. Metaphorically and due to its depth-first nature, if a search encounters an unexplored location, all the unexplored territory reachable from this location will be reached before another search gets there.)

DFS Theorem

DFS of an undirected graph produces only tree and back edges: never forward or cross edges.

(Proof in textbook uses v.d and v.f. Informally, this is because the edges being bidirectional, we would have traversed the supposed forward or cross edge earlier as a tree or back edge.)


Up Next

Next, in Topic 14B we will discuss the applications of the depth first search.


Nodari Sitchinava (based on material by Dan Suthers)
Some images are from the instructor's material for Cormen et al. Introduction to Algorithms, Third Edition.