- Prelude: Greedy Algorithms and Dynamic Programming
- Example: Activity Selection Problem
- Greedy Strategy
- Huffman Codes

- Read CLRS Sections 16.1-16.3. We are not covering Matroids (Section 16.4).
- Screencasts 13A, 13B, 13C (also in Laulima)

This presentation follows the CLRS reading fairly closely, selecting out the most relevant parts and explaining a few things in more detail. (The associated videos change the ordering somewhat: 13A provides a conceptual introduction, leaving the activity selection example for 13B.)

Both Dynamic Programming and Greedy Algorithms are ways of solving * optimization
problems*: a solution is sought that optimizes (minimizes or maximizes) an

**Dynamic Programming:**

- Finds solutions bottom-up (solves subproblems before solving their super-problem)
- Exploits overlapping subproblems for efficiency (by reusing solutions)
- Can handle subproblem interdependence

**Greedy Algorithms** "greedily" take the choice with the most immediate gain.

- Find solutions top-down (commit to a choice, then solve sub-problems)
- Assume that if the objective function is optimized locally it will be optimized globally
- Cannot handle interdependent subproblems

For some problems, but not all, local optimization actually results in global optimization.

We'll use an example to simultaneously review dynamic programming and motivate greedy algorithms, as the two approaches are related (but distinct).

Suppose that *activities* require exclusive use of a common resource, and you want to
schedule as many as possible.

Let *S* = {*a*_{1}, ..., *a*_{n}} be a set of *n* activities.

Each activity *a _{i}* needs the resource during a time period starting at

(*Why not* [*s _{i}*,

The optimization problem is to select the largest set of non-overlapping (mutually compatible)
activities from *S*.

We assume that activities are sorted by finish time *f*_{1} ≤
*f*_{2} ≤ ... *f*_{n-1} ≤ *f*_{n} (this
can be done in Θ(*n* lg *n*)).

Consider these activities:

Here is a graphic representation:

Suppose we chose one of the activities that *start first*, and then look for the next
activity that starts after it is done. This could result in {*a*_{4},
*a*_{7}, *a*_{8}}, but this solution is not optimal.

An optimal solution is {*a*_{1}, *a*_{3} *a*_{6},
*a*_{8}}. (It maximizes the objective function of number of activities scheduled.)

Another one is {*a*_{2}, *a*_{5}, *a*_{7},
*a*_{9}}. (Optimal solutions are not necessarily unique.)

How do we find (one of) these optimal solutions? Let's consider it as a dynamic programming problem ...

A dynamic programming analysis begins by identifying the choices to be made, and assuming that you can make an optimal choice (without yet specifying what that choice is) that will be part of an optimal solution.

It then specifies the possible subproblems that result in the most general way (to ensure that possible components of optimal solutions are not excluded), and shows that an an optimal solution must recursively include optimal solutions to the subproblems. (This is done by reasoning about the value of the solutions according to the objective function.)

We'll approach Activity Selection similarly. I'll try to clarify the reasoning in the text ...

For generality, we define the problem in a way that applies both to the original problem and subproblems.

Suppose that due to prior choices we are working on a time interval from *i* to
*j*. This could be after some already-scheduled activity *a _{i}* and before some
already-scheduled event

Then the candidate activities to consider are those that start after *a _{i}* and
end before

Now let's define *A _{ij}* to be an optimal solution, i.e., a maximal set of mutualy
compatible activities in

At some point we will need to make a choice to include some activity *a _{k}* with
start time

*S*: activities that start after_{ik}*a*finishes, and finish before_{i}*a*starts_{k}*S*: activities that start after_{kj}*a*finishes, and finish before_{k}*a*starts_{j}

(Note that *S _{ij}* may be a proper superset of

Using the same notation as above, define the optimal solutions to these subproblems to be:

*A*=_{ik}*A*∩_{ij}*S*: the optimal solution to_{ik}*S*_{ik}*A*=_{kj}*A*∩_{ij}*S*: the optimal solution to_{kj}*S*_{kj}

So the structure of an optimal solution *A _{ij}* is:

A=_{ij}A∪ {_{ik}a} ∪_{k}A_{kj}

and the number of activities is:

|A| = |_{ij}A| + 1 + |_{ik}A|_{kj}

By the "cut and paste argument", an optimal solution *A _{ij}* for

Therefore the Activity Scheduling problem exhibits optimal substructure.

Since the optimal solution *A*_{ij} must include optimal solutions to the
subproblems for *S*_{ik} and *S*_{kj}, we could solve by
dynamic programming.

Let *c*[*i*, *j*] = size of optimal solution for *S*_{ij}
(*c*[*i*, *j*] has the same value as |*A _{ij}*|). Then

c[i,j] =c[i,k] +c[k,j] + 1 (the 1 is to counta)._{k}

We don't know which activity *a _{k}* to choose for an optimal solution, so we
could try them all:

This suggests a recursive algorithm that can be memoized, or we could develop an equivalent bottom-up approach, filling in tables in either case.

But it turns out we can solve this without considering multiple subproblems.

We are trying to optimize the number of activities. Let's be greedy!

- The more time that is left after running an activity, the more subsequent activities we can fit in.
- If we
**choose the first activity to**the most time will be left.*finish,* - Since activities are sorted by finish time, we will always start with
*a*_{1}. - Then we can solve the single subproblem of activity scheduling in this remaining time.

Since there is only a single subproblem, the *S _{ij}* notation, bounding the set at
both ends, is more complex than we need. We'll simplify the notation to indicate the activities that
start after

S_{k}= {a∈_{i}S:s≥_{i}f}_{k}

So, after choosing *a*_{1} we just have *S*_{1} to solve (and so
on after choices in recursive subproblems).

By optimal substructure, *if* *a*_{1} is part of an optimal solution, then an
optimal solution to the original problem consists of *a*_{1} plus all activities in an
optimal solution to *S*_{1}.

But we need to prove that *a*_{1} is always part of some optimal solution (i.e., to prove
our original intuition).

* Theorem:* If

*Proof:* Let *A _{k}* be an optimal solution to

Claim:Activities inA'are disjoint._{k}

Proof of Claim:Activities inAare disjoint because it was a solution._{k}

Sinceais the first activity in_{j}Ato finish, and f_{k}_{m}≤ f_{j}(ais the earliest in_{m}S),_{k}acannot overlap with any other activities in_{m}A'._{k}

No other changes were made toA, so_{k}A'must consist of disjoint activities._{k}

Since |*A' _{k}*| = |

Therefore we don't need the full power of dynamic programming: we can just repeatedly choose the activity that finishes first, remove any activities that are incompatible with it, and repeat on the remaining activities until no activities remain.

Let the start and finish times be represented by arrays *s* and *f*, where *f* is
assumed to be sorted in monotonically increasing order.

Add a fictitious activity *a*_{0} with *f*_{0} = 0, so
*S*_{0} = *S* (i.e., the entire input sequence).

Our initial call will be RECURSIVE-ACTIVITY-SELECTOR(*s*, *f*, 0, *n*).

The algorithm is Θ(*n*) because each activity is examined exactly once across all
calls: each recursive call starts at *m*, where the previous call left off.

If the activities need to be sorted, the overall problem can be solved in Θ(*n* lg
*n*)).

This algorithm is nearly tail recursive, and can easily be converted to an iterative version:

Let's trace the algorithm on this:

Instead of starting with the more elaborate dynamic programming analysis, we could have gone directly to the greedy approach.

Typical steps for designing a solution with the greedy strategy (and two properties that are key to determining whether it might apply to a problem):

- Consider how we can make a greedy choice (local optimization of the objective function), leaving one subproblem to solve.
**Greedy Choice Property:**Prove that the greedy choice is always part of some optimal solution.**Optimal Substructure:**Demonstrate that an optimal solution to the problem contains within it optimal solutions to the subproblems.

Then we can construct an algorithm that combines the greedy choice with an optimal solution to the remaining problem.

Both require optimal substructure, but ...

**Dynamic Programming**

- Each choice depends on knowing the optimal solutions to subproblems.
- Bottom-up: Solve subproblems first

**Greedy Strategy**

- Each choice depends only on local optimization
- Top-down: Make choice before solving subproblems

These two problems demonstrate that the two strategies do not solve the same problems. Suppose a thief has a knapsack of fixed carrying capacity, and wants to optimize the value of what he takes.

There are *n* items. Item *i* is worth $*v _{i}* and weighs

In the example, item 1 is worth $6/pound, item 2 $5/pound and item 3 $4/pound.

The greedy strategy of optimizing value per unit of weight would take item 1 first.

The same as the 0-1 knapsack problem except that the thief *can take a fraction of each item*
(they are divisible substances, like gold powder).

Both versions of the problem have optimal substructure *(why?).*

However, only the fractional knapsack problem has the greedy choice property:

*Fractional:* One can fill up as much of the most valuable substance by weight as one can
hold, then as much of the next most valuable substance, etc., until *W* is reached:

*0-1:* A greedy strategy could result in empty space, reducing the overall dollar density
of the knapsack. After choosing item 1, the optimal solution (shown third) cannot be achieved:

We are going to see several greedy algorithms throughout the semester. The activity scheduler was good for illustration, but is not important in practice. We should look at one important greedy algorithm today ...

Huffman codes provide an efficient way to compress text, and are constructed using a greedy algorithm. We only have time to review how this important algorithm works; see the text for analysis.

**Fixed-length binary codes** (e.g., ASCII) represent each character with a fixed number of
bits (a binary string of fixed length called a **codeword**).

**Variable-length binary codes** can vary the number of bits allocated to each character. This
opens the possibility of space efficiency by using fewer bits for frequent characters.

Example: Suppose we want to encode documents with these characters:

With a 3 bit code it would take 300,000 bits to code a file of 100,000 characters, but the variable-length code shown requires only 224,000 bits.

**Prefix codes** (better named **prefix-free codes**) are codes in which no codeword is a
prefix of another.

For any data, it is always possible to construct a prefix code that is optimal (though not all prefix codes are optimal, as we will see below).

Prefix codes also have the advantage that each character in an input file can be "consumed" unambiguously, as the prefix cannot be confused with another code.

We can think of the 0 and 1 in a prefix code as directions for traversing a binary tree: 0 for left and 1 for right. The leaves store the coded character. For example, here is the fixed-length prefix code from the table above represented as a binary tree:

Consuming bits from an input file, we traverse the tree until the character is identified, and then start over at the top of the tree for the next character.

*Exercise: Decode 101100100011*

But the above tree uses three bits per character: it is not optimal. It can be shown that an optimal code is always represented by a full binary tree (every non-leaf node has two children).

For example, an optimal prefix code (from the table reproduced again here) is represented by this tree:

*Exercise: Decode 10111010111*

Huffman's greedy algorithm constructs optimal prefix codes called **Huffman Codes**.

It is given a set *C* of *n* characters, where each character has frequency
*c.freq* in the "text" to be encoded.

The optimality of a code is relative to a "text", which can be what we normally think of as texts, or can be other data encoded as sequences of bits, such as images.

- We can construct a generic Huffman code for a universe of texts, such as all texts in English, by estimating the frequency of characters in this universe of texts.
- More commonly, we contruct Huffman codes optimized for particular documents. Then the document-specific code needs to be passed on along with the compressed document.

The algorithm creates a binary tree leaf node for each character, annotated with its frequency, and the tree nodes are then put on a min-priority queue (this is only implied in line 2 below).

Then the first two subtrees on the queue (those with minimum frequency) are dequeued with
`Extract-Min`, merged into a single tree, annotated with the sum of their frequencies, and
this single node is re-queued.

This process is repeated until only one tree node remains on the queue (the root). Since a tree
is being constructed and |*E*| = |*V*|−1 we can just run the loop until
*n*−1 and know that there will be one node left at this point.

Here is the algorithm:

The "greedy" aspect is the choice to merge min-frequency nodes first, and assume that this local minimization will result in an optimal global solution.

Intuitively, this approach will result in an optimal solution because the lowest frequency items will be "pushed down" deeper in the tree, and hence have longer codes; while higher frequency items will end up nearer the root, and hence have the shortest codes.

Cormen et al. prove correctness with two Lemmas for the two properties:

- Greedy choice property: there exists an optimal prefix code
where two characters having the lowest frequency in
*C*are encoded with equal length strings that differ only in the last bit, as they are leaf nodes. - Optimal-substructure property: if the tree constructed by merging two nodes is optimal it must have been constructed from an optimal tree for the subproblem.

The initial BUILD-MIN-HEAP implied by line 2 requires O(*n*) time.

The loop executes *n* times, with O(lg *n*) required for each heap operation.

Thus, HUFFMAN is O(*n* lg *n*).

The characters are in a min priority queue by frequency:

Take out the two lowest frequency items and make a subtree that is put back on the queue as if it is a combined character:

Combine the next lowest frequency characters:

Continuing, tree fragments themselves become subtrees:

Two subtrees are merged next:

The highest frequency character gets added to the tree last, so it will have a code of length 1:

One might wonder why the second most frequent character does not have a code of length 2. This would force the other characters to be deeper in the tree, giving them excessively long codes.

We will encounter several examples of greedy algorithms later in the course, including classic algorithms for finding minimum spanning trees (Topic 17) and shortest paths in graphs (Topics 18 and 19).

Nodari Sitchinava (based on material by Dan Suthers) Images are from the instructor's material for Cormen et al. Introduction to Algorithms, Third Edition.