Outline: Compression, Image Data



Information Theory

If the I'm sending a stream of 90% "0" bits and 10% "1" bits, how many bits of information does each bit carry?

Notes:


Work with class to figure out how much they know.

Specifically, how many have had a course on information theory?

How many know how to correlate information content to entropy?



Lossless vs. Lossy Compression

Compression is lossless if decompression is guaranteed to always give us the same data as we had before compression.

Example: dictionary reference. 7 bits for word-on-page, 9 bits for page number, 16 bits for each English word.

This compression is lossless if algorithm can handle words that are not in the dictionary (plural, third-person verbs, past tense, slang, acronyms, made-up words, etc). Otherwise it is lossy ("I went and bought Nikes" becomes "I go and buy sneaker").



Compression Criteria

Lossless or Lossy?

If lossy, what do you lose?

Compression ratio.



Run-Length Encoding (RLE)

Replace each string of identical bits (or digits, or letters) by a count of the symbols.

Example: data (24 bits) is

0000 0011 1111 1111 1000 0000
The compressed data is 8 zeros, 9 ones, and 7 zeros. Encoding this with a 3-bit count and the 1 bit value, the encoding is
0-110 1-111 1-100 0-111
The compression ratio is (24 - 16) / 24 = 1/3.

RLE is lossless.

RLE is good for compressing images with large uniform areas (scanned text: 8-to-1 compression).



Differential Pulse Code Modulation
(DPCM)

Compressed data starts with a reference symbol, then only lists the differences between successive symbols.

Example: data (24 bits) is

0000 0001 0001 0010 1001 1000
compressed data is 0 (symbol), +1, +0, +1, 9 (new symbol), -1. We encode this by using 2 bits to encode the difference or (code 10) to state that a new symbol follows.
10.0000 01 00 01 10.1001 11
Compression ratio is (24 - 20) / 24 = 1/6.



DCPM -- Continued

DPCM is lossless.

DCPM is like RLE, but allows small variations in the data (e.g. real-world images).

DPCM good for reducing the dynamic range of pixel values (adjacent pixels are usually similar).



Dictionary Encoding

Encode variable-length strings by their index in a table.

The table must be shared between compression and decompression: either

  1. encrypt-compress: use a standard (unchanging) table, or
  2. included-table: include the table in the compressed data.

("encrypt-compress") has been used for encryption, but is inefficient since the shared secret (the dictionary) is large. Also, this is subject to frequency analysis.

("included-table") has the advantage that the dictionary can be customized to the data.

Example: Liv-Zemple (LZ) compression.



Computing a Dictionary

Sample the data to determine which bit strings occur frequently. Put those into the dictionary, and assign them short code words.

Encode less-frequently-occurring bit strings with longer code words, or with themselves.

(One of many possible algorithms).



Computing a Dictionary -- Example

Data (384 bits):
7f45 4c46 0102 0100 0000 0000 0000 0000
0002 0002 0000 0001 0001 3644 0000 0034
0014 3f38 0000 0000 0034 0020 0005 0028
Dictionary:
word frequency code
0000 8 0
0001 2 100
0002 2 101
0034 2 110
others 1 111+word

This example assumes we have decided to encode 16-bit "words". Other possibilities include shorter words, longer words, or variable-length words.


Notes:


Data is the first 48 bytes from a Unix binary file.



Using the Dictionary -- Example

Data (384 bits):
7f45 4c46 0102 0100 0000 0000 0000 0000
0002 0002 0000 0001 0001 3644 0000 0034
0014 3f38 0000 0000 0034 0020 0005 0028
Dictionary (89 bits) -- note mixed notation:
01.0.0000 11.100.0001 11.101.0002
11.110.0034 00.11.111
Encoding (216 bits) -- note mixed notation:
111.7f45 111.4c46 111.0102 111.0100 0 0 0 0
101 101 0 100 100 111.3644 0 110
111.0014 111.3f38 0 0 110 111.0020
 111.0005 111.0028
Compression ratio is (384 - 216 - 89) / 384 = 21%.



GIF image compression

Graphical Interchange Format: lossy compression for 24-bit color images.

  1. Identify the colors used in the picture.
  2. Build a table with 256 colors to match the colors in the picture, replace 24-byte pixels with 8-byte indices into table (lossy).
  3. Run LZ over result (sequences of pixels are the "strings").

Up to 10-to-1 compression ratio (90%).



JPEG image compression

Joint Photographic Experts Group: lossy compression for color images.

  1. Color (RGB or YUV) separation (steps following are for each color).
  2. Divide into 8 * 8 blocks.
  3. Discrete Cosine Transform (DCT) on each block.
  4. Quantization (lossy).
  5. RLE/DPCM Encoding

JPEG standard lets the compression quality be traded against image quality and viceversa.


Notes:


How many people have heard of FFT?



JPEG DCT

Discrete Cosine Transform is similar to Fast Fourier Transform (FFT).

DCT computes spatial frequencies for a data set (for JPEG, for an 8 * 8 block).

Except for quantization errors, the spatial frequencies can be converted (using inverse DCT) back into the data set, so DCT is not lossy.

Intuition: spatial frequency (0, 0) is the "DC", "constant", or "offset" for the entire block (how light or dark the entire block is). Higher frequencies correspond to variations in the image: highest frequencies are the sharpest edges.



JPEG Quantization

As judged by human viewers, the lowest spatial frequencies are most important for picture quality.

JPEG uses more bits for the low spatial frequency components. Only the more significant bits of higher spatial frequency components are retained.

The exact algorithm is to divide each spatial frequency (the result of the DCT phase) by a corresponding coefficient stored in a table. The table has small coefficients for low frequencies and large coefficients for high frequencies.

For example, the coefficient for (0,0) might be 4, so only the 2 least significant bits are lost. The coefficient for (7,7) might be 32, so the 5 least significant bits are lost.

JPEG specifies a set of quantization tables.



JPEG Encoding

JPEG uses a variant of run-length encoding in a diagonal pattern over the data produced by quantization.

The variant is that only the length of runs of zero values are encoded -- all other values are encoded as themselves. Since many of the higher-frequency components have been divided by large coefficients, they are likely to be zero.

Also, the DC coefficient is likely to be the same as in the previous block, and is encoded using DPCM.


Notes:


Draw picture of diagonal pattern, also book figure 7.12.



Summary