Rate Distortion Theory Tutorial: Shannon Lower Bound & Independent Sources

Understanding Rate Distortion Theory

Rate distortion theory addresses the fundamental limits of lossy compression. For a continuous random variable X with mean zero and variance σ², the rate-distortion function R(D) for mean-squared distortion quantifies the minimum number of bits per symbol needed to achieve distortion D. This tutorial walks through key concepts from Ee276 homework #8 p0, including the Shannon lower bound, rate distortion for uniform sources with Hamming distortion, and compression of independent sources. These ideas are not just theoretical—they underpin modern codecs like those used in streaming services (e.g., Netflix adapting video quality to bandwidth) and even in AI model compression for deploying large language models on edge devices.

Shannon Lower Bound

Part (a) asks to show the lower bound for R(D). For a continuous source with variance σ² and MSE distortion, the Shannon lower bound states:

R(D) ≥ (1/2) log(σ²/D)   for 0 ≤ D ≤ σ²

This bound is derived by considering that the mutual information I(X; X̂) is at least the entropy of X minus the entropy of the error. For Gaussian sources, this bound is tight, meaning Gaussian sources are the hardest to compress (require the most bits) at a given distortion among all sources with the same variance. In the context of today's AI trends, this is analogous to how compressing a high-entropy image (like a detailed landscape) requires more bits than a low-entropy one (like a solid color) for the same quality.

Upper Bound via Joint Distribution

Part (b) uses the joint distribution in Figure 1 to derive an upper bound. The distribution involves a test channel where X̂ = (σ² - D)/σ² X + Z, with Z independent Gaussian with variance D(σ² - D)/σ². This yields:

R(D) ≤ (1/2) log(σ²/D)

Combining with the lower bound, we get R(D) = (1/2) log(σ²/D) for Gaussian sources. For non-Gaussian sources, the lower bound is not necessarily achievable, so Gaussian sources are indeed harder to describe: they require higher rate for the same distortion. This is why many compression algorithms assume Gaussian models for worst-case performance.

Rate Distortion for Uniform Source with Hamming Distortion

Now consider a source X uniformly distributed on {1,2,…,m} with Hamming distortion (d(x,x̂)=0 if x=x̂, else 1). This models binary symmetric channels or discrete alphabet compression.

Zero Distortion Case

When D=0, we need lossless compression. The rate must be at least H(X)=log m. Thus R(0)=log m. For D≥1-1/m, we can always guess a constant, so R(D)=0.

Lower Bound via Fano's Inequality

For 0

I(X;X̂) = H(X) - H(X|X̂) ≥ log m - [H_b(D) + D log(m-1)]

Thus R(D) ≥ log m - H_b(D) - D log(m-1).

Achievability

To achieve this bound, use a test channel where with probability 1-D we output X, and with probability D we output a uniform random symbol from the other m-1 values. This yields I(X;X̂)=log m - H_b(D) - D log(m-1), so the lower bound is tight.

Final Rate-Distortion Function

Thus:

R(D) = { log m - H_b(D) - D log(m-1)   for 0 ≤ D ≤ 1-1/m; 0 for D ≥ 1-1/m }

This result is used in designing quantization for discrete sources, such as in JPEG compression where DCT coefficients are quantized.

Compression of Two Independent Sources

Can we compress two independent sources better together than separately? Let {X_i} iid ~ p(x) with rate distortion function R_X(D), and {Y_i} iid ~ p(y) independent. We want to describe the pair (X_i,Y_i) with distortions D1 and D2.

Lower Bound

By the data processing inequality and independence, R_{X,Y}(D1,D2) ≥ R_X(D1)+R_Y(D2). This is because the sum of rates for each source individually is a lower bound for the joint rate. For example, if you compress a video stream (X) and an audio stream (Y) separately, the total rate is at least the sum of their individual rate-distortion functions.

Equality Condition

Equality holds if the sources are independent and the distortion measures are separable. In that case, we can compress each source independently without loss of optimality. However, if there is any dependency or if the distortion measure couples the sources, joint compression may yield gains. This is relevant in modern applications like compressing multi-view video or stereo audio where channels are correlated.

Distortion-Rate Function

The distortion-rate function D(R) is the inverse of R(D). It is decreasing and convex in R. Decreasing: as rate increases, distortion decreases. Convex: diminishing returns—each additional bit reduces distortion less. This property is used in rate-distortion optimization for video encoding (e.g., H.265) where encoder chooses quantization parameters to minimize distortion for a given bitrate.

Proof Steps

Given i.i.d. sources and a code with rate R, the expected distortion is at least D(R). The steps involve using the joint distribution of the source and reconstruction, applying the data processing inequality, and the convexity of the rate-distortion function. These steps are typical in proving the converse of the rate-distortion theorem.

Conclusion

Rate distortion theory provides the theoretical foundation for lossy compression. From the Shannon lower bound to the rate-distortion function for uniform sources, these concepts are essential for understanding modern compression algorithms. The independence result shows that for independent sources with separable distortion, separate compression is optimal—a key insight for system design. As we move into an era of AI-generated content and virtual reality, rate distortion theory continues to guide efficient representation of data.