Channel Capacity and Probability of Error Tutorial

Introduction to Channel Coding and Probability of Error

In information theory, the goal of a communication system is to reliably transmit a message from a source to a destination over a noisy channel. The probability of error (Pe) is a key performance metric. For a fixed codebook, the optimal decoder that minimizes Pe is the maximum a posteriori (MAP) decoder, which selects the message with the highest posterior probability given the received sequence. This principle is foundational for understanding channel capacity and coding theorems.

MAP Decoder Minimizes Error Probability

Given a codebook with M codewords, the decoder observes Y^n and outputs an estimate Ĵ. The probability of error is P(Ĵ ≠ J). To minimize Pe, we choose Ĵ that maximizes P(J = j | Y^n = y^n). This is derived from the law of total probability and the fact that the MAP rule minimizes the probability of misclassification. In practice, for a memoryless channel, this reduces to maximizing the likelihood function when the prior is uniform.

The Two-Look Gaussian Channel: Capacity with Correlated Noise

Consider a Gaussian channel with two correlated observations Y1 = X + Z1 and Y2 = X + Z2, where (Z1, Z2) are jointly Gaussian with zero mean and covariance matrix K = σ^2 [[1, ρ], [ρ, 1]]. The input X has power constraint P. The capacity C depends on the correlation coefficient ρ.

Case (a): ρ = 1

When ρ = 1, the noises are perfectly correlated: Z1 = Z2. Then Y1 = X + Z, Y2 = X + Z, so both outputs are identical. The channel effectively provides one observation. The capacity is the same as a standard AWGN channel: C = (1/2) log(1 + P/σ^2) bits per channel use.

Case (b): ρ = 0

When ρ = 0, the noises are independent. The channel is a vector Gaussian channel with two independent observations. The capacity is C = (1/2) log(1 + 2P/σ^2). This is higher than the ρ=1 case because the two looks provide independent noise realizations, effectively doubling the SNR.

Case (c): ρ = -1

When ρ = -1, the noises are perfectly negatively correlated: Z1 = -Z2. Then Y1 = X + Z, Y2 = X - Z. By averaging, we can cancel the noise: (Y1 + Y2)/2 = X. The channel becomes noiseless, and capacity is infinite (or limited only by power constraint). In practice, with power constraint, capacity is unbounded because we can transmit an infinite number of bits per use by using the noiseless combination.

Output Power Constraint in AWGN Channels

Consider an AWGN channel Y = X + Z, Z ~ N(0, σ^2), with an expected output power constraint E[Y^2] ≤ P. This is different from the usual input power constraint. Since E[Y^2] = E[X^2] + σ^2, the output constraint implies E[X^2] ≤ P - σ^2, assuming X and Z independent. The capacity is then C = (1/2) log(1 + (P - σ^2)/σ^2) = (1/2) log(P/σ^2) bits per channel use, provided P > σ^2. This scenario arises in systems where the receiver has a power limitation, such as in energy harvesting or sensor networks.

Gaussian Mutual Information for Markov Chains

Suppose (X, Y, Z) are jointly Gaussian and form a Markov chain X → Y → Z. Let ρ1 be the correlation between X and Y, and ρ2 between Y and Z. Then the mutual information I(X; Z) can be computed using the formula for bivariate normal: I(X; Z) = -(1/2) log(1 - ρ_XZ^2), where ρ_XZ = ρ1 * ρ2 because the chain implies conditional independence. Thus I(X; Z) = -(1/2) log(1 - ρ1^2 ρ2^2). This result is useful in understanding information flow in cascaded Gaussian channels.

Bottleneck Channel: Capacity Limited by Alphabet Size

The bottleneck channel consists of a transition X → V → Y, where X and Y have alphabet size m, and V has alphabet size k. The overall channel is p(y|x) = Σ_v p(v|x) p(y|v). The capacity C ≤ log k, because the mutual information I(X;Y) ≤ I(X;V) ≤ H(V) ≤ log k. This bound is tight when V can be chosen to carry all information. This principle is used in data compression and feature extraction in machine learning.

Joint Typicality and Its Bounds

Joint typicality extends the concept of typical sequences to multiple random variables. For i.i.d. triples (Xi, Yi, Zi) ~ p(x,y,z), a triple (x^n, y^n, z^n) is jointly typical if its empirical distribution is close to the true distribution. The probability that independent sequences (drawn from product of marginals) appear jointly typical is bounded by approximately 2^{-n I(X;Y;Z)} where I(X;Y;Z) is the multivariate mutual information. Specifically, P( (X^n, Y^n, Z^n) jointly typical ) ≈ 2^{-n (H(X)+H(Y)+H(Z) - H(X,Y) - H(X,Z) - H(Y,Z) + H(X,Y,Z)) }.

This concept is central to the proof of the channel coding theorem and rate-distortion theory. It ensures that with high probability, the transmitted codeword and received sequence are jointly typical, while any other codeword is not.

Real-World Applications and Trends

Understanding channel capacity and error probability is crucial for designing 5G/6G wireless systems, satellite communications, and deep space networks. For example, in autonomous vehicles, reliable communication between sensors and the central processor requires minimizing error probability under power constraints. The two-look Gaussian channel models diversity combining in MIMO systems, where multiple antennas provide correlated or uncorrelated observations.

In machine learning, the bottleneck channel relates to the information bottleneck method, which compresses data while preserving relevant information. Joint typicality is used in variational autoencoders and contrastive learning to align representations.

Key Takeaways

The MAP decoder minimizes probability of error for a fixed codebook.
Correlation between noise components in multi-look channels significantly affects capacity.
Output power constraints can limit capacity differently than input constraints.
Mutual information in Gaussian Markov chains simplifies to a function of correlation coefficients.
The bottleneck channel capacity is bounded by the log of the intermediate alphabet size.
Joint typicality provides exponential bounds on the probability of false acceptance in coding.

Master these concepts to excel in information theory and its applications to modern communication and AI systems.