Programming lesson
Understanding Time-Varying Channels and Joint Typicality in Coding Theory
Explore time-varying discrete memoryless channels, jointly typical sequences, and Fano's inequality with practical examples from modern communications and AI.
Introduction to Time-Varying Channels
In modern communication systems, channels often change over time due to fading, interference, or mobility. Understanding time-varying discrete memoryless channels (DMCs) is crucial for designing robust codes. This tutorial builds on concepts from Ee276 homework #6 p0, focusing on the binary symmetric channel (BSC) with time-varying crossover probabilities. We'll connect these ideas to real-world trends like 5G networks, AI-driven adaptive coding, and even gaming latency optimization.
Channel Capacity and Mutual Information
For a time-varying DMC where each use has a BSC with crossover probability δ_i, the mutual information I(X;Y) is maximized by an input distribution that may depend on the channel state. The problem asks to show that max I(X;Y) = (1/n) Σ C(δ_i) where C(δ)=1-H(δ) is the capacity of a BSC. Using a chain of inequalities similar to the converse proof, we bound I(X;Y) by the sum of per-use capacities. This result is foundational for adaptive modulation and coding in Wi-Fi and 5G.
Proof Sketch
By the data processing inequality and chain rule, I(X;Y) ≤ Σ I(X_i;Y_i) ≤ Σ max_{P_X} I(X_i;Y_i) = Σ C(δ_i). Equality is achieved by independent uniform inputs. This shows that the capacity of the time-varying channel is the average of per-time capacities, a key insight for resource allocation in dynamic environments.
Jointly Typical Sequences
Joint typicality is the cornerstone of random coding proofs. For a BSC with crossover probability 0.1 and uniform input, the joint distribution P(x,y) is given. We compute H(X)=1 bit, H(Y)=1 bit, H(X,Y)=1+ H(0.1) ≈ 1.469 bits, and I(X;Y)=1 - H(0.1) ≈ 0.531 bits. The jointly typical set A_ε^{(n)} includes sequence pairs (x^n, y^n) whose empirical statistics are close to the true distribution. For n=25 and ε=0.2, we calculate the size of the typical set using the provided table. The number of sequences with Hamming weight k is given; only those with k between about 1 and 11 are typical (since H(0.1) ≈ 0.469, typical sequences have k/n ≈ 0.1). Summing the counts for k=1 to 11 yields the size ≈ 24,000,000.
Additive Channel Interpretation
The BSC can be modeled as Y = X ⊕ Z, where Z ~ Bernoulli(0.1). Then (x^n, y^n) is jointly typical iff x^n is typical and z^n = y^n ⊕ x^n is typical. This simplifies analysis and connects to error-correcting codes used in SSDs and space communications.
Probability of Error in Joint Typical Decoding
Consider a random code with M=512 codewords of length n=25. For a fixed received sequence y^n (e.g., all zeros), the probability that a randomly chosen codeword (uniform over {0,1}^n) is jointly typical with y^n equals the probability that the error pattern z^n is typical, which is about 2^{-n I(X;Y)} ≈ 2^{-13.275}. Using the union bound, the probability that any of the other 511 codewords is jointly typical is at most 511 * 2^{-13.275} ≈ 0.025. The total error probability includes the event that the transmitted codeword is not typical with the received sequence (small) plus the event of a false match. This yields Pe ≈ 0.025, demonstrating that random coding achieves reliable communication at rates below capacity.
BSC with Feedback
Feedback can increase capacity? For a BSC with p, using X1 ~ Bernoulli(1/2), X2 = Y1, ..., Xn = Y_{n-1}, the mutual information per transmission approaches 1 - H(p) as n→∞, which is the capacity without feedback. So feedback does not increase capacity for memoryless channels, a classic result. However, feedback simplifies coding schemes and reduces delay, relevant in real-time gaming and video streaming.
Fano's Inequality
Fano's inequality relates the probability of error Pe to the conditional entropy H(X|Y). In the absence of conditioning, we maximize H(X) given a fixed Pe = 1 - p_1. The maximizing distribution is uniform over the remaining m-1 symbols with total probability Pe, yielding H(X) ≤ H(Pe) + Pe log(m-1). This gives a lower bound on Pe in terms of H(X), fundamental for understanding the trade-off between compression and error.
Conclusion
These concepts—time-varying channels, joint typicality, feedback, and Fano's inequality—are pillars of information theory. They underpin modern communication standards like LTE, Wi-Fi 6, and satellite links. As AI and machine learning increasingly optimize channel coding, understanding these fundamentals becomes even more vital. Whether you're designing a 5G modem or a neural network for channel decoding, these principles guide performance limits and practical algorithms.