Polar Codes Tutorial: Encoding, Decoding, and Polarization for BEC

Introduction to Polar Codes

Polar codes, introduced by Erdal Arıkan, are a class of error-correcting codes that achieve the symmetric capacity of memoryless channels. They are based on the concept of channel polarization, where multiple uses of a channel are transformed into synthetic channels that are either very reliable or very noisy. This tutorial walks through the encoding and decoding process using a binary erasure channel (BEC) example, similar to problems in an advanced coding theory assignment. We'll also explore the polarization theorem and its proof for the BEC, which is a classic topic in information theory.

Polar Code Encoding for N=4

Consider the polar code encoding circuit for N=4. The input bits U1, U2, U3, U4 are transformed into codeword bits X1, X2, X3, X4 via the encoding matrix G = [[1,0,0,0],[1,0,1,0],[1,1,0,0],[1,1,1,1]] (mod 2). In our example, U1 and U2 are frozen to 0, while U3 and U4 carry the message. The rate of the code is the number of message bits divided by the block length: R = 2/4 = 0.5.

Encoding Example

For message (U3, U4) = (1,1), we compute the codeword: X = U * G = [U1, U2, U3, U4] * G. With U1=0, U2=0, U3=1, U4=1, we get X = [0,0,1,1] * G = [1,1,1,1]. So the codeword is (1,1,1,1).

Successive Cancellation Decoding

Successive cancellation (SC) decoding processes bits one by one, using previously decoded bits and the channel outputs. For received vector (Y1,Y2,Y3,Y4) = (1,?,?,1) where '?' denotes an erasure, we attempt to decode U1 and U2 (frozen to 0) and then U3, U4. The decoding succeeds only if the channel outputs are consistent with the frozen bits. In this case, we can compute likelihoods and find that the decoded message is (U3,U4) = (1,1).

When Decoding Fails

If we change the frozen set: U2 and U3 frozen to 0, U1 and U4 as message bits, and receive (1,?,?,0), SC decoding fails at U1. This is because the erasures prevent reliable estimation of U1. This demonstrates the suboptimality of SC decoding compared to maximum likelihood (ML) decoding.

Maximum Likelihood Decoding

ML decoding considers all possible messages. For the same received vector (1,?,?,0), we compute all four possible codewords: (U1,U4) = (0,0) gives (0,0,0,0); (0,1) gives (1,1,1,1); (1,0) gives (1,0,1,0); (1,1) gives (0,1,0,1). Only (1,0) and (1,1) are consistent with the received vector? Actually, we need to check which codewords could produce the received pattern. The erasures allow multiple possibilities. In this case, both (1,0) and (1,1) are possible, so ML decoding declares failure. Thus, SC also fails, but ML can sometimes succeed when SC fails.

Proving Polarization for the BEC

The polarization theorem states that as the number of channel uses goes to infinity, the fraction of noiseless channels approaches the channel capacity C, and the fraction of useless channels approaches 1-C. For the BEC with erasure probability p, the capacity is C = 1-p. We define the mediocrity M(W) = p * (1-p) for a BEC with erasure probability p. This function measures how far the channel is from being perfect (M=0) or completely noisy (M=0 also? Actually M=0 when p=0 or p=1).

Polarized Channels

After one polarization step, we get two synthetic channels: W+ (better) and W- (worse). For a BEC, both are also BECs with erasure probabilities p^2 and 2p - p^2, respectively. Their mediocrities are M(W+) = p^2(1-p^2) and M(W-) = (2p-p^2)(1-2p+p^2).

Distribution of Erasure Probabilities

Consider a binary tree where each node represents a channel. Starting from the original BEC with erasure p, each step chooses W+ or W- with equal probability. After m steps, we have 2^m channels. The distribution F_m of erasure probabilities is the distribution of the random variable obtained by this process. For m=1, we have two channels with probabilities p^2 and 2p-p^2, each with probability 1/2. For m=2, we have four channels: p^4, 2p^2-p^4, 2p^2-p^4? Actually careful: The recursion is: if a channel has erasure q, then its children have erasures q^2 and 2q-q^2. So starting from p, we get p^2 and 2p-p^2. Then from p^2 we get p^4 and 2p^2-p^4; from 2p-p^2 we get (2p-p^2)^2 and 2(2p-p^2)-(2p-p^2)^2. So F2 has four values each with probability 1/4.

Average Mediocrity

The average mediocrity M_m = E[M(W)] under F_m. For m=0, M_0 = p(1-p). For m=1, M_1 = (1/2)[p^2(1-p^2) + (2p-p^2)(1-2p+p^2)]. It can be shown that M_1 = p(1-p) * (1 - 2p(1-p)). So M_1 < M_0 for p in (0,1). In fact, M_m decreases to 0 as m increases.

Convergence to Zero

Define ρ = 2p(1-p). Then M_1 = M_0 * (1-ρ). By induction, M_m ≤ M_0 * (1-ρ)^m = p(1-p) * (1-2p(1-p))^m. Since ρ > 0 for p in (0,1), M_m → 0 exponentially. This implies that the fraction of channels with mediocrity larger than any positive threshold goes to zero.

Fraction of Good and Bad Channels

Let g(m,ϵ) be the fraction of channels with erasure probability < ϵ (good), and b(m,ϵ) be the fraction with erasure > 1-ϵ (bad). The expected erasure probability under F_m is always p. Therefore, p = E[erasure] ≥ (1-ϵ) * b(m,ϵ) (since bad channels have erasure > 1-ϵ). So b(m,ϵ) ≤ p/(1-ϵ). As m→∞, the fraction of mediocre channels (erasure between ϵ and 1-ϵ) tends to 0, so g(m,ϵ) + b(m,ϵ) → 1. Since b(m,ϵ) ≤ p/(1-ϵ), we get g(m,ϵ) ≥ 1 - p/(1-ϵ). For any ϵ, the limit g(ϵ) = liminf g(m,ϵ) satisfies g(ϵ) ≥ 1-p/(1-ϵ). Taking ϵ→0, we get g(ϵ) ≥ 1-p = C. Thus, the fraction of good channels is at least the capacity.

Convexity of Rate Distortion Function

Another important concept in information theory is the rate distortion function, which is convex in the distortion parameter. This follows from the convexity of mutual information in the conditional distribution p(y|x) for fixed p(x). Using the log-sum inequality, one can show that the Kullback-Leibler divergence D(p||q) is convex in (p,q). Then, writing I(X;Y) = D(p(x,y)||p(x)p(y)), and noting that p(x,y) = p(y|x)p(x), we can prove convexity in p(y|x).

Conclusion

Polar codes are a fascinating area of coding theory with practical applications in 5G wireless communications. Understanding their encoding and decoding, as well as the polarization phenomenon, is essential for students of information theory. This tutorial provided hands-on examples from a typical homework assignment, illustrating both the mechanics and the theory behind polar codes.