Strong Law of Large Numbers & Birkhoff Ergodic Theorem Tutorial

Introduction: Why Ergodic Theory Matters in 2026

In today's data-driven world, from AI recommendation systems to financial market predictions, the laws of large numbers and ergodic theorems are more relevant than ever. For instance, when you train a neural network on a massive dataset, you rely on the idea that sample averages converge to true expectations — that's the strong law of large numbers in action. Similarly, if you're analyzing a time series of stock returns, Birkhoff's ergodic theorem ensures that time averages equal ensemble averages for stationary processes. This tutorial will guide you through the core problems of Math154 Homework 7, focusing on the strong law, the maximal ergodic theorem, and Birkhoff's theorem, using clear explanations and connections to modern applications.

Problem 7.1: A Sequence of Independent Random Variables with Zero Mean

Part (a): Checking Independence and Zero Mean

We consider the probability space Ω = [0,1] with Lebesgue measure. For n ≥ 1, define a_n = 1/(n log(n)) and X_n(x) = n * 1_{[0, a_n/2]}(x) - n * 1_{[1 - a_n/2, 1]}(x). Each X_n takes values n, -n, 0. The first task is to check that X_n is a sequence of independent random variables with zero mean and variance n/(log n).

Independence follows because the intervals defining X_n are disjoint for different n? Actually, they are not disjoint; but since the support of each X_n has measure a_n, and the a_n are small, the events {X_n = n} and {X_n = -n} for different n are independent? Wait, they are not independent because they are defined on the same space. However, the problem states they are independent? Let's re-read: The assignment says "Xn is a sequence of independent random variables". Actually, in the given construction, they are not independent because they are all defined on the same [0,1] and their supports overlap. But the problem likely intends that the random variables are independent because the underlying space is a product space? The excerpt is garbled. We'll assume they are independent as per the assignment. The mean is zero: E[X_n] = n * (a_n/2) - n * (a_n/2) = 0. Variance: E[X_n^2] = n^2 * (a_n/2) + n^2 * (a_n/2) = n^2 a_n = n^2/(n log n) = n/log n.

Part (b): Weak Law of Large Numbers Holds

We need to check that P[|S_n/n| ≥ ε] → 0, where S_n = X_1 + ... + X_n. Since the variables are independent with zero mean, we can use Chebyshev's inequality: P[|S_n/n| ≥ ε] ≤ Var(S_n)/(n^2 ε^2) = (∑ Var(X_i))/(n^2 ε^2). Compute ∑_{i=1}^n i/log i. This sum is asymptotically n^2/(2 log n) (by integral test). So Var(S_n)/(n^2) = (1/n^2) * O(n^2/log n) = O(1/log n) → 0. Hence the weak law holds.

Part (c): Strong Law Fails

The strong law would require P[lim S_n/n = 0] = 1. But we can show that P[|S_n/n| ≥ 1/2 infinitely often] = 1. Since P[X_n = n] = a_n/2 = 1/(2n log n), and ∑ P[X_n = n] diverges (by integral test, ∑ 1/(n log n) diverges). By the second Borel-Cantelli lemma (since events are independent), P[X_n = n infinitely often] = 1. When X_n = n, then S_n/n ≥ (n - (n-1)*something?)/n but careful: if X_n = n occurs infinitely often, then S_n gets increments of n infinitely often, so S_n/n does not converge to 0. More rigorously, on the event that X_n = n for infinitely many n, we have |S_n/n| ≥ 1/2 for those n (since the contribution from other terms is bounded). Thus strong law fails.

Problem 7.2: Maximal Ergodic Theorem of Hopf

The maximal ergodic theorem is a key tool for proving Birkhoff's theorem. It states: Let T be a measure-preserving transformation on a probability space (X, μ). For f ∈ L^1, define M_n(x) = max_{1≤k≤n} |S_k(x)|/k? Actually, the standard maximal ergodic theorem: For any f ∈ L^1, define f^*(x) = sup_{n≥1} |S_n(f)(x)|/n. Then for any λ > 0, μ({x: f^*(x) > λ}) ≤ (1/λ) ∫_{f^* > λ} |f| dμ. The proof uses the idea of "upcrossings" or a clever decomposition. The notes likely provide a step-by-step proof. Make sure you understand how to apply the lemma to the function f - E[f] to get convergence.

Problem 7.3: Birkhoff's Ergodic Theorem

Birkhoff's ergodic theorem is a cornerstone of ergodic theory. It says: If T is a measure-preserving transformation on a probability space and f ∈ L^1, then the time averages (1/n) ∑_{k=0}^{n-1} f(T^k x) converge almost surely to the conditional expectation E[f | ℐ], where ℐ is the sigma-algebra of invariant sets. The proof uses the maximal ergodic theorem to show that the set where the limsup and liminf differ has measure zero. Understanding every step is crucial — from the definition of the invariant sigma-algebra to the application of the maximal inequality.

Problem 7.4: History of Birkhoff's Ergodic Theorem and Harvard Connection

George David Birkhoff, a prominent American mathematician, proved the ergodic theorem in 1931 while at Harvard University. The theorem emerged from the ergodic hypothesis in statistical mechanics, which states that time averages equal space averages for Hamiltonian systems. Birkhoff's work was influenced by earlier results of von Neumann (mean ergodic theorem) and Poincaré. The connection to Harvard is significant: Birkhoff spent most of his career there, and the theorem solidified Harvard's reputation as a center for dynamical systems research. Today, the theorem is used in fields from economics to ecology.

Problem 7.5: Circle Rotations and Diophantine Approximation

Part (a): Coboundaries

Let α be irrational and T(x) = x + α mod 1. If f is a measurable function such that f(x) = g(x+α) - g(x) for some measurable g, then S_n(x) = g(x+nα) - g(x). So S_n is bounded if g is bounded. In general, S_n/n → 0 almost surely? Actually, if f is a coboundary, then S_n/n → 0 pointwise (if g is bounded).

Part (b): Weyl Sums

The sum S_n is known as a Weyl sum. For f with zero mean, Birkhoff's theorem says S_n/n → 0 almost surely (since the system is ergodic). But if α is rational, the system is not ergodic — the averages converge to a function that depends on the orbit.

Part (c): Independence? No

The random variables X_n = f(T^n x) are not independent; they are dependent through the dynamics. However, they are stationary and ergodic (if α irrational). Thus the strong law (Birkhoff) applies, but the weak law also holds via ergodicity.

Part (d): Diophantine Approximation

If α is badly approximable (i.e., there exists c>0 such that |α - p/q| > c/q^2 for all rationals p/q), then for f with sufficient regularity (e.g., Lipschitz), the discrepancy of the sequence {nα} is small. This leads to better convergence rates for S_n. The connection: the rate of convergence in Birkhoff's theorem can depend on Diophantine properties of α.

Conclusion: From Theory to Practice

Whether you're studying the long-term behavior of a Markov chain, analyzing time series data for an AI model, or understanding the dynamics of planetary orbits, the strong law and ergodic theorems provide the mathematical backbone. In 2026, as we rely more on algorithms that learn from data, these concepts ensure that sample averages are reliable. Master them now, and you'll have a solid foundation for advanced probability and statistics.