Programming lesson
Understanding Frisch-Waugh-Lovell Theorem Through Two-Stage Regressions: A Step-by-Step Guide
Learn the Frisch-Waugh-Lovell theorem with two regressors, including matrix inversion, partialling out, and measurement error bias, using practical examples from econometrics homework.
Introduction to the Frisch-Waugh-Lovell Theorem
The Frisch-Waugh-Lovell (FWL) theorem is a cornerstone of econometric theory, especially in multiple regression analysis. It states that the coefficient of a regressor in a multiple regression can be obtained by first regressing that regressor on all other regressors, taking the residuals, and then regressing the dependent variable on those residuals. This tutorial walks through the FWL theorem with two regressors, as outlined in the assignment. We'll cover matrix inversion, partialling out, and the impact of measurement error—all using timely examples related to current trends like AI model training and fantasy sports analytics.
Setting Up the Two-Regressor Model
Consider the regression model: y_i = β₁x₁ᵢ + β₂x₂ᵢ + εᵢ. In vector notation: Y = β₁X₁ + β₂X₂ + ε. The design matrix X has columns X₁ and X₂. The OLS estimator is β̂ = (X'X)⁻¹X'Y. For two regressors, the matrix X'X is 2x2:
X'X = [[X₁'X₁, X₁'X₂],
[X₁'X₂, X₂'X₂]]Each entry is a scalar inner product. To find β̂, we invert this matrix. For simplicity, assume units are chosen so that X₁'X₁ = 1 and X₂'X₂ = 1. This normalization is without loss of generality because scaling regressors does not affect the coefficients. The inverse of X'X is:
(X'X)⁻¹ = (1/(1 - r²)) [[1, -r], [-r, 1]]where r = X₁'X₂ is the correlation between the regressors. Then:
β̂₁ = (X₁'Y - r X₂'Y) / (1 - r²)
β̂₂ = (X₂'Y - r X₁'Y) / (1 - r²)This is the direct OLS solution.
The FWL Theorem: Partialling Out X₂
Now apply the FWL theorem. First, regress X₂ on X₁: X₂ = X₁ξ + error. The OLS estimate is ξ̂ = (X₁'X₁)⁻¹ X₁'X₂ = r (since X₁'X₁ = 1). The fitted values are P₁X₂ = X₁ * r, and the residuals are M₁X₂ = X₂ - P₁X₂ = X₂ - rX₁. Next, regress Y on these residuals: Y = (M₁X₂) β₂ + error. The coefficient is:
β̂₂* = ((M₁X₂)' (M₁X₂))⁻¹ (M₁X₂)'YSince M₁X₂ is orthogonal to X₁, we have (M₁X₂)' (M₁X₂) = 1 - r² and (M₁X₂)'Y = X₂'Y - rX₁'Y. Thus:
β̂₂* = (X₂'Y - rX₁'Y) / (1 - r²)This is exactly the same as β̂₂ from the original regression. The FWL theorem is verified.
Application to Real Data: Consumption and Income Growth
In the computer exercise, you regress consumption growth on income growth and the interest rate. Let's denote Y = consumption growth, X₁ = income growth, X₂ = interest rate. The steps are:
- Step a: Regress income growth on the interest rate:
X₁ = X₂δ + error. Obtain residualsMrY = X₁ - X₂δ̂. - Step b: Regress consumption growth on
MrY. The coefficient onMrYequals the coefficient on income growth from the original multiple regression. - Step c: Regress consumption growth on the interest rate:
Y = X₂γ + error. Obtain residualsMrC = Y - X₂γ̂. - Step d: Regress
MrConMrY. The coefficient here also equals the original income coefficient. This demonstrates that partialling out the interest rate from both Y and X₁ yields the same result.
This is analogous to how fantasy sports analysts isolate a player's performance by controlling for team strength or opponent quality. For example, to find a quarterback's true contribution, you might regress team points on opponent defense strength first, then use the residuals.
Measurement Error and Attenuation Bias
Measurement error in regressors causes attenuation bias, pushing coefficients toward zero. In the exercise, you add iid mean zero noise to income growth: Y* = X₁ + v, where v is measurement error. Then regress consumption growth on Y* and the interest rate. The coefficient on the mismeasured income growth shrinks toward zero. The interest rate coefficient may also change, often becoming less biased if the measurement error is uncorrelated with the interest rate.
With larger measurement error variance, the attenuation becomes more severe. This is similar to using noisy sensor data in AI training: if input features are corrupted, the model's learned coefficients are less reliable. In sports analytics, if a player's stats are poorly recorded, their estimated impact on wins diminishes.
Practical Implementation Tips
When implementing in MATLAB or Python, use matrix operations to verify the FWL theorem. For the measurement error part, generate random normal errors with increasing variance and observe the changes in coefficients. This reinforces the concept of consistency and bias in OLS.
Conclusion
The Frisch-Waugh-Lovell theorem is not just a theoretical curiosity; it underpins many econometric techniques like fixed effects and instrumental variables. By mastering this two-regressor case, you build intuition for more complex models. Whether you're analyzing economic data, training machine learning models, or evaluating player performance, the ability to partial out control variables is indispensable.