Monte Carlo Simulation of OLS Estimator Performance with Varying Sample Sizes and Error Distributions

Understanding the OLS Estimator Through Monte Carlo Simulation

Econometrics often relies on asymptotic approximations, but how well do these approximations work in small samples? In this tutorial, we explore the finite-sample behavior of the Ordinary Least Squares (OLS) estimator using Monte Carlo simulations in MATLAB. You will examine how sample size and the distribution of regressors affect the performance of t-tests, and compare results under normal and non-normal errors. This exercise is directly relevant to Assignment Chef's econometrics homework, where you investigate the size of t-tests for the null hypothesis that a coefficient equals its true value.

Setting Up the MATLAB Environment

We begin by generating regressors with specific correlation structures. For N=500, draw a vector w of standard normal errors and construct X1 = 2 + w. Then draw another vector v of standard normal errors and construct X2 = 1 + w + 4*log(v.^2). This creates regressors that are correlated and have non-zero means, mimicking real-world data where regressors are not orthogonal. We then use only the first N observations for each simulation run.

Monte Carlo Loop for N=20 with Normal Errors

Set N=20. Draw a vector e of normal errors with variance 3. Construct the dependent variable as Y = 1 + 2*X1 + 4*X2 + e. Estimate the OLS coefficients and compute the t-statistic for testing H0: beta1 = 2. Repeat this 100 times and count how often the absolute t-statistic exceeds the critical value from a t-distribution with N-3 degrees of freedom at the 5% significance level (two-sided). This gives you the empirical rejection rate, which should be close to 0.05 if the t-test is well-behaved in small samples.

% MATLAB code snippet for N=20
rng(42); % for reproducibility
N = 20;
reps = 100;
w = randn(500,1);
v = randn(500,1);
X1 = 2 + w;
X2 = 1 + w + 4*log(v.^2);
X = [ones(500,1), X1, X2];
reject = 0;
for i = 1:reps
    e = sqrt(3)*randn(500,1);
    Y = 1 + 2*X1 + 4*X2 + e;
    % Use only first N observations
    X_sub = X(1:N,:);
    Y_sub = Y(1:N);
    b = (X_sub'*X_sub)\ (X_sub'*Y_sub);
    res = Y_sub - X_sub*b;
    MSE = res'*res / (N-3);
    se_b1 = sqrt(MSE * inv(X_sub'*X_sub)(2,2));
    t_stat = (b(2)-2)/se_b1;
    crit = tinv(0.975, N-3);
    if abs(t_stat) > crit
        reject = reject + 1;
    end
end
rejection_rate = reject/reps;

Extending to N=200 with Non-Normal Errors

Now set N=200. Draw a vector of standard normal errors and construct z = e.^2 - 1, which has mean zero but is not normally distributed (it follows a shifted chi-squared distribution). Construct Y = 1 + 2*X1 + 4*X2 + z. Estimate the OLS coefficients and compute the asymptotic t-test for H0: beta1 = 2 using the normal distribution critical value (1.96 for a 5% two-sided test). Repeat 100 times and record the rejection rate. Compare this to the rejection rate for N=20 with normal errors. Does the asymptotic test perform better for larger N? Typically, with non-normal errors, the t-test may have size distortions in small samples, but as N increases, the central limit theorem ensures that the asymptotic approximation improves.

% MATLAB code snippet for N=200 with non-normal errors
N = 200;
reps = 100;
reject_asym = 0;
for i = 1:reps
    e = randn(500,1);
    z = e.^2 - 1; % mean zero, non-normal
    Y = 1 + 2*X1 + 4*X2 + z;
    X_sub = X(1:N,:);
    Y_sub = Y(1:N);
    b = (X_sub'*X_sub)\ (X_sub'*Y_sub);
    res = Y_sub - X_sub*b;
    MSE = res'*res / (N-3);
    se_b1 = sqrt(MSE * inv(X_sub'*X_sub)(2,2));
    t_stat = (b(2)-2)/se_b1;
    if abs(t_stat) > 1.96
        reject_asym = reject_asym + 1;
    end
end
rejection_rate_asym = reject_asym/reps;

Interpreting Results and Practical Insights

The empirical rejection rate from the Monte Carlo simulation tells you how often the t-test incorrectly rejects a true null hypothesis. For N=20 with normal errors, you might find a rejection rate slightly above or below 5% due to sampling variability, but it should be close. For N=200 with non-normal errors, the asymptotic test using the normal critical value may have a rejection rate that approaches 5% as N grows, but may still show some distortion if the error distribution is highly skewed or heavy-tailed. This exercise highlights the importance of sample size and error distribution in econometric inference.

Connecting to Current Trends

Monte Carlo simulations are widely used in finance for risk assessment and in AI for model validation. For instance, just as a gamer might simulate thousands of battles to estimate the win rate of a strategy, econometricians simulate many samples to understand the reliability of statistical tests. This approach is also used in app development to test algorithms under different conditions, ensuring robust performance even when assumptions are violated.

Conclusion

By systematically varying sample size and error distribution, you gain intuition about when asymptotic approximations are reliable. This Monte Carlo framework is a fundamental tool in econometrics, enabling you to evaluate the finite-sample properties of estimators and tests. Use the provided MATLAB code as a starting point for your homework assignment, and experiment with other N values to see how quickly the asymptotic distribution becomes a good approximation.