Programming lesson
Mastering Econometrics with Stata: A Step-by-Step Tutorial for the LI Econometrics Assignment
This tutorial guides you through key Stata commands and econometric concepts needed for the LI Econometrics assignment, using the Quarterly Labour Force Survey data to explore earnings determinants.
Introduction: Why Econometrics Matters in 2026
In today's data-driven world, econometrics is more relevant than ever. With the rise of AI-powered analytics and big data, understanding how to process and interpret real-world data is a critical skill. This tutorial focuses on the LI Econometrics (08 29172) Stata Assignment, which uses the Quarterly Labour Force Survey (April-June 2018) from the UK Data Service. You will learn to load data, create variables, run regressions, and interpret results—all while developing critical thinking about economic phenomena.
Getting Started: Loading Data and Defining the Sample
First, download the dataset lfsp_aj18_eul.dta from the UK Data Service. Open Stata and load the data:
use "lfsp_aj18_eul.dta", clearNext, keep only individuals with positive gross weekly earnings and not currently working towards a qualification:
keep if GRSSWK > 0 & QULNOW == 2Check the number of observations: it should be 9141. If not, review your steps.
Section A: Region and Earnings
1a. Histogram of Weekly Earnings
Plot the distribution of GRSSWK:
histogram GRSSWK, frequencyYou'll likely see a right-skewed distribution, typical of income data—most people earn moderate amounts, with a long tail of high earners.
1b. Why Regional Differences?
Economic theory suggests regional earnings differences due to variations in cost of living, industry composition, labour demand, and agglomeration effects. For example, London often commands higher wages due to a concentration of high-paying finance and tech jobs.
1c. Creating Variables and Running Regression
Generate the log of earnings:
gen logearn = ln(GRSSWK)Create age squared:
gen age2 = AGE^2Create country dummies for England, Scotland, and Northern Ireland (Wales as baseline). Note that Scotland includes both "Scotland" and "Scotland North of Caledonian Canal":
gen England = (COUNTRY == 1)
gen Scotland = (COUNTRY == 2 | COUNTRY == 3)
gen NIreland = (COUNTRY == 4)Estimate the regression:
reg logearn AGE age2 England Scotland NIrelandPresent your results in a table (not raw Stata output). Discuss coefficients: age likely positive, age2 negative (concave age-earnings profile). England and Scotland likely have higher earnings than Wales, while Northern Ireland may be similar or lower.
1d. Hypothesis Tests
Use Stata's test command to compare coefficients. For example, test if England and Scotland have the same coefficient:
test England = ScotlandTo do it by hand, estimate the restricted model without the England dummy (i.e., combine England and Scotland into one group) and calculate the F-statistic using the formula: F = ((RSS_r - RSS_ur) / q) / (RSS_ur / (n - k - 1)), where q is the number of restrictions (1), RSS_r and RSS_ur are residual sums of squares from restricted and unrestricted models.
1e. England Only: Regional Dummies
Keep only England residents:
keep if COUNTRY == 1Check observations: 7616. Create dummies for English regions using URESMC. Use Merseyside as base. For example:
gen London = (URESMC == 1 | URESMC == 2)
... (repeat for other regions)Estimate regression with age, age2, and region dummies. You'll likely find a significant London effect, with higher earnings relative to Merseyside.
Section A: Education
2a. Degree Dummy
Tabulate HIQUL15D and keep only valid responses:
tab HIQUL15D
keep if HIQUL15D >= 1 & HIQUL15D <= 4Check observations: 7521. Create degree dummy:
gen degree = (HIQUL15D == 1)Add this to your previous regression. The degree coefficient will likely be positive and significant. Compare with earlier results: region coefficients may shrink if education correlates with region (e.g., London has more graduates).
2b. Testing Regional Equality
Test if all region coefficients are equal:
test London = Outer_South_East = ... = MerseysideThen exclude London and South East and test again. Expect that regional differences become smaller after removing the highest-earning areas.
Section B: Other Factors
3a. Choosing an Additional Dimension
Consult the codebook. A good choice is industry sector (variable IND07M) or gender (SEX). Economic theory suggests gender pay gaps and sectoral wage differentials. These dimensions likely vary across regions and by education.
Create dummy variables for your chosen dimension. For example, for gender:
gen female = (SEX == 2)3b. Running Additional Regressions
Add your new variable(s) to the regression from part 2a. Present results in a second table. Discuss whether your theoretical predictions hold: e.g., a negative coefficient for female indicates a gender pay gap, and the gap may differ across regions.
Section B: Taking a Step Back
4a. Sample Selection and Regional Conclusions
By keeping only those with positive earnings, we exclude the unemployed and non-participants. This may bias our view of regional labour markets—regions with high unemployment may appear to have higher average earnings because only employed individuals are included.
4b. Region of Residence vs. Region of Birth
Using region of residence mixes non-movers and movers. Movers may self-select into high-wage regions, biasing coefficients. Consider using region of birth if available, or acknowledge this limitation.
Conclusion
This tutorial has equipped you with the Stata skills and econometric intuition needed for the LI Econometrics assignment. By following these steps, you'll be able to load data, create variables, estimate regressions, and interpret results with confidence. Remember to present your findings in clear tables and include your Stata code in an appendix. Good luck!