LI Econometrics Stata Assignment Tutorial: Step-by-Step Guide

Introduction: Why Econometrics Matters in 2026

In today's data-driven world, econometrics is more relevant than ever. With the rise of AI-powered analytics and big data, understanding how to process and interpret real-world data is a critical skill. This tutorial focuses on the LI Econometrics (08 29172) Stata Assignment, which uses the Quarterly Labour Force Survey (April-June 2018) from the UK Data Service. You will learn to load data, create variables, run regressions, and interpret results—all while developing critical thinking about economic phenomena.

Getting Started: Loading Data and Defining the Sample

First, download the dataset lfsp_aj18_eul.dta from the UK Data Service. Open Stata and load the data:

use "lfsp_aj18_eul.dta", clear

Next, keep only individuals with positive gross weekly earnings and not currently working towards a qualification:

keep if GRSSWK > 0 & QULNOW == 2

Check the number of observations: it should be 9141. If not, review your steps.

Section A: Region and Earnings

1a. Histogram of Weekly Earnings

Plot the distribution of GRSSWK:

histogram GRSSWK, frequency

You'll likely see a right-skewed distribution, typical of income data—most people earn moderate amounts, with a long tail of high earners.

1b. Why Regional Differences?

Economic theory suggests regional earnings differences due to variations in cost of living, industry composition, labour demand, and agglomeration effects. For example, London often commands higher wages due to a concentration of high-paying finance and tech jobs.

1c. Creating Variables and Running Regression

Generate the log of earnings:

gen logearn = ln(GRSSWK)

Create age squared:

gen age2 = AGE^2

Create country dummies for England, Scotland, and Northern Ireland (Wales as baseline). Note that Scotland includes both "Scotland" and "Scotland North of Caledonian Canal":

gen England = (COUNTRY == 1)
gen Scotland = (COUNTRY == 2 | COUNTRY == 3)
gen NIreland = (COUNTRY == 4)

Estimate the regression:

reg logearn AGE age2 England Scotland NIreland

Present your results in a table (not raw Stata output). Discuss coefficients: age likely positive, age2 negative (concave age-earnings profile). England and Scotland likely have higher earnings than Wales, while Northern Ireland may be similar or lower.

1d. Hypothesis Tests

Use Stata's test command to compare coefficients. For example, test if England and Scotland have the same coefficient:

test England = Scotland

To do it by hand, estimate the restricted model without the England dummy (i.e., combine England and Scotland into one group) and calculate the F-statistic using the formula: F = ((RSS_r - RSS_ur) / q) / (RSS_ur / (n - k - 1)), where q is the number of restrictions (1), RSS_r and RSS_ur are residual sums of squares from restricted and unrestricted models.

1e. England Only: Regional Dummies

Keep only England residents:

keep if COUNTRY == 1

Check observations: 7616. Create dummies for English regions using URESMC. Use Merseyside as base. For example:

gen London = (URESMC == 1 | URESMC == 2)
... (repeat for other regions)

Estimate regression with age, age2, and region dummies. You'll likely find a significant London effect, with higher earnings relative to Merseyside.

Section A: Education

2a. Degree Dummy

Tabulate HIQUL15D and keep only valid responses:

tab HIQUL15D
keep if HIQUL15D >= 1 & HIQUL15D <= 4

Check observations: 7521. Create degree dummy:

gen degree = (HIQUL15D == 1)

Add this to your previous regression. The degree coefficient will likely be positive and significant. Compare with earlier results: region coefficients may shrink if education correlates with region (e.g., London has more graduates).

2b. Testing Regional Equality

Test if all region coefficients are equal:

test London = Outer_South_East = ... = Merseyside

Then exclude London and South East and test again. Expect that regional differences become smaller after removing the highest-earning areas.

Section B: Other Factors

3a. Choosing an Additional Dimension

Consult the codebook. A good choice is industry sector (variable IND07M) or gender (SEX). Economic theory suggests gender pay gaps and sectoral wage differentials. These dimensions likely vary across regions and by education.

Create dummy variables for your chosen dimension. For example, for gender:

gen female = (SEX == 2)

3b. Running Additional Regressions

Add your new variable(s) to the regression from part 2a. Present results in a second table. Discuss whether your theoretical predictions hold: e.g., a negative coefficient for female indicates a gender pay gap, and the gap may differ across regions.

Section B: Taking a Step Back

4a. Sample Selection and Regional Conclusions

By keeping only those with positive earnings, we exclude the unemployed and non-participants. This may bias our view of regional labour markets—regions with high unemployment may appear to have higher average earnings because only employed individuals are included.

4b. Region of Residence vs. Region of Birth

Using region of residence mixes non-movers and movers. Movers may self-select into high-wage regions, biasing coefficients. Consider using region of birth if available, or acknowledge this limitation.

Conclusion

This tutorial has equipped you with the Stata skills and econometric intuition needed for the LI Econometrics assignment. By following these steps, you'll be able to load data, create variables, estimate regressions, and interpret results with confidence. Remember to present your findings in clear tables and include your Stata code in an appendix. Good luck!