Problem set to Lecture 4
Estimate returns to education
Data
This exercise uses public data from Oreopoulos (2006) that can be downloaded from https://www.openicpsr.org/openicpsr/project/116082/version/V1/view. You can download the replication package, along with the data, after registering with your university email address and accepting the terms and conditions. The ZIP file contains country-specific DTA files used to run the analysis. You will use uk/combined-general-household-survey.dta in this exercise.
The objective is to replicate and analyse the results presented in Tables 1 and 2 in Oreopoulos (2006)1.
Prepare variables necessary for the estimation
- age in 1947 (note that year of birth coded 30 means 1930)
- affected by ROSLA (aged 14 in 1947)
- log earnings (
earnvariable) - drop observations from Northern Ireland, with missing earnings, born before 1921 or after 1951, or aged 65+
Estimate the following system of equations using IV regression
where
is log earnings, is age left education, is indicator variable equal to 1 if individual is affected by ROSLA and is quartic polynomial in year of birth and age. Report the first-stage, reduced-form and IV estimates. What do the results imply about returns to a year of schooling?How do the results change with different specifications of
(for example, cubic or quadratic polynomials, dummy variables for 5-year age or birth cohort groups, inclusion of gender indicator, etc.)?Discuss possible violations of the identification assumptions necessary for IV.
Suggest alternative ways to estimate returns to education that mitigate the issues above. Explain how they would help improve the estimates.
References
Footnotes
The analysis reported in the paper is slightly different than the one you need to do in this problem set. Therefore, the estimates will not be identical, but should be approximately similar.↩︎