*** Illustration of Heckman selection model *** y = final honours school grade, x1 = entrance exam score, x2 = iq, u_1 & u_2 are two correlated random variables clear set seed 156 matrix m=(0,0) matrix c=(1, .5 \.5, 1) drawnorm u_1 u_2, n(19000) means(m) corr(c) gen x1=uniform() gen x2=uniform() *** generate a latent propensity to be selected variable gen zstar = -5+2*x1+2*x2 +3*u_2 *** generate a selection criterion variable gen zobs =(zstar >0) *** generate FHS score according to rule gen y =2.5*x1 + 3*u_1 *** set y1 values = missing for those not selected gen y1=y replace y1=. if zobs==0 *** make an id number for the dataset egen id=fill(1,2) outfile id y y1 x1 x2 zobs using heckex.raw, replace clear infile id y y1 x1 x2 zobs using heckex.raw *** estimate FHS ent_exam score regression for parent population reg y x1 *** estimate FHS ent_exam score regression for selected cases reg y1 x1 *** This bit estimates the Heckman 2 stage model manually probit zobs x1 x2 predict p_hat, xb replace p_hat = -p_hat generate phi = (1/sqrt(2*_pi))*exp(-(p_hat^2/2)) generate capphi = normal(p_hat) generate invmills = phi/(1-capphi) reg y1 x1 invmills **** This bit estimates the Heckman 2 stage model using Stata's command heckman y1 x1, select(x1 x2) twostep *** This bit estimates the Heckman model by maximum likelihood heckman y1 x1, select(x1 x2)