* first read the data set - as in Powers & Xie Table 2.2 pp 26 * you will have to change this depending on where you store the data use "E:\px_ch2_gls.dta", clear *First make the empirical probabilities - reasonable if n is large & p is small gen p=y/n * Make the weights for the FGLS and poisson models * FGLS weight gen wt_fgls = (n*p)/(1-p) * poisson weight (actually I don't use this) gen wt_poiss = y * OLS model yvar is ln(p) gen log_p = ln(p) reg log_p a2 a3 p2 * FGLS model - remember se's must be rescaled by 1/sqrt(mse) * first make column of 1s for the constant gen c = 1 * transform the constant gen c_t = c*sqrt(wt_fgls) * then transform other variables gen log_p_t = log_p*sqrt(wt_fgls) gen a2_t = a2*sqrt(wt_fgls) gen a3_t = a3*sqrt(wt_fgls) gen p2_t = p2*sqrt(wt_fgls) reg log_p_t c_t a2_t a3_t p2_t, noconstant * rescale the se's matrix V = vecdiag(e(V)) matrix list V matrix V_0 = diag(V) matrix list V_0 matrix SE = cholesky(V_0) matrix list SE matrix SE_fgls = (1/e(rmse))*SE * and here are the "correct" standard errors matrix list SE_fgls * and now let's do all this with MATA * first the OLS estimation mata // define y as a column vector and get data from STATA y = st_data(., "log_p") // define X as a 6 x 4 matrix (not forgetting the constant in the first column) x = st_data(., ("c", "a2", "a3", "p2")) // generate tx as x' - transpose of x tx=x' // generate x'y crossproducts matrix txy= tx*y // generate crossproducts for x - the predictor variables txx=tx*x // generate the inverse of txx itxx=invsym(txx) itxx // generate ols estimated constant and slope coefficients coefficients b_hat=itxx*txy // generate estimated residuals e_hat = y - x * b_hat // calculate the estimated standard errors - first the variances and then take the square root of the diagonal entries s2 = (1 / (rows(x) - cols(x))) * (e_hat' * e_hat) V = s2 * itxx se_hat = sqrt(diagonal(V)) // print out the coefficients and standard errors b_hat se_hat /**leave mata**/ end clear mata * now do the FGLS estimation mata // define y as a column vector and get data from STATA y = st_data(., "log_p_t") // define X as a 6 x 4 matrix (not forgetting the constant in the first column) x = st_data(., ("c_t", "a2_t", "a3_t", "p2_t")) // generate tx as x' - transpose of x tx=x' // generate x'y crossproducts matrix txy= tx*y // generate crossproducts for x - the predictor variables txx=tx*x // generate the inverse of txx itxx=invsym(txx) itxx // generate ols estimated constant and slope coefficients coefficients b_hat=itxx*txy // generate estimated residuals e_hat = y - x * b_hat // calculate the estimated standard errors - first the variances and then take the square root of the diagonal entries s2 = (1 / (rows(x) - cols(x))) * (e_hat' * e_hat) V = s2 * itxx se = sqrt(diagonal(V)) // calculate the square root of the residual mean square error RSS=e_hat'*e_hat RSS RMSE=sqrt(RSS/(6-4)) // divide the standard errors by the reciprocal of the root mean square errot se_hat=(1/RMSE)*se // print out the coefficients and standard errors b_hat se_hat /**leave mata**/ end clear mata * finally here is how to generate the estimates in the final column of Powers & Xie's table 2.3 * remember to generate the exposure variable & use it as an offset in the GLM - basically you are conditioning on exposure gen ln_n = ln(n) glm y a2 a3 p2, family(poisson) offset(ln_n)