* Here is some STATA code to obtain the OLS estimates for a simple regression problem with 3 predictors * You can run it as a do file (NB the file is called normalreg.txt) from the command window or cut and paste it into * the do-file window. * * The data are read from the STATA file simdata.dta which must be in memory * * In this example we do the same thing (more or less) in three different ways * * * First we handcrank it in STATA's matrix language MATA, where the objective boils down to * finding the elements of the 4x1 column vector b_hat. b_hat = INV(X'X)X'y * and the associated standard errors - contained in se_hat * * For didatic reasons, if anyone is interested, the MATA programme is broken down into an unecessarily large * number of steps. It is possible and in practice desirable to programme this more succinctly. * * It is absolutely not necessary to follow this for the purposes of this course. However if you already know some linear * algebra or want to learn some you might find it interesting/illuminating. * Two useful references: * Jacques Tacq, 1997 Multivariate Analysis Techniques in Social Science Research, Sage, pp. 388-400 * Daniel A. Powers and Yu Xie (2008) Statistical Methods for Categorical Data Analysis (2nd ed), Emerald, pp.269-275. * Being able to read matrix notation opens up the exposition of the more 'advanced' techniques to you. * Without it you will find that you can get so far- ie to where this course ends - and not much further. * * * Then we used STATA's standard 'black box' regression routine to do the same thing - this is what you would normally use * and is what you need to know for the primary purposes of this course. * * Finally we estimate the normal regression model by maximum-likelihood - the coefficients are identical to those * estimated by OLS but the root MSE and the estimated standard errors differ (very slightly). This is not so relevant for * week 1 and may even appear a little mysterious. Hopefully its relevance will become apparent in week 2. * * * use "I:\simdata.dta", clear //* obviously you should change this line to reflect where you are reading the data from //enter mata mata // define y as a column vector and get data from STATA y = st_data(., "score1") // define X as a 1000 x 4 matrix (not forgetting the constant in the first column) x = st_data(., ("constant", "ability", "hours", "female")) // generate tx as x' - transpose of x tx=x' // generate x'y crossproducts matrix txy= tx*y // generate crossproducts for x - the predictor variables txx=tx*x // generate the inverse of txx itxx=invsym(txx) itxx // generate ols estimated constant and slope coefficients coefficients b_hat=itxx*txy // generate estimated residuals e_hat = y - x * b_hat // calculate the estimated standard errors - first the variances and then take the square root of the diagonal entries s2 = (1 / (rows(x) - cols(x))) * (e_hat' * e_hat) V = s2 * itxx se_hat = sqrt(diagonal(V)) // print out the coefficients and standard errors b_hat se_hat /**leave mata**/ end * Here is the normal way you would do the same thing in STATA reg score1 ability hours female * Now do the estimation by maximum likelihood * * NB the standard errors will be (very) slightly different - they converge as n gets bigger * hence exp(lnsigma) is not exactly equal to the root MSE from STATA's standard routine program normalreg version 11.1 args lnf xb lnsigma local y "\$ML_y1" quietly replace `lnf' = ln(normalden(`y', `xb', exp(`lnsigma'))) end ml model lf normalreg (xb: score1 = ability hours female) (lnsigma:) ml maximize