* Here is some STATA code to obtain the OLS estimates for a simple regression problem with 3 predictors
* You can run it as a do file (NB the file is called normalreg.txt) from the command window or cut and paste it into
* the do-file window.
*
* The data are read from the STATA file simdata.dta which must be in memory
*
* In this example we do the same thing (more or less) in three different ways
*
*
* First we handcrank it in STATA's matrix language MATA, where the objective boils down to
* finding the elements of the 4x1 column vector b_hat. b_hat = INV(X'X)X'y
* and the associated standard errors - contained in se_hat
*
* For didatic reasons, if anyone is interested, the MATA programme is broken down into an unecessarily large
* number of steps. It is possible and in practice desirable to programme this more succinctly.
*
* It is absolutely not necessary to follow this for the purposes of this course. However if you already know some linear
* algebra or want to learn some you might find it interesting/illuminating.
* Two useful references:
* Jacques Tacq, 1997 Multivariate Analysis Techniques in Social Science Research, Sage, pp. 388-400
* Daniel A. Powers and Yu Xie (2008) Statistical Methods for Categorical Data Analysis (2nd ed), Emerald, pp.269-275.
* Being able to read matrix notation opens up the exposition of the more 'advanced' techniques to you.
* Without it you will find that you can get so far- ie to where this course ends - and not much further.
*
*
* Then we used STATA's standard 'black box' regression routine to do the same thing - this is what you would normally use
* and is what you need to know for the primary purposes of this course.
*
* Finally we estimate the normal regression model by maximum-likelihood - the coefficients are identical to those
* estimated by OLS but the root MSE and the estimated standard errors differ (very slightly). This is not so relevant for
* week 1 and may even appear a little mysterious. Hopefully its relevance will become apparent in week 2.
*
*
*
use "I:\simdata.dta", clear //* obviously you should change this line to reflect where you are reading the data from
//enter mata
mata
// define y as a column vector and get data from STATA
y = st_data(., "score1")
// define X as a 1000 x 4 matrix (not forgetting the constant in the first column)
x = st_data(., ("constant", "ability", "hours", "female"))
// generate tx as x' - transpose of x
tx=x'
// generate x'y crossproducts matrix
txy= tx*y
// generate crossproducts for x - the predictor variables
txx=tx*x
// generate the inverse of txx
itxx=invsym(txx)
itxx
// generate ols estimated constant and slope coefficients coefficients
b_hat=itxx*txy
// generate estimated residuals
e_hat = y - x * b_hat
// calculate the estimated standard errors - first the variances and then take the square root of the diagonal entries
s2 = (1 / (rows(x) - cols(x))) * (e_hat' * e_hat)
V = s2 * itxx
se_hat = sqrt(diagonal(V))
// print out the coefficients and standard errors
b_hat
se_hat
/**leave mata**/
end
* Here is the normal way you would do the same thing in STATA
reg score1 ability hours female
* Now do the estimation by maximum likelihood
*
* NB the standard errors will be (very) slightly different - they converge as n gets bigger
* hence exp(lnsigma) is not exactly equal to the root MSE from STATA's standard routine
program normalreg
version 11.1
args lnf xb lnsigma
local y "$ML_y1"
quietly replace `lnf' = ln(normalden(`y', `xb', exp(`lnsigma')))
end
ml model lf normalreg (xb: score1 = ability hours female) (lnsigma:)
ml maximize