// This is a plain vanilla program to do the various calculations that go into
// a simple correspondence analysis of a two-way table. It is nothing more than
// singular value decomposition of the matrix of standarized Pearson residuals
// plus a couple of calculations to get the coordinates for a bi-plot.
// In practice you'd use Stata's ca routine, but there is some value in seeing
// what is going on below the hood.
//
// Data are taken from Ch 6. of Michael Greenacre's Correspondence Analysis in
// Practice (CAIP)
// VG G R B VB
// 16-24 243 789 167 18 6
// 25-34 220 809 164 35 6
// 35-44 147 658 181 41 8
// 45-54 90 469 236 50 16
// 55-64 53 414 306 106 30
// 65-74 44 267 284 98 20
// 75+ 20 136 157 66 17
//
// Rows = agegroup, Columns = self-rating of health Very Good, Good, Reasonable,
// Bad, Very Bad
//
// I could make some remarks here about the unbelievably silly things some sociologists
// have said about correspondence analysis and its alleged special relationship
// with certain forms of sociology speak, but I won't.
// I don't claim any originality for any of this.
// The Mata program is a more or less straight port from Michael Greenacres R
// script on pp 219-20 of the 2nd ed. of CAIP. If you are proficient in R you'll
// have no need of it. But some of us still like Stata....
// enter mata
mata
// The program assumes that you have the data in memory
//
// read the data of raw frequences into matrix X simply adjust the next statement
// for a different size of table
X = st_data(., ("var1", "var2", "var3", "var4", "var5"))
// get the total n
n = sum(X)
// calculate matix P - contains cell probabilities
P = (1/n)*X
// get the column masses
c = colsum(P)
// get the row masses
r = rowsum(P)
// diagonalize reciprocal of the square root of c
D_cmh = diag(1:/sqrt(c))
D_cmh
// diagonalize reciprocal of square root of r
D_rmh = diag(1:/sqrt(r))
D_rmh
// calculate matrix of standardized residuals
S =D_rmh*(P-(r*c))*D_cmh
// define matrices for SVD U, V = eigenvectors D = singular values
// S = UDV'
U = D = V = J(0,0, .)
// singular value decomposition of S (standardized Pearson residuals)
svd(S, U, D, V)
Vt= V'
// print out eigenvectors & SV
U
D
Vt
//calculate standard coordinates of columns and rows
C_sc = D_cmh*Vt
R_sc = D_rmh*U
C_sc
R_sc
//calculate principal coordinates of columns and rows
C_pc = C_sc * diag(D)
R_pc = R_sc * diag(D)
C_pc
R_pc
end