* generate predictions and confidence intervals and plots
* HSB data from Bryk&Raudenbush
*Thanks to Daniel Stegmueller for providing the code using the margins command!
*Sorry, but we have to use some matrix notation for this.
*A more detailed description can be found here: http://www.ats.ucla.edu/stat/stata/faq/mar_graph/margins_graph.htm
use http://www.ats.ucla.edu/stat/data/hsbdemo, clear
* fit multilevel model
xtmelogit honors female read || cid:, var mle
* predict values at holding covariates at: read from 30 to 70, female=1
* predict based on fixed effects only
*The command tells Stata to calculate the marginal effect for the reading score in the range from 30 to 70
*in steps of 5. That is for reading scores 30, 35, 40... 70. They are calculated for females only.
margins, at(read=(30(5)70) female=1) predict(mu fixedonly)
* save b (predicted vals)
*Stata saves the coefficients (betas) of these predictions in the matrix r(b).
*Remember that the effect depends on the level of reading scores. That's why we have several betas.
*The svmat command will put the betas into a 'normal' variable, which is called bfem.
mat B1 = r(b)'
svmat B1, names(bfem)
* get standard error (sqrt of diagonal of var.cov. mat)
*This command looks complicated, but effectively will do the following:
*Take the diagonal values of the variance-covariance matrix (vecdiag),
*write them into a diagonal matrix (diag)
*take the squareroot (cholesky)
*write them into a column vector (vecdiag)
*the svmat command will again create a normal variable from the matrix.
mat SE1 = vecdiag(cholesky(diag(vecdiag(r(V)))))'
svmat SE1, names(sefem)
* calc. 95% CI
*The confidence interval is created by subtracting or adding 1.96*SE to/from the beta
gen lofem = bfem - 1.96*sefem
gen hifem = bfem + 1.96*sefem
*Install the ado 'seq' first
net install seq, from(http://fmwww.bc.edu/RePEc/bocode/s)
* gen sequence from 1 to 9
*We wanted the marginal effects in the range from 30 to 70, in steps of 5. That is 9 data points.
*Here we tell Stata to create a variable that contains the values from 1 to 9.
seq xvals, f(1) t(9)
* plot (you have to add nice axis labels etc....)
sort xvals
twoway (line bfem1 xvals, lcolor(red) lpattern(solid)) ///
(line lofem xvals, lcolor(red) lpattern(dot)) ///
(line hifem xvals, lcolor(red) lpattern(dot)) in 1/9
*You can plot the CI also in a different way
*The rarea command will produce a confidence envelope. It has to be typed first, because the line will be overlaid.
sort xvals
twoway (rarea lofem hifem xvals) ///
(line bfem1 xvals), scheme(sj)
**Plotting an interaction effect**
**Here: Does the probability of gaining an honors degree differ by gender in public/private schools?**
*This works similar to the steps above.
*Center reading and math scores
center read math
*Running the model with a random slope for female and the interaction femaleXschooltype (schtyp = 1, public: schtyp = 2, private)
xtmelogit honors c_read c_math i.female##i.schtyp || cid: female, var mle cov(unstr)
*We use margins to get the predictions
*By specifying female#schtyp, we tell Stata to give us the predictions for all gender/schooltype combinations.
margins female#schtyp, predict(mu fixedonly)
/*
Predictive margins Number of obs = 200
Expression : Predicted mean, fixed portion only, predict(mu fixedonly)
-------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
female#schtyp |
0 1 | .2005989 .0564048 3.56 0.000 .0900475 .3111504
0 2 | .0346888 .0385771 0.90 0.369 -.0409209 .1102985
1 1 | .3096128 .0427405 7.24 0.000 .2258431 .3933826
1 2 | .3648489 .0880668 4.14 0.000 .1922412 .5374567
-------------------------------------------------------------------------------
The first two lines (0 1, 0 2) are the predicted probabilities for males (female=0) in schooltype 1 (public)
and schooltype 2 (private). The last two lines are the predictions for females.
Thus we find that 20% of males in public schools will have a honors degree, but only 3% in private schools.
Among females, 31% will have an honors in public schools, and 36% in private schools.
*/
* save b (predicted vals)
*We save the predictions (betas) in the matrix B2 and the variances in matrix V2
mat B2 = r(b)'
mat V2 = r(V)
*This is the content of the column vector B2
matrix list B2
* get standard error (sqrt of diagonal of var.cov. mat)
*Now we get the standard errors for each gender/schooltype combination.
mat SE2 = vecdiag(cholesky(diag(vecdiag(V2))))'
*In order to make a nice graph later, we should assign gender and schooltype to the predictions and SEs.
*We create a column vector F that is 0 for the first two combinations (female = 0) and 1 for the last two.
*And also a vector PR (Private). The second and the 4th predictions are for Private schools.
mat F=(0\0\1\1)
mat PR=(0\1\0\1)
*We combine the predictions, SEs, gender and schooltypes
mat BALL=B, SE2, F, PR
*And the new matrix BALL looks like this:
matrix list BALL
/*
BALL[4,4]
r1 r1 c1 c1
0.female#
1.schtyp .20059894 .05640485 0 0
0.female#
2.schtyp .03468879 .03857707 0 1
1.female#
1.schtyp .30961279 .04274045 1 0
1.female#
2.schtyp .36484894 .08806685 1 1
*/
*Finally we create the variables pred1 (predictions), pred2 (SE), pred3 (female) and pred4 (schooltype)
svmat BALL, names(pred)
list pred1 pred2 pred3 pred4 in 1/4
label var pred1 "Predictions"
label var pred2 "Standard Errors"
label var pred3 "female"
label var pred4 "schooltype"
label define pred3 0"Male" 1"Female", modify
label val pred3 pred3
label define pred4 0"Public" 1"Private", modify
label val pred4 pred4
tab pred1 pred3
tab pred1 pred4
*The following trick, to create bar charts with error bars, is described here:
*http://www.ats.ucla.edu/stat/stata/faq/barcap.htm
preserve
collapse pred1 pred2, by(pred3 pred4)
list pred1 pred2 pred3 pred4
* calc. 95% CI
gen lopred = pred1 - 1.96*pred2
gen hipred = pred1 + 1.96*pred2
*This will make a graph collapsed by schooltype and then gender
generate fem_priv = pred4 if pred3 == 0
replace fem_priv = pred4+3 if pred3 == 1
twoway (bar pred1 fem_priv if pred4==0) ///
(bar pred1 fem_priv if pred4==1) ///
(rcap hipred lopred fem_priv), ///
legend(row(1) order(1 "Public" 2 "Private")) ///
xlabel( 0.5 "Male" 3.5 "Female", noticks) ///
xtitle("Gender") ytitle("Percent honor degree")
*This will make a graph collapsed by gender and then schooltype
generate priv_fem = pred3 if pred4 == 0
replace priv_fem = pred3+3 if pred4 == 1
twoway (bar pred1 priv_fem if pred3==0) ///
(bar pred1 priv_fem if pred3==1) ///
(rcap hipred lopred priv_fem), ///
legend(row(1) order(1 "Male" 2 "Female")) ///
xlabel( 0.5 "Public" 3.5 "Private", noticks) ///
xtitle("Schooltype") ytitle("Percent honor degree")
restore
**Plotting a dummy/continous interaction**
*Does the school program (general/academic/vocation) matter for being awarded a honors degree for students with
*high/low social science achievement (continous)?
*Please read also here: http://www.ats.ucla.edu/stat/stata/faq/mar_graph/margins_graph.htm
center socst
xtmelogit honors c_read c_math i.prog##c.c_socst female || cid: c_socst, cov(unstr) var mle
margins prog, at(c_socst=(-25(5)20)) predict(mu fixedonly)
*We get 30 individual data points and save them in matrix b
matrix b=r(b)'
matrix list b
*We also need the standard errors. We create them from the variances in r(V) and save them in the vector se.
mat se = vecdiag(cholesky(diag(vecdiag(r(V)))))'
*The social science score has 10 levels. From the minimum score of -25 in steps of 5 to +20
matrix at=r(at)
matrix list at
*We need a matrix that has a datapoint for the 3 program types and 10 levels of social science score
matrix at=at[1...,"c_socst"]#(1\1\1) /* # is the kronecker operator */
matrix list at
*We will put the at, b and se matrices together into matrix p and save the values to data.
matrix p=at,b, se
matrix list p
*The svmat command creates the variables p1, p2 and p3
*p1 contains the levels of social science score (-25 - +20), variable p2 the predictions, p3 the SEs
svmat p, names(p)
*Because we have 3 program types, we need a column to identify them
*We use the seq command (you might have to download the ado seq for this) in order to create
*a sequence of 1, 2, 3 in our matrix
seq p4 in 1/30, f(1) t(3)
clist p1-p4 in 1/30
*Create confidence intervals
capture drop pred_hi pred_lo
gen pred_hi=p2+p3*1.96
gen pred_lo=p2-p3*1.96
*Finally we can graph the probabilities over social science score for the different program types.
*First without confidence intervals:
twoway (line p2 p1 if p4==1, legend(label(1 "General"))) ///
(line p2 p1 if p4==2, legend(label(2 "Academic"))) ///
(line p2 p1 if p4==3, legend(label(3 "Vocation"))), ///
xtitle("Social Science Score (centered)") ytitle("Percent Honors") scheme(sj)
*And here are the probabilities for the general and the vocational program, including c.i.
*Sorry, you have to experiment a bit with the colors, etc.
*In any case, the confidence intervals overlap.
twoway (line pred_hi p1 if p4==1, lpattern(dash) lwidth(vvthin) lcolor(orange_red)) ///
(line pred_lo p1 if p4==1, lpattern(dash) lwidth(vvthin) lcolor(orange_red)) ///
(line p2 p1 if p4==1, lpattern(solid) lcolor(red)) ///
(line pred_hi p1 if p4==3, lpattern(dot) lcolor(bluishgrey)) ///
(line pred_lo p1 if p4==3, lpattern(dot) lcolor(bluishgrey)) ///
(line p2 p1 if p4==3, lpattern(solid) lcolor(blue)), ///
legend(rows(1) order(3 "General" 6 "Vocational")) ///
xtitle("Social Science Score (centered)") ytitle("Percent Honors") scheme(sj)