********************************************************************** ********************************************************************** **** **** **** Chapter 14 -- Survey weights **** **** Stata do file to reproduce examples **** **** in Chapter 14 of **** **** Snijders, Tom A.B., and Bosker, Roel J. **** **** Multilevel Analysis: **** **** An Introduction to Basic and Advanced Multilevel Modeling,**** **** second edition **** **** London etc.: Sage Publishers, 2012 **** **** **** **** Contributed by Jon Fahlander **** **** jon.fahlander (at) nuffield.ox.ac.uk **** **** **** ********************************************************************** ********************************************************************** *Documentation of the dataset used: *http://www.oecd.org/dataoecd/0/47/42025182.pdf clear set more off, permanently infile SCHOOLID StIDStd STRATUM W_FSTUWT grade sex AGE IMMIG ESCS STUDREL JOYREAD METASUM UNDREM private urban SCHSIZE SELSCH STRATIO W_FSCHWT using "/Users/Jon/Documents/Snijders/SecondEditionExamples/DataSnijdersSecondEdition/pisacc/pisacc.dat" in 2/5234, clear sum _all inspect *Observe that the NA's in the textfile are transformed into .'s. *Attach value labels label define stratumLab 1 "USA: Private Midwest" 2 "USA: Private Northeast" 3 "USA: Private South" 4 "USA: Private West" 5 "USA: Public Midwest" 6 "USA: Public Northeast" 7 "USA: Public South" 8 "USA: Public West" label values STRATUM stratumLab *Table 14.1 Students table STRATUM *Table 14.1 Schools preserve collapse STRATUM , by(SCHOOLID) label values STRATUM stratumLab table STRATUM restore *Calculate the school average weight bysort SCHOOLID: egen gmeanW_FSTUWT=mean(W_FSTUWT) *Equation 14.18 gen w1ij= W_FSTUWT/gmeanW_FSTUWT * This has within-school mean 1. hist w1ij sum w1ij *Equation 14.19 gen ww = w1*W_FSCHWT egen avww = mean(ww) gen w2j=W_FSCHWT/avww * This has mean 1 over the whole data set. hist w2j mean w2j * More information about its distribution: centile w2j, centile(0, 100, 10) *Number of groups egen id2=group(SCHOOLID) sum id2 gen w1ijXw1ij=w1ij*w1ij gen sumwXsumw=. gen sum_wXw=. gen Nschool=. sum id2 display r(max) foreach i of num 1/`r(max)' { sum w1ij if id2==`i' *ereturn list replace sumwXsumw=r(sum)^2 if id2==`i' sum w1ijXw1ij if id2==`i' replace sum_wXw=r(sum) if id2==`i' replace Nschool= r(N) if id2==`i' } gen Neffj= sumwXsumw/sum_wXw gen designEff= Neffj/Nschool * p. 232: All level-1 design effects are between .95 and 1. sum designEff hist designEff * How do level-1 design effects covary with sample sizes per school: preserve collapse designEff Nschool, by(SCHOOLID) scatter designEff Nschool restore * The design effect on level 1 in different substrata matrix DE= J(1,4,.) forvalue i = 5/8{ preserve collapse w2j STRATUM, by(SCHOOLID) gen w2jXw2j= w2j* w2j if STRATUM==`i' sum w2j if STRATUM==`i' local rsum2=r(sum)^2 display `rsum2' sum w2jXw2j if STRATUM==`i' local r2sum= r(sum) display `r2sum' display "Neff is " `rsum2'/`r2sum' sum w2j if STRATUM==`i' local DF=(`rsum2'/`r2sum')/`r(N)' display "Design effect is " (`rsum2'/`r2sum')/`r(N)' matrix DE[1,`i'-4]=(`rsum2'/`r2sum')/`r(N)' restore } matrix list DE *Table 14.3 *Keep schools in the stratum "USA: Public South" keep if STRATUM==7 table urban recode urban (1/2=1 "City") (3/4=2 "Town" ) (5=3 "Village"), gen(newUrb) tabulate newUrb, gen(newUrbD) *Move to school level, unweighted estimate bysort id2: keep if _n==1 // This keeps the first observation in each school and discards the rest sum newUrbD* *mean and sd of the estimate *the mean sum newUrbD1 *and sd... display "SD of the city proportion estimate is " ((r(mean)*(1- r(mean)))/(55-1))^(1/2) sum newUrbD2 display "SD of the town proportion estimate is " ((r(mean)*(1- r(mean)))/(55-1))^(1/2) sum newUrbD3 display "SD of the village proportion estimate is " ((r(mean)*(1- r(mean)))/(55-1))^(1/2) *The weighted estimates of proportions (equation 14.2) gen Village_wXy=w2j*newUrbD3 sum Village_wXy local sVwXy =r(sum) gen Town_wXy=w2j*newUrbD2 sum Town_wXy local sum_Town_wXy =r(sum) gen City_wXy=w2j*newUrbD1 sum City_wXy local sum_City_wXy =r(sum) sum w2j local dP=r(sum) display "weighted estimate of village proportion is " `sVwXy'/`dP' scalar wvp= `sVwXy'/`dP' display "weighted estimate of town proportion is " `sum_Town_wXy'/ `dP' scalar wtp= `sum_Town_wXy'/ `dP' display "weighted estimate of city proportion is " `sum_City_wXy'/`dP' scalar wcp= `sum_City_wXy'/`dP' *The sd of the weighted proportions (eq 14.3) *Village sum newUrbD1 gen sd_nominator = w2j^2*(newUrbD1-wvp)^2 sum sd_nominator local sum_sd_nom=r(sum) sum w2j local sum_sd_denom=r(sum) display "sd of the weighted estimate of village prop is " sqrt((`sum_sd_nom' /(`sum_sd_denom')^2)) *Town sum newUrbD2 gen sd_nominatorT = w2j^2*(newUrbD2-wtp)^2 sum sd_nominatorT local sum_sd_nomT=r(sum) sum w2j local sum_sd_denomT=r(sum) display "sd of the weighted estimate of town prop is " sqrt((`sum_sd_nomT' /(`sum_sd_denomT')^2)) *City sum newUrbD3 gen sd_nominatorC = w2j^2*(newUrbD3-wcp)^2 sum sd_nominatorC local sum_sd_nomC=r(sum) sum w2j local sum_sd_denomC=r(sum) display "sd of the weighted estimate of city prop is " sqrt((`sum_sd_nomC' /(`sum_sd_denomC')^2)) * How strongly are inverse school weights related to school sizes? gen InvSchW= 1/w2j scatter InvSchW SCHSIZE ################################################################### #### Model-based analysis of five parts:Section 14.4.2 #### ################################################################### *Load and recalculate the weights as above clear set more off, permanently infile SCHOOLID StIDStd STRATUM W_FSTUWT grade sex AGE IMMIG ESCS STUDREL JOYREAD METASUM UNDREM private urban SCHSIZE SELSCH STRATIO W_FSCHWT using "/Users/Jon/Documents/Snijders/SecondEditionExamples/DataSnijdersSecondEdition/pisacc/pisacc.dat" in 2/5234, clear bysort SCHOOLID: egen gmeanW_FSTUWT=mean(W_FSTUWT) gen w1ij= W_FSTUWT/gmeanW_FSTUWT gen ww = w1*W_FSCHWT egen avww = mean(ww) gen w2j=W_FSCHWT/avww *Create five parts of the dataset gen w2j_5=. centile w2j, centile(0, 25,50,75) replace w2j_5 =1 if w2j<.3291332 replace w2j_5 =2 if w2j>.3291332 & w2j<.5279652 replace w2j_5 =3 if w2j>.5279652 & w2j<.9632865 replace w2j_5 =5 if w2j>.9632865 replace w2j_5 =0 if private==1 tabulate w2j_5, gen(groupD) bysort SCHOOLID: egen gmeanIMMIG=mean(IMMIG) bysort SCHOOLID: egen gmeanESCS =mean(ESCS) *Table 14.4 xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS || SCHOOLID: if groupD1==1 xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS || SCHOOLID: if groupD2==1 xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS || SCHOOLID: if groupD3==1 xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS || SCHOOLID: if groupD4==1 xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS || SCHOOLID: if groupD5==1 *Table 14.5 table urban tabulate urban, gen(urbD) recode urban (1/2=1 "City") (3/4=2 "Town" ) (5=3 "Village"), gen(newUrb) tabulate newUrb, gen(newUrbD) xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS urbD1 urbD2 || SCHOOLID: if groupD1==1 , var xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS urbD1 urbD2 urbD3 || SCHOOLID: if groupD2==1 , var xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS urbD1 urbD2 urbD3 || SCHOOLID: if groupD3==1 , var xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS urbD1 urbD2 urbD3 urbD4|| SCHOOLID: if groupD4==1 , var xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS urbD1 urbD2 urbD3 urbD4|| SCHOOLID: if groupD5==1 , var ************************ **** Example 14.1 **** ************************ *The inclusion of probability weigths in Stata *(Table 14.6) *With weights on both levels set more off xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS urbD1 urbD2 urbD3 urbD4 [pw= w1ij] || SCHOOLID: , pweight(w2j) var *Without weights xtmixed METASUM grade sex AGE IMMIG ESCS gmeanIMMIG gmeanESCS urbD1 urbD2 urbD3 urbD4 || SCHOOLID: , var est store mod2 est table mod1 mod2, star capture log close