Information about MLwiN 2.0 macros for checking model assumptions

Back to the MLn macro page of Tom Snijders

Back to the multilevel page of Tom Snijders

Last change in this description and these macros: March 19, 2007.

Several macros for checking assumptions of multilevel models using MlwiN 2.0 are available:

HET1.OBE for testing level-one heteroscedasticity,
RES1.OBE for calculating level-one OLS residuals,
DINFL.OBE for calculating level-two influence diagnostics similar to Cook's distance and standardized multivariate level-two residuals.

In addition, the set of macros includes
SMOOTH.OBE for smoothing Y as a function of X;
TABLE.OBE for calculating means and standard deviations of Y grouped for values of X.
These can be used for postprocessing calculated residuals (using residuals as Y ).

You can download a zipped file (which can be unzipped using PKUNZIP or WINZIP) containing these macros by clicking checks2.zip.

The theory behind these macros and the interpretation of their results is treated in Chapter 9 of Snijders and Bosker (1999) and in Snijders and Berkhof (2007). The procedures are according to Snijders and Berkhof (2007) and differ slightly from Snijders and Bosker (1999) because the parameter estimates used are based on the deletion principle: when assessing the effects of group j, the data for this group are taken out of the data used to estimate the parameters.
Examples of their use are in the introduction to MLwiN 2.0 given in file wexample20.pdf which can be obtained from the website connected to Snijders & Bosker (1999).

It is easy to use the macros in MLwiN 2.0. See to it that you have a log file attached, and that the command window and its output window are open if you are working in MLwiN. Take care that you have a model of the kind required for the specific macro you are using and that the columns used by the macro (as mentioned below) are not in use already. For macro HET1.OBE, specify boxes B20 and B21, e.g., by giving the commands
set B20 10
set B21 0
in the command window;
for the auxiliary macros SMOOTH.OBE and TABLE.OBE, specify boxes B1 and B2, as well as B21 (for SMOOTH.OBE) and B3 (for TABLE.OBE).
The macro then is executed by giving, e.g., the command
obey het1.obe
in the command window.

All these macros contain a header with a description of what the macro does and what it requires in terms of available variables, defined constants, and defined model. These descriptions are also reproduced here.

HET1.OBE is a macro for testing level-one heteroscedasticity: do all groups (level-two units) have the same level-one variance?
This macro uses test (9.2) of Snijders & Bosker (1999), which is the same as [9.6] in Bryk & Raudenbush (1992). The test statistic is called H.
The requirement for using this test is that a two-level model has been specified without level-two explanatory variables and with a level-one random part that consists only of the constant.
The box B20 must be set to a positive value: this is the lowest value for the within-group residual degrees of freedom ( $df$ _j in Snijders & Bosker) that is required to include a level-two unit (j) in the calculation. The advised value is B20 = 10 (unless the Monte Carlo option is used, see below). If B20 is undefined, the default value used is B20 = 10.
The chi-squared distribution for H is an approximation which is reasonable when all residual degrees of freedom are 10 or larger. When all or many groups have small degrees of freedom, the p-value can be approximated by Monte Carlo simulation (this may be time-consuming). The Monte Carlo approximation is valid also for small degrees of freedom, so for this approximation you may set B20 low, e.g., 5 or even 1.
If the box B21 is set to a value greater than 1, then B21 simulation runs are executed for this purpose. The advised value for B21 is 100 for getting a first impression, and 1000 for a more accurate approximation of the p-value. If B21 is undefined, no Monte Carlo simulations are carried out.

It is assumed that the variable with constant values 1 is available with the name "cons".
The macro uses (i.e., changes and/or destroys) some of columns C201 and higher in the worksheet and constants B1 to B25.
The macro produces (in the output window and the log file) the chi-squared test statistic H with degrees of freedom and its p-value. (If B21 = 0, only the large-sample approximate p-value; if B21 > 0, also the p-value approximated by Monte Carlo simulation).
For the groups with enough (i.e., B20 or more) degrees of freedom, the group numbers, degrees of freedom, residual variances, and standardized dispersion values $d$ _j (see formula (9.3) of Snijders & Bosker or [9.5] of Bryk & Raudenbush) are made available in columns C201 - C204.
If boxes B20 and/or B21 are undefined, or if B20 = 0, then the default values B20 = 10 and B21 = 0 are used.
RES1.OBE is a macro which calculates level-one OLS residuals, as discussed in section 9.5 of Snijders & Bosker (1999). The requirement for using this macro is that a two-level model has been specified without level-two explanatory variables, and of which the random part at level one consists of only the constant.
It is assumed that the variable with constant values 1 is available with the name "cons".
The macro uses (i.e., changes and/or destroys) some of columns C201 and higher in the worksheet and constants B1 to B18.
The macro produces (in the output window and the log file) the following variables:
For groups with residual d.f. $>=$ 10 (and therefore only for a subset of the data if there exist any groups with less than 10 within-group residual degrees of freedom):
- C221: level-1 identifier,
- C222: level 2 identifier,
- C223: dependent variable,
- C225: within-group residual degrees of freedom,
- C226: level-one residuals,
- C228: standardised residuals $×$ sigma,
- group G4: explanatory variables.
  In MLn, you can use command print G4 to see the columns in G4; in MLwiN, you can use the Groups Window to see what are the columns in G4.
To smooth the residuals as a function of some (explanatory) variable, you can use the macro SMOOTH.OBE (for a continuous explanatory variable with really many different values) or TABLE.OBE (for an explanatory variable with a limited number of categories). These macros are described below.
It is advised to plot smoothed residuals (C226) as a function of relevant level-one explanatory variables to investigate the shape of the effect of these variables;
and to plot smoothed squared standardised residuals (C228) as a function of relevant level-one explanatory variables to investigate possible level-one heteroscedasticity as a function of these variables.
SMOOTH.OBE smoothes Y as a function of X.
Column numbers of X and Y must be given as boxes B1 and B2. The smoother calculates a simple moving average of values Y(i)...Y(i + 2×B21), where Y(i) is ordered according to X. Thus, the box B21 must be set to a positive integer, e.g., 20 or 50.
Columns C398 and C399 are used as temporary variables.
The output is:
for all data
- C221: ordered X values,
- C222: corresponding original Y values,
- C223: corresponding smoothed Y values (but because of the smoothing the B21 first and last values in C223 are meaningless),
and for the data from which the meaningless rows (i.e., B21 first and last values) are deleted:
- C231: ordered X values,
- C232: corresponding original Y values,
- C233: corresponding smoothed Y values.
Variables C231 - C233 are the ones that are to be used for plotting, C221 - C223 are for reference purposes only.
TABLE.OBE is a macro that calculates means and standard deviations of variable Y grouped for values of variable X.
Column numbers of X and Y must be given as boxes B1 and B2.
Box B3 is the minimum number of cases per category. This must be 1 or more.
It is assumed that a variable "cons" is available of at least the same column length as X and Y.
X-values are rounded to the nearest integer.
Columns C391-C397 are used as temporary variables.

The output is:
- C211: rounded X-value,
- C212: number of cases,
- C213: average of Y for these cases,
- C214: standard deviation of Y for these cases,
- C215: standard error of the mean of Y for these cases.
DINFL.OBE calculates level-two influence diagnostics similar to Cook's distance, and the standardized multivariate residual, defined in Snijders and Berkhof (2007). The predicted values and covariance matrices are determined after taking one iteration note step of the estimation algorithm for data from which group j was deleted. Deleting this group provides a better measure of influence. Further, level-one standardized residuals are calculated based on level-two unit deletion. If the model definition is correct and the number of groups is reasonably large, then the standardized multivariate residual has a chi-squared distribution.
This macro presupposes a two-level model, and the random part may be arbitrary.
It is assumed that the usual unit column "cons" exists.
The columns C177-C209 and also C301 and higher will be used and overwritten.

The macros produces the following columns (the names of which correspond to similar names used in Snijders & Bosker, 1999):
- C204: group identifier (unit2_j),
- C205: group size (n_j),
- C201: influence on random part parameters (CR_j),
- C202: influence on fixed parameters (CF_j),
- C200: combined influence diagnostic (C_j),
- C203: standardized multivariate residual (S2_j)
  (C203 is approximately chi squared, d.f. in C205),
- C206: p-values for C203,
- C207: observed normal deviates for C206,
- C208: expected normal deviates for C206,
- C209: standardized level-1 group-deletion residuals (res_1).
The macro produces in the log file and the output window:
- an omnibus fit test based on the sum of the standardized multivariate residuals,
- an index plot of the influence statistics (C200 versus C204),
- a plot of the level-1 residuals as function of group identifier,
- a plot of influence as a function of group sizes (C200 versus C205),
- a plot of influence as a function of significance of group residuals (C200 versus C206),
- and a probability plot for the level-two residuals (C207 versus C208).
It is advised to inspect the lowest values of C206 with the corresponding C204 and C200 values (possible also C201 and C202 separately) to investigate which are the poorest fitting groups, how poor their fit is, and how large is their influence on the parameter estimates.

You can download a zipped file (which can be unzipped using PKUNZIP or WINZIP) containing these macros by clicking checks2.zip.