AIC example

Example from Faraway (2015).

options(digits = 3)
library(faraway)
statedata <- data.frame(state.x77, row.names = state.abb)
head(statedata)
##    Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
## AL       3615   3624        2.1     69.0   15.1    41.3    20  50708
## AK        365   6315        1.5     69.3   11.3    66.7   152 566432
## AZ       2212   4530        1.8     70.5    7.8    58.1    15 113417
## AR       2110   3378        1.9     70.7   10.1    39.9    65  51945
## CA      21198   5114        1.1     71.7   10.3    62.6    20 156361
## CO       2541   4884        0.7     72.1    6.8    63.9   166 103766

We take life expectancy (Life.Exp) as the response and the remaining variables as predictors.

lmod <- lm(Life.Exp ~ ., statedata)
summary(lmod)
## 
## Call:
## lm(formula = Life.Exp ~ ., data = statedata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4890 -0.5123 -0.0275  0.5700  1.4945 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.09e+01   1.75e+00   40.59  < 2e-16 ***
## Population   5.18e-05   2.92e-05    1.77    0.083 .  
## Income      -2.18e-05   2.44e-04   -0.09    0.929    
## Illiteracy   3.38e-02   3.66e-01    0.09    0.927    
## Murder      -3.01e-01   4.66e-02   -6.46  8.7e-08 ***
## HS.Grad      4.89e-02   2.33e-02    2.10    0.042 *  
## Frost       -5.74e-03   3.14e-03   -1.82    0.075 .  
## Area        -7.38e-08   1.67e-06   -0.04    0.965    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.745 on 42 degrees of freedom
## Multiple R-squared:  0.736,  Adjusted R-squared:  0.692 
## F-statistic: 16.7 on 7 and 42 DF,  p-value: 2.53e-10

The notation ~ . above means that the predictors are “all columns not otherwise in the formula”, i.e. in this case all columns except Life.Exp (see ?formula).

We seek the model which minimises AIC.

library(leaps)
b <- regsubsets(Life.Exp ~ ., data = statedata)
rs <- summary(b)
rs$which
##   (Intercept) Population Income Illiteracy Murder HS.Grad Frost  Area
## 1        TRUE      FALSE  FALSE      FALSE   TRUE   FALSE FALSE FALSE
## 2        TRUE      FALSE  FALSE      FALSE   TRUE    TRUE FALSE FALSE
## 3        TRUE      FALSE  FALSE      FALSE   TRUE    TRUE  TRUE FALSE
## 4        TRUE       TRUE  FALSE      FALSE   TRUE    TRUE  TRUE FALSE
## 5        TRUE       TRUE   TRUE      FALSE   TRUE    TRUE  TRUE FALSE
## 6        TRUE       TRUE   TRUE       TRUE   TRUE    TRUE  TRUE FALSE
## 7        TRUE       TRUE   TRUE       TRUE   TRUE    TRUE  TRUE  TRUE
AIC <- 50*log(rs$rss/50) + (2:8)*2
plot(AIC ~ I(1:7), ylab = "AIC", xlab = "Number of Predictors")

The minimum AIC is achieved by a model with 4 predictors, the model with the predictors Population, Murder, HS.Grad and Frost, plus an intercept.

What about BIC?

BIC <- 50*log(rs$rss/50) + (2:8)*log(50)
plot(BIC ~ I(1:7), ylab = "BIC", xlab = "Number of Predictors")

The minimum BIC is again achieved with 4 predictors plus an intercept (just) the same as for AIC.