Suppose we model the product sales as a function of the TV, radio and newspaper advertising budgets:

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \epsilon\]

where \(y=\) `sales`

, \(x_1=\) `TV`

, \(x_2=\) `radio`

, \(x_3=\) `newspaper`

.

That is,

\[{\tt sales} = \beta_0 + \beta_1{\tt TV} + \beta_2{\tt radio} + \beta_3{\tt newspaper} + \epsilon.\]

```
adv <- read.csv("http://www.stats.ox.ac.uk/~laws/LMs/data/advert.csv")
adv.lm <- lm(sales ~ TV + radio + newspaper, data = adv)
```

Note: R automatically includes an intercept.

To explcitly include an intercept \(\beta_0\), use: `lm(sales ~ 1 + TV + radio + newspaper, data = adv)`

To explicitly exclude an intercept, use: `lm(sales ~ -1 + TV + radio + newspaper, data = adv)`

Normally we will want to include an intercept.

```
options(digits = 3)
summary(adv.lm)
```

```
##
## Call:
## lm(formula = sales ~ TV + radio + newspaper, data = adv)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.828 -0.891 0.242 1.189 2.829
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.93889 0.31191 9.42 <2e-16 ***
## TV 0.04576 0.00139 32.81 <2e-16 ***
## radio 0.18853 0.00861 21.89 <2e-16 ***
## newspaper -0.00104 0.00587 -0.18 0.86
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.69 on 196 degrees of freedom
## Multiple R-squared: 0.897, Adjusted R-squared: 0.896
## F-statistic: 570 on 3 and 196 DF, p-value: <2e-16
```

Interpreting the coefficients in this model:

coefficient of TV: if radio and newspaper budgets are held fixed and the TV budget is increased by 1, we would expect to see an increase of 0.046 in sales

coefficient of radio: if TV and newspaper budgets are held fixed and the radio budget is increased by 1, we would expect to see an increase of 0.189 in sales

coefficient of newspaper: if TV and radio budgets are held fixed and the newspaper budgetis increased by 1, we would expect to see a decrease of 0.001 in sales.

Above, an “increase of 1” in a budget means an increase of one thousand dollars, and an “increase of 0.046” in sales means an increase of 0.19 thousand units of sales. That is, units of measurement matter. Equally, we could say the above amounts (0.046, 0.189, -0.001) give the predicted change in sales (in thousands) in the three cases.

These interpretations correspond to changing one explanatory variable *while holding the others constant*. However it is not always possible to change one explanatory variable while holding the others constant. E.g. suppose we have a model that includes both \(x\) and \(x^2\) as explanatory variables – it is not possible to change one of these two without changing the other. Some data will have similar (but not quite so extreme) features – some explanatory variables may be highly correlated, so as one variable changes another tends to change too.

We should not makes statements of causality such as “increasing \(x_1\) by 1 causes an increase of xxx in \(y\)”. Rather we prefer to say “increasing \(x_1\) by 1 is associated with an increase of xxx in \(y\)”. For example some other variable(s) could be the actual cause of increases in \(x_1\) and \(y\).

The intercept of 2.939 would be the predicted sales when the TV, radio and newspaper budgets are all zero. Is it reasonable to use our model when these budgets are all zero? We need to look at our data for this: here we do have several values near zero for all of TV, radio and newspaper budgets. Hence interpreting the 2.939 value like this seems reasonable, though we should maybe be cautious as budgets of zero are at the lower extreme of possible budgets.

In some examples an explanatory variable being zero can be meaningless, e.g. IQ score. In other examples an explanatory variable being zero may be possible but we may have only observed explanatory values of \(x_1\) in [50, 100] say. So a value \(x_1=\) 0 would be a large extrapolation outside the range of data observed and it may not be appropriate to extrapolate this far. E.g. the dependence on \(x_1\) may be approx linear for \(x_1\) in [50, 100], but without data we may have nothing to support extrapolating approx linear behaviour to \(x_1\) in [0, 50]. Extrapolation can be dangerous.