11 Linear models
Statistical models of a linear relationship between variables: \[Y=\beta_0+\beta_1X+e,\] where:
- \(Y\) is the dependent variable;
 - \(X\) is the independent variable;
 - \(e\) is the error term;
 
The errors should be independent, identically normally distributed, with mean \(0\) and variance \(\sigma^2>0\).
The model parameters to be estimated are: \(\beta_0\) and \(\beta_1\).
Examples: \(\hat y=1+0.5x\), \(\hat y=0-2x\).
In R: consider 1000 pairs \((x_i,y_i)\) as follows
> set.seed(98765421)
> x<-rchisq(1000,df=1)
> y<-1+0.5*x+0.7*rnorm(1000)
> 
> plot(x,y,xlab='x',ylab='y')
- The main idea is to compute the best linear model, that is, the best blue line:
 

Suppose the estimated model \(\hat y=1+0.5x\) for the above data.
Model residuals:
> residuals.y<-y-(1+0.5*x)
> 
> # Check assumptions
> f<-function(x){dnorm(x,sd=sd(residuals.y))}
> hist(residuals.y,probability=TRUE,ylim=c(0,0.7),nclass = 20)
> curve(f,col="blue", lwd=3, add=TRUE)
11.1 Fitting models: the lm()function
\[y=\beta_0+\beta_1x+e\]
- In R linear models can be fitted to data with the 
lm()function: 
> analysis<-lm(y~x)
> analysis
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##       0.991        0.496So, parameter estimates are: \(\hat{\beta}_0=0.991\) and \(\hat{\beta}_1=0.496\).
Model formulas: the argument to
lm()is aformulaobject. A linear model is specified by a formula object, which may look like this:
The corresponding linear model is: \[y=\beta_0+\beta_1x+\beta_2z+\beta_3w+e.\]
Contents of the
lm()function:
> analysis<-lm(y~x)
> names(analysis)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"- Accessing the contents:
 
- Summaries:
 
> summary(analysis)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2027 -0.4663 -0.0532  0.5073  1.9379 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.9909     0.0267    37.1   <2e-16 ***
## x             0.4964     0.0142    35.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.698 on 998 degrees of freedom
## Multiple R-squared:  0.551,  Adjusted R-squared:  0.551 
## F-statistic: 1.23e+03 on 1 and 998 DF,  p-value: <2e-16