After estimating Ordinary Least Squares or OLS in Rstudio, we must ensure that the model is a good fit. The model must satisfy the assumptions of OLS including heteroscedasticity, no multicollinearity, normality of residuals and no autocorrelation. We can apply several tests and generate plots to check for the goodness of fit of a model.
Here, we will estimate the same OLS model as we did in the earlier post on OLS estimation. After this, we will illustrate the estimation of various tests and procedures essential for determining the fit of any OLS model. Our analysis will focus on the following:
- Checking the normality assumption of residuals using plots.
- Reporting the Standard errors, P-values, R-square, Adjusted R-square, Confidence intervals and Anova results.
- Testing for heteroscedasticity using several tests and plots.
- Check for the problem of multicollinearity.
- Perform tests of autocorrelation and generate Autocorrelation (ACF) and Partial autocorrelation (PACF) plots.
Econometrics Tutorials
This website contains affiliate links. When you make a purchase through these links, we may earn a commission at no additional cost to you.
Goodness of fit: code and results
We will use the “mtcars” data available in
“datasets” package to estimate an OLS model. First step
involves loading the libraries and the dataset, followed by the
analysis.
Here we will illustrate the tests for Heterocedasticity,
Multicollinearity and Autocorrelation that can be conducted
easily using R or Rstudio.
OLS Estimation, Confidence Intervals, Anova and Normality of
Residuals
The “lm” is used to estimate the OLS model. For
Goodness of fit, the function automatically estimates Standard
Errors, P-values, R-square and Adjusted R-square.
To supplement this, we can use confint to estimate
the confidence intervals of the coefficients and anova
to do an F-test.
Fitted values and residuals/error terms can be obtained using
fitted and resid commands. To check
the distribution of residuals graphically, we can plot a histogram using
hist.
stargazer(ols1 <- lm(mpg ~ IV), type = "text", style = "all", title = "OLS results")
##
## OLS results
## ======================================================
## Dependent variable:
## ----------------------------------
## mpg
## ------------------------------------------------------
## IVhp -0.018
## (0.012)
## t = -1.519
## p = 0.141
## IVcyl -0.942*
## (0.551)
## t = -1.709
## p = 0.099
## IVwt -3.167***
## (0.741)
## t = -4.276
## p = 0.0002
## Constant 38.752***
## (1.787)
## t = 21.687
## p = 0.000
## ------------------------------------------------------
## Observations 32
## R2 0.843
## Adjusted R2 0.826
## Residual Std. Error 2.512 (df = 28)
## F Statistic 50.171*** (df = 3; 28) (p = 0.000)
## ======================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
conf <- confint(ols1, level = 0.95)
stargazer(conf, type = "text", title = "Confidence Intervals")
##
## Confidence Intervals
## =========================
## 2.5 % 97.5 %
## -------------------------
## (Intercept) 35.092 42.412
## IVhp -0.042 0.006
## IVcyl -2.070 0.187
## IVwt -4.684 -1.650
## -------------------------
anova <- anova(ols1)
stargazer(anova, type = "text", title = "Anova")
##
## Anova
## ============================================
## Statistic N Mean St. Dev. Min Max
## --------------------------------------------
## Df 2 15.500 17.678 3 28
## Sum Sq 2 563.024 546.456 176.621 949.427
## Mean Sq 2 161.392 219.322 6.308 316.476
## F value 1 50.171 50.171 50.171
## Pr(> F) 1 0.000 0 0
## --------------------------------------------
ycap <- fitted(ols1)
error <- resid(ols1)
hist(error, breaks = 5)
Heteroscedasticity
We will use the skedastic package to apply various
tests of heteroscedasticity.
We can also use the plot command to get some
insights into the behaviour of residuals by plotting them with fitted
values.
An interesting application of the plot command is to
use it on the model object ols1 that we estimated:
plot(ols1). This automatically generated 4 different
types of plots that give insights into normality as well as
heteroscedasticity.
plot(ycap,error) # Residual vs fitted plot
plot(ols1) # 4 different types of plot
lmtest::bptest(ols1)
##
## studentized Breusch-Pagan test
##
## data: ols1
## BP = 2.9351, df = 3, p-value = 0.4017
skedastic::white_lm(ols1) # White's Heteroscedasticity Test
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 5.83 0.442 6 White's Test greater
skedastic::breusch_pagan(ols1) # Breusch-Pagan Test
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 2.94 0.402 3 Koenker (studentised) greater
skedastic::cook_weisberg(ols1) # Cook-Weisberg Test
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <int> <chr> <chr>
## 1 3.48 0.324 3 mult greater
skedastic::glejser(ols1) # Glejser Test
## # A tibble: 1 × 4
## statistic p.value parameter alternative
## <dbl> <dbl> <dbl> <chr>
## 1 2.49 0.478 3 greater
skedastic::hetplot(ols1, horzvar = "fitted.values")
Multicollinearity
To check for multicollinearity in the model, we can look at the
correlations using the cor command.
Using the car package, we will estimate the
Variance Inflation Factors (VIF). As a reule of thumb,
multicollinearity is a serious problem if VIF > 10.
stargazer(cor(IV), type = "text", title = "Correlations")
##
## Correlations
## =====================
## hp cyl wt
## ---------------------
## hp 1 0.832 0.659
## cyl 0.832 1 0.782
## wt 0.659 0.782 1
## ---------------------
vif1 <- vif(lm(mpg ~ hp + cyl + wt, data = ols))
stargazer(vif1, type = "text", title = "VIF")
##
## VIF
## =================
## hp cyl wt
## -----------------
## 3.258 4.757 2.580
## -----------------
Autocorrelation
Autocorrelation and Partial
Autocorrelation plots are widely used in Econometrics. Here,
acf and pacf commands are used on the
residuals to determine their autocorrelation structure.
In addition to this, we can also apply the
Durbin-Watson and Breusch-Godfrey test
of serial correlation.
acf(error, lag.max = 6, type = "correlation")
pacf(error, lag.max = 6)
lmtest::dwtest(ols1) # Durbin-Watson test for AR(1) only
##
## Durbin-Watson test
##
## data: ols1
## DW = 1.6441, p-value = 0.1002
## alternative hypothesis: true autocorrelation is greater than 0
lmtest::bgtest(ols1, order = 6)
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: ols1
## LM test = 5.0855, df = 6, p-value = 0.5329