Goodness of fit in Rstudio

After estimating Ordinary Least Squares or OLS in Rstudio, we must ensure that the model is a good fit. The model must satisfy the assumptions of OLS including heteroscedasticity, no multicollinearity, normality of residuals and no autocorrelation. We can apply several tests and generate plots to check for the goodness of fit of a model.

Here, we will estimate the same OLS model as we did in the earlier post on OLS estimation. After this, we will illustrate the estimation of various tests and procedures essential for determining the fit of any OLS model. Our analysis will focus on the following:

  • Checking the normality assumption of residuals using plots.
  • Reporting the Standard errors, P-values, R-square, Adjusted R-square, Confidence intervals and Anova results.
  • Testing for heteroscedasticity using several tests and plots.
  • Check for the problem of multicollinearity.
  • Perform tests of autocorrelation and generate Autocorrelation (ACF) and Partial autocorrelation (PACF) plots.

This website contains affiliate links. When you make a purchase through these links, we may earn a commission at no additional cost to you.

Goodness of fit: code and results

We will use the “mtcars” data available in

“datasets” package to estimate an OLS model. First step
involves loading the libraries and the dataset, followed by the
analysis.

Here we will illustrate the tests for Heterocedasticity,
Multicollinearity and Autocorrelation
that can be conducted
easily using R or Rstudio.

OLS Estimation, Confidence Intervals, Anova and Normality of
Residuals

The “lm” is used to estimate the OLS model. For
Goodness of fit, the function automatically estimates Standard
Errors, P-values, R-square and Adjusted R-square
.

To supplement this, we can use confint to estimate
the confidence intervals of the coefficients and anova
to do an F-test.

Fitted values and residuals/error terms can be obtained using
fitted and resid commands. To check
the distribution of residuals graphically, we can plot a histogram using
hist.

stargazer(ols1 <- lm(mpg ~ IV), type = "text", style = "all", title = "OLS results")
## 
## OLS results
## ======================================================
##                            Dependent variable:        
##                     ----------------------------------
##                                    mpg                
## ------------------------------------------------------
## IVhp                              -0.018              
##                                  (0.012)              
##                                 t = -1.519            
##                                 p = 0.141             
## IVcyl                            -0.942*              
##                                  (0.551)              
##                                 t = -1.709            
##                                 p = 0.099             
## IVwt                            -3.167***             
##                                  (0.741)              
##                                 t = -4.276            
##                                 p = 0.0002            
## Constant                        38.752***             
##                                  (1.787)              
##                                 t = 21.687            
##                                 p = 0.000             
## ------------------------------------------------------
## Observations                        32                
## R2                                0.843               
## Adjusted R2                       0.826               
## Residual Std. Error          2.512 (df = 28)          
## F Statistic         50.171*** (df = 3; 28) (p = 0.000)
## ======================================================
## Note:                      *p<0.1; **p<0.05; ***p<0.01
conf <- confint(ols1, level = 0.95)
stargazer(conf, type = "text", title = "Confidence Intervals")
## 
## Confidence Intervals
## =========================
##             2.5 %  97.5 %
## -------------------------
## (Intercept) 35.092 42.412
## IVhp        -0.042 0.006 
## IVcyl       -2.070 0.187 
## IVwt        -4.684 -1.650
## -------------------------
anova <- anova(ols1)
stargazer(anova, type = "text", title = "Anova")
## 
## Anova
## ============================================
## Statistic N  Mean   St. Dev.   Min     Max  
## --------------------------------------------
## Df        2 15.500   17.678     3      28   
## Sum Sq    2 563.024 546.456  176.621 949.427
## Mean Sq   2 161.392 219.322   6.308  316.476
## F value   1 50.171           50.171  50.171 
## Pr(> F)   1  0.000              0       0   
## --------------------------------------------
ycap <- fitted(ols1)
error <- resid(ols1)
hist(error, breaks = 5)

Heteroscedasticity

We will use the skedastic package to apply various
tests of heteroscedasticity.

We can also use the plot command to get some
insights into the behaviour of residuals by plotting them with fitted
values.

An interesting application of the plot command is to
use it on the model object ols1 that we estimated:
plot(ols1). This automatically generated 4 different
types of plots that give insights into normality as well as
heteroscedasticity.

plot(ycap,error) # Residual vs fitted plot

plot(ols1) # 4 different types of plot

lmtest::bptest(ols1)
## 
##  studentized Breusch-Pagan test
## 
## data:  ols1
## BP = 2.9351, df = 3, p-value = 0.4017
skedastic::white_lm(ols1) # White's Heteroscedasticity Test
## # A tibble: 1 × 5
##   statistic p.value parameter method       alternative
##       <dbl>   <dbl>     <dbl> <chr>        <chr>      
## 1      5.83   0.442         6 White's Test greater
skedastic::breusch_pagan(ols1) # Breusch-Pagan Test
## # A tibble: 1 × 5
##   statistic p.value parameter method                alternative
##       <dbl>   <dbl>     <dbl> <chr>                 <chr>      
## 1      2.94   0.402         3 Koenker (studentised) greater
skedastic::cook_weisberg(ols1) # Cook-Weisberg Test
## # A tibble: 1 × 5
##   statistic p.value parameter method alternative
##       <dbl>   <dbl>     <int> <chr>  <chr>      
## 1      3.48   0.324         3 mult   greater
skedastic::glejser(ols1) # Glejser Test
## # A tibble: 1 × 4
##   statistic p.value parameter alternative
##       <dbl>   <dbl>     <dbl> <chr>      
## 1      2.49   0.478         3 greater
skedastic::hetplot(ols1, horzvar = "fitted.values")

Multicollinearity

To check for multicollinearity in the model, we can look at the
correlations using the cor command.

Using the car package, we will estimate the
Variance Inflation Factors (VIF). As a reule of thumb,
multicollinearity is a serious problem if VIF > 10.

stargazer(cor(IV), type = "text", title = "Correlations")
## 
## Correlations
## =====================
##      hp    cyl   wt  
## ---------------------
## hp    1   0.832 0.659
## cyl 0.832   1   0.782
## wt  0.659 0.782   1  
## ---------------------
vif1 <- vif(lm(mpg ~ hp + cyl + wt, data = ols))
stargazer(vif1, type = "text", title = "VIF")
## 
## VIF
## =================
## hp     cyl   wt  
## -----------------
## 3.258 4.757 2.580
## -----------------

Autocorrelation

Autocorrelation and Partial
Autocorrelation
plots are widely used in Econometrics. Here,
acf and pacf commands are used on the
residuals to determine their autocorrelation structure.

In addition to this, we can also apply the
Durbin-Watson and Breusch-Godfrey test
of serial correlation.

acf(error, lag.max = 6, type = "correlation")

pacf(error, lag.max = 6)

lmtest::dwtest(ols1) # Durbin-Watson test for AR(1) only
## 
##  Durbin-Watson test
## 
## data:  ols1
## DW = 1.6441, p-value = 0.1002
## alternative hypothesis: true autocorrelation is greater than 0
lmtest::bgtest(ols1, order = 6)
## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  ols1
## LM test = 5.0855, df = 6, p-value = 0.5329

R Programming Fundamentals

Visualizing Data with R

Using Python for Research

Visualizing Data with Python



Leave a Reply