After estimating Ordinary Least Squares or OLS in Rstudio, we must ensure that the model is a good fit. The model must satisfy the assumptions of OLS including heteroscedasticity, no multicollinearity, normality of residuals and no autocorrelation. We can apply several tests and generate plots to check for the goodness of fit of a model.

Here, we will estimate the same OLS model as we did in the earlier post on OLS estimation. After this, we will illustrate the estimation of various tests and procedures essential for determining the fit of any OLS model. Our analysis will focus on the following:

- Checking the normality assumption of residuals using plots.
- Reporting the Standard errors, P-values, R-square, Adjusted R-square, Confidence intervals and Anova results.
- Testing for heteroscedasticity using several tests and plots.
- Check for the problem of multicollinearity.
- Perform tests of autocorrelation and generate Autocorrelation (ACF) and Partial autocorrelation (PACF) plots.

*This website contains affiliate links. When you make a purchase through these links, we may earn a commission at no additional cost to you.*

## Goodness of fit: code and results

We will use the **“mtcars”** data available in

**“datasets” package** to estimate an OLS model. First step

involves loading the libraries and the dataset, followed by the

analysis.

Here we will illustrate the tests for **Heterocedasticity,Multicollinearity and Autocorrelation** that can be conducted

easily using R or Rstudio.

**OLS Estimation, Confidence Intervals, Anova and Normality ofResiduals**

The **“lm”** is used to estimate the OLS model. For

Goodness of fit, the function automatically estimates **StandardErrors, P-values, R-square and Adjusted R-square**.

To supplement this, we can use **confint** to estimate

the confidence intervals of the coefficients and **anova**

to do an F-test.

Fitted values and residuals/error terms can be obtained using**fitted** and **resid** commands. To check

the distribution of residuals graphically, we can plot a histogram using**hist**.

`stargazer(ols1 <- lm(mpg ~ IV), type = "text", style = "all", title = "OLS results")`

```
##
## OLS results
## ======================================================
## Dependent variable:
## ----------------------------------
## mpg
## ------------------------------------------------------
## IVhp -0.018
## (0.012)
## t = -1.519
## p = 0.141
## IVcyl -0.942*
## (0.551)
## t = -1.709
## p = 0.099
## IVwt -3.167***
## (0.741)
## t = -4.276
## p = 0.0002
## Constant 38.752***
## (1.787)
## t = 21.687
## p = 0.000
## ------------------------------------------------------
## Observations 32
## R2 0.843
## Adjusted R2 0.826
## Residual Std. Error 2.512 (df = 28)
## F Statistic 50.171*** (df = 3; 28) (p = 0.000)
## ======================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
```

```
conf <- confint(ols1, level = 0.95)
stargazer(conf, type = "text", title = "Confidence Intervals")
```

```
##
## Confidence Intervals
## =========================
## 2.5 % 97.5 %
## -------------------------
## (Intercept) 35.092 42.412
## IVhp -0.042 0.006
## IVcyl -2.070 0.187
## IVwt -4.684 -1.650
## -------------------------
```

```
anova <- anova(ols1)
stargazer(anova, type = "text", title = "Anova")
```

```
##
## Anova
## ============================================
## Statistic N Mean St. Dev. Min Max
## --------------------------------------------
## Df 2 15.500 17.678 3 28
## Sum Sq 2 563.024 546.456 176.621 949.427
## Mean Sq 2 161.392 219.322 6.308 316.476
## F value 1 50.171 50.171 50.171
## Pr(> F) 1 0.000 0 0
## --------------------------------------------
```

```
ycap <- fitted(ols1)
error <- resid(ols1)
hist(error, breaks = 5)
```

**Heteroscedasticity**

We will use the **skedastic** package to apply various

tests of heteroscedasticity.

We can also use the **plot** command to get some

insights into the behaviour of residuals by plotting them with fitted

values.

An interesting application of the **plot** command is to

use it on the model object **ols1** that we estimated:**plot(ols1)**. This automatically generated 4 different

types of plots that give insights into normality as well as

heteroscedasticity.

`plot(ycap,error) # Residual vs fitted plot`

`plot(ols1) # 4 different types of plot`

`lmtest::bptest(ols1)`

```
##
## studentized Breusch-Pagan test
##
## data: ols1
## BP = 2.9351, df = 3, p-value = 0.4017
```

`skedastic::white_lm(ols1) # White's Heteroscedasticity Test`

```
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 5.83 0.442 6 White's Test greater
```

`skedastic::breusch_pagan(ols1) # Breusch-Pagan Test`

```
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 2.94 0.402 3 Koenker (studentised) greater
```

`skedastic::cook_weisberg(ols1) # Cook-Weisberg Test`

```
## # A tibble: 1 × 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <int> <chr> <chr>
## 1 3.48 0.324 3 mult greater
```

`skedastic::glejser(ols1) # Glejser Test`

```
## # A tibble: 1 × 4
## statistic p.value parameter alternative
## <dbl> <dbl> <dbl> <chr>
## 1 2.49 0.478 3 greater
```

`skedastic::hetplot(ols1, horzvar = "fitted.values")`

**Multicollinearity**

To check for multicollinearity in the model, we can look at the

correlations using the **cor** command.

Using the **car** package, we will estimate the**Variance Inflation Factors (VIF)**. As a reule of thumb,

multicollinearity is a serious problem if VIF > 10.

`stargazer(cor(IV), type = "text", title = "Correlations")`

```
##
## Correlations
## =====================
## hp cyl wt
## ---------------------
## hp 1 0.832 0.659
## cyl 0.832 1 0.782
## wt 0.659 0.782 1
## ---------------------
```

```
vif1 <- vif(lm(mpg ~ hp + cyl + wt, data = ols))
stargazer(vif1, type = "text", title = "VIF")
```

```
##
## VIF
## =================
## hp cyl wt
## -----------------
## 3.258 4.757 2.580
## -----------------
```

**Autocorrelation**

**Autocorrelation** and **PartialAutocorrelation** plots are widely used in Econometrics. Here,

**acf**and

**pacf**commands are used on the

residuals to determine their autocorrelation structure.

In addition to this, we can also apply the**Durbin-Watson** and **Breusch-Godfrey** test

of serial correlation.

`acf(error, lag.max = 6, type = "correlation")`

`pacf(error, lag.max = 6)`

`lmtest::dwtest(ols1) # Durbin-Watson test for AR(1) only`

```
##
## Durbin-Watson test
##
## data: ols1
## DW = 1.6441, p-value = 0.1002
## alternative hypothesis: true autocorrelation is greater than 0
```

`lmtest::bgtest(ols1, order = 6)`

```
##
## Breusch-Godfrey test for serial correlation of order up to 6
##
## data: ols1
## LM test = 5.0855, df = 6, p-value = 0.5329
```