The method of Ordinary Least Squares (OLS) is one of the basic types of analysis in Econometrics. It is widely used in other fields of study as well. R programming language can be used to apply OLS. We are going to implement OLS in Rstudio because it provides an easy-to-use environment for implementing R programming code.

In this post, we will discuss how OLS can be easily implemented in Rstudio. We are going to report the results of the model along with the code. This will enable us to understand the working of R code and the role of packages in R or Rstudio. To learn more about the theory of OLS and its interpretation, go to Ordinary Least Squares Estimation or Interpretation of Coefficients: OLS.

## OLS in Rstudio: packages

The “** install.packages**” command can be used to install any packages that are needed. There are thousands of open source packages available to install in R. However, there are certain packages that are useful for Econometric analysis. Some of these are:

: tidyverse is a collection of packages that are very useful for visualisation, cleaning and manipulating data.*tidyverse*: this package can be used to present the R output in several table formats widely used in research journals or simply to make the output more organised.*stargazer*: a package to implement heteroscedasticity tests.*skedastic*: this package also provides various functions for Econometric analysis.*car*: this package can be used to load datasets of various formats such as CSV, Excel, Stata files etc.*haven*: as the name suggests, it contains a number of datasets that are readily available to load and be used for analysis.*datasets*: it allows us to use a pipe “%>%” which can be used to pass values forward into functions.*magrittr*: we will use this package to generate the HTML report below.*rmarkdown*

In this post, we are going to use some of these functions to estimate an OLS model in Rstudio. Let us estimate our OLS model.

## Rstudio: Code and results

First, we have to load all the **libraries** and the
dataset that we will be using for illustration. Here, we will use the
**“mtcars”** data available in the
**“datasets”** package.

Next, we will extract the vectors and matrices that we are going to use for OLS estimation. We can name the vectors or columns of the matrices as needed:

```
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(magrittr)
library(stargazer)
library(haven)
library(skedastic)
library(car)
library(datasets)
ols <- datasets::mtcars
mpg <- cbind(ols$mpg)
hp <- cbind(ols$hp)
IV <- cbind(ols$hp, ols$cyl, ols$wt)
colnames(IV, do.NULL = FALSE)
```

`## [1] "col1" "col2" "col3"`

`colnames(IV) <- c("hp", "cyl", "wt")`

**Basic Plots**

Let us look at some basic plots of the data:

- A scatterplot of
**mileage per gallon (mpg)**and**horsepower (hp)**showing the a negative relationship between the two. - A histogram to see the distribution of
**mileage per gallon (mpg)**.

`plot(mpg ~ hp, data = ols)`

`hist(mpg)`

**Correlations**

We can easily estimate the correlation coefficients using the
**“cor”** command. Using the Stargazer package, we can make
the table look neat and easy to read as seen in the last table.

`cor(mpg, IV)`

```
## hp cyl wt
## [1,] -0.7761684 -0.852162 -0.8676594
```

`cor(IV)`

```
## hp cyl wt
## hp 1.0000000 0.8324475 0.6587479
## cyl 0.8324475 1.0000000 0.7824958
## wt 0.6587479 0.7824958 1.0000000
```

`stargazer(cor(IV), type = "text", title = "Correlations of IV (Independent variables)")`

```
##
## Correlations of IV (Independent variables)
## =====================
## hp cyl wt
## ---------------------
## hp 1 0.832 0.659
## cyl 0.832 1 0.782
## wt 0.659 0.782 1
## ---------------------
```

**OLS Estimation**

The **“lm”** function can be used to specify the OLS
model. We specify the dependent variable **“mpg”** followed
by the independent variables. Here, we are using a matrix of independent
variable specified as **“~ IV”**. We can also specifiy each
independent variable separately **“~ hp + cyl + wt”**.

We are again using Stargazer to customize the output. In the first
output, we are using the table format accepted in the American Economic
Review using the command **“style =”aer”**. There are many
other format available as well.

Without Stargazer, the output is reported as a table shown by the
command **“summary(ols1)”**.

Moreover, it’s easy to estimate confidence intervals and Anova with
simple commands: **confint** and
**anova**.

`stargazer(ols1 <- lm(mpg ~ IV), type = "text", style = "aer", title = "OLS results AER")`

```
##
## OLS results AER
## ==========================================================
## mpg
## ----------------------------------------------------------
## IVhp -0.018
## (0.012)
##
## IVcyl -0.942*
## (0.551)
##
## IVwt -3.167***
## (0.741)
##
## Constant 38.752***
## (1.787)
##
## Observations 32
## R2 0.843
## Adjusted R2 0.826
## Residual Std. Error 2.512 (df = 28)
## F Statistic 50.171*** (df = 3; 28)
## ----------------------------------------------------------
## Notes: ***Significant at the 1 percent level.
## **Significant at the 5 percent level.
## *Significant at the 10 percent level.
```

`stargazer(ols1 <- lm(mpg ~ IV), type = "text", style = "all", title = "OLS results")`

```
##
## OLS results
## ======================================================
## Dependent variable:
## ----------------------------------
## mpg
## ------------------------------------------------------
## IVhp -0.018
## (0.012)
## t = -1.519
## p = 0.141
## IVcyl -0.942*
## (0.551)
## t = -1.709
## p = 0.099
## IVwt -3.167***
## (0.741)
## t = -4.276
## p = 0.0002
## Constant 38.752***
## (1.787)
## t = 21.687
## p = 0.000
## ------------------------------------------------------
## Observations 32
## R2 0.843
## Adjusted R2 0.826
## Residual Std. Error 2.512 (df = 28)
## F Statistic 50.171*** (df = 3; 28) (p = 0.000)
## ======================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
```

`summary(ols1)`

```
##
## Call:
## lm(formula = mpg ~ IV)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9290 -1.5598 -0.5311 1.1850 5.8986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.75179 1.78686 21.687 < 2e-16 ***
## IVhp -0.01804 0.01188 -1.519 0.140015
## IVcyl -0.94162 0.55092 -1.709 0.098480 .
## IVwt -3.16697 0.74058 -4.276 0.000199 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared: 0.8431, Adjusted R-squared: 0.8263
## F-statistic: 50.17 on 3 and 28 DF, p-value: 2.184e-11
```

```
conf <- confint(ols1, level = 0.95)
stargazer(conf, type = "text", title = "Confidence Intervals")
```

```
##
## Confidence Intervals
## =========================
## 2.5 % 97.5 %
## -------------------------
## (Intercept) 35.092 42.412
## IVhp -0.042 0.006
## IVcyl -2.070 0.187
## IVwt -4.684 -1.650
## -------------------------
```

```
anova <- anova(ols1)
stargazer(anova, type = "text", title = "Anova")
```

```
##
## Anova
## ============================================
## Statistic N Mean St. Dev. Min Max
## --------------------------------------------
## Df 2 15.500 17.678 3 28
## Sum Sq 2 563.024 546.456 176.621 949.427
## Mean Sq 2 161.392 219.322 6.308 316.476
## F value 1 50.171 50.171 50.171
## Pr(> F) 1 0.000 0 0
## --------------------------------------------
```

**Residuals and fitted values**

To estimate the fitted values and the residuals, we can use the
**“fitted”** and **“resid”** commands.

Here, we will also plot a histogram and boxplot to see how the error term or residuals are distributed. From the graphs, we can see that the residuals look approximately normal but we have a few outliers to deal with.

```
ycap <- fitted(ols1)
summary(ycap)
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.14 15.97 20.12 20.09 24.52 28.93
```

```
error <- resid(ols1)
summary(error)
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.9290 -1.5598 -0.5311 0.0000 1.1850 5.8986
```

`hist(error, breaks = 5)`

`boxplot(error)`