OLS or Ordinary Least Squares is the most common method used in Econometrics. It is a linear regression technique that minimizes the sum of squared residuals (error term) to estimate the coefficients. In this post, we will dive deeper into the practical process of estimating the coefficients using Ordinary Least Squares.
- Interpretation of Coefficients: OLS
- OLS in R
- Goodness of fit in R
- OLS and Robust Standard Errors
- Information Criteria and Model Selection
The Ordinary Least Squares Equation
Suppose, we need to estimate the quantity demanded of a commodity ‘y’. According to economic theory, demand for any commodity depends on several factors such as its price, the price of other commodities, income, etc. For illustration purposes, we will use three variables representing the price of y (denoted by ‘p’), the price of a substitute good (x) and income (i). In such a situation, the quantity demanded of y is the dependent variable. Its price, price of substitute good “x” and income (i) are independent or explanatory variables. Each observation of quantity demanded (y) can be written in equation form as follows:
For different combinations of independent variables, the data has a corresponding value of ‘y’. For instance, if there are 50 observations, then we have 50 different values of ‘y’ corresponding to 50 different combinations of independent variables’ values.
OLS: The Matrix Form and Normal Equation Estimation
Generally, OLS estimation is carried out using matrices. The above equation can be expressed in the form of matrices as follows:
The first column of X includes all values of 1 because of the constant. Therefore, it does not have any independent variable attached to it. This column is necessary to estimate the constant.
Hence, the values of coefficients can be estimated using matrix multiplications, transpose and matrix inverse calculations. This can be easily accomplished using any econometric software package.
Estimation with real data
To demonstrate, we will consider a small hypothetical data sample of 10 observations for the variables used above.
|Observations||Quantity demanded of commodity ‘y’||Price of ‘y’ (p)||Price of substitute good ‘x’||Income (i)|
The quantity demanded ‘y’ can be predicted from any given values of independent variables and their coefficients. This is accomplished by substituting values in the following equation:
From the given data, we can construct the matrices to estimate the coefficients with the normal equation method. The matrices for the above equation can be constructed as follows:
On solving, the values of coefficients based on the given data were:
Making predictions after ordinary least squares
Suppose, we have to predict the quantity demanded of commodity ‘y’. Let us assume the given values of independent variables as p = 42, x = 57 and i = 780. We can forecast ‘y’ using the OLS equation:
This means that the quantity demanded of “y” will be 241 at the given price of 42, the price of substitute “x” at 57 and the income of 780.
Estimating the residuals or error terms
The residuals or error terms are estimated by subtracting the “predicted values of Y” from the “actual values of Y” for every observation and can be expressed as:
For each observation, we can obtain the predicted values with the help of estimated coefficients and the values of independent variables from those observations.
|Observations||Quantity demanded of commodity ‘y’||Price of ‘y’ (p)||Price of substitute good ‘x’||Income (i)||Predicted ‘y’ (using coefficients)||Residuals or error |
(y – predicted y)
Note: predicted ‘y’ has been rounded off to the nearest integer values here. The quantity demanded of a commodity is generally a whole number because consumers cannot buy fractions of a commodity in most cases (for example, 0.3 units of a book).