Hand Coding Instumental Variables
In a previous post we discussed the linear model and how to write a function that performs a linear regression. In this post we will use that linear model function to perform a [Two-Stage Least Squares estimation]. This estimation allows us to […]
Recall that we built the follow linear model function.
ols <- function (y, X, intercept=TRUE) {
if (intercept) X <- cbind(1, X)
solve(t(X)%*%X) %*% t(X)%*%y # solve for beta
}
We used the following data.
data(iris)
x1 <- iris$Petal.Width
x2 <- iris$Sepal.Length
y <- iris$Petal.Length
This allowed us to estimate a linear model.
X <- cbind(x1, x2)
ols(y = y, X = X)
## [,1]
## -1.5071384
## x1 1.7481029
## x2 0.5422556
Which includes the intercept, since the default value it TRUE
(see function definition above),
we could estimate it without an intercept using.
ols(y = y, X = X, intercept = FALSE)
## [,1]
## x1 1.9891655
## x2 0.2373159
Having revisited the above, we can continue with instumental variables.
We will replicate an example from the AER
(Applied Econometric Regressions) package.
library(AER)
data("CigarettesSW")
rprice <- with(CigarettesSW, price/cpi)
tdiff <- with(CigarettesSW, (taxs - tax)/cpi)
packs <- CigarettesSW$packs
We first need to obtain our first stage estimate (putting the whole function between parentheses allows us to both write it to the object s1
and print it).
( s1 <- ols(y = rprice, X = tdiff) )
## [,1]
## 91.103739
## X 4.163002
We can now obtain the predicted (fitted) values
Xhat <- s1[1]*tdiff + s1[2]
Using these fitted values, we can finally estimate our second stage.
ols(y = packs, X = Xhat)
## [,1]
## 126.89145456
## X -0.04658554
Now compare this to the results using ivreg()
fucntion from the AER
package.
ivreg(packs ~ rprice | tdiff)
##
## Call:
## ivreg(formula = packs ~ rprice | tdiff)
##
## Coefficients:
## (Intercept) rprice
## 219.576 -1.019
When we compare this estimate to the estimate from the linear model, we find that the effect of the price is significantly overestimated there.
lm(packs ~ rprice)
##
## Call:
## lm(formula = packs ~ rprice)
##
## Coefficients:
## (Intercept) rprice
## 222.209 -1.044
We can also do this using R
’s built in lm
function.
# first stage
s1 <- lm(rprice ~ tdiff)
# estimate second stage using fitted values (Xhat)
lm(packs ~ s1$fitted.values)
##
## Call:
## lm(formula = packs ~ s1$fitted.values)
##
## Coefficients:
## (Intercept) s1$fitted.values
## 219.576 -1.019
As an intermediate form, we can manually calculate Xhat
(fitted.values
) and estimate using that.
# manually obtain fitted values
Xhat <- s1$coefficients[2]*tdiff + s1$coefficients[1]
# estimate second stage using Xhat
lm(packs ~ Xhat)
##
## Call:
## lm(formula = packs ~ Xhat)
##
## Coefficients:
## (Intercept) Xhat
## 219.576 -1.019
Note that if you estimate a TSLS using the lm
function,
that you can only use the coefficients,
the error terms will be wrong.