The Linear Model from Scratch in R
When it comes to econometrics, the main take aways from the workshops are primarily in terms of the syntax of yet another computer program.
The Linear Model
Then using the Ordinary Least Squares approach to solving a model, we start with the following equation of the OLS model for a univariate regression.
\[y_i = \beta_0 + \beta_1 x_1 + \epsilon\]This can be solver for the following (hat denotes the estimator, bar denotes the mean):
\[\hat{\beta_1} = \frac{ \sumˆn_{i=1} (x_i - \bar{x} )(y_i - \bar{y} ) }{(x_i - \bar{x})ˆ2 }\]We start by loading a basic data set.
Inspect the data set.
Assign our variables to objects (in the global environment)
We can now estimate the slope parameter:
Using the slope parameter we can now compute the intercept.
Lets check this using the built in command.
The matrix model
In matrix form we can specify our general equation as:
\[y = \beta X + \epsilon\]From which we can derive our estimator:
\[\beta = (X^T *X)^{-1} * (X^T*y)\]The matrix estimation
Use the built in command.
Now we estimate our beta ourselves, the function used to invert is called solve()
.
Now lets estimate with an intercept
To hand code this, we need to add a vector of ones (1
s).
Note that the single 1
that we are binding to the vector X
will be repeated until it is the same length.
We also have to discuss the above solve()
function, ^-1
is not correct syntax for a matrix inversion. In the above case it would still work correctly because our X
matrix is in fact a vector. If we pre-multiply this vector with the transpose of itself, we obtain a scalar.
However, for matrices wider than one column this is not the case.
This ^-1
will invert every individual number in the matrix, rather than the matrix as a whole.
We want to obtain to obtain the inverse of the matrix, because this will allow us to pre-multiply on both sides, eliminating XI
on the Right-Hand Side (RHS).
We therefore use a different tool, thesolve()
function from the base
package.
This function implements the QR decomposition,
which is an efficient way of deriving an inverse of a matrix.
Now we can use this matrix to estimate a model with an intercept.
Note that this is programmatically exactly the same the way that the lm()
function does this.
We can suppress the automatic intercept and include our XI
variable and we will obtain the same results.
We have now constructed a univariate (univariate) model, however, from a programmatic point of view, the hurdles of multivariate modelling have already been overcome by estimating a model with an intercept (making X
a matrix).
It is therefore very easy to use the same method in a case with two independent variables.
we start by binding the two independent variables together (with vector of 1
s, since we want an intercept).
Now we estimate our model
And that’s all! The leap from univariate to multivariate modelling was truly very small.
EDIT: in tomorrow’s post we use the method we developed here to create an easy to use function.