Wednesday, 28 August 2013

Basic Linear Regression - Part 2

Linear Regression - Introduction

Linear Regression - Part 2

Have you had your coffee yet? Okay, let's continue

2. Goodness-of-fit

To measure how good our linear regression model performs its job, R-square \( R^2 \) will be measured:

\[ R^2 = \frac{\hat{V}\{\hat{y}\}} {\hat{V}\{y\}} \]

where \( \hat{V}() \) denotes the sample variance

\[ R^2=\frac{1/(N-1) \sum_{i=1}^{N}(\hat{y_i}-\bar{y})^2}{1/(N-1) \sum_{i=1}^{N}(y_i-\bar{y})^2} \]

As:
\( TSS \) = Total sum of Squares = \( \sum_{i=1}^{N}(y_i - \bar{y})^2 \)
\( SSR \) = Regression Sum of Squares = \( \sum_{i=1}^{N}(\hat{y_i} - \bar{y})^2 \)
\( SSE \) = Error Sum of Squares = \( \sum_{i=1}^{N}(y_i - \hat{y_i})^2 \)

and \( TSS = SSR + SSE \)

\[ R^2 = \frac{SSR}{TSS} \]
\[ R^2 = 1 - \frac{SSE}{TSS} \]
\[ R^2 = 1 - \frac{\hat{V}\{e\}}{\hat{V}\{y\}} = 1 - \frac{\frac{1}{N-1} \sum_{i=1}^{N} e_{i}^{2}}{\frac{1}{N-1} \sum_{i=1}^{N} (y_i - \bar{y})^2} \]

The \( R^2 \) measure has a drawback that it will never decrease if the number of regressors is increased, even it has no explanatory power in the regression. To compensate, adjusted \( R^2 \) aka \( \bar{R}^2 \) is proposed

\[ \bar{R}^2 = 1 - \frac{\frac{1}{N-1} \sum_{i=1}^{N-K} e_{i}^{2}}{\frac{1}{N-1} \sum_{i=1}^{N} (y_i - \bar{y})^2} \]

Now, let's modify the regression and its summary functions by incorporating the \( R^2 \) and \( \bar{R}^2 \) measures:


lr <- function(X, y, method = "OLS", intercept = T) {
    X = as.matrix(X)
    K = ncol(X)
    N = nrow(X)
    beta = matrix()
    result = list()


    if (intercept == T) {
        # reset the length of vector beta
        K = K + 1
        X = cbind(1, X)
    }

    beta = solve(t(X) %*% X) %*% t(X) %*% y

    # calculate the residuals
    resid = y - X %*% beta

    # calculate the variance of residuals
    residvar = (t(resid) %*% resid)/(N - K)
    # convert it to value
    residvar = residvar[1, 1]

    # calcuate R-squared and adjusted R-sqaured calculate the variance of
    # residuals (not adjusted for no. of regressors)
    SSE = (t(resid) %*% resid)/(N - 1)
    SSE = SSE[1, 1]
    adjSSE = residvar
    TSS = t(y - mean(y)) %*% (y - mean(y))/(N - 1)
    TSS = TSS[1, 1]
    Rsquared = 1 - (SSE/TSS)
    adjRsquared = 1 - (adjSSE/TSS)

    betacovar = solve(t(X) %*% X) * residvar

    # assign value to the list result and return it
    result$beta$val = beta
    result$resid$val = resid
    result$resid$var = residvar
    result$beta$covar = betacovar
    result$beta$var = diag(betacovar)
    result$beta$se = sqrt(diag(betacovar))
    result$hasIntercept = intercept
    result$R = Rsquared
    result$adjR = adjRsquared
    result
}
lr.summary <- function(result) {
    displayAdjust = 0
    K = length(result$beta$val)

    cat("Regression Results:\n\n")
    cat("Residuals Summary:\n")
    residsummary = summary(result$resid$val)
    print(residsummary)
    cat("\n")
    cat("Coefficients:\n")

    text = sprintf("%30s%15s\n", "Values", "Std. Error")
    cat(text)

    for (i in 1:K) {
        if (result$hasIntercept && (i == 1)) {
            text = sprintf("%-15s%15.6f%15.6f\n", "(Intercept)", result$beta$val[1], 
                result$beta$se[1])
            cat(text)
            displayAdjust = 1
        } else {
            tcoef = sprintf("beta%d", i - displayAdjust)
            text = sprintf("%-15s%15.6f%15.6f\n", tcoef, result$beta$val[i], 
                result$beta$se[i])
            cat(text)
        }

    }

    text = sprintf("\nR-squared: %19.4f\n", result$R)
    cat(text)
    text = sprintf("Adjusted R-squared: %10.4f\n", result$adjR)
    cat(text)
}

Run again:

results = lr(X, y)
lr.summary(results)
## Regression Results:
## 
## Residuals Summary:
##        V1           
##  Min.   :-8.53e-14  
##  1st Qu.: 2.49e-14  
##  Median : 5.68e-14  
##  Mean   : 5.90e-14  
##  3rd Qu.: 9.95e-14  
##  Max.   : 2.02e-13  
## 
## Coefficients:
##                         Values     Std. Error
## (Intercept)          12.000000       0.000000
## beta1                 6.000000       0.000000
## beta2                 4.000000       0.000000
## beta3                 9.000000       0.000000
## 
## R-squared:              1.0000
## Adjusted R-squared:     1.0000

Testing the coefficients

Remember that we estimated the variance of the residuals \( \sigma \) as \( s \):

\[ s^2 = \frac{1}{N-K-1} \sum_{i=1}^{N}e_i^2 \]

From the Gauss-Markov assumptions, the residuals should be normally distributed and the distribution of \( \sum_{i=1}^{N}e_i^2 \) should be in Chi-squared with \( N-K-1 \) degrees of freedom.

\[ (N-K-1) s^2 \sim \chi_{N-K-1}^2 \]

From the properties of Chi-squared distribution, if \( \sum_{j=1}^J e_j^2 \sim \chi_J^2 \), the distribution of \( \sum_{j=1}^J \frac{(e-\mu)^2}{\sigma^2} \sim \chi_J^2 \) is also in Chi-squared with \( E(\sum_{j=1}^J \frac{(e-\mu)^2}{\sigma^2}) = J \) and \( V(\sum_{j=1}^J \frac{(e-\mu)^2}{\sigma^2}) = 2J \)

Thus, with the variable \( e \) has a zero mean (i.e. \( \mu = 0 \)), it follows
\[ \xi = \frac{(N-K-1) s^2}{\sigma} \sim \chi_{N-K-1}^2 \]

Let's go to the t-distribution definitation, if a variable \( X \) has a standard normal distribution, \( X \sim N(0,1) \), and \( \xi \sim \chi_J^2 \), then the following ratio has a t-distribution:

\[ t = \frac{X}{\sqrt{\xi/J}} \]

In our case, to test whether the regressor \( \beta_k \) is significant or not, we let \( X = \frac{\hat{\beta_k} - \beta_k}{\sqrt{V(\hat{\beta_k})}} \). As \( V(\hat{\beta}) = \sigma^2 (X'X)^{-1} \), we let \( c_{kk} \) be the \( k \) row and \( k \) column of the matrix \( (X'X)^{-1} \). The variance of the regressor \( \hat{\beta_K} \) is \( \sigma^2 c_{kk} \), the standard deviation is \( \sigma \sqrt{c_{kk}} \)

So, \( X = \frac{\hat{\beta_k} - \beta_k}{\sigma \sqrt{c_{kk}}} \)

\[ t = \frac{X}{\xi/(N-k-1)} \]
\[ t = \frac{\hat{\beta_k} - \beta_k}{\sigma \sqrt{c_{kk}}} / \frac{s^2}{\sigma} \]
\[ t = \frac{\hat{\beta_k} - \beta_k}{s \sqrt{c_{kk}}} \]
\[ t = \frac{\hat{\beta_k} - \beta_k}{se(\hat{\beta_k})} \]

the value \( t \) has a t-distribution with \( N-K-1 \) degrees of freedom

To test whether the coefficient \( \beta_k \) is significantly different from zero, we will test:

\[ t_k = \frac{\hat{\beta_k}-0}{se(\hat{\beta_k})} \]
with dof = (N-K-1).


lr <- function(X, y, method = "OLS", intercept = T) {
    X = as.matrix(X)
    K = ncol(X)
    N = nrow(X)
    beta = matrix()
    result = list()


    if (intercept == T) {
        # reset the length of vector beta
        K = K + 1
        X = cbind(1, X)
    }

    beta = solve(t(X) %*% X) %*% t(X) %*% y

    # calculate the residuals
    resid = y - X %*% beta

    # calculate the variance of residuals
    residvar = (t(resid) %*% resid)/(N - K)
    # convert it to value
    residvar = residvar[1, 1]

    # calculate R-squared and adjusted R-sqaured calculate the variance of
    # residuals (not adjusted for no. of regressors)
    SSE = (t(resid) %*% resid)/(N - 1)
    SSE = SSE[1, 1]
    adjSSE = residvar
    TSS = t(y - mean(y)) %*% (y - mean(y))/(N - 1)
    TSS = TSS[1, 1]
    Rsquared = 1 - (SSE/TSS)
    adjRsquared = 1 - (adjSSE/TSS)

    betacovar = solve(t(X) %*% X) * residvar





    # assign value to the list result and return it
    result$beta$val = beta
    result$resid$val = resid
    result$resid$var = residvar
    result$beta$covar = betacovar
    result$beta$var = diag(betacovar)
    result$beta$se = sqrt(diag(betacovar))
    # calculate t value for all the coefficients
    result$beta$t = beta/sqrt(diag(betacovar))
    result$hasIntercept = intercept
    result$R = Rsquared
    result$adjR = adjRsquared
    result
}
lr.summary <- function(result) {
    displayAdjust = 0
    K = length(result$beta$val)

    cat("Regression Results:\n\n")
    cat("Residuals Summary:\n")
    residsummary = summary(result$resid$val)
    print(residsummary)
    cat("\n")
    cat("Coefficients:\n")

    text = sprintf("%30s%15s%15s\n", "Values", "Std. Error", "t")
    cat(text)

    for (i in 1:K) {
        if (result$hasIntercept && (i == 1)) {
            text = sprintf("%-15s%15.6f%15.6f%15.6f\n", "(Intercept)", result$beta$val[1], 
                result$beta$se[1], result$beta$t[1])
            cat(text)
            displayAdjust = 1
        } else {
            tcoef = sprintf("beta%d", i - displayAdjust)
            text = sprintf("%-15s%15.6f%15.6f%15.6f\n", tcoef, result$beta$val[i], 
                result$beta$se[i], result$beta$t[i])
            cat(text)
        }

    }

    text = sprintf("\nR-squared: %19.4f\n", result$R)
    cat(text)
    text = sprintf("Adjusted R-squared: %10.4f\n", result$adjR)
    cat(text)
}

Run again:

results = lr(X, y)
lr.summary(results)
## Regression Results:
## 
## Residuals Summary:
##        V1           
##  Min.   :-8.53e-14  
##  1st Qu.: 2.49e-14  
##  Median : 5.68e-14  
##  Mean   : 5.90e-14  
##  3rd Qu.: 9.95e-14  
##  Max.   : 2.02e-13  
## 
## Coefficients:
##                         Values     Std. Error              t
## (Intercept)          12.000000       0.000000415445722959532.375000
## beta1                 6.000000       0.0000002012432297973165.250000
## beta2                 4.000000       0.0000001328501284142088.000000
## beta3                 9.000000       0.0000003244399974637064.500000
## 
## R-squared:              1.0000
## Adjusted R-squared:     1.0000

Tuesday, 27 August 2013

Basic Linear Regression - Part 1

Basic Linear Regression

Basic Linear Regression

This post serves as an introduction to econometric theory. Unlike others available resources on the web, I would like to go through all the basic concepts with the R software. All the code of linear/multiple regression will be reinvented by yourself, rather than using the built-in functions lm. This approach will let you familiar with the software R, as well as to let you to understand the theory in a more concrete way.

What is Linear Regression

The following linear equation defines a linear relationship between the variable \( x \) and \( y \):
\[ y = 12 + 6x_1 + 4x_2 + 9x_3 \]
From the above equation, we can find out the value of \( y \) by a given set \( x=\{x_1, x_2, x_3\} \)
For example, if \( x_1 = 10, x_2=20, x_3 = 30 \)
\( y = 12 + 6\times10 + 4\times20 +9\times30 = 422 \)
Let's define the linear equation in R code:
equation01 <- function(x) {
    b = c(12, 6, 4, 9)
    x = c(1, x)
    result = t(x) %*% b
    as.numeric(result)  #convert the datatype from matrix to numeric
}
Next, let's generate 100 observations of \( y \) by setting different values of vector \( x=\{x_1, x_2, x_3\} \). For simplicity, let's limited the values of vector \( x \) between 1 to 10.
y = rep(0, 100)
x = matrix(sample(1:10, 300, replace = T), nrow = 100, ncol = 3, byrow = T)
for (i in 1:100) {
    y[i] = equation01(x[i, ])
}
Now, the population data for the linear equation are:
df = data.frame(y = y, x1 = x[, 1], x2 = x[, 2], x3 = x[, 3])
df
##       y x1 x2 x3
## 1   161  8  5  9
## 2   118  2 10  6
## 3   120  3  9  6
## 4   107  2  5  7
## 5    98  2  5  6
## 6   118  5  1  8
## 7    81  5  3  3
## 8    73  3  4  3
## 9   102 10  3  2
## 10  131  7  8  5
## 11  172  5 10 10
## 12  101  7  5  3
## 13   58  1  1  4
## 14  102 10  3  2
## 15  120  9  9  2
## 16  106  8  7  2
## 17  155  7  5  9
## 18  186  8  9 10
## 19  127 10  7  3
## 20   87  5  9  1
## 21   63  1  9  1
## 22  109  6  4  5
## 23  152  7  2 10
## 24  185 10  8  9
## 25  113  5  2  7
## 26   77  6  5  1
## 27  172  7  7 10
## 28  123  8  9  3
## 29  116  3  8  6
## 30  153  8  3  9
## 31   71  7  2  1
## 32  139  4 10  7
## 33  157  7 10  7
## 34  173  8  8  9
## 35  133  6 10  5
## 36  102  4  3  6
## 37  121  3  7  7
## 38  116  8  5  4
## 39  119 10  5  3
## 40  141  9  3  7
## 41  156  9  9  6
## 42  131  9  5  5
## 43   57  1  3  3
## 44  139  5  4  9
## 45   61  2  7  1
## 46  113  4  8  5
## 47  186  8  9 10
## 48  120  2  6  8
## 49  174  6  9 10
## 50  126  2  3 10
## 51  141  8  9  5
## 52  137  9  2  7
## 53  134  3  8  8
## 54  120  9  9  2
## 55  124  3 10  6
## 56  135  8  3  7
## 57   52  1  4  2
## 58   77  1  8  3
## 59  114  7  6  4
## 60  166 10 10  6
## 61  108  3  6  6
## 62  107  2  5  7
## 63  129  8  6  5
## 64  150  2  9 10
## 65  122  4  8  6
## 66  137  2  8  9
## 67  176  9  5 10
## 68  124  3  1 10
## 69  163 10  7  7
## 70   64  2  1  4
## 71  104  8  2  4
## 72  141  9  3  7
## 73  100  5  1  6
## 74   79  9  1  1
## 75   67  3  7  1
## 76  139  1 10  9
## 77   65  1  5  3
## 78  152  3  8 10
## 79  180  9  6 10
## 80  116  1  2 10
## 81   72  3  6  2
## 82   61  3  1  3
## 83   55  1  7  1
## 84  184 10 10  8
## 85  125  6  8  5
## 86   99  2  3  7
## 87  120  2  6  8
## 88   92  4  5  4
## 89  118  9  4  4
## 90  150  2  9 10
## 91  142 10  4  6
## 92  150  9  3  8
## 93  165  9  9  7
## 94  135  3  6  9
## 95  178  8  7 10
## 96  123  1  6  9
## 97   60  1  6  2
## 98  126  6  6  6
## 99  105  7  6  3
## 100 115  9 10  1

Estimating the coefficients

Now, suppose that we do NOT know the parameters in the linear equation; we just have the 100 observations of \( y \) and \( x \) s, Is there any method for us to estimate the coefficients?
Yes, the simplest and easliest method is Ordinary Least Squares

Ordinary Least Squares (OLS)

If \( y \) and \( x_{input} \) has a linear relationship between them, and the real linear coefficients is a vector \( \beta \), the linear relationship can be defined as:
\( x_{input} \) is a \( k \) dimension column vector \( x_{input}=(x_1, x_2, \cdots, x_k)' \). The linear combination of \( x \) for arbitrary coefficients \( \beta = (\beta_0, \beta_1, \beta_2, \cdots, \beta_k )' \) is:
\[ \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k \]
where \( \beta_0 \) represents a constant, and the dimension of vector \( \beta \) is \( k+1 \). The relationship can be represented in the following matrix form:
\[ x = (1, x_{input}') \]
\[ y = x' \beta \]
To estimate the true coefficients \( \beta \), we define a vector \( \hat{\beta} = (\hat{\beta_0}, \hat{\beta_1}, \cdots, \hat{\beta_k})' \)
For \( i \) observation, the estimation error \( e_i \) (residual) is defined as:
\[ e_i = y_i - x' \hat{\beta} \]
A good estimation of \( \hat{\beta} \) should results a minimum values of the squared estimation errors (residuals sum of squares) of the all observations, OLS estimate the linear coefficients by minimizing the following objective function:
\[ S(\hat{\beta}) = \sum_{i=1}^{N}{e_{i}^2} \]
\[ S(\hat{\beta}) = \sum_{i=1}^N{(y_i - x_i ' \hat{\beta})^2} \]
where \( N \) is the total number of observations.
The minimization can be done by differentiating the objective function \( S(\hat{\beta}) \) with respect to \( \beta \):
\[ -2 \sum_{i=1}^N x_i(y_i - x_i ' \hat{\beta}) = 0 \]
or
\[ (\sum_{i=1}^N x_i x_i ') \hat{\beta} = \sum_{i=1}^N x_i y_i \]
\[ \hat{\beta} = (\sum_{i=1}^N x_i x_i ')^{-1} \sum_{i=1}^N x_i y_i \]
The above equation is solved when the \( (K+1) \times (K+1) \) matrix \( \sum_{i=1}^N x_i x_i ' \) is invertible. No multicollinearity exists in the data \( x \) and the matrix is full rank.

A complete Example using Matrix Notation

For a linear system, the \( N \times K+1 \) matrix \( X \) represents the input data has \( N \) observation and \( K \) parameters (dependent variables).
\[ \left( \begin{array}{cccc} 1 & x_{12} & \cdots & x_{1K} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_{N2} & \cdots & x_{NK} \end{array} \right) = \left( \begin{array}{c} x_1 ' \\ \vdots \\ x_N ' \end{array} \right) , y = \left( \begin{array}{c} y_1 ' \\ \vdots \\ y_N ' \end{array} \right) \]
The \( N \) dimension column vector \( e \) is the residuals in the linear regression.
\[ e = y - X\hat{\beta} \]
The objective function in matrix notation is:
\[ S(\hat{\beta}) = e'e \] \[ S(\hat{\beta})= (y - X\hat{\beta})' (y - X\hat{\beta}) = y'y - 2y'X\hat{\beta} + \hat{\beta}'X'X\hat{\beta} \]
Differentiating the objective function with respect to \( \hat{\beta} \) yields:
\[ \frac{\delta S(\hat{\beta})}{\delta \hat{\beta}} = -2(X'y - X'X\hat{\beta}) = 0 \]
\[ \hat{\beta} = (X'X)^{-1}X'y \]
Let's try to build a function that carry out linear regression in R.
lr <- function(X, y, method = "OLS", intercept = T) {
    beta = rep(0, ncol(X))

    if (intercept == T) {
        # reset the length of vector beta
        X = cbind(1, X)
        beta = rep(0, ncol(X))
    }


    beta = solve(t(X) %*% X) %*% t(X) %*% y
    beta
}
Now we run a test for the lr() function, the data “df” we have is generated by the linear equation \( y = 12 + 6x_1 + 4x_2 + 9x_3 \) with the random generated values of \( x_1, x_2 and x_3 \)
head(df)
##     y x1 x2 x3
## 1 161  8  5  9
## 2 118  2 10  6
## 3 120  3  9  6
## 4 107  2  5  7
## 5  98  2  5  6
## 6 118  5  1  8
The true values of \( \beta \) are \( \beta = (12, 6, 4, 9) \)
Suppose that we do not know the underlying relationship between \( y \) and \( x \) and we just have the data “df”. We believed that the relationship between the variables \( y \) and \( x \) are linear.
\[ y=\beta_0 + \beta_1 x_1 + \beta_2 x_2 +\beta_3 x_3 \]
The coefficent \( \hat{\beta} \) estimated by OLS are:
X = as.matrix(df[, 2:4], dimnames = NULL)
y = df[, 1]

dimnames(X) <- list(NULL, NULL)


lr(X = X, y = y)
##      [,1]
## [1,]   12
## [2,]    6
## [3,]    4
## [4,]    9
The true \( \beta \) and the estimated \( \hat{\beta} \) are the same.

Linear Regression with Stochastic Component

In real life, it is hardly to find a perfect linear system that output is exactly following a linear relationship without errors. For example, you may think that individual wages has a linear relationship with the factors of education level and ages. However, even two persons with the same education level and ages, their wages are hardly the same. The differences may be yielded from difference job natures, different services of years etc. It refered as the omitted variables problem. Further even all the factors are the same, there may be a random error in the model that cannot be explained. This may caused by sampling errors or measurment errors, a stochastic component \( \epsilon \) should be added to the linear equation.
\[ y = X \beta + \epsilon \]
Sometimes, the stochastic component \( \epsilon \) is refered as the unobservable error term. The unobservable error term should independent to any dependent variable \( x \), i.e. \( E\{\epsilon | X\} = 0 \). The variable \( x \) is considered as exogenous variables in this case.

Gauss-Markov Assumptions

OLS is a good estimator if and only if the following assumptions are hold: 1. \( E\{\epsilon_i\} = 0, i=1,\cdots,N \) 2. \( x_i \) and \( \epsilon_i \) are independent 3. \( V\{\epsilon_i\} = \sigma^2 \) (homoskedasticity) 4. \( cov\{\epsilon_i, \epsilon_j\} = 0, i \neq j \)
The above four assumptions is called the Gauss-Markov conditions

How Good OLS does its job?

A good estimator should be unbiased and consistent (my old post). Let us examinate whether OLS provides a good estimation or not.
###1. Unbiased Unbiased means the expected value of the estimator \( \hat{\beta} \) should be equal to the true value \( \beta \) in repeated sampling.
In OLS, \( \hat{\beta} \) is estimated as:
\[ \hat{\beta} = (X'X)^{-1}X'y \]
The expected value of \( \hat{\beta} \): \[ E\{\hat{\beta}\} = E\{(X'X)^{-1}X'y \} \] \[ E\{\hat{\beta}\} = E\{(X'X)^{-1}X' (X \beta + \epsilon) \} \] \[ E\{\hat{\beta}\} = E\{(X'X)^{-1}X'X \beta + (X'X)^{-1}X'\epsilon) \} \] \[ E\{\hat{\beta}\} = E\{ \beta + (X'X)^{-1}X'\epsilon) \} \] (from assumption 2) \[ E\{\hat{\beta}\} = \beta + E\{ (X'X)^{-1}X'\} E\{\epsilon \} \] (from assumption 1, \( E\{\epsilon\}=0 \)) \[ E\{\hat{\beta}\} = \beta \]
The variance of \( \hat{\beta} \):
\[ V\{\hat{\beta}\} = E\{(\hat{\beta} - \beta)(\hat{\beta} - \beta)'\} \] \[ V\{\hat{\beta}\} = E\{(X'X)^{-1}X'\epsilon \epsilon'X(X'X)^{-1}\} \] (from assumption 3 and 4) \[ V\{\hat{\beta}\} = E\{(X'X)^{-1}X'\sigma^2 I_N X(X'X)^{-1}\} \] \[ V\{\hat{\beta}\} = \sigma^2 (X'X)^{-1} \]
So, next, what is the value of \( \sigma \)? An unbiased estimator of it should be the sample variance of the residuals \( e_i = y_i - x_i ' \hat{\beta} \)
\[ s^2 = \frac{1}{N-(K+1)} \sum_{i=1}^{N} e_{i}^2 \]
If the regression contains no intercept, the degree of freedom adjustment should be \( (N-K) \) instead of \( (N-(K+1) \).
The estimated variance of an element \( \hat{\beta_k} \) is given by \( s^2 c_{kk} \), where \( c_{kk} \) is the \( (k,k) \) element in the matrix \( (X'X)^{-1} \).
The square root of the estimated variance is called the standard error of \( \hat{\beta_k} \), which denoted as \( se(\hat{\beta_k}) \)
Now, let's modify our lr() function in R to incorporate the variance and standard error measure:

lr <- function(X, y, method = "OLS", intercept = T) {
    X = as.matrix(X)
    K = ncol(X)
    N = nrow(X)
    beta = matrix()
    result = list()


    if (intercept == T) {
        # reset the length of vector beta
        K = K + 1
        X = cbind(1, X)
    }

    beta = solve(t(X) %*% X) %*% t(X) %*% y

    # calculate the residuals
    resid = y - X %*% beta

    # calculate the variance of residuals
    residvar = (t(resid) %*% resid)/(N - K)
    # convert it to value
    residvar = residvar[1, 1]

    betacovar = solve(t(X) %*% X) * residvar

    # assign value to the list result and return it
    result$beta$val = beta
    result$resid$val = resid
    result$resid$var = residvar
    result$beta$covar = betacovar
    result$beta$var = diag(betacovar)
    result$beta$se = sqrt(diag(betacovar))
    result$hasIntercept = T
    result
}
lr.summary <- function(result) {
    displayAdjust = 0
    K = length(result$beta$val)

    cat("Regression Results:\n\n")
    cat("Residuals:\n")
    cat(result$resid$val, "\n\n")
    cat("Coefficients:\n")

    cat("                 \t Values \t \t Std. Error \t \n")

    for (i in 1:K) {
        if (result$hasIntercept && (i == 1)) {
            text = sprintf("(Intercept)\t \t  %-.6f \t \t %-.6f \t \n", result$beta$val[1], 
                result$beta$se[1])
            cat(text)
            displayAdjust = 1
        } else {
            text = sprintf("beta%-5d   \t \t  %-.6f \t \t %-.6f \t \n", i - 
                displayAdjust, result$beta$val[i], result$beta$se[i])
            cat(text)
        }

    }

}

Let's try out our modified version of lr() function:

results = lr(X, y)
lr.summary(results)
## Regression Results:
## 
## Residuals Summary:
##        V1           
##  Min.   :-8.53e-14  
##  1st Qu.: 2.49e-14  
##  Median : 5.68e-14  
##  Mean   : 5.90e-14  
##  3rd Qu.: 9.95e-14  
##  Max.   : 2.02e-13  
## 
## Coefficients:
##                         Values     Std. Error
## (Intercept)          12.000000       0.000000
## beta1                 6.000000       0.000000
## beta2                 4.000000       0.000000
## beta3                 9.000000       0.000000
I think that's enough for today, let's take a coffee and have a break!

Monday, 26 August 2013

Properties of Good Estimators

A good estimator should possess the 3 proprietress:  Unbiased, Consistent, and Efficient.
  • Unbiased: What you get is what you expected.
  • Consistent: When you get more and more samples, the value of the estimator will converge to the true value.
  • Efficient: The variance of your estimator should be small.
Abstract enough? let's take a look on the following example:

Is the formula \(bar{X} = \frac{1}{n}\sum X_i\) a good estimator for the 'mean' for a set of random variables $X_i$?

Yes, it is! let's proof it:

As there is no other estimator to compare, we can just focus on the properties of unbiased and consistent.

1. For Unbiased:
$E[\bar{X}] = E[ \frac{1}{n}\sum X_i]$
$E[\bar{X}] = \frac{1}{n}\sum E[X_i]$
$E[\bar{X}] = \frac{1}{n}\sum \mu$
$E[\bar{X}] = \frac{1}{n}n\mu$
$E[\bar{X}] = \mu$
What you get is what you expected, so it is unbiased!

2. Consistent:
$Var[\bar{X}] = Var[\frac{1}{n}\sum X_i]$
$Var[\bar{X}] = \frac{1}{n}^2 Var[\sum X_i]$
Since the observation of $X_i$ are independent to each other, the covariance between $X_i$ and $X_{i-1}$ is zero, the equation can be further written as :
$Var[\bar{X}] = \frac{1}{n}^2 \sum Var[X_i]$
$Var[\bar{X}] = \frac{1}{n}^2 n\sigma_X$
$Var[\bar{X}] = \frac{sigma_X}{\sqrt{n}}$
As the sample size $n$ goes large, the variance of the estimator is zero, so the estimator converge.

Done!