Minimising Sums of Squared Residuals and Variances of Residuals

Author

Enrico Schumann

Keywords

portfolio optimisation, R, variance-covariance matrices, regression, style analysis, index tracking

Review Status

draft

Introduction

This tutorial describes how to minimise sums of squares or variances for linear models in R. The scripts can be used to compute style regressions (with inequality constraints) or tracking portfolios.

A helpful identity

We have a matrix $X$ of size $T \times p$ (a sample of $T$ observations of a random vector $x$ of size $p$). Define the vector $m$ as the vector of column means of $X$, then

(1)
\begin{align} \frac{1}{T}X'X = \operatorname{cov}(X) + mm' \end{align}

where $\operatorname{cov}(X)$ stands for the variance-covariance-matrix of the columns of $X$, computed with denominator $T$ (not $T-1$). If the mean-vector is zero, the crossproduct of $X$ equals the variance-covariance-matrix up to a scalar.

The following script tests Equation (1) in R.

T <- 100
p <- 10
X <- array(rnorm(T*p), dim = c(T,p) )

X1 <- crossprod(X) / T
X2 <- ( (T - 1) / T) * cov(X) + outer(colMeans(X),colMeans(X))
identical(X1,X2)
all.equal(X1,X2)

Numerically, X1 and X2 will not be exactly equal, hence identical will give a FALSE.

The following script tests Equation (1) in Matlab.

T = 100;
p = 10;
X = rand(T,p);

X1 = X' * X / T;
X2 = ((T - 1) / T) * cov(X) + mean(X)' * mean(X);

% numerical differences
max(max(abs(X1 - X2)))

Minimising sums of squares

In the linear regression model

(2)
\begin{align} y = X \beta + \epsilon \end{align}

the method of Least Squares minimises the the sum of the squared residuals. Finding the $\beta$ means solving the normal equations

(3)
\begin{align} (X'X)\beta = X'y \end{align}

for $\beta$. (If $X$ contains a constant column, the residuals will have zero mean, and hence minimising the sum of squares is equivalent to minimising the variance of the residuals.)

Sum of squares for a sample $X$ can be written as:

(4)
\begin{split} (y-X\beta)'(y-X\beta) &= y'y - y'X\beta - (X\beta)'y + (Xb)'X\beta\\ &= y'y - 2 y'X\beta + b'X'X\beta \end{split}

(The product $y'X\beta$ is a scalar, and hence equals $\beta'X'y$.) We drop $y'y$ since it does not depend on $\beta$ and divide by 2 to obtain

(5)
\begin{align} \frac{1}{2}\beta'X'X\beta - y'X\beta\,, \end{align}

to be minimised.

Minimising variance

Let $x$ be a random vector of size $p$, then we want to minimise

(6)
\begin{split} \operatorname{var}(y-x'\beta) &= \operatorname{var}(y) + \phantom{\beta'} \operatorname{var}(x'\beta) - 2\operatorname{cov}(y,x'\beta)\\ &= \operatorname{var}(y) + \beta'\operatorname{var}(x')\beta - 2\operatorname{cov}(y,x')\beta \end{split}


We drop $\operatorname{var}(y)$ since it does not depend on $\beta$, divide by 2, and obtain

(7)
\begin{align} \frac{1}{2}\beta'\operatorname{var}(x')\beta - \operatorname{cov}(y,x')\beta\,. \end{align}


For a sample $X$, this becomes

(8)
\begin{align} \frac{1}{2}\beta'\operatorname{var}(X)\beta - \operatorname{cov}(y,X)\beta\,. \end{align}

R Implementation

require(quadprog)

## create data
p  <- 10
T  <- 100
X  <- array(rnorm(T*p), dim = c(T,p))
y  <- rnorm(T)

## minimise squares

# variant 1 -- linear regression
coef(lm(y ~ 0 + X))

# variant 2 -- quadprog
Dmat <- crossprod(X) 
dvec <- as.vector(t(as.matrix(y)) %*% X)
Amat <- as.matrix(rep(0,p))
solve.QP(Dmat = Dmat, dvec = dvec, Amat = Amat)

## minimise variance

# variant 1 -- linear regression
coef(lm(y ~ X))[-1]

# variant 2 -- quadprog
Dmat <- cov(X) 
dvec <- as.vector(cov(X,y))
Amat <- as.matrix(rep(0,p))
solve.QP(Dmat = Dmat, dvec = dvec,Amat = Amat)

Internal Links

Concepts
Tutorials
How to compute the tangency portfolio
How to compute the global minimum-variance portfolio
Tips
Related Articles

External links

References
1. Gilli, M., D. Maringer and E. Schumann. (2011). Numerical Methods and Optimization in Finance. Elsevier.
2. R Development Core Team (2008). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. http://www.R-project.org.
3. Turlach, B.A. [S original] and A. Weingessel [R port] (2007). quadprog: Functions to solve Quadratic Programming Problems. R package version 1.4-11. available from CRAN.
Weblinks

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License