## Author

## Keywords

portfolio optimisation, R, variance-covariance matrices, regression, style analysis, index tracking

## Review Status

*draft*

# Introduction

This tutorial describes how to minimise sums of squares or variances for linear models in R. The scripts can be used to compute style regressions (with inequality constraints) or tracking portfolios.

# A helpful identity

We have a matrix $X$ of size $T \times p$ (a sample of $T$ observations of a random vector $x$ of size $p$). Define the vector $m$ as the vector of column means of $X$, then

(1)where $\operatorname{cov}(X)$ stands for the variance-covariance-matrix of the columns of $X$, computed with denominator $T$ (not $T-1$). If the mean-vector is zero, the crossproduct of $X$ equals the variance-covariance-matrix up to a scalar.

The following script tests Equation (1) in R.

```
T <- 100
p <- 10
X <- array(rnorm(T*p), dim = c(T,p) )
X1 <- crossprod(X) / T
X2 <- ( (T - 1) / T) * cov(X) + outer(colMeans(X),colMeans(X))
identical(X1,X2)
all.equal(X1,X2)
```

Numerically, `X1` and `X2` will not be exactly equal, hence `identical` will give a `FALSE`.

The following script tests Equation (1) in Matlab.

```
T = 100;
p = 10;
X = rand(T,p);
X1 = X' * X / T;
X2 = ((T - 1) / T) * cov(X) + mean(X)' * mean(X);
% numerical differences
max(max(abs(X1 - X2)))
```

# Minimising sums of squares

In the linear regression model

(2)the method of Least Squares minimises the the sum of the squared residuals. Finding the $\beta$ means solving the normal equations

(3)for $\beta$. (If $X$ contains a constant column, the residuals will have zero mean, and hence minimising the sum of squares is equivalent to minimising the variance of the residuals.)

Sum of squares for a sample $X$ can be written as:

(4)(The product $y'X\beta$ is a scalar, and hence equals $\beta'X'y$.) We drop $y'y$ since it does not depend on $\beta$ and divide by 2 to obtain

(5)to be minimised.

# Minimising variance

Let $x$ be a random vector of size $p$, then we want to minimise

(6)

We drop $\operatorname{var}(y)$ since it does not depend on $\beta$, divide by 2, and obtain

For a sample $X$, this becomes

# R Implementation

```
require(quadprog)
## create data
p <- 10
T <- 100
X <- array(rnorm(T*p), dim = c(T,p))
y <- rnorm(T)
## minimise squares
# variant 1 -- linear regression
coef(lm(y ~ 0 + X))
# variant 2 -- quadprog
Dmat <- crossprod(X)
dvec <- as.vector(t(as.matrix(y)) %*% X)
Amat <- as.matrix(rep(0,p))
solve.QP(Dmat = Dmat, dvec = dvec, Amat = Amat)
## minimise variance
# variant 1 -- linear regression
coef(lm(y ~ X))[-1]
# variant 2 -- quadprog
Dmat <- cov(X)
dvec <- as.vector(cov(X,y))
Amat <- as.matrix(rep(0,p))
solve.QP(Dmat = Dmat, dvec = dvec,Amat = Amat)
```

# Internal Links

Concepts |

… |

Tutorials |

How to compute the tangency portfolio How to compute the global minimum-variance portfolio |

Tips |

… |

Related Articles |

… |

# External links

References |

1. Gilli, M., D. Maringer and E. Schumann. (2011).
Numerical Methods and Optimization in Finance. Elsevier.2. R Development Core Team (2008).
R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. http://www.R-project.org.3. Turlach, B.A. [S original] and A. Weingessel [R port] (2007).
quadprog: Functions to solve Quadratic Programming Problems. R package version 1.4-11. available from CRAN. |

Weblinks |

… |