Generating correlated normal variates

Enrico Schumann

## Keywords

covariance matrices, Matlab, R, random numbers, random variables

Unreviewed

# Overview

For many applications in econometrics and finance one needs to create random variates that are correlated. Unfortunately, methods to create such random numbers are not always implemented in statistical software packages (which often only offer univariate random number generators). However, given vectors of random numbers can be adjusted to have the required correlation. Here the case for variates following a normal distribution is described.

# The Problem

Assume one wants to create a vector of random variates $Z$ which is distributed according to

(1)
\begin{align} Z\! \sim \! \mathcal{N}(\mu,\Sigma) \end{align}

where $\mu$ is the vector of means, and $\Sigma$ is the variance-covariance matrix.

# Solution

Any variance-covariance matrix is symmetric and real-valued; assume further that the matrix is positiv definit.1
Such a matrix can always be decomposed into

(2)
\begin{align} \Sigma =\, LDL' \end{align}

where $L$ is a unit lower triangular matrix and $D$ is a diagonal matrix with strictly positive elements. This allows to rewrite the decompostion as

(3)
\begin{align} \Sigma =\, L\sqrt{D}\sqrt{D}L'=CC' \end{align}

where $C$ is a lower triangular matrix called the Cholesky factor of $\Sigma$. (The expression $\sqrt{D}$ here means that one takes the square root of each diagonal element of $D$ which is always possible since all elements on the main diagonal of $D$ are strictly positive.)

To generate correlated random variables, generate first a vector $X$ which is distributed as

(4)
\begin{align} X\! \sim \! \mathcal{N}(0,I) \end{align}

where $I$ is an appropriately-sized identity matrix. In general, premultiplying such a vector by a matrix $M$ and adding a vector $A$, one obtains

(5)
\begin{align} MX\! \sim \! \mathcal{N}(A,MIM') \end{align}

(This is the matrix analogue to saying that for any scalar random variable $x$ with mean $m$ and variance $v$, it holds that $ax+b$ has mean $m + b$ and variance $a^2v$.)
Thus,

(6)
\begin{align} CX+\mu\! \sim \! \mathcal{N}(\mu,CIC') \end{align}

Since

(7)
\begin{align} CIC' = CC' = \Sigma \end{align}

one obtains the desired result by simply premultiplying the (column) vector of uncorrelated random variates by the Cholesky factor.

Assume one wants to create $T$ observations of $n$ time series which are correlated according to $\Sigma$. Then one creates a matrix $R$, where by the usual convention the observations are in the rows, that is $R$ is of dimension $T \times n$. To induce the required correlations, one postmultiplies the whole matrix by $C'$ (i.e., the upper triangular matrix), that is

(8)
\begin{equation} R^c = RC' \end{equation}

where the columns in $R^c$ are correlated as desired.

# Source Code / Implementation

In both Matlab and R, the Cholesky factor can be computed with the command chol. Please note that both programmes return upper triangular matrices.

## An R implementation

nA         <- 5      # number of assets
nT         <- 100    # number of obs
rho        <- 0.8    # correlation between two assets

## create uncorrelated observations
X         <- rnorm(nA * nT) * 0.05
dim(X)    <- c(nT, nA)

## check
pairs(X, col = grey(0.4)); cor(X) ## set correlation matrix
M         <- array(rho, dim = c(nA, nA))
diag(M)    <- 1

## compute cholesky factor
cF        <- chol(M)

## induce correlation, check
Y        <- X %*% cF
pairs(Y, col = grey(0.4)); cor(Y) ## A Matlab implementation

nA = 5;     % number of assets
nT = 100;   % number of obs
rho = 0.8;  % correlation between two assets

% create uncorrelated observations
X = randn(nT, nA) * 0.05;

% check
plotmatrix(X); corrcoef(X)

% set correlation matrix
M = ones(nA, nA) * rho;
M(1:(nA + 1):(nA * nA)) = 1;

% compute cholesky factor
cF = chol(M);

% induce correlation, check
Y = X * cF;
plotmatrix(Y); corrcoef(Y)