Calibrating the Nelson-Siegel Model

## Keywords

Nelson-Siegel, Nelson-Siegel-Svensson, Differential Evolution

draft

# Abstract

The Nelson-Siegel-Svensson model is widely-used for modelling the yield curve, yet many authors have reported ‘numerical difficulties’ when calibrating the model. We argue that the problem is twofold: firstly, the optimisation problem is not convex and has multiple local optima. Hence standard methods that are readily available in statistical packages are not appropriate. We implement and test an optimisation heuristic, Differential Evolution, and show that it is capable of reliably solving the model. Secondly, we also stress that in certain ranges of the parameters, the model is badly conditioned, thus estimated parameters are unstable given small perturbations of the data. We discuss to what extent these difficulties affect applications of the model.

# 1 Models and estimation

We look into two variants of the model, namely the original formulation of Nelson and Siegel [4], and the extension of Svensson [5]. De Pooter [2] gives an overview of other variants.

Nelson and Siegel [4] suggested to model the yield curve at a point in time as follows: let $y(\tau)$ be the zero rate for maturity $\tau$, then

(1)
\begin{eqnarray} y(\tau) &= \beta_{1} + \beta_{2} \left[\dfrac{1-\exp(-\frac{\tau}{\lambda})}{\frac{\tau}{\lambda}}\right] + \beta_{3} \left[\dfrac{1-\exp(-\frac{\tau}{\lambda})}{\frac{\tau}{\lambda}} - \exp(-\frac{\tau}{\lambda})\right]\,. \end{eqnarray}

Thus, for given a given cross-section of yields, we need to estimate four parameters: $\beta_1$, $\beta_2$, $\beta_3$, and $\lambda$. For $m$ observed yields with different maturities $\tau_{\re{1}}$, …, $\tau_{m}$, we have $m$ equations. We do not assume that the model's parameters are constant, but they can change over time. To simplify notation, we do not add subscripts for the time period.

The Nelson-Siegel-Svensson (NSS) model adds a second hump term to the NS model. Let again $y(\tau)$ be the zero rate for maturity $\tau$, then

(2)
\begin{eqnarray} y(\tau) &=& \beta_{1} + \beta_{2} \left[\dfrac{1-\exp(-\frac{\tau}{\lambda_{1}})}{\frac{\tau}{\lambda_{1}}}\right] + \\ && \beta_{3} \left[\dfrac{1-\exp(-\frac{\tau}{\lambda_{1}})}{\frac{\tau}{\lambda_{1}}} - \exp(-\frac{\tau}{\lambda_{1}})\right] + \beta_{4}\left[\dfrac{1-\exp(-\frac{\tau}{\lambda_{2}})}{\frac{\tau}{\lambda_{2}}} - \exp(-\frac{\tau}{\lambda_{2}})\right]\,.\nonumber \end{eqnarray}

Here we need to estimate six parameters: $\beta_1$, $\beta_2$, $\beta_3$, $\beta_4$, $\lambda_1$ and $\lambda_2$.

The parameters of the models can be estimated by minimising the difference between the model rates $y$, and observed rates $y^M$ where the superscript stands for ‘market’. An optimisation problem can be stated as

(3)
\begin{align} \min_{\beta, \lambda}\sum \big(y-y^M\big)^2\,, \end{align}

possibly subject to constraints. There are many variants of this objective function: we could use absolute values instead of squares, or a more robust function of scale. Likewise, we could use bond prices instead of rates, and so on. We are interested in the numerical aspects of the calibration; the decision on which specification to use should rather follow from empirical tests, so here we work with specification (3). Below we will apply an optimisation method that is capable of estimating all parameters in one step for different variants of the objective function, and under different constraints.

# 3 The collinearity problem

Both NS and NSS can be interpreted as factor models (Diebold and Li [3]): the $\beta$-coefficients are the factor realisations; the factor loadings are the weight functions of these parameters. For the NS-model the loadings for a maturity $\tau$ are thus given by

(4)

So by setting $\lambda$, we impose fixed factor loadings on a given maturity. The three factors are then interpreted as level ($\beta_1$), steepness ($\beta_2$), and curvature ($\beta_3$). With $m$ different maturities $\tau_{\re{1}}$, …, $\tau_m$ and with a fixed $\lambda$-value, we have $m$ linear equations from which to estimate three parameters. So we need to solve

(5)
\begin{eqnarray} \left( \begin{array}{cccc} 1 & \dfrac{1-\exp(-\frac{\tau_1}{\lambda})}{\frac{\tau_1}{\lambda}} & \dfrac{1-\exp(-\frac{\tau_1}{\lambda})}{\frac{\tau_1}{\lambda}} - \exp(-\frac{\tau_1}{\lambda}) \\ 1 & \dfrac{1-\exp(-\frac{\tau_2}{\lambda})}{\frac{\tau_2}{\lambda}} & \dfrac{1-\exp(-\frac{\tau_2}{\lambda})}{\frac{\tau_2}{\lambda}} - \exp(-\frac{\tau_2}{\lambda}) \\ 1 & \dfrac{1-\exp(-\frac{\tau_3}{\lambda})}{\frac{\tau_3}{\lambda}} & \dfrac{1-\exp(-\frac{\tau_3}{\lambda})}{\frac{\tau_3}{\lambda}} - \exp(-\frac{\tau_3}{\lambda}) \\ \vdots & \vdots & \vdots \\ 1 & \dfrac{1-\exp(-\frac{\tau_m}{\lambda})}{\frac{\tau_m}{\lambda}} & \dfrac{1-\exp(-\frac{\tau_m}{\lambda})}{\frac{\tau_m}{\lambda}} - \exp(-\frac{\tau_m}{\lambda}) \\ \end{array} \right) \left( \begin{array}{c} \beta_1\\ \beta_2\\ \beta_3\\ \end{array} \right) = \left( \begin{array}{c} y^M(\tau_1)\\ y^M(\tau_2)\\ y^M(\tau_3)\\ \vdots\\ \vdots\\ y^M(\tau_m)\\ \end{array} \right) \end{eqnarray}

for $\beta$. We can interpret the NSS-model analogously, just now we have to fix two parameters, $\lambda_1$ and $\lambda_2$. Then we have a fourth loading

(6)
\begin{align} \Bigg[\ldots \dfrac{1-\exp(-\frac{\tau_i}{\lambda_2})}{\frac{\tau_i}{\lambda_2}} - \exp(-\frac{\tau_i}{\lambda_2}) \Bigg]'\,, \end{align}

ie, a fourth regressor, and can proceed as before. This system of equations is overidentified for the practical case $m>3$ (or $m>4$ for NSS), so we need to minimise a norm of the residuals.

Even for badly-conditioned problems, we may obtain small residuals (ie, a good fit), but we cannot accurately compute the parameters any more. In other words, many different parameter values give similarly-good fits. This is the problem here. For many values of $\lambda$, the factor loadings are highly correlated, and hence Equations (5) are badly conditioned; we have an identification problem.

Figure 1. NS: Correlations between factor loadings of $\beta_2$ and $\beta_3$ for different values of $\lambda$.

Figure 2. NSS: Correlations between factor loadings for different $\lambda$.

In the NS model, for many values of the $\lambda$-parameter, the correlation between the second and the third loading is high, thus the attribution of a particular yield curve shape to the specific factor becomes difficult. Figure 1 shows the correlation between the factor loadings for different values of $\lambda$. We see that the correlation is 1 at a $\lambda$ of zero, and rapidly decays to -1 as $\lambda$ grows. Outside a range from 0.1 to 4 or so, an optimisation procedure can easily trade off changes in one variable against changes in the other. This is troublesome for numeric procedures since they then lack a clear indication into which direction to move. We obtain the same results for the NSS-model. Figure 2 shows the correlation between the second and the third, the second and the fourth, and the third and the fourth factor loading, respectively. Again, we see that for $\lambda$-values greater than about 5 or so, the correlation rapidly reaches either 1 or -1. So we have to expect large estimation errors; estimation results will be sensitive to small changes in the data.

If we only want to obtain a tractable approximation to the current yield curve, for instance for pricing purposes, we need not care much. True, many different parameter values give similar fits; but we are not interested in the parameters, only in interpolation. How about forecasting? Correlated regressors are not necessarily a problem in forecasting. We are often not interested in disentangling the effects of two single factors as long as we can assess their combined effect. The problem changes if we want to predict the regression coefficients themselves. Diebold and Li [3] for instance use the NS-model to forecast interest rates. They first fix $\lambda$, and then estimate the $\beta$-values by Least Squares. That is, for each cross-section of yields, they run a regression, and so obtain a time series of $\beta$-values. They then model the $\beta$-values as AR(1)-processes, and use these to predict future $\beta$-values and hence yield curves.

To demonstrate the effects of high correlation, we replicate some of the results of Diebold and Li [3]. Their data set comprises monthly zero rates for maturities 1/12, 3/12, 6/12, 9/12, 1, 2, …, 10 years. Altogether, there are 372 yield curves, from January 1970 to December 2000. These data are available from http://www.ssc.upenn.edu/~fdiebold/YieldCurve.html. We first set the value of $\lambda$ to 1.4 as do Diebold and Li [3]. Then we run regressions to obtain $\beta$ time series; these are shown in Figure 3. We have slightly rescaled the series to make them correspond to Figure 7 in Diebold and Li [3, p.350]: for $\beta_2$ we switch the sign, $\beta_3$ is multiplied by 0.3.

Next, we run regressions with $\lambda$ at 10. Figure 1 suggest that then the weight functions of $\beta_2$ and $\beta_3$ will be strongly negatively correlated, ie, we cannot accurately estimate $\beta_2$ and $\beta_3$ any more. Figure 4 shows the obtained time series in black lines. Note that the $y$-scales have changed; the grey lines are the time series that were depicted in Figure 3. We see that the series are much more volatile, and thus are very likely more difficult to model.

Figure 3. Time series of $\beta_1$, $-\beta_2$, and $0.3\beta_3$. The rescaling corresponds to Diebold and Li [3, p. 350, Figure 7]. $\lambda$ is set to 1.4.

Figure 4. Time series of $\beta_1$, $-\beta_2$, and $0.3\beta_3$. The rescaling corresponds to Diebold and Li [3, p. 350, Figure 7]. $\lambda$ is set to 10. The grey lines are the time series from Figure 3.

So in sum, if we aim to meaningfully estimate parameters for the NS or the NSS model, we need to restrict the $\lambda$-values to ranges where practical identification is still possible. For both the NS and the NSS-case, this means a $\lambda$ up to about 4 or 5; for the NSS-model we should make sure that the $\lambda$-values do not become too similar. The exact correlations for a particular case can readily be calculated: we fix the maturities with which to work, insert them into Equations (5), and compute the correlations between the regressors.

# 4 The optimisation problem

Now we will deal with the optimisation problem, so our aim will be to solve model (3). The problem is not convex, so we will use appropriate procedures: optimisation heuristics. More specifically, we will apply Differential Evolution (DE). We will not discuss the algorithm in detail here; for details see the paper. The computing time for a run of DE in R 2.8.1 on an Intel P8700 (single core) at 2.53GHz with 2GB RAM is less than 10 seconds.

## The Experiment

We use again the data set from Diebold and Li [3], monthly zero rates for maturities 1 month to 10 years. We have 372 cross-sections of yields. For each cross-section, we fit an NSS model by running ten times a gradient-based search (nlminb from R's stats package), and DE. The starting values for nlminb are drawn randomly from the allowed ranges of the parameters; in the same way we set up the initial population of DE.

For each cross-section of yields (ie, each month) we have ten solutions obtained from nlminb, and ten from DE. For each solution, we compute the root-mean-squared (rms) error in percentage points, ie,

(7)
\begin{align} \sqrt{\frac{1}{m}\sum_{i=1}^{m}\bigg(\,y^M(\tau_i)-y(\tau_i)\,\bigg)^2}\,. \end{align}

Since the problem is not convex, we should not expect to obtain the same error for different restarts, not even — or rather, in particular not — for nlminb, given that we use different starting values for each restart. Thus, for each month for which we calibrate the model, we have ten different solutions for a given method. We compute the worst of these solutions in terms of fit as given by Equation (7), the best solution, and the median.

If we just compare the average error for gradient search and DE, we find them not too different. The median-median rms error is 5.4 bp for DE, compared with 8.1 bp for gradient search. (‘median-median’ means: for each month, we have ten results for each method, of which we compute the median; then we take median over all these median values, ie, we average over all months.) So we can find acceptable fits even with a gradient-based method, even though it is actually not an appropriate method. We stress, however, that several restarts are required.

While repeated runs of both DE and gradient search result in different solutions, these solutions are much more stable for DE than for nlminb. The median range (highest rms error minus lowest rms error) over all months for DE is exactly zero, the mean range is 0.2 bp; in 97% of all cases was the range for DE smaller than one basis point! In contrast, gradient search has a median range of 6.1 bp (mean range 8.1 bp); a range smaller than one basis point was only achieved in 8% of cases.

Given that DE found solutions at least as good and mostly better than gradient search, and more reliably so, and that computation time is not prohibitive, we conclude that for this particular problem, DE is more appropriate than a traditional optimisation technique based on the gradient.

# 5 Conclusion

In this paper we have analysed the calibration of the Nelson-Siegel and Nelson-Siegel-Svensson model. Both models are widely used, yet it is rarely discussed that fitting the models to market rates often causes problems. We have shown that these difficulties can possibly be reduced by using alternative optimisation techniques. Differential Evolution, which we tested, gave results that were reliably better than those obtained by a traditional method based on the derivatives of the objective function. But these improvements concern the fit, that is, the discrepancy between market rates and model rates. We also showed that parameter identification is only possible when specific parameters are restricted to certain ranges; unconstrained optimisation runs the risk of moving into parameter ranges where single parameters cannot be accurately computed any more.