Portfolio Optimisation in a Distributed Computing Environment

## Keywords

Portfolio Optimisation, Threshold Accepting, Distributed Computing

Unreviewed

# Abstract

We investigate portfolio selection with alternative objective functions in a distributed computing environment. In particular, we optimise a portfolio's ‘Omega’ which is the ratio of two partial moments of the returns distributions. Since finding optimal portfolios under such performance measures and realistic constraints leads to non-convex problems, we suggest to solve the model with a heuristic method called Threshold Accepting (TA). TA is a very flexible technique as it requires no simplifications of the problem and allows for a straightforward implementation of all kinds of constraints. Applying this algorithm to actual data, we find that TA is well-adapted to optimisation problems of this type. Furthermore, we show that the computations can easily be distributed which leads to considerable speedups.

# 1 Introduction

Classical portfolio optimisation as introduced by Markowitz [6] models the investor’s choice of assets as a trade-off between the ‘reward’ and the risk of a portfolio, which Markowitz took to be the mean and variance of returns, respectively. This approach has been criticised for several reasons. One of these is that even if one does not want to consider a complete utility function, the decision to only look at the first two moments of the returns distribution appears debatable. There exists substantial evidence of non-normality of financial returns. (This is the case in particular for certain asset classes like derivatives or hedge funds.)

The task of the financial analyst who takes this criticism seriously should thus be first to incorporate objective functions into the portfolio optimisation process which capture the ‘stylised facts’ of actual return distributions and allow to treat upside and downside risk differently. Second, to make the application of these risk measures meaningful, the data cannot be reduced to a few parameters to be estimated, but rather the whole empirical distribution of returns (or at least larger parts of it) has to be taken into account.

Unfortunately, incorporating these requirements (in particular in conjunction with realistic constraints) into portfolio selection usually renders the optimisation problem infeasible for classical methods. Thus we propose to solve the problem with a heuristic method, Threshold Accepting. More specifically, we investigate the possibility to exploit one characteristic of heuristic methods, namely the stochastic nature of their solutions, when it comes to speeding up the computation in a distributed computing environment.

# 2 The investor's problem

The investor's problem is given as follows

(1)
\begin{array} {l} \displaystyle \min_{x \, \in \, \mathbb{R}^{n_{\mathrm{A}}}} \; \Phi(\ell) \\ E(\ell) \leq - v_0\, r_d \\ x'p_{0} = v_0 \\ x_j^{\inf} \leq x_j \leq x_j^{\sup} \qquad \\ K_{\inf} \leq \#\{\mathcal{J}\} \leq K_{\sup}. \end{array}

Here, $\ell$ are the portfolio losses in different states of nature (scenarios), the objective function, $\Phi(\ell)$, is a risk measure or a combination of multiple objectives which is to be minimised. Besides a minimum desired return $r_{d}$ and a budget constraint, further constraints are given by $x_j^{\inf}$ and x_j^{\sup}$which are vectors of minimum and maximum holding sizes, respectively, for those assets included in the portfolio.$K_{\inf}$and$K_{\sup}$are cardinality constraints which set a minimum and maximum number of assets to be held. For$\Phi, we chose (2) \begin{align} {\displaystyle \frac{\int_{-\infty}^{r_d} (r_d - r) \, f(r)\mathrm{d}r}{\int_{r_d}^{\infty} (r - r_d) \, f(r)\mathrm{d}r}} \end{align} which is known as the Omega function [5]. The distribution of portfolio losses was estimated via a scenario approach. # 3 Experiments and Distributed Optimisation ## Stochastics of the Solution A major difference between classical techniques and heuristic optimisation methods is the stochastic nature of the solutions found by the latter. Thus restarting the same algorithmn_{\mathtt{Restarts}}$times with a different seed will produce different solutions, all of which can be regarded as realisations of a random variable with some (unknown) distribution$\mathcal{D}$, see [4]. The distribution of the objective function values to which the obtained solutions map is not symmetric, but bounded to the left (recall that we minimise) and degenerates for increasing computational resources to a single point, namely the global minimum. Let the total amount of computational resources available be denoted by$\mathcal{I}, then in a serial computing environment we have (3) \begin{align} \mathcal{I} = n_{\mathtt{Restarts}} \times n_{\mathtt{Rounds}} \times n_{\mathtt{Steps}}\,, \end{align} wheren_{\mathtt{Rounds}}$is the number of thresholds, and$n_{\mathtt{Steps}}$is the number of steps per threshold.$\mathcal{D}$will depend on the total number of iterations per restart (and possibly also on how this product is subdivided into rounds and steps, see the paper for details). The time an algorithm needs (or is given) to produce a solution is proportional to$\mathcal{I}$in a serial environment, but this does not hold in parallel. From this formalisation one can derive two issues to be investigated: First, increasing$\mathcal{I}$should lead to better solutions, that is the mean solution should decrease while the standard deviation of solutions should also decrease and eventually go to zero as$\mathcal{I}$goes to infinity. Thus, the speed of convergence for some increasing but finite sequence$\mathcal{I}_1$,$\mathcal{I}_2\ldots\mathcal{I}_p$can be estimated. Second, assume an analyst has a fixed amount of computational resources at his disposal, then it is important to know how to allocate these resources among restarts, rounds and steps. Intuitively, one may rather use less iterations and thus have a ‘worse’ distribution of solutions, but still, by using several restarts, produce better results than just doing one optimisation with many steps. This reasoning gets more important in a distributed computing environment. Since the restarts are independent, it is straightforward to allocate them to different nodes of a grid or cluster. ## Experiments and Results Our data set consisted of several hundred daily stock price series for European companies over the period January 2004 to January 2005. (As a robustness check, we also repeated our experiments with data from other time periods.) From these we generated scenarios via parametric bootstrapping (from about one year of historical data we so created 2600 price scenarios). Over these scenarios we optimised our portfolio and investigated the shape of$\mathcal{D}$for an increasing number of iterations (the number of rounds was set to 10 for all experiments). The following figure shows the empirical distribution functions of the solutions (obtained values for$\Phi$) for 100000, 35000, 20000, 15000 and 10000 iterations. Next, given a desired quality of the solution, we investigated how the execution time could efficiently be reduced in a distributed computing environment. To obtain a value for the ‘desired quality’, we computed the 99th quantile of the distribution of the solutions obtained with the maximum number of iterations (100000). For all alternative settings, this objective function value, denoted$f^*$, was considered as the quality to be achieved with a probability of at least 99%. In our setting we had$f^* = 0.4538$as can be seen in the figure above. For each distribution corresponding to optimisation runs with fewer iterations we computed the number of restarts necessary to obtain at least one result below$f^*$. This can be formalised as a draw from a Bernoulli variable where ‘success’ is a realisation below$f^*$. What we wanted to compute is how many draws were necessary to have at least one ‘success’ in$n_{\mathtt{Restarts}}$trials. Denoting$p$the success probability (found by inserting$f^*into the respective CDF), the number of restarts could be derived from the expression (4) \begin{align} 0.99 = 1-P(x=0) = 1 - (1-p)^{n_{\mathrm{Restarts}}} \end{align} whereP(x=0)$is the probability of a binomial random variable$x\sim \mathcal{B}(n_{\mathrm{Restarts}},p). This implies that (5) \begin{align} {n_{\mathrm{Restarts}}} = \frac{\log 0.01}{\log{(1-p)}} \end{align} which allowed us to compute the potential speedup forS_{n_{\mathrm{Restarts}}}on different processors defined as (6) \begin{align} S_{n_{\mathrm{Restarts}}} = \frac{C_0}{C_{n_{\mathrm{Restarts}}}} \end{align} as well as the efficiencyE_{n_{\mathrm{Restarts}}}(7) \begin{align} E_{n_{\mathrm{Restarts}}} = \frac{S_{n_{\mathrm{Restarts}}}}{n_{\mathrm{Restarts}}}\,. \end{align} Here,C_0$is the computational resources employed for the most expensive run, that is 100000 iterations.$C_{n_{\mathrm{Restarts}}}\$ is the number of iterations necessary to obtain a qualitatively equivalent solution when distributing the restarts to different nodes.

The results are shown in the following figure.

From the speedup we estimated the number of iterations required to obtain the desired solution. This is shown in the lower panel of the preceding figure. We see, for example, that with 16 nodes the number of iterations is decreased to below 20000, which corresponds to a speedup of more than 5. We also found that the efficiency decreased quite rapidly and that small clusters of, say, up to 64 nodes seemed well adapted for this type of optimisation.

We repeated these calculations for different time periods and also for alternative objective functions like Value-at-Risk (VaR). The results were qualitatively very similar as considerable speedups could already be gained with a small number of processors; efficiency, however, dropped rapidly.

# 4 Conclusions

There are several conclusions we can draw from this analysis.

• TA seems well-adapted to the problem of portfolio optimisation (as has been shown in previous research, see for example Dueck and Winker [1] and Gilli et al [2]).
• The analyst using a heuristic method should explore his optimisation problem in more detail. This will provide a clearer understanding of the computational resources required to obtain a desired solution and allows for more deliberate decisions how to allocate computational resources between the number of restarts and iterations per run, which also indicates how one should sensibly distribute the optimisation problem to parallel machines.