Robust regression with Particle Swarm Optimisation in R

Author

Enrico Schumann1

Keywords

Least Median of Squares, Least Trimmed Squares, Particle Swarm Optimisation, R, robust regression

Review Status

Unreviewed

1 Introduction

This tutorial explains how to implement a Particle Swarm (PS) optimisation for robust regression as used in Gilli and Schumann [2]. It is still ongoing research, so comments are highly welcome.

The chosen examples are LTS- and LMS-regression.

2 The algorithm

The PS algorithm is implemented as a function PSO that is called with three arguments:

    PSO(pso,dataList,OF)

pso is a list that holds the settings of the PS algorithm. dataList is a list that holds the pieces of data necessary to evaluate the objective function. For a regression model, dataList will typically contain the matrix $X$ and the vector $y$. OF is the objective function. In principle, we need not pass these data, since the scoping rules in R also allow to access objects outside a function's environment. However, it makes for a clearer structure of the algorithm to pass all required objects explicitly to the function.

The function will return a list of two elements: beta which is the estimated parameter vector of the model, and OFvalue, the objective function value associated with this solution.

# particle swarm optimisation -- the main algorithm
PSO <- function(pso,dataList,OF)
{
# set up initial population and velocity
mP <- pso$min + diag(pso$max - pso$min) %*% mRU(pso$d,pso$nP)
mV <- pso$initV * mRU(pso$d,pso$nP)

# evaluate initial population
vF <- OF(mP,dataList)

# set up best solutions
mPbest <- mP                 # matrix of `personally best' solutions
vFbest <- vF                 # vector of OF of best solutions
sGbest <- min(vFbest)            # scalar: best OF-value
sgbest <- which.min(vFbest)[1]    # scalar: best solution (counter)

# start generations
for (g in 1:pso$nG){

    # update population
    mDV <- pso$c1 * mRU(pso$d,pso$nP) * (mPbest - mP) + pso$c2 * mRU(pso$d,pso$nP) * (mPbest[,sgbest] - mP)
    mV  <- pso$iner * mV + mDV
    mP  <- mP + mV

    # evaluate updated population
    vF <- OF(mP,dataList)
    vF[!(is.finite(vF))] <- 10000    # a large value for NAs etc.

    # find improvements
    logik <- vF < vFbest        # improved solutions
    mPbest[,logik] <- mP[,logik]
    vFbest[logik] <- vF[logik]

    # find best solution
    if (min(vF) < sGbest){
        sGbest <- min(vF)
        sgbest <- which.min(vF)[1]
    }
} # end generations

# return best solution
return(list(beta = mPbest[,sgbest], OFvalue = sGbest))
}

For convenience, we have defined two auxiliary functions

# auxiliary functions
mRU <- function(m,n)
# returns a matrix of size mxn of uniform random variates
{
return(array(runif(m*n), dim = c(m,n)))
}

mRN <- function(m,n)
# returns a matrix of size mxn of normal random variates
{
return(array(rnorm(m*n), dim = c(m,n)))
}

which act like rand(m,n) and randn(m,n) in Matlab.

Settings

PS requires us to set several parameters. These are collected in the list pso and then passed to the function. For a discussion of these parameters, see Gilli and Schumann [2]. Initial velocity initV serves as a factor to the randomly set the velocities (computed as $\mathtt{initV} \times \mathtt{mRU(d,nP)}$).

# pso parameters
pso <- list(
min = rep(-5,nP),   # lower bound for initial parameters
max = rep( 5,nP),   # upper bound for initial parameters
c1 = 1.0,
c2 = 2.0,
iner = 0.75,        # inertia
initV = 1.00,       # initial velocity
d = nP,             # number of parameters
nP = 200,           # population size
nG = 100            # number of generations
)

We will have $\mathtt{nP} \times \mathtt{nG}$ objective function evaluations (with the settings here: 20,000).

Data

We use the setup of Salibian-Barrera and Yohai [4]. A function to generate the data could look like this (not vectorised, but speed is not relevant here).

# create data
genData <- function(nP,nO,ol,dy)
{
# create data file [Salibian-Barrera & Yohai 2006]
# nP  = 36;       % regressors
# nO  = 400;      % nObs
# ol  = 40;       % number of outliers 
# dy  = 110;      % outlier size ('M' in S-B&Y 2006): 90 to 200

y   <- mRN(nO,1);
X   <- cbind(as.matrix(numeric(nO)+1),mRN(nO,nP-1))

zz <- sample(nO);
z  <- cbind(1,100, array(0, dim = c(1,nP-2)));
for (i in 1:ol){
    X[zz[i],] <- z;
    y[zz[i]]  <- dy;
}
return(list(X = X,y = y))
}

The function returns a list containing $X$ and $y$.

The objective function

Our solutions are stored in a matrix mP which we pass to the objective function. Alternatively, we could have called a function of the apply-family from the main algorithm. However, for objective functions that could be vectorised, this setup here allows for easier reuse of the code. Here, we use the Least Trimmed Squares (LTS) estimator.

# objective function
OF <- function(mParam,data)
{
# LTS
# retrieve data from list
X <- data$X
y <- data$y
nO <- dim(X)[1]
nP <- dim(X)[2]
h <- (nO+nP+1) %/% 2

# compute residuals [mParam is a matrix!]
aux <- as.vector(y) - X %*% mParam    # as.vector(y) to allow recycling
aux <- aux * aux
aux <-  apply(aux,2,sort, partial = h)
return(colSums(aux[1:h,]))
}

We use only a partial sort here.

Testing the algorithm

We compare some results with the ltsReg function implemented in the robustbase package. This is not meant as a contest, but rather aims to show that PS is capable of solving the model.

We start with creating data

nP <- 3;nO <- 400;ol <- 40;dy <- 150;h <- (nO+nP+1) %/% 2
aux <- genData(nP,nO,ol,dy);X <- aux$X;y <- aux$y
dataList <- list(y = y,X = X)

and setting up the pso-list.

# pso parameters
pso <- list(
min = rep(-5,nP),   # lower bound for initial parameters
max = rep( 5,nP),   # upper bound for initial parameters
c1 = 1.0,
c2 = 2.0,
iner = 0.75,        # inertia
initV = 1.00,       # initial velocity
d = nP,             # number of parameters
nP = 200,           # population size
nG = 100            # number of generations
)

The following figure shows the data ($y$ and $x_2$) with the clearly visible outliers (here set to 90) and a Least-Squares regression line.

dataWiki.eps

Now we are ready to run our algorithm.

    system.time(sol <- PSO(pso = pso, dataList = dataList, OF = OF))

As a comparison, we ran
    system.time(ltsReg(dataList$y ~ dataList$X[,-1], adjust = TRUE, nsamp = 5000))

(both functions should take about 3-5 sec on a standard PC).

In order not to depend on a single data set, we ran 50 times the data-generating model of Salibian-Barrera and Yohai [4] (see the function genData given above) with 400 observations and 40 outliers (set to 150). For each data set, we estimated coefficients with ltsReg and PSO and compute the quantity

(1)
\begin{align} \dfrac{Q_{\text{PS}}}{Q_{\text{fastLTS}}} \end{align}

where

(2)
\begin{align} Q_{\text{fastLTS}} = \sum_{i=1}^h{r^2_{[i]}} \end{align}

for the residuals from the estimates obtained from ltsReg, and

(3)
\begin{align} Q_{\text{PS}} = \sum_{i=1}^h{r^2_{[i]}} \end{align}

for the residuals from ps. The $h$th order statistic of the squared residuals is defined by

    h <- (nO+nP+1) %/% 2

The following figure gives the distribution of the ratio computed in Equation X for a model with three regressors. A value of 1 implies both models gave the same answer in terms of their objective functions, a value below unity indicates that the PS algorithm performed better.

LTSwiki.eps

Another criterion

Let us demonstrate the flexibility of PS by replacing the objective function: we now use the Least Median of Squares (LMS) estimator. The only thing we need to do is to rewrite the objective function.

# objective function
OF <- function(mParam,data)
{
# LMS
# retrieve data from list
X <- data$X
y <- data$y
nO <- dim(X)[1]
nP <- dim(X)[2]
h <- (nO+nP+1) %/% 2

# compute residuals [mParam is a matrix!]
aux <- as.vector(y) - X %*% mParam    # as.vector(y) to allow recycling
aux <- aux * aux
aux <-  apply(aux,2,sort, partial = h)
return(aux[h,])
}

Here we compare the results with results from the function lqs from the MASS package. The function is called with

    lqs(dataList$y ~ dataList$X[,-1], adjust = TRUE, nsamp = 50000, method = "lqs", quantile=h)

The following figure gives results analogous to the picture for LTS (note the changed $x$-scale).

LMSwiki.eps

Internal Links

Concepts
Particle Swarm
Tutorials
Tips
Related Articles
Least Median of Squares Estimation by Optimization Heuristics with an Application to CAPM

External links

References
1. Gilli, M., D. Maringer and E. Schumann. (2011). Numerical Methods and Optimization in Finance. Elsevier.
2. Gilli, M. and E. Schumann (2009). Robust regression with optimisation heuristics. COMISEF Working Paper Series 11.
3. R Development Core Team (2008). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. http://www.R-project.org.
4. Salibian-Barrera, M. and V. Yohai (2006). A Fast Algorithm for S-Regression Estimates. Journal of Computational and Graphical Statistics 15, 414–427.
5. Original code by many authors and notably P. Rousseeuw and C. Croux and see file ‘Copyrights’; Valentin Todorov, A. Ruckstuhl, M. Salibian-Barrera, T. Verbeke and M. Maechler. robustbase: Basic Robust Statistics R package version 0.4-5. available from CRAN.
6. Venables, W.N. and B.D. Ripley. (2002). Modern Applied Statistics with S. 4th ed. Springer.
Weblinks
Particle Swarm Optimisation is available in the R package NMOF, currently available from R-Forge https://r-forge.r-project.org/projects/nmof/

COMISEF Working Paper Series
CRAN Task View: Robust Statistical Methods
External Links for 'Robust Statistics'

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License