Robust regression with Particle Swarm Optimisation in R

## Keywords

Least Median of Squares, Least Trimmed Squares, Particle Swarm Optimisation, R, robust regression

Unreviewed

# 1 Introduction

This tutorial explains how to implement a Particle Swarm (PS) optimisation for robust regression as used in Gilli and Schumann [2]. It is still ongoing research, so comments are highly welcome.

The chosen examples are LTS- and LMS-regression.

# 2 The algorithm

The PS algorithm is implemented as a function PSO that is called with three arguments:

    PSO(pso,dataList,OF)


pso is a list that holds the settings of the PS algorithm. dataList is a list that holds the pieces of data necessary to evaluate the objective function. For a regression model, dataList will typically contain the matrix $X$ and the vector $y$. OF is the objective function. In principle, we need not pass these data, since the scoping rules in R also allow to access objects outside a function's environment. However, it makes for a clearer structure of the algorithm to pass all required objects explicitly to the function.

The function will return a list of two elements: beta which is the estimated parameter vector of the model, and OFvalue, the objective function value associated with this solution.

# particle swarm optimisation -- the main algorithm
PSO <- function(pso,dataList,OF)
{
# set up initial population and velocity
mP <- pso$min + diag(pso$max - pso$min) %*% mRU(pso$d,pso$nP) mV <- pso$initV * mRU(pso$d,pso$nP)

# evaluate initial population
vF <- OF(mP,dataList)

# set up best solutions
mPbest <- mP                 # matrix of personally best' solutions
vFbest <- vF                 # vector of OF of best solutions
sGbest <- min(vFbest)            # scalar: best OF-value
sgbest <- which.min(vFbest)[1]    # scalar: best solution (counter)

# start generations
for (g in 1:pso$nG){ # update population mDV <- pso$c1 * mRU(pso$d,pso$nP) * (mPbest - mP) + pso$c2 * mRU(pso$d,pso$nP) * (mPbest[,sgbest] - mP) mV <- pso$iner * mV + mDV
mP  <- mP + mV

# evaluate updated population
vF <- OF(mP,dataList)
vF[!(is.finite(vF))] <- 10000    # a large value for NAs etc.

# find improvements
logik <- vF < vFbest        # improved solutions
mPbest[,logik] <- mP[,logik]
vFbest[logik] <- vF[logik]

# find best solution
if (min(vF) < sGbest){
sGbest <- min(vF)
sgbest <- which.min(vF)[1]
}
} # end generations

# return best solution
return(list(beta = mPbest[,sgbest], OFvalue = sGbest))
}


For convenience, we have defined two auxiliary functions

# auxiliary functions
mRU <- function(m,n)
# returns a matrix of size mxn of uniform random variates
{
return(array(runif(m*n), dim = c(m,n)))
}

mRN <- function(m,n)
# returns a matrix of size mxn of normal random variates
{
return(array(rnorm(m*n), dim = c(m,n)))
}


which act like rand(m,n) and randn(m,n) in Matlab.

## Settings

PS requires us to set several parameters. These are collected in the list pso and then passed to the function. For a discussion of these parameters, see Gilli and Schumann [2]. Initial velocity initV serves as a factor to the randomly set the velocities (computed as $\mathtt{initV} \times \mathtt{mRU(d,nP)}$).

# pso parameters
pso <- list(
min = rep(-5,nP),   # lower bound for initial parameters
max = rep( 5,nP),   # upper bound for initial parameters
c1 = 1.0,
c2 = 2.0,
iner = 0.75,        # inertia
initV = 1.00,       # initial velocity
d = nP,             # number of parameters
nP = 200,           # population size
nG = 100            # number of generations
)


We will have $\mathtt{nP} \times \mathtt{nG}$ objective function evaluations (with the settings here: 20,000).

## Data

We use the setup of Salibian-Barrera and Yohai [4]. A function to generate the data could look like this (not vectorised, but speed is not relevant here).

# create data
genData <- function(nP,nO,ol,dy)
{
# create data file [Salibian-Barrera & Yohai 2006]
# nP  = 36;       % regressors
# nO  = 400;      % nObs
# ol  = 40;       % number of outliers
# dy  = 110;      % outlier size ('M' in S-B&Y 2006): 90 to 200

y   <- mRN(nO,1);
X   <- cbind(as.matrix(numeric(nO)+1),mRN(nO,nP-1))

zz <- sample(nO);
z  <- cbind(1,100, array(0, dim = c(1,nP-2)));
for (i in 1:ol){
X[zz[i],] <- z;
y[zz[i]]  <- dy;
}
return(list(X = X,y = y))
}


The function returns a list containing $X$ and $y$.

## The objective function

Our solutions are stored in a matrix mP which we pass to the objective function. Alternatively, we could have called a function of the apply-family from the main algorithm. However, for objective functions that could be vectorised, this setup here allows for easier reuse of the code. Here, we use the Least Trimmed Squares (LTS) estimator.

# objective function
OF <- function(mParam,data)
{
# LTS
# retrieve data from list
X <- data$X y <- data$y
nO <- dim(X)[1]
nP <- dim(X)[2]
h <- (nO+nP+1) %/% 2

# compute residuals [mParam is a matrix!]
aux <- as.vector(y) - X %*% mParam    # as.vector(y) to allow recycling
aux <- aux * aux
aux <-  apply(aux,2,sort, partial = h)
return(colSums(aux[1:h,]))
}


We use only a partial sort here.

# Testing the algorithm

We compare some results with the ltsReg function implemented in the robustbase package. This is not meant as a contest, but rather aims to show that PS is capable of solving the model.

nP <- 3;nO <- 400;ol <- 40;dy <- 150;h <- (nO+nP+1) %/% 2
aux <- genData(nP,nO,ol,dy);X <- aux$X;y <- aux$y
dataList <- list(y = y,X = X)


and setting up the pso-list.

# pso parameters
pso <- list(
min = rep(-5,nP),   # lower bound for initial parameters
max = rep( 5,nP),   # upper bound for initial parameters
c1 = 1.0,
c2 = 2.0,
iner = 0.75,        # inertia
initV = 1.00,       # initial velocity
d = nP,             # number of parameters
nP = 200,           # population size
nG = 100            # number of generations
)


The following figure shows the data ($y$ and $x_2$) with the clearly visible outliers (here set to 90) and a Least-Squares regression line.

Now we are ready to run our algorithm.

    system.time(sol <- PSO(pso = pso, dataList = dataList, OF = OF))


As a comparison, we ran
    system.time(ltsReg(dataList$y ~ dataList$X[,-1], adjust = TRUE, nsamp = 5000))


(both functions should take about 3-5 sec on a standard PC).

In order not to depend on a single data set, we ran 50 times the data-generating model of Salibian-Barrera and Yohai [4] (see the function genData given above) with 400 observations and 40 outliers (set to 150). For each data set, we estimated coefficients with ltsReg and PSO and compute the quantity

(1)
\begin{align} \dfrac{Q_{\text{PS}}}{Q_{\text{fastLTS}}} \end{align}

where

(2)
\begin{align} Q_{\text{fastLTS}} = \sum_{i=1}^h{r^2_{[i]}} \end{align}

for the residuals from the estimates obtained from ltsReg, and

(3)
\begin{align} Q_{\text{PS}} = \sum_{i=1}^h{r^2_{[i]}} \end{align}

for the residuals from ps. The $h$th order statistic of the squared residuals is defined by

    h <- (nO+nP+1) %/% 2


The following figure gives the distribution of the ratio computed in Equation X for a model with three regressors. A value of 1 implies both models gave the same answer in terms of their objective functions, a value below unity indicates that the PS algorithm performed better.

## Another criterion

Let us demonstrate the flexibility of PS by replacing the objective function: we now use the Least Median of Squares (LMS) estimator. The only thing we need to do is to rewrite the objective function.

# objective function
OF <- function(mParam,data)
{
# LMS
# retrieve data from list
X <- data$X y <- data$y
nO <- dim(X)[1]
nP <- dim(X)[2]
h <- (nO+nP+1) %/% 2

# compute residuals [mParam is a matrix!]
aux <- as.vector(y) - X %*% mParam    # as.vector(y) to allow recycling
aux <- aux * aux
aux <-  apply(aux,2,sort, partial = h)
return(aux[h,])
}


Here we compare the results with results from the function lqs from the MASS package. The function is called with

    lqs(dataList$y ~ dataList$X[,-1], adjust = TRUE, nsamp = 50000, method = "lqs", quantile=h)
`

The following figure gives results analogous to the picture for LTS (note the changed $x$-scale).