Optimised U-Type Designs on Flexible Regions

Author

Chris Sharpe1

Keywords

U-type Designs, Central Composite Discrepancy, Flexible Regions, Threshold Accepting

Review Status

Unreviewed


Abstract

The concept of a ‘flexible region’ describes an infinite variety of symmetrical shapes to enclose a particular ‘region of interest’ within a space. In experimental design, the properties of a function on the ‘region of interest’ is analysed based on a set of design points. The choice of design points can be made based on some discrepancy criterion. In this article is discussed the generation of design points on a 'flexible region'. The Central Composite Discrepancy ($CCD$), a recently proposed discrepancy measure, is applied for measuring the 'evenness' in the distribution of design points over the 'flexible region'. Threshold Accepting (TA) is used to generate low discrepancy $U$-type designs. The results for the two dimensional case indicate that using an optimisation heuristic in combination with an appropriate discrepancy measure, it is possible to produce high quality experimental designs on ‘flexible regions’.

1. Introduction

The development by Draper and Guttman [2] of a ‘flexible region’ in experimental design built on their earlier work into other region shape types. The central question consists in understanding the effect of selecting particular points in the experimental domain (input space) for response surface analysis, where the functional relationship between input and output is not known. It is assumed that a desirable feature of the input space - that is, inputs that produce an output that meets some set objective - is localised. The sub-space selected for experimental examination is coined the ´region of interest', represented by a set of design points which are summarised in a design matrix.

The process of generating a good, low discrepancy design (using the $CCD$, as devised by Chuang and Hung [1]) is framed as a heuristic optimisation problem, with the $CCD$ the objective function to be minimised. Threshold Accepting is the heuristic technique implemented to obtain low discrepancy designs.

It has been applied to many experimental design problems (Fang et al [4], Fang et al [5] , and Winker, Ch. 11 [7]) and has a wide range of supporting research material. (See alsoFang [3] for a general overview on Experimental Design problems.)

2 Description of a Flexible Region

For any positive value of $m$, Draper and Guttman define a flexible region $R$ in dimension $s$ by the following constraint on the potential design points.

(1)
\begin{align} |x_{1}|^m + |x_{2}|^m + ,\cdots, + |x_{s}|^m \leq 1. \end{align}

By an adjustment of a single parameter, $m$, it becomes possible to obtain an infinite variety of intermediate symmetrical shapes. The approach is particularly useful as it offers the possibility for intuitive linguistic mapping to shape types. When applying a flexible region to a real problem Draper and Guttman point out that we do have some a priori knowledge as to what the region of interest should ‘look like’, which is first expressed in terms of natural language. For example, a specificiation in natural language maybe for a shape that, "covers the space around the corners but not right up to corners'' . This is mapped to a specific value $m$, for the shape of the ‘flexible region’. This process offers some intriguing research possibilities2. The graphs below show some examples for different values of $m$ in the two dimensional case.

flexibleRegions.pdf

This definition is for the hypercube $[-1,1]^{s}$, while in recent work on experimental design the hypercube $C^{s} = [0,1)^{s}$ is considered. Therefore, the following modification is made to the original definition: for a given shape parameter $m > 0$, the flexible $R_{m}$ is defined by,

(2)
\begin{align} R_{m} = {x{_2} [0, 1]{^s} | (| 2 (x{_1} {-} 0.5)|^m + \dots + | 2 (x{_s} {-} 0.5) | {^m}) \leq 1}. \end{align}

In reference to the concept of $U$-type designs, only design points lying on a grid over the unit cube are considered.

3 U-type Design

We define a $U$-type design $\mathcal{P}$ as a set of $n$ points, $\mathcal{P} = {x_{1}, \dots ,x_{n}}$, sampled from the $s$-dimensional unit cube $C{^s}$ on a grid with $q^{s}$ points. This set of points can also be described by a n × s design matrix $U$, where each row corresponds to one run and each column to one factor. Thereby, the factors can take on $l = 1, \dots , q$ different levels. The correspondance between the set of design point sets and the set of design matrices $U(n, q^{s})$ is given by
the transformation $\frac{2l−1}{2q} , l = 1, \dots , q$.

4 Central Composite Discrepancy

There are several common measures for uniformity, amongst which the Centered Discrepancy proposed by Hickernell [6] would appear at first glance to be a reasonable choice given that a flexible region is a symmetrical shape with its origin at the centre of the region of interest. However, in common with other discrepancy measures, the discrepancy value is for a cuboid shaped region and cannot be adapted for other shape types.

The principle idea of the $CCD$ - and the reason why it can be applied to flexible regions - is that measurement is not taken from one fixed point: every point in $R$ is considered a center point. This removes the constraint on the shape type and, hence, makes it possible to implement the idea of a flexible region.

The $CCD$ for a set of points $\mathcal{P}$ in a region of interest $R$ is defined by

(3)
\begin{align} CCD_{p}(n,\mathcal{P}) &=& \biggl\{ \frac{1}{v(D)} \int_D \frac{1}{2^{K}} \sum_{k=1}^{2^{K}}\biggl|{\frac{N(D_{k}(x),\mathcal{P})}{n} - \frac{v(D_{k}(x))}{v(D)}}\biggr|^{p} dx \biggr\}^{1/p}\,. \end{align}

An optimal $U$-type design on $R$ is obtained if the $n$ design points from $\mathcal{P}$ are distributed in $R$ in a way to minimise $CCD_{p}(n,P)$, or more formally,

(4)
\begin{align} \mathcal{P^{*}}&=& arg \min_{P \in Z(n)} CCD_{p}(n,\mathcal{P})\,. \end{align}

One drawback is that, at present, there is no general analytical formula for the $CCD$, and this imposes a limit on the number of dimensions the design can take.

5. Experiment Description and Results

Experiment Setup

For all designs $U(n, q^{s})$ in the experiment the dimensions are set to $s = 2$, and the number of levels $q = 31$.

A smaller number of levels would make the optimisation process a trivial task, especially for flexible regions with $m < 1$. The level needs to be of fine enough resolution to well cover smaller regions (for small values of $m$) and allow the optimisation process the opportunity to be effective in lowering the discrepancy once points are moved into the flexible region. Basically, for higher values of $q$, there are more points to choose from.

The number of runs (design points), $n$, is one of the two varying factors in the experiment. $n = \{5,7,9,11\}$.

The parameter characterising the shape of the flexible region, $m$, is the other varying factor. $m = \{9999,2,1,0.5,0.3\}$.

This represents a general sample of flexible region shape types. Note that due to the constraint on $U$-type designs on the $q^{s }$ grid, the first value results in a covering of all grid points for the given design parameters.

Computational Resource Allocation and Convergence3

The algorithm is run with a total of$Its$ iterations ($Its = n_{R} \times n_{S}$, threshold reductions × design change iterations), and $Rep$ replications, the number of times we restart an individual experiment with different random design to start with and different random numbers for the selection of neighbours. The total computational resources used for one problem instance is $C = Its \times Rep$$.4

The number of replications is set at $Rep = 10$.

The computational load for a single evaluation of the objective function $CCD$ directly influences the upper limit of $Its$. Therefore, $Its = \{10^{5}, 1 0^{6}\}$.

This enables an assesment to be made if the empirical results conform to convergence theory. That is, if the quality of the results improves with increasing $Its$.

Results

The results shown in the table are for $10^{6}$ . Given that $10$ replications were run for each problem instance, the results obtained can be interpreted as an empirical distribution.

Unsupported math environment "tabular"

To corroborate the quality of these designs, the graphs below represent the distribution of points in the flexible region for the designs corresponding to the ‘Best’ discrepancy values reported above in the first column of the table.

optimisedDesigns.pdf

It can be seen that the points are evenly distributed over the design space, exploiting the space available as best possible, most noticeably when the flexible region is contracted. Therefore, it appears that using the $CCD$ as a measure of uniformity is suitable for design optimisation on flexible regions.

The statistical distribution of the discrepancy for optimised designs is assumed to have a left truncated distribution. As $I$, increases, $\hat{\mu}$ should converge to this lower truncation point, while $\hat{\sigma}$ decreases to zero. The tenfold increase in $I$ produces lower discrepancy values for $m > 0.5$, where it is less likely to consistently find the global minimum. The mean, $\hat{\mu}$, moves closer to the lowest value found, and the variance $\hat{\sigma}^{2}$ is squeezed. This is as predicted. In the case $m = 1$, $U(5, 31^{2})$ (row 9 in the table), the best value is found in all $10$ restarts.

This suggests (although does not prove) that this is the global minimum with an optimal design. It can be seen that as $m < 1$, the frequency of finding the best value increases, and at $m = 0.3$, we have all but the last result always finding the best value. For larger flexible regions , when $m > 1$, we cannot state how close our values of CCD are to the global minimum as no lower bounds are known.

Evaluation

Given that no lower bounds are available for the $CCD$, it cannot be proven that these designs are actually optimum designs. Therefore, we provide some additional evidence on the quality of the results based on a comparison with randomly generated $U$-type designs on the flexible region, for $I = 10^5$ random designs on the flexible region. The random designs had a much worse statistical profile than optimised designs in all cases: the CCD values are substantially higher in mean and exhibited a larger standard deviation. And not only the mean results for the optimised designs are always better than the $1\%$-quantile of the randomly generated designs, the same finding even holds for the worst outcomes of the optimisation procedure even with only $I = 10^{5}$ iterations. This serves to reinforce the point that optimisation is absolutely worth the additional time and effort required to obtain better designs.

5 Conlusion

Here is analysed the designs for flexible regions, as devised by Draper and Guttman[2] , that enclose some region of interest within a larger design space. Points on a grid are selected from the region of interest, resulting in $U$-type designs. A TA implementation is used to optimise these $U$-type designs.

The results indicate that:

  • As a measure of discrepancy, the $CCD$, is well suited to flexible regions.
  • Using the TA heuristic, it is possible to produce optimised, low discrepancy designs that distribute design points uniformly over the region of interest.
  • In some cases, in particular for small flexible regions, it might be speculated that the resulting designs are in fact already optimal designs.

There are two obvious paths that can be taken for the next course of research on flexible regions.

  • Run experiments and produce results for higher design dimensions.
  • Apply the method to a real meta modelling problem, such as the example laid out in the original paper by Draper and Guttman in order to demonstrate to what extent optimised $U$-type designs on flexible regions might improve the results.

Internal Links

Concepts
Threshold Accepting
Tutorials
Tips
Related Articles

External links

References
1. Chuang, S.C. and Y.C. Hung (2009). Uniform design over general input domains with applications to target region estimation in computer experiments. Manuscript ???, forthcoming.
2. Draper, N.R. and I. Guttman (1986). Response surface designs in flexible regions. Journal of the American Statistical Association 81, 1089–1094.
3. Fang, K.-T., R. Li and A. Sudjianto (2006). Design and Modeling for Computer Experiments. Chapman & Hall/CRC. Boca Raton, FL.
4. Fang, K.-T., D.K.J. Lin, P. Winker and Y. Zhang (2000). Uniform design: Theory and application. Technometrics 42, 237–248.
5. Fang, K.-T., D. Maringer, Y. Tang and P. Winker (2005). Lower bounds and stochastic optimization algorithms for uniform designs with three or four levels. Mathematics of Computation 75(254), 859–878.
6. Hickernell, F.J. (1996). A generalized discrepancy and quadrature error bound. Mathematics of Computation 67, 299–322.
7. Winker, P. (2001). Optimisation Heuristics in Econometrics: Applications of Threshold Accepting. Wiley. Chichester.
8. Winker, P. (2006). The stochastics of Threshold Accepting: Analysis of an application to the uniform design problem. In: COMPSTAT 2006, Proceedings in Computational Statistics (A. Rizzi and M. Vichi, Eds.).31 pp. 495–503. Physica. Heidelberg.
Weblinks
Prof. Kai-Tai Fang's homepage, for a vast array of resources on Experimental Design.
Here is an online version of the Working Paper, ID WP-013.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License