OPTCON2: An algorithm for the optimal control of nonlinear stochastic models

## Keywords

control theory, stochastic dynamic optimum problems.

Unreviewed

# Abstract

In this article the new version of the algorithm OPTCON2 , named OPTCON2, is presented. This algorithm has been developed to obtain approximate solution of control optimum problems where the objective function is quadratic and the dynamic multivariable system is nonlinear. The additive and multiplicative uncertainty are present in dynamic models. OPTCON2 differs from the basic algorithm OPTCON in the dealing with stochastic parameters during the computation of optimal control variables. OPTCON2 uses passive learning, i.e. the stochastic parameters are updated in each time period. This fact can make the results of optimum stochastic control problems more accurate and more reliable than computation without update.

# 1 Introduction

Optimal control of dynamic systems is an interesting area of research due to its relevance to many economic, engineering etc. applications. The field of optimal control problems is well known and especially for linear deterministic problems well researched. So, there are a lot of literatures that can help to learn this area. Basic concepts for solving of control problems can be found in [2], [3], [5], [6]. Our research aim is the development of a reliable algorithm that solves control optimum problem with a quadratic objective function and a nonlinear dynamic multivariable system under additive and parameter uncertainties. In year 1992, R. Neck and J. Matulka have been developed the algorithm OPTCON [1] that solves such control optimum problems and that combines concepts of nonlinearity and stochastic in control problems. This algorithm uses the simple strategy open-loop for dealing with the stochastic parameters during the computation of optimum control variables and thus OPTCON is taken as the basic instrument for our research. This algorithm can be augmented/ improved in order to get more reliable method for solving such problems. While the basic algorithm OPTCON uses in each time period always the same information about stochastic parameters, in the new version OPTCON2 the stochastic parameters are updated in each period. According to Kendrick's approach in [2] the update of stochastic parameters is done using the idea of Kalman Filter expecting more reliable results of the stochastic optimum control problems.

# 2 The problem

The algorithm OPTCON/OPTCON2 is designed to provide approximate solutions to optimum control problems with a quadratic objective function (a loss function to be minimized) and a nonlinear multivariate discrete-time dynamic system under additive and parameter uncertainties. The intertemporal objective function is formulated in quadratic tracking form, which is quite often used in applications of optimum control theory to econometric models. It can be written as

(1)
\begin{align} J=\sum^T_{t=S}L_t(x_t,u_t) \end{align}

with

(2)
\begin{align} L_t(x_t,u_t)=\frac{1}{2}\left( \begin{array}{c} x_t-\tilde{x}_t\\ u_t-\tilde{u}_t\\ \end{array} \right)'W_t\left( \begin{array}{c} x_t-\tilde{x}_t\\ u_t-\tilde{u}_t\\ \end{array} \right). \end{align}

$x_t$ is an n-dimensional vector of state variables that describes the state of the economic system at any point in time. $u_t$ is an m-dimensional vector of control variables, which can be controlled in process. $\tilde{x}_t\in R^n$ and $\tilde{u}_t\in R^m$ are the given 'ideal' levels of the state and control variables, respectively. $S$ denotes the initial and $T$ the terminal time period of the finite planning horizon. $W_t$ is the weight matrix, which can be defined as

(3)
\begin{align} W_t=\left( \begin{array}{cc} W_t^{xx} & W_t^{xu} \\ W_t^{ux} & W_t^{uu} \\ \end{array} \right), \enspace for\enspace t=S, ..., T \end{align}

where $W_t^{xx}$, $W_t^{xu}$, $W_t^{ux}$ and $W_t^{uu}$ are $(n\times n)$, $(n\times m)$, $(m\times n)$ and $(m\times m)$ matrices, respectively.

Next the dynamic system of nonlinear difference equations has to be defined:

(4)
\begin{align} x_t=f(x_{t-1},x_t, u_t, \theta, z_t)+\varepsilon_t, \enspace t=S, ..., T \end{align}

where $\theta$ is a p-dimensional vector of unknown parameters that denotes parameter uncertainty, $z_t$ denotes an l-dimensional vector of non-controlled exogenous variables, and $\varepsilon_t$ is an n-dimensional vector of additive disturbances (additive uncertainty). $\theta$ and $\varepsilon_t$ are assumed to be independent random vectors with known expectations ($\hat{\theta}$ and $O_n$, respectively) and covariance matrices ($\Sigma^{\theta\theta}$ and $\Sigma^{\varepsilon\varepsilon}$, respectively). f is a vector-valued function, $f^i(...)$, is the i-th component of $f(...)$, $i=1, ..., n$.

# 3 OPTCON2

Input: $f(...)$, the tentative values $x_{S-1}=\overset{\circ}{x}_{S-1}$ and $(u_t)_{t=S}^T=(\overset{\circ}{u}_t)_{t=S} ^T$, $\hat{\theta}_{S-1}=\hat{\theta}_{S-1/S-1}$, $\Sigma^{\theta\theta}_{S-1}=\Sigma^{\theta\theta}_{S-1/S-1}$, $E(\varepsilon)=0$ and $\Sigma^{\varepsilon\varepsilon}$, $(z_t)_{t=S}^T$

Output: $(x_t^*)_{t=S}^T$, $(u_t^*)_{t=S}^T$ and $J^*$

The simplistic schema of OPTCON2 is presented as follows:

Step I Solve the nonlinear system of equations and obtain the tentative path $(\overset{\circ}{x}_t)_{t=S}^T$. Thus the tentative path $(\overset{\circ}{x}_t, \overset{\circ}{u}_t)_{t=S}^T$ is known.

Step II Generate $MCruns$ sets of random system noises $(\varepsilon^m_t)_{t=1}^T$ and $\mu^m$ (for $\theta^m = \hat{\theta} + \mu^m$), where $m=1, ..., MCruns$ .

Step III For each MC run m, i.e. for each $((\varepsilon^m_t)_{t=1}^T, \mu^m)$, do:
Step III-1 For each t from S to T do

• Find the open-loop solution for the subproblem $(t, ..., T)$: $u_t^*$ and $x_t^{*}=f(x_{t-1}^{a*},u^*_{t}, \theta^m)$
• Calculate $x_t^{a*}=f(x_{t-1}^{a*},u^*_{t}, \hat{\theta}, \varepsilon^m_t)$
• Update $\theta^m$ using $x_t^{*}$ and $x_t^{a*}$: get new $\theta^m$ and $\Sigma^{\theta\theta}$

Step III-2 Calculate the objective function $J^*$

Here only the new step 'update' will be described more detailled, the other steps are similar to the corresponding steps in OPTCON [1].

In order to update the stochastic parameters we use the idea of Kalman Filter. The procedure of update via Kalman Filter3 consists of two parts: prediction and correction. The prediction of variables consists the calculation of predicted values of variables using the corrected estimate from previous time period. The update phase or correction improves the predicted values using the actual measurement.

Update:
Prediction:
a) $\hat{x}_{t/t-1}=f(x^{a*}_{t-1},u_t^{*},\theta^m_{t-1/t-1})=x^{*}_{t}$, $\theta^m_{t/t-1}=\theta^m_{t-1/t-1}$

b) $\Sigma^{xx}_{t/t-1}=F^x_{\theta t-1}\Sigma^{\theta\theta}_{t-1/t-1}(F^x_{\theta t-1})'+\Sigma^{\varepsilon\varepsilon}_t$, $\Sigma^{x \theta}_{t/t-1}=(\Sigma^{\theta x}_{t/t-1})'=F^x_{\theta t-1}\Sigma^{\theta\theta}_{t-1/t-1}$ and $\Sigma^{\theta\theta}_{t/t-1}=\Sigma^{\theta\theta}_{t-1/t-1}$

where $F^x_{\theta t-1}$ is the derivative of the function with respect to $\theta$.

Correction:
a) $\Sigma^{\theta\theta}_{t/t}=\Sigma^{\theta\theta}_{t/t-1}-\Sigma^{\theta x}_{t/t-1}(\Sigma^{xx}_{t/t-1})^{-1}\Sigma^{x\theta}_{t/t-1}$

b) $\theta^m_{t/t}=\theta^m_{t/t-1}+\Sigma^{\theta x}_{t/t-1}(\Sigma^{xx}_{t/t-1})^{-1}[x^{a*}_{t}-x^*_{t}]$ and $\hat{x}_{t/t}=x_t^{a*}$

Thus, using the idea of Kalman Filter we update $cov(\theta)=\Sigma^{\theta\theta}$ and $\theta^m$.

This update of stochastic parameters will be done for each time period in each iteration using diverse random noises. With every random noise one obtains different changes in stochastic parameters and thus different results of the control problem. Then one can create distribution of the results, and the assertion about optimal solution of the problem can be deduced using mean and variance. In this way one hope to get more reliable results.

# 4 Conclusion/ Further work

Finally, I summarize the results and plans of my research regarding OPTCON with passive learning. The new version of the algorithm with the passive learning, OPTCON2, is developed by now theoretically as described above and the implementation in computer language C# is done. My plans for the near future include the testing of OPTCON2 on same existing macroeconomic models and comparing the results with the results of OPTCON.

# 5 Remarks

Open-loop solution is found on following way:
a) Initialization for backward recursion

b) Backward recursion: $T, ..., t$

• Linearize the system equations: $x_t=A_t(\theta)x_{t-1}+B_t(\theta)u_t+c_t(\theta)+\xi_t,$ for $T, ..., t$
• min $J$, get feedback rules: $u_t^*=G_tx_{t-1}^*+g_t$, $(G_T, ..., G_t$ and $g_T, ..., g_t)$

c) Forward recursion: $t, ..., T$

• $u_t^*=G_tx_{t-1}^*+g_t$
• $x_t^{*}=f(x_{t-1}^{a*},u^*_{t},\theta^m)$ for the time period $t$
• $x_t^{*}=f(x_{t-1}^{*},u^*_{t}, \theta^m)$ for the time periods $t+1, ..., T$

For the time period $t=S$: $x_{S-1}^{a*}=\overset{\circ}{x}_{S-1}$

Stop criteria:
- when the algorithm converges, i.e. when the optimal control and state variables
do not change more than a prespecified small number from one
iteration to the next
or
- when a prespecified number of iterations is reached