首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper is a sequel to the analysis of finite-step approximations in solving controlled Markov set-chains for infinite horizon discounted reward by the author. For average-reward-controlled Markov set-chains with finite state and action spaces, we develop a value-iteration-type algorithm and analyze an error bound relative to the optimal average reward that satisfies an optimality equation from the successive approximation under an ergodicity condition. We further analyze an error bound of the rolling horizon control policy defined from a finite-step approximate value by applying the value-iteration-type algorithm.   相似文献   

2.
The Hamilton–Jacobi–Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method.  相似文献   

3.
Part production is considered over a finite horizon in a single-part multiple-failure mode manufacturing system. When the rate of demand for parts is constant, for Markovian machine-mode dynamics and for convex running cost functions associated with part inventories or backlogs, it is known that optimal part-production policies are of the so-called hedging type. For the infinite-horizon case, such policies are characterized by a set of constant critical machine-mode dependent inventory levels that must be aimed at and maintained whenever possible. For the finite-horizon (transient) case, the critical levels still exist, but they are now time-varying and in general very difficult to characterize. Thus, in an attempt to render the problem tractable, transient production optimization is sought within the (suboptimal) class of time-invariant hedging control policies, a renewal equation is developed for the cost functional over finite horizon under an arbitrary time-invariant hedging control policy  相似文献   

4.
5.
In this paper we consider nonautonomous optimal control problems of infinite horizon type, whose control actions are given by L1-functions. We verify that the value function is locally Lipschitz. The equivalence between dynamic programming inequalities and Hamilton–Jacobi–Bellman (HJB) inequalities for proximal sub (super) gradients is proven. Using this result we show that the value function is a Dini solution of the HJB equation. We obtain a verification result for the class of Dini sub-solutions of the HJB equation and also prove a minimax property of the value function with respect to the sets of Dini semi-solutions of the HJB equation. We introduce the concept of viscosity solutions of the HJB equation in infinite horizon and prove the equivalence between this and the concept of Dini solutions. In the Appendix we provide an existence theorem.  相似文献   

6.
An approach to solve finite time horizon suboptimal feedback control problems for partial differential equations is proposed by solving dynamic programming equations on adaptive sparse grids. A semi-discrete optimal control problem is introduced and the feedback control is derived from the corresponding value function. The value function can be characterized as the solution of an evolutionary Hamilton–Jacobi Bellman (HJB) equation which is defined over a state space whose dimension is equal to the dimension of the underlying semi-discrete system. Besides a low dimensional semi-discretization it is important to solve the HJB equation efficiently to address the curse of dimensionality. We propose to apply a semi-Lagrangian scheme using spatially adaptive sparse grids. Sparse grids allow the discretization of the value functions in (higher) space dimensions since the curse of dimensionality of full grid methods arises to a much smaller extent. For additional efficiency an adaptive grid refinement procedure is explored. The approach is illustrated for the wave equation and an extension to equations of Schrödinger type is indicated. We present several numerical examples studying the effect the parameters characterizing the sparse grid have on the accuracy of the value function and the optimal trajectory.  相似文献   

7.
This paper is concerned with the stability of a class of receding horizon control (RHC) laws for constrained linear discrete-time systems subject to bounded state disturbances and convex state and input constraints. The paper considers the class of finite horizon feedback control policies parameterized as affine functions of the system state, calculation of which can be shown to be tractable via a convex reparameterization. When minimizing the expected value of a finite horizon quadratic cost, we show that the value function is convex. When solving this optimal control problem at each time step and implementing the result in a receding horizon fashion, we provide sufficient conditions under which the closed-loop system is input-to-state stable (ISS).  相似文献   

8.
This paper is dedicated to the study of continuous-time mean–variance optimal portfolio selection problem with non-linear wealth equations under non-extensive statistical mechanics for the time-varying stochastic differential equation model. Firstly, we allow the returns and variance of risky assets are time-varying functions, which can fit the financial data better. Secondly, we consider an investor with the non-linear wealth equation. In fact, the wealth equations are not linear in many cases. The investor has to pay some taxes, which leads to a non-linear wealth equation. Moreover, since the return of the stocks price may be affected by a large investors portfolio selection, the wealth equation is non-linear in this case. Thirdly, the non-linear wealth equation driven by Tsallis distribution is constructed under non-extensive statistical mechanics, which can capture the characteristics of fat tails and aiguilles of the risky asset’s return. The viscosity solution of the HJB equation for the portfolio problem is proposed by the optimal stochastic control theory and Lagrange multiplier method. Finally, the efficient portfolio strategy and efficient frontier under non-extensive statistical mechanics are obtained. Furthermore, numerical analysis and real data study are discussed to show our results.  相似文献   

9.
An investment problem is considered with dynamic mean–variance(M-V) portfolio criterion under discontinuous prices which follow jump–diffusion processes according to the actual prices of stocks and the normality and stability of the financial market. The short-selling of stocks is prohibited in this mathematical model. Then, the corresponding stochastic Hamilton–Jacobi–Bellman(HJB) equation of the problem is presented and the solution of the stochastic HJB equation based on the theory of stochastic LQ control and viscosity solution is obtained. The efficient frontier and optimal strategies of the original dynamic M-V portfolio selection problem are also provided. And then, the effects on efficient frontier under the value-at-risk constraint are illustrated. Finally, an example illustrating the discontinuous prices based on M-V portfolio selection is presented.  相似文献   

10.
Optimal portfolios with regime switching and value-at-risk constraint   总被引:1,自引:0,他引:1  
We consider the optimal portfolio selection problem subject to a maximum value-at-Risk (MVaR) constraint when the price dynamics of the risky asset are governed by a Markov-modulated geometric Brownian motion (GBM). Here, the market parameters including the market interest rate of a bank account, the appreciation rate and the volatility of the risky asset switch over time according to a continuous-time Markov chain, whose states are interpreted as the states of an economy. The MVaR is defined as the maximum value of the VaRs of the portfolio in a short time duration over different states of the chain. We formulate the problem as a constrained utility maximization problem over a finite time horizon. By utilizing the dynamic programming principle, we shall first derive a regime-switching Hamilton-Jacobi-Bellman (HJB) equation and then a system of coupled HJB equations. We shall employ an efficient numerical method to solve the system of coupled HJB equations for the optimal constrained portfolio. We shall provide numerical results for the sensitivity analysis of the optimal portfolio, the optimal consumption and the VaR level with respect to model parameters. These results are also used to investigating the effect of the switching regimes.  相似文献   

11.
We give conditions for the existence of average optimal policies for continuous-time controlled Markov chains with a denumerable state-space and Borel action sets. The transition rates are allowed to be unbounded, and the reward/cost rates may have neither upper nor lower bounds. In the spirit of the "drift and monotonicity" conditions for continuous-time Markov processes, we propose a new set of conditions on the controlled process' primitive data under which the existence of optimal (deterministic) stationary policies in the class of randomized Markov policies is proved using the extended generator approach instead of Kolmogorov's forward equation used in the previous literature, and under which the convergence of a policy iteration method is also shown. Moreover, we use a controlled queueing system to show that all of our conditions are satisfied, whereas those in the previous literature fail to hold.  相似文献   

12.
In this paper we consider infinite horizon risk-sensitive control of Markov processes with discrete time and denumerable state space. This problem is solved by proving, under suitable conditions, that there exists a bounded solution to the dynamic programming equation. The dynamic programming equation is transformed into an Isaacs equation for a stochastic game, and the vanishing discount method is used to study its solution. In addition, we prove that the existence conditions are also necessary.  相似文献   

13.
This paper considers mobile to base station power control for lognormal fading channels in wireless communication systems within a centralized information stochastic optimal control framework. Under a bounded power rate of change constraint, the stochastic control problem and its associated Hamilton-Jacobi-Bellman (HJB) equation are analyzed by the viscosity solution method; then the degenerate HJB equation is perturbed to admit a classical solution and a suboptimal control law is designed based on the perturbed HJB equation. When a quadratic type cost is used without a bound constraint on the control, the value function is a classical solution to the degenerate HJB equation and the feedback control is affine in the system power. In addition, in this case we develop approximate, but highly scalable, solutions to the HJB equation in terms of a local polynomial expansion of the exact solution. When the channel parameters are not known a priori, one can obtain on-line estimates of the parameters and get adaptive versions of the control laws. In numerical experiments with both of the above cost functions, the following phenomenon is observed: whenever the users have different initial conditions, there is an initial convergence of the power levels to a common level and then subsequent approximately equal behavior which converges toward a stochastically varying optimum.  相似文献   

14.
针对工业环境下无线传感器网络系统在受到外部较大干扰时的系统稳定性问题,提出Hamilton-JacobiBellman (HJB)方程与Minimax控制相结合的方法.首先,针对无线传感器网络在复杂工况环境下出现的网络时延和连续丢包有界的情况,给出具有时延和丢包的无线传感器网络系统模型;然后,在Minimax性能指标函数下,利用HJB方程设计系统的Minimax最优控制器,进一步通过检验函数得出有关最大干扰的表达形式,从而推导出系统稳定的充分条件;最后,通过数值算例和仿真验证系统在突发较大干扰时采用所提方法的可行性和有效性.  相似文献   

15.
We use a distributed parallel genetic algorithm (DPGA) to fund numerical solutions to a single state deterministic optimal growth model for both the infinite and finite horizon cases. To evaluate the DPGA we consider a version of the Taylor-Uhlig problem for which the analytical solutions are known. The first-order conditions for the infinite horizon case lead to a nonlinear integral equation whose solution we approximate using a Chebyshev polynomial series expansion. The DPGA is used to search the parameter space for the optimal fit for this function. For the finite horizon case the DPGA searches the state space for a sequence of states which maximize the agent's discounted utility over the finite horizon. The DPGA runs on a cluster of up to fifty workstations linked via PVM. The topology of the function to be optimized is mapped onto each node on the cluster and the nodes essentially complete with one another for the optimal solution. We demonstrate that the DPGA has several useful features. For instance, the DPGA solves the exact Euler equation over the full range of the state variable and it does not require an accurate initial guess. The DPGA is easily generalized to multiple state problems.  相似文献   

16.
The finite-horizon H filtering and prediction problems in the discrete-time case admit solutions only if a suitable difference Riccati equation is solved by a matrix sequence satisfying “feasibility” conditions. In this work, sufficient conditions ensuring the existence of filters and predictors over an arbitrarily long time interval are derived. Moreover, it is shown that, under these conditions, finite horizon estimators tend to stable stationary ones as the time horizon increases  相似文献   

17.
This paper studies the problem of the existence of stationary optimal policies for finite state controlled Markov chains, with compact action space and imperfect observations, under the long-run average cost criterion. It presents sufficient conditions for existence of solutions to the associated dynamic programming equation, that strengthen past results. There is a detailed discussion comparing the different assumptions commonly found in the literature.  相似文献   

18.
We consider Markov decision processes with denumerable state space and finite control sets; the performance index of a control policy is a long-run expected average cost criterion and the cost function is bounded below. For these models, the existence of average optimal stationary policies was recently established in [11] under very general assumptions. Such a result was obtained via an optimality inequality. Here, we use a simple example to prove that the conditions in [11] do not imply the existence of a solution to the average cost optimality equation.  相似文献   

19.
In this paper, we approach supervisory control as an online decision problem. In particular, we introduce ldquocalibrated forecastsrdquo as a mechanism for controller selection in supervisory control. The forecasted quantity is a candidate controller's performance level, or reward, over finite implementation horizon. Controller selection is based on using the controller with the maximum calibrated forecast of the reward. The proposed supervisor does not perform a pre-routed search of candidate controllers and does not require the presence of exogenous inputs for excitation or identification. Assuming the existence of a stabilizing controller within the set of candidate controllers, we show that under the proposed supervisory controller, the output of the system remains bounded for any bounded disturbance, even if the disturbance is chosen in an adversarial manner. The use of calibrated forecasts enables one to establish overall performance guarantees for the supervisory scheme even though non-stabilizing controllers may be persistently selected by the supervisor because of the effects of initial conditions, exogenous disturbances, or random selection. The main results are obtained for a general class of system dynamics and specialized to linear systems.  相似文献   

20.
This paper investigates the linear quadratic regulation (LQR) problem for discrete-time systems with multiplicative noise. Multiplicative noise is usually assumed to be a scalar in existing literature works. Motivated by recent applications of networked control systems and MIMO communication technology, we consider multi-channel multiplicative noise represented by a diagonal matrix. We first show that the finite horizon LQR problem can be solved using a generalized Riccati equation. We then prove the convergence of the generalized Riccati equation under the conditions of stabilization and exact observability, and obtain the solution to the infinite horizon LQR problem. Finally, we provide a numerical example to demonstrate the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号