首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper deals with an optimal control problem of deterministic two-machine flowshops. Since the sizes of both internal and external buffers are practically finite, the problem is one with state constraints. The Hamilton-Jacobi-Bellman (HJB) equations of the problem involve complicated boundary conditions due to the presence of the state constraints, and as a consequence the usual “verification theorem” may not work for the problem. To overcome this difficulty, it is shown that any function satisfying the HJB equations in the interior of the state constraint domain must be majorized by the value function. The main techniques employed are the “constraint domain approximation” approach and the “weak-Lipschitz” property of the value functions developed in preceding papers. Based on this, an explicit optimal feedback control for the problem is obtained  相似文献   

2.
A method is presented for solving the infinite time Hamilton-Jacobi-Bellman (HJB) equation for certain state-constrained stochastic problems. The HJB equation is reformulated as an eigenvalue problem, such that the principal eigenvalue corresponds to the expected cost per unit time, and the corresponding eigenfunction gives the value function (up to an additive constant) for the optimal control policy. The eigenvalue problem is linear and hence there are fast numerical methods available for finding the solution.  相似文献   

3.
针对含扩散项不可靠随机生产系统最优生产控制的优化命题, 采用数值解方法来求解该优化命题最优控制所满足的模态耦合的非线性偏微分HJB方程. 首先构造Markov链来近似生产系统状态演化, 并基于局部一致性原理, 把求解连续时间随机控制问题转化为求解离散时间的Markov决策过程问题, 然后采用数值迭代和策略迭代算法来实现最优控制数值求解过程. 文末仿真结果验证了该方法的正确性和有效性.  相似文献   

4.
This paper considers mobile to base station power control for lognormal fading channels in wireless communication systems within a centralized information stochastic optimal control framework. Under a bounded power rate of change constraint, the stochastic control problem and its associated Hamilton-Jacobi-Bellman (HJB) equation are analyzed by the viscosity solution method; then the degenerate HJB equation is perturbed to admit a classical solution and a suboptimal control law is designed based on the perturbed HJB equation. When a quadratic type cost is used without a bound constraint on the control, the value function is a classical solution to the degenerate HJB equation and the feedback control is affine in the system power. In addition, in this case we develop approximate, but highly scalable, solutions to the HJB equation in terms of a local polynomial expansion of the exact solution. When the channel parameters are not known a priori, one can obtain on-line estimates of the parameters and get adaptive versions of the control laws. In numerical experiments with both of the above cost functions, the following phenomenon is observed: whenever the users have different initial conditions, there is an initial convergence of the power levels to a common level and then subsequent approximately equal behavior which converges toward a stochastically varying optimum.  相似文献   

5.
A sufficient condition to solve an optimal control problem is to solve the Hamilton–Jacobi–Bellman (HJB) equation. However, finding a value function that satisfies the HJB equation for a nonlinear system is challenging. For an optimal control problem when a cost function is provided a priori, previous efforts have utilized feedback linearization methods which assume exact model knowledge, or have developed neural network (NN) approximations of the HJB value function. The result in this paper uses the implicit learning capabilities of the RISE control structure to learn the dynamics asymptotically. Specifically, a Lyapunov stability analysis is performed to show that the RISE feedback term asymptotically identifies the unknown dynamics, yielding semi-global asymptotic tracking. In addition, it is shown that the system converges to a state space system that has a quadratic performance index which has been optimized by an additional control element. An extension is included to illustrate how a NN can be combined with the previous results. Experimental results are given to demonstrate the proposed controllers.  相似文献   

6.
The Hamilton–Jacobi–Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method.  相似文献   

7.
The Hamilton-Jacobi-Bellman (HJB) equation corresponding to constrained control is formulated using a suitable nonquadratic functional. It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS). The value function of this HJB equation is solved for by solving for a sequence of cost functions satisfying a sequence of Lyapunov equations (LE). A neural network is used to approximate the cost function associated with each LE using the method of least-squares on a well-defined region of attraction of an initial stabilizing controller. As the order of the neural network is increased, the least-squares solution of the HJB equation converges uniformly to the exact solution of the inherently nonlinear HJB equation associated with the saturating control inputs. The result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line.  相似文献   

8.
The purpose of this paper is to describe the application of the notion of viscosity solutions to solve the Hamilton-Jacobi-Bellman (HJB) equation associated with an important class of optimal control problems for quantum spin systems. The HJB equation that arises in the control problems of interest is a first-order nonlinear partial differential equation defined on a Lie group. Hence we employ recent extensions of the theory of viscosity solutions to Riemannian manifolds in order to interpret possibly non-differentiable solutions to this equation. Results from differential topology on the triangulation of manifolds are then used develop a finite difference approximation method for numerically computing the solution to such problems. The convergence of these approximations is proven using viscosity solution methods. In order to illustrate the techniques developed, these methods are applied to an example problem.  相似文献   

9.
随机运动目标搜索问题的最优控制模型   总被引:1,自引:0,他引:1  
提出了Rn空间中做布朗运动的随机运动目标的搜索问题的最优控制模型.采用分析的方法来研究随机运动目标的最优搜索问题,并将原问题转化为由一个二阶偏微分方程(HJB方程)所表示的确定性分布参数系统的等价问题,推导出随机运动目标的最优搜索问题的HJB方程,并证明了该方程的解即是所寻求的最优搜索策略.由此给出了一个计算最优搜索策略的算法和一个实例.  相似文献   

10.
针对工业环境下无线传感器网络系统在受到外部较大干扰时的系统稳定性问题,提出Hamilton-Jacobi-Bellman (HJB)方程与Minimax控制相结合的方法.首先,针对无线传感器网络在复杂工况环境下出现的网络时延和连续丢包有界的情况,给出具有时延和丢包的无线传感器网络系统模型;然后,在Minimax性能指标...  相似文献   

11.
将ENO格式和径向基函数插值相结合,提出了求解双曲型偏微分方程的径向基函数插值的ENO方法。该方法依据ENO思想建立自适应模板,在选定的模板上利用径向基函数进行逼近,能够很好地处理具有间断解的问题,消除间断点处数值振荡现象。以一维双曲型偏微分方程为例,对该方法进行了验证,并通过与多项式ENO格式比较,表明该方法更具有优势。  相似文献   

12.
Principle of optimality or dynamic programming leads to derivation of a partial differential equation (PDE) for solving optimal control problems, namely the Hamilton‐Jacobi‐Bellman (HJB) equation. In general, this equation cannot be solved analytically; thus many computing strategies have been developed for optimal control problems. Many problems in financial mathematics involve the solution of stochastic optimal control (SOC) problems. In this work, the variational iteration method (VIM) is applied for solving SOC problems. In fact, solutions for the value function and the corresponding optimal strategies are obtained numerically. We solve a stochastic linear regulator problem to investigate the applicability and simplicity of the presented method and prove its convergence. In particular, for Merton's portfolio selection model as a problem of portfolio optimization, the proposed numerical method is applied for the first time and its usefulness is demonstrated. For the nonlinear case, we investigate its convergence using Banach's fixed point theorem. The numerical results confirm the simplicity and efficiency of our method.  相似文献   

13.
In this paper, an event-triggered safe control method based on adaptive critic learning (ACL) is proposed for a class of nonlinear safety-critical systems. First, a safe cost function is constructed by adding a control barrier function (CBF) to the traditional quadratic cost function; the optimization problem with safety constraints that is difficult to deal with by classical ACL methods is solved. Subsequently, the event-triggered scheme is introduced to reduce the amount of computation. Further, combining the properties of CBF with the ACL-based event-triggering mechanism, the event-triggered safe Hamilton–Jacobi–Bellman (HJB) equation is derived, and a single critic neural network (NN) framework is constructed to approximate the solution of the event-triggered safe HJB equation. In addition, the concurrent learning method is applied to the NN learning process, so that the persistence of excitation (PE) condition is not required. The weight approximation error of the NN and the states of the system are proven to be uniformly ultimately bounded (UUB) in the safe set with the Lyapunov theory. Finally, the availability of the presented method can be validated through the simulation.  相似文献   

14.
In this paper, a new formulation for the optimal tracking control problem (OTCP) of continuous-time nonlinear systems is presented. This formulation extends the integral reinforcement learning (IRL) technique, a method for solving optimal regulation problems, to learn the solution to the OTCP. Unlike existing solutions to the OTCP, the proposed method does not need to have or to identify knowledge of the system drift dynamics, and it also takes into account the input constraints a priori. An augmented system composed of the error system dynamics and the command generator dynamics is used to introduce a new nonquadratic discounted performance function for the OTCP. This encodes the input constrains into the optimization problem. A tracking Hamilton–Jacobi–Bellman (HJB) equation associated with this nonquadratic performance function is derived which gives the optimal control solution. An online IRL algorithm is presented to learn the solution to the tracking HJB equation without knowing the system drift dynamics. Convergence to a near-optimal control solution and stability of the whole system are shown under a persistence of excitation condition. Simulation examples are provided to show the effectiveness of the proposed method.  相似文献   

15.
In this paper, we present an empirical study of iterative least squares minimization of the Hamilton-Jacobi-Bellman (HJB) residual with a neural network (NN) approximation of the value function. Although the nonlinearities in the optimal control problem and NN approximator preclude theoretical guarantees and raise concerns of numerical instabilities, we present two simple methods for promoting convergence, the effectiveness of which is presented in a series of experiments. The first method involves the gradual increase of the horizon time scale, with a corresponding gradual increase in value function complexity. The second method involves the assumption of stochastic dynamics which introduces a regularizing second derivative term to the HJB equation. A gradual reduction of this term provides further stabilization of the convergence. We demonstrate the solution of several problems, including the 4-D inverted-pendulum system with bounded control. Our approach requires no initial stabilizing policy or any restrictive assumptions on the plant or cost function, only knowledge of the plant dynamics. In the Appendix, we provide the equations for first- and second-order differential backpropagation.  相似文献   

16.
This paper considers optimal consensus control problem for unknown nonlinear multiagent systems (MASs) subjected to control constraints by utilizing event‐triggered adaptive dynamic programming (ETADP) technique. To deal with the control constraints, we introduce nonquadratic energy consumption functions into performance indices and formulate the Hamilton‐Jacobi‐Bellman (HJB) equations. Then, based on the Bellman's optimality principle, constrained optimal consensus control policies are designed from the HJB equations. In order to implement the ETADP algorithm, the critic networks and action networks are developed to approximate the value functions and consensus control policies respectively based on the measurable system data. Under the event‐triggered control framework, the weights of the critic networks and action networks are only updated at the triggering instants which are decided by the designed adaptive triggered conditions. The Lyapunov method is used to prove that the local neighbor consensus errors and the weight estimation errors of the critic networks and action networks are ultimately bounded. Finally, a numerical example is provided to show the effectiveness of the proposed ETADP method.  相似文献   

17.
This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints. In order to obtain the optimal strategy of each agent, it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman (HJB) equations. It is very difficult to solve HJB equations by the traditional method. The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained. In this paper, an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents. Actually, this algorithm is to find the optimal solution of Bellman equations online. This solution employs a distributed policy iteration process, using only the local information available to each agent. It can be proved that under certain conditions, when each agent updates its own strategy simultaneously, the whole multi-agent system will reach Nash equilibrium. In the process of algorithm implementation, for each agent, two layers of neural networks are used to fit the value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.  相似文献   

18.
Consideration is given to the control problem of motion of a linear oscillator, which is subject to the external Gaussian and Poisson random actions, with the aim to minimize the mean energy with the aid of the external bounded control force. The hybrid solution method is suggested for the solution of the stated problem. This method relies on the search in a portion of the phase space for the exact analytical solution of the appropriate Hamilton-Jacobi-Bellman (HJB) equation and the numerical solution of this equation in the remaining (bounded) portion of the space. It is proved that the found analytical solutions represent the asymptotics of solutions of the Hamilton-Jacobi-Bellman equation. With the aid of the decomposition method, the obtained results are applied to the problem for the suppression with the aid of the actuator of vibrations of an elastic rod (plate) that is found to be under the action of Gaussian random actions. Results of the numerical modeling are given.  相似文献   

19.
Optimal portfolios with regime switching and value-at-risk constraint   总被引:1,自引:0,他引:1  
We consider the optimal portfolio selection problem subject to a maximum value-at-Risk (MVaR) constraint when the price dynamics of the risky asset are governed by a Markov-modulated geometric Brownian motion (GBM). Here, the market parameters including the market interest rate of a bank account, the appreciation rate and the volatility of the risky asset switch over time according to a continuous-time Markov chain, whose states are interpreted as the states of an economy. The MVaR is defined as the maximum value of the VaRs of the portfolio in a short time duration over different states of the chain. We formulate the problem as a constrained utility maximization problem over a finite time horizon. By utilizing the dynamic programming principle, we shall first derive a regime-switching Hamilton-Jacobi-Bellman (HJB) equation and then a system of coupled HJB equations. We shall employ an efficient numerical method to solve the system of coupled HJB equations for the optimal constrained portfolio. We shall provide numerical results for the sensitivity analysis of the optimal portfolio, the optimal consumption and the VaR level with respect to model parameters. These results are also used to investigating the effect of the switching regimes.  相似文献   

20.

In this technical note, we revisit the risk-sensitive optimal control problem for Markov jump linear systems (MJLSs). We first demonstrate the inherent difficulty in solving the risk-sensitive optimal control problem even if the system is linear and the cost function is quadratic. This is due to the nonlinear nature of the coupled set of Hamilton-Jacobi-Bellman (HJB) equations, stemming from the presence of the jump process. It thus follows that the standard quadratic form of the value function with a set of coupled Riccati differential equations cannot be a candidate solution to the coupled HJB equations. We subsequently show that there is no equivalence relationship between the problems of risk-sensitive control and H control of MJLSs, which are shown to be equivalent in the absence of any jumps. Finally, we show that there does not exist a large deviation limit as well as a risk-neutral limit of the risk-sensitive optimal control problem due to the presence of a nonlinear coupling term in the HJB equations.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号