首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 373 毫秒
1.
A sufficient condition to solve an optimal control problem is to solve the Hamilton–Jacobi–Bellman (HJB) equation. However, finding a value function that satisfies the HJB equation for a nonlinear system is challenging. For an optimal control problem when a cost function is provided a priori, previous efforts have utilized feedback linearization methods which assume exact model knowledge, or have developed neural network (NN) approximations of the HJB value function. The result in this paper uses the implicit learning capabilities of the RISE control structure to learn the dynamics asymptotically. Specifically, a Lyapunov stability analysis is performed to show that the RISE feedback term asymptotically identifies the unknown dynamics, yielding semi-global asymptotic tracking. In addition, it is shown that the system converges to a state space system that has a quadratic performance index which has been optimized by an additional control element. An extension is included to illustrate how a NN can be combined with the previous results. Experimental results are given to demonstrate the proposed controllers.  相似文献   

2.
The asymmetric input-constrained optimal synchronization problem of heterogeneous unknown nonlinear multiagent systems(MASs) is considered in the paper. Intuitively,a state-space transformation is performed such that satisfaction of symmetric input constraints for the transformed system guarantees satisfaction of asymmetric input constraints for the original system. Then, considering that the leader’s information is not available to every follower, a novel distributed observer is designed to est...  相似文献   

3.
In this paper, a Hamilton–Jacobi–Bellman (HJB) equation–based optimal control algorithm for robust controller design is proposed for nonlinear systems. The HJB equation is formulated using a suitable nonquadratic term in the performance functional to tackle constraints on the control input. Utilizing the direct method of Lyapunov stability, the controller is shown to be optimal with respect to a cost functional, which includes penalty on the control effort and the maximum bound on system uncertainty. The bounded controller requires the knowledge of the upper bound of system uncertainty. In the proposed algorithm, neural network is used to approximate the solution of HJB equation using least squares method. Proposed algorithm has been applied on the nonlinear system with matched and unmatched type system uncertainties and uncertainties in the input matrix. Necessary theoretical and simulation results are presented to validate proposed algorithm.  相似文献   

4.
The Hamilton-Jacobi-Bellman (HJB) equation corresponding to constrained control is formulated using a suitable nonquadratic functional. It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS). The value function of this HJB equation is solved for by solving for a sequence of cost functions satisfying a sequence of Lyapunov equations (LE). A neural network is used to approximate the cost function associated with each LE using the method of least-squares on a well-defined region of attraction of an initial stabilizing controller. As the order of the neural network is increased, the least-squares solution of the HJB equation converges uniformly to the exact solution of the inherently nonlinear HJB equation associated with the saturating control inputs. The result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line.  相似文献   

5.
This paper proposes an online adaptive approximate solution for the infinite-horizon optimal tracking control problem of continuous-time nonlinear systems with unknown dynamics. The requirement of the complete knowledge of system dynamics is avoided by employing an adaptive identifier in conjunction with a novel adaptive law, such that the estimated identifier weights converge to a small neighborhood of their ideal values. An adaptive steady-state controller is developed to maintain the desired tracking performance at the steady-state, and an adaptive optimal controller is designed to stabilize the tracking error dynamics in an optimal manner. For this purpose, a critic neural network (NN) is utilized to approximate the optimal value function of the Hamilton-Jacobi-Bellman (HJB) equation, which is used in the construction of the optimal controller. The learning of two NNs, i.e., the identifier NN and the critic NN, is continuous and simultaneous by means of a novel adaptive law design methodology based on the parameter estimation error. Stability of the whole system consisting of the identifier NN, the critic NN and the optimal tracking control is guaranteed using Lyapunov theory; convergence to a near-optimal control law is proved. Simulation results exemplify the effectiveness of the proposed method.   相似文献   

6.
In this paper the solution of a stochastic optimal control problem described by linear equations of motion and a nonquadratic performance index is presented. The theory is then applied to the dynamics of a single-foil and a hydrofoil boat flying on rough water. The random disturbances caused by sea waves are represented as the response of an auxiliary system to a white noise input. The control objective is formulated as an integral performance index containing a quadratic acceleration term and a nonquadratic term of the submergence deviation of the foil from calm water submergence. The stochastic version of the maximum principle is used in the formulation of a feedback control law. The Riccati equations and the feedback gains associated with a nonquadratic performance index are non-linear functions of the state and auxiliary state variables. These equations are integrated forward with the state equations for the steady-state solution of the problem. The controller for a nonquadratic performance index contains computing elements which perform the integration of the Riccati equations to generate the instantaneous values of the feedback gains. The effect of a nonquadratic penalty on the submergence deviation and the effect of a nonquadratic control penalty on the response of the system are investigated. A comparison between an optimal nonlinear control law and a suboptimal linear control law is presented.  相似文献   

7.
This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints. In order to obtain the optimal strategy of each agent, it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman (HJB) equations. It is very difficult to solve HJB equations by the traditional method. The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained. In this paper, an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents. Actually, this algorithm is to find the optimal solution of Bellman equations online. This solution employs a distributed policy iteration process, using only the local information available to each agent. It can be proved that under certain conditions, when each agent updates its own strategy simultaneously, the whole multi-agent system will reach Nash equilibrium. In the process of algorithm implementation, for each agent, two layers of neural networks are used to fit the value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.  相似文献   

8.
Considering overshoot and chatter caused by the unknown interference, this article studies the adaptive robust optimal controls of continuous-time (CT) multi-input systems with an approximate dynamic programming (ADP) based Q-function scheme. An adaptive integral reinforcement learning (IRL) scheme is proposed to study the optimal solutions of Q-functions. First, multi-input value functions are presented, and Nash equilibrium is analyzed. A complex Hamilton–Jacobi–Issacs (HJI) equation is constructed with the multi-input system and the zero-sum-game-based value function. It is a challenging task to solve the HJI equation for nonlinear system. Thus, A transformation of the HJI equation is constructed as a Q-function. The neural network (NN) is applied to learn the solution of the transformed Q-functions based on the adaptive IRL scheme. Moreover, an error information is added to the Q-function for the issue of insufficient initial incentives to relax the persistent excitation (PE) condition. Simultaneously, an IRL signal of the critic networks is introduced to study the saddle-point intractable solution, such that the system drift and NN derivatives in the HJI equation are relaxed. The convergence of weight parameters is proved, and the closed-loop stability of the multi-system with the proposed IRL Q-function scheme is analyzed. Finally, a two-engine driven F-16 aircraft plant and a nonlinear system are presented to verify the effectiveness of the proposed adaptive IRL Q-function scheme.  相似文献   

9.
随机运动目标搜索问题的最优控制模型   总被引:1,自引:0,他引:1  
提出了Rn空间中做布朗运动的随机运动目标的搜索问题的最优控制模型.采用分析的方法来研究随机运动目标的最优搜索问题,并将原问题转化为由一个二阶偏微分方程(HJB方程)所表示的确定性分布参数系统的等价问题,推导出随机运动目标的最优搜索问题的HJB方程,并证明了该方程的解即是所寻求的最优搜索策略.由此给出了一个计算最优搜索策略的算法和一个实例.  相似文献   

10.
罗艳红  张化光  曹宁  陈兵 《自动化学报》2009,35(11):1436-1445
提出一种贪婪迭代DHP (Dual heuristic programming)算法, 解决了一类控制受约束非线性系统的近似最优镇定问题. 针对系统的控制约束, 首先引入一个非二次泛函把约束问题转换为无约束问题, 然后基于协状态函数提出一种贪婪迭代DHP算法以求解系统的HJB (Hamilton-Jacobi-Bellman)方程. 在算法的每个迭代步, 利用一个神经网络来近似系统的协状态函数, 而后根据协状态函数直接计算系统的最优控制策略, 从而消除了常规近似动态规划方法中的控制网络. 最后通过两个仿真例子证明了本文提出的最优控制方案的有效性和可行性.  相似文献   

11.
The Hamilton–Jacobi–Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method.  相似文献   

12.
In this paper, an observer design is proposed for nonlinear systems. The Hamilton–Jacobi–Bellman (HJB) equation based formulation has been developed. The HJB equation is formulated using a suitable non-quadratic term in the performance functional to tackle magnitude constraints on the observer gain. Utilizing Lyapunov's direct method, observer is proved to be optimal with respect to meaningful cost. In the present algorithm, neural network (NN) is used to approximate value function to find approximate solution of HJB equation using least squares method. With time-varying HJB solution, we proposed a dynamic optimal observer for the nonlinear system. Proposed algorithm has been applied on nonlinear systems with finite-time-horizon and infinite-time-horizon. Necessary theoretical and simulation results are presented to validate proposed algorithm.  相似文献   

13.
张绍杰  吴雪  刘春生 《自动化学报》2018,44(12):2188-2197
本文针对一类具有执行器故障的多输入多输出(Multi-input multi-output,MIMO)不确定连续仿射非线性系统,提出了一种最优自适应输出跟踪控制方案.设计了保证系统稳定性的不确定项估计神经网络权值调整算法,仅采用评价网络即可同时获得无限时域代价函数和满足哈密顿-雅可比-贝尔曼(Hamilton-Jacobi-Bellman,HJB)方程的最优控制输入.考虑执行器卡死和部分失效故障,设计最优自适应补偿控制律,所设计的控制律可以实现对参考输出的一致最终有界跟踪.飞行器控制仿真和对比验证表明了本文方法的有效性和优越性.  相似文献   

14.
In this paper, an integral reinforcement learning (IRL) algorithm on an actor–critic structure is developed to learn online the solution to the Hamilton–Jacobi–Bellman equation for partially-unknown constrained-input systems. The technique of experience replay is used to update the critic weights to solve an IRL Bellman equation. This means, unlike existing reinforcement learning algorithms, recorded past experiences are used concurrently with current data for adaptation of the critic weights. It is shown that using this technique, instead of the traditional persistence of excitation condition which is often difficult or impossible to verify online, an easy-to-check condition on the richness of the recorded data is sufficient to guarantee convergence to a near-optimal control law. Stability of the proposed feedback control law is shown and the effectiveness of the proposed method is illustrated with simulation examples.  相似文献   

15.
This paper considers the near-optimal tracking control problem for discrete-time systems with delayed input. Using a variable transformation, the system with delayed input is transformed into a non-delayed system, and the quadratic performance index of the optimal tracking control is transformed into a relevant format. The optimal tracking control law is constructed by the solution of a Riccati matrix equation and a Stein matrix equation. A reduced-order observer is constructed to solve the physically realizable problem of the feedforward compensator and a near-optimal tracking control is obtained. Simulation results demonstrate the effectiveness of the optimal tracking control law.  相似文献   

16.
In statistical control, the cost function is viewed as a random variable and one optimizes the distribution of the cost function through the cost cumulants. We consider a statistical control problem for a control-affine nonlinear system with a nonquadratic cost function. Using the Dynkin formula, the Hamilton-Jacobi-Bellman equation for the nth cost moment case is derived as a necessary condition for optimality and corresponding sufficient conditions are also derived. Utilizing the nth moment results, the higher order cost cumulant Hamilton-Jacobi-Bellman equations are derived. In particular, we derive HJB equations for the second, third, and fourth cost cumulants. Even though moments and cumulants are similar mathematically, in control engineering higher order cumulant control shows a greater promise in contrast to cost moment control. We present the solution for a control-affine nonlinear system using the derived Hamilton-Jacobi-Bellman equation, which we solve numerically using a neural network method.  相似文献   

17.
In this study, a finite-time online optimal controller was designed for a nonlinear wheeled mobile robotic system (WMRS) with inequality constraints, based on reinforcement learning (RL) neural networks. In addition, an extended cost function, obtained by introducing a penalty function to the original long-time cost function, was proposed to deal with the optimal control problem of the system with inequality constraints. A novel Hamilton-Jacobi-Bellman (HJB) equation containing the constraint conditions was defined to determine the optimal control input. Furthermore, two neural networks (NNs), a critic and an actor NN, were established to approximate the extended cost function and the optimal control input, respectively. The adaptation laws of the critic and actor NN were obtained with the gradient descent method. The semi-global practical finite-time stability (SGPFS) was proved using Lyapunov's stability theory. The tracking error converges to a small region near zero within the constraints in a finite period. Finally, the effectiveness of the proposed optimal controller was verified by a simulation based on a practical wheeled mobile robot model.  相似文献   

18.
This paper considers mobile to base station power control for lognormal fading channels in wireless communication systems within a centralized information stochastic optimal control framework. Under a bounded power rate of change constraint, the stochastic control problem and its associated Hamilton-Jacobi-Bellman (HJB) equation are analyzed by the viscosity solution method; then the degenerate HJB equation is perturbed to admit a classical solution and a suboptimal control law is designed based on the perturbed HJB equation. When a quadratic type cost is used without a bound constraint on the control, the value function is a classical solution to the degenerate HJB equation and the feedback control is affine in the system power. In addition, in this case we develop approximate, but highly scalable, solutions to the HJB equation in terms of a local polynomial expansion of the exact solution. When the channel parameters are not known a priori, one can obtain on-line estimates of the parameters and get adaptive versions of the control laws. In numerical experiments with both of the above cost functions, the following phenomenon is observed: whenever the users have different initial conditions, there is an initial convergence of the power levels to a common level and then subsequent approximately equal behavior which converges toward a stochastically varying optimum.  相似文献   

19.
In this paper, a novel theoretic formulation based on adaptive dynamic programming (ADP) is developed to solve online the optimal tracking problem of the continuous-time linear system with unknown dynamics. First, the original system dynamics and the reference trajectory dynamics are transformed into an augmented system. Then, under the same performance index with the original system dynamics, an augmented algebraic Riccati equation is derived. Furthermore, the solutions for the optimal control problem of the augmented system are proven to be equal to the standard solutions for the optimal tracking problem of the original system dynamics. Moreover, a new online algorithm based on the ADP technique is presented to solve the optimal tracking problem of the linear system with unknown system dynamics. Finally, simulation results are given to verify the effectiveness of the theoretic results.  相似文献   

20.
In this article, the event-triggered optimal tracking control problem for multiplayer unknown nonlinear systems is investigated by using adaptive critic designs. By constructing a neural network (NN)-based observer with input–output data, the system dynamics of multiplayer unknown nonlinear systems is obtained. Subsequently, the optimal tracking control problem is converted to an optimal regulation problem by establishing a tracking error system. Then, the optimal tracking control policy for each player is derived by solving coupled event-triggered Hamilton-Jacobi (HJ) equation via a critic NN. Meanwhile, a novel weight updating rule is designed by adopting concurrent learning method to relax the persistence of excitation (PE) condition. Moreover, an event-triggering condition is designed by using Lyapunov's direct method to guarantee the uniform ultimate boundedness (UUB) of the closed-loop multiplayer systems. Finally, the effectiveness of the developed method is verified by two different multiplayer nonlinear systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号