期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于数据自适应评判的离散2-D系统零和博弈最优控制 总被引：1，自引：1，他引：0

魏庆来张化光崔黎黎《自动化学报》2009,35(6):682-692

提出了基于一种迭代自适应评判设计(ACD)算法解决一类离散时间Roesser型2-D系统的二人零和对策问题. 文章主要思想是采用自适应评判技术迭代的获得最优控制对使得性能指标函数达到零和对策的鞍点. 所提出的ACD可以通过输入输出数据进行实现而不需要系统的模型. 为了实现迭代ACD算法, 神经网络分别用来近似性能指标函数和计算最优控制率. 最后最优控制策略将应用到空气干燥过程控制中以证明其有效性. 相似文献

2.

一类控制受约束非线性系统的基于单网络贪婪迭代DHP算法的近似最优镇定

罗艳红张化光曹宁陈兵《自动化学报》2009,35(11):1436-1445

提出一种贪婪迭代DHP (Dual heuristic programming)算法, 解决了一类控制受约束非线性系统的近似最优镇定问题. 针对系统的控制约束, 首先引入一个非二次泛函把约束问题转换为无约束问题, 然后基于协状态函数提出一种贪婪迭代DHP算法以求解系统的HJB (Hamilton-Jacobi-Bellman)方程. 在算法的每个迭代步, 利用一个神经网络来近似系统的协状态函数, 而后根据协状态函数直接计算系统的最优控制策略, 从而消除了常规近似动态规划方法中的控制网络. 最后通过两个仿真例子证明了本文提出的最优控制方案的有效性和可行性. 相似文献

3.

基于自适应动态规划的一类带有时滞的离散时间非线性系统的最优控制策略 总被引：4，自引：3，他引：1

魏庆来张化光刘德荣赵琰《自动化学报》2010,36(1):121-129

针对一类状态和控制变量均带有时滞的非线性系统的带有二次性能指标函数最优控制问题, 本文提出了一种基于新的迭代自适应动态规划算法的最优控制方案. 通过引进时滞矩阵函数, 应用动态规划理论, 本文获得了最优控制的显式表达式, 然后通过自适应评判技术获得最优控制量. 本文给出了收敛性证明以保证性能指标函数收敛到最优. 为了实现所提出的算法, 本文采用神经网络近似性能指标函数、计算最优控制策略、求解时滞矩阵函数、以及给非线性系统建模. 最后本文给出了两个仿真例子说明所提出的最优策略的有效性. 相似文献

4.

未知非线性零和博弈最优跟踪的事件触发控制设计

王鼎胡凌治赵明明哈明鸣乔俊飞《自动化学报》2023,49(1):91-101

设计了一种基于事件的迭代自适应评判算法,用于解决一类非仿射系统的零和博弈最优跟踪控制问题.通过数值求解方法得到参考轨迹的稳定控制,进而将未知非线性系统的零和博弈最优跟踪控制问题转化为误差系统的最优调节问题.为了保证闭环系统在具有良好控制性能的基础上有效地提高资源利用率,引入一个合适的事件触发条件来获得阶段性更新的跟踪策略对.然后,根据设计的触发条件,采用Lyapunov方法证明误差系统的渐近稳定性.接着,通过构建四个神经网络,来促进所提算法的实现.为了提高目标轨迹对应稳定控制的精度,采用模型网络直接逼近未知系统函数而不是误差动态系统.构建评判网络、执行网络和扰动网络用于近似迭代代价函数和迭代跟踪策略对.最后,通过两个仿真实例,验证该控制方法的可行性和有效性. 相似文献

5.

基于自适应最优控制的有限时间微分对策制导律 总被引：1，自引：0，他引：1

陈燕妮刘春生孙景亮《控制理论与应用》2019,36(6):877-884

针对固定末端时刻拦截机动目标的制导系统,本文首先构建了非线性有限时间微分对策框架,将导弹拦截非线性系统的最优问题转化为一般非线性系统的最优控制问题,并通过自适应动态规划算法(adaptive dynamic programming, ADP)获得近似最优值函数与最优控制策略.为了有效实现该算法,本文利用一个具有时变权值和激活函数的评价网络来逼近Hamilton-Jacobi-Isaacs(HJI)方程的解,并在线更新.通过李雅普诺夫法来证明本文提出的控制策略可保证闭环微分对策系统稳定性和评价网络权值近似误差的有界性.最后给出一个非线性导弹拦截目标系统的仿真例子验证了该方法的可行性和有效性. 相似文献

6.

自适应动态规划综述 总被引：24，自引：14，他引：10

张化光张欣罗艳红杨珺《自动化学报》2013,39(4):303-311

自适应动态规划(Adaptive dynamic programming, ADP)是最优控制领域新兴起的一种近似最优方法, 是当前国际最优化领域的研究热点. ADP方法利用函数近似结构来近似哈密顿--雅可比--贝尔曼(Hamilton-Jacobi-Bellman, HJB)方程的解, 采用离线迭代或者在线更新的方法, 来获得系统的近似最优控制策略, 从而能够有效地解决非线性系统的优化控制问题. 本文按照ADP的结构变化、算法的发展和应用三个方面介绍ADP方法. 对目前ADP方法的研究成果加以总结, 并对这一研究领域仍需解决的问题和未来的发展方向作了进一步的展望. 相似文献

7.

离散非线性零和博弈的事件驱动最优控制方案

张欣薄迎春崔黎黎《控制理论与应用》2018,35(5):619-626

在求解离散非线性零和博弈问题时,为了在有效降低网络通讯和控制器执行次数的同时保证良好的控制效果,本文提出了一种基于事件驱动机制的最优控制方案.首先,设计了一个采用新型事件驱动阈值的事件驱动条件,并根据贝尔曼最优性原理获得了最优控制对的表达式.为了求解该表达式中的最优值函数,提出了一种单网络值迭代算法.利用一个神经网络构建评价网.设计了新的评价网权值更新规则.通过在评价网、控制策略及扰动策略之间不断迭代,最终获得零和博弈问题的最优值函数和最优控制对.然后,利用Lyapunov稳定性理论证明了闭环系统的稳定性.最后,将该事件驱动最优控制方案应用到了两个仿真例子中,验证了所提方法的有效性. 相似文献

8.

多媒体服务器集群系统节能建模与在线优化

胡晗杨坚朱里越奚宏生《信息与控制》2013,(1):125-131

提出了一种基于马尔可夫切换状态空间控制模型的多媒体服务器集群系统能耗最优控制方法.通过建立多媒体服务器集群的随机控制模型,将能耗最优控制描述为一个带约束的优化问题.结合拉格朗日乘子法和性能势理论,提出了一种在线策略迭代算法.该优化算法通过样本轨道在线寻找最优控制策略,寻找过程不需要精确的系统参数信息.仿真实验证明了该算法的有效性. 相似文献

9.

针对时变轨迹的非线性仿射系统的鲁棒近似最优跟踪控制

屈秋霞罗艳红张化光《控制理论与应用》2016,33(1):77-84

针对非线性连续系统难以跟踪时变轨迹的问题,本文首先通过系统变换引入新的状态变量从而将非线性系统的最优跟踪问题转化为一般非线性时不变系统的最优控制问题,并基于近似动态规划算法(ADP)获得近似最优值函数与最优控制策略.为有效地实现该算法,本文利用评价网与执行网来估计值函数及相应的控制策略,并且在线更新二者.为了消除神经网络近似过程中产生的误差,本文在设计控制器时增加一个鲁棒项;并且通过Lyapunov稳定性定理来证明本文提出的控制策略可保证系统跟踪误差渐近收敛到零,同时也验证在较小的误差范围内,该控制策略能够接近于最优控制策略.最后给出两个时变跟踪轨迹实例来证明该方法的可行性与有效性. 相似文献

10.

基于广义模糊双曲模型的自适应动态规划最优控制设计

张吉烈张化光罗艳红梁洪晶《自动化学报》2013,39(2):142-149

为连续非线性系统提出了一种有效的最优控制设计方法. 广义模糊双曲模型(Generalized fuzzy hyperbolic model, GFHM)首次作为逼近器用来估计 HJB (Hamilton-Jacobi-Bellman)方程的解 (值函数,即它是状态与代价函数之间的映射), 然后,利用该近似解获得最优控制. 本文方法只需要一个GFHM估计值函数. 首先, 阐述了对于连线非线性系统最优控制的设计过程; 然后,证明了逼近误差是一致最终有界的 (Uniformly ultimately bounded, UUB); 最后, 一个数值例子验证了本文方法的有效性. 另一个例子通过与神经网络自适应动态规划的方法作比较, 演示了本文方法的优点. 相似文献

11.

Data‐driven optimal event‐triggered consensus control for unknown nonlinear multiagent systems with control constraints

Huaipin Zhang Ju H. Park Dong Yue Chunxia Dou 《国际强度与非线性控制杂志
》2019,29(14):4828-4844

This paper considers optimal consensus control problem for unknown nonlinear multiagent systems (MASs) subjected to control constraints by utilizing event‐triggered adaptive dynamic programming (ETADP) technique. To deal with the control constraints, we introduce nonquadratic energy consumption functions into performance indices and formulate the Hamilton‐Jacobi‐Bellman (HJB) equations. Then, based on the Bellman's optimality principle, constrained optimal consensus control policies are designed from the HJB equations. In order to implement the ETADP algorithm, the critic networks and action networks are developed to approximate the value functions and consensus control policies respectively based on the measurable system data. Under the event‐triggered control framework, the weights of the critic networks and action networks are only updated at the triggering instants which are decided by the designed adaptive triggered conditions. The Lyapunov method is used to prove that the local neighbor consensus errors and the weight estimation errors of the critic networks and action networks are ultimately bounded. Finally, a numerical example is provided to show the effectiveness of the proposed ETADP method. 相似文献

12.

Neural-based online finite-time optimal tracking control for wheeled mobile robotic system with inequality constraints

Liang Ding Miao Zheng Shu Li Huaiguang Yang Haibo Gao Zongquan Deng 《Asian journal of control》2024,26(1):297-311

In this study, a finite-time online optimal controller was designed for a nonlinear wheeled mobile robotic system (WMRS) with inequality constraints, based on reinforcement learning (RL) neural networks. In addition, an extended cost function, obtained by introducing a penalty function to the original long-time cost function, was proposed to deal with the optimal control problem of the system with inequality constraints. A novel Hamilton-Jacobi-Bellman (HJB) equation containing the constraint conditions was defined to determine the optimal control input. Furthermore, two neural networks (NNs), a critic and an actor NN, were established to approximate the extended cost function and the optimal control input, respectively. The adaptation laws of the critic and actor NN were obtained with the gradient descent method. The semi-global practical finite-time stability (SGPFS) was proved using Lyapunov's stability theory. The tracking error converges to a small region near zero within the constraints in a finite period. Finally, the effectiveness of the proposed optimal controller was verified by a simulation based on a practical wheeled mobile robot model. 相似文献

13.

Observer-based Adaptive Optimal Control for Unknown Singularly Perturbed Nonlinear Systems With Input Constraints

下载免费PDF全文

Zhijun Fu Wenfang Xie Subhash Rakheja Jing Na 《IEEE/CAA Journal of Automatica Sinica》2017,4(1):48-57

This paper introduces an observer-based adaptive optimal control method for unknown singularly perturbed nonlinear systems with input constraints. First, a multi-time scales dynamic neural network (MTSDNN) observer with a novel updating law derived from a properly designed Lyapunov function is proposed to estimate the system states. Then, an adaptive learning rule driven by the critic NN weight error is presented for the critic NN, which is used to approximate the optimal cost function. Finally, the optimal control action is calculated by online solving the Hamilton-Jacobi-Bellman (HJB) equation associated with the MTSDNN observer and critic NN. The stability of the overall closed-loop system consisting of the MTSDNN observer, the critic NN and the optimal control action is proved. The proposed observer-based optimal control approach has an essential advantage that the system dynamics are not needed for implementation, and only the measured input/output data is needed. Moreover, the proposed optimal control design takes the input constraints into consideration and thus can overcome the restriction of actuator saturation. Simulation results are presented to confirm the validity of the investigated approach. 相似文献

14.

基于事件触发的离散 MIMO 系统自适应评判容错控制

王敏黄龙旺杨辰光《自动化学报》2022,48(5):1234-1245

本文针对具有执行器故障的一类离散非线性多输入多输出(Multi-input multi-output, MIMO)系统, 提出了一种基于事件触发的自适应评判容错控制方案. 该控制方案包括评价和执行网络. 在评价网络里, 为了缓解现有的非光滑二值效用函数可能引起的执行网络跳变问题, 利用高斯函数构建了一个光滑的效用函数, 并采用评价网络近似最优性能指标函数. 在执行网络里, 通过变量替换将系统状态的将来信息转化成关于系统当前状态的函数, 并结合事件触发机制设计了最优跟踪控制器. 该控制器引入了动态补偿项, 不仅能够抑制执行器故障对系统性能的影响, 而且能够改善系统的控制性能. 稳定性分析表明所有信号最终一致有界且跟踪误差收敛于原点的有界小邻域内. 数值系统和实际系统的仿真结果验证了该方案的有效性. 相似文献

15.

Event-triggered-based integral reinforcement learning output feedback optimal control for partially unknown constrained-input nonlinear systems

Haoming Zou Guoshan Zhang 《Asian journal of control》2023,25(5):3843-3858

In this paper, an adaptive output feedback event-triggered optimal control algorithm is proposed for partially unknown constrained-input continuous-time nonlinear systems. First, a neural network observer is constructed to estimate unmeasurable state. Next, an event-triggered condition is established, and only when the event-triggered condition is violated will the event be triggered and the state be sampled. Then, an event-triggered-based synchronous integral reinforcement learning (ET-SIRL) control algorithm with critic-actor neural networks (NNs) architecture is proposed to solve the event-triggered Hamilton–Jacobi–Bellman equation under the established event-triggered condition. The critic and actor NNs are used to approximate cost function and optimal event-triggered optimal control law, respectively. Meanwhile, the event-triggered-based closed-loop system state and all the neural network weight estimation errors are uniformly ultimately bounded proved by Lyapunov stability theory, and there is no Zeno behavior. Finally, two numerical examples are presented to show the effectiveness of the proposed ET-SIRL control algorithm. 相似文献

16.

Event-Triggered Optimal Adaptive Control Algorithm for Continuous-Time Nonlinear Systems

下载免费PDF全文

Kyriakos G. Vamvoudakis 《IEEE/CAA Journal of Automatica Sinica》2014,1(3):282-293

This paper proposes a novel optimal adaptive eventtriggered control algorithm for nonlinear continuous-time systems. The goal is to reduce the controller updates, by sampling the state only when an event is triggered to maintain stability and optimality. The online algorithm is implemented based on an actor/critic neural network structure. A critic neural network is used to approximate the cost and an actor neural network is used to approximate the optimal event-triggered controller. Since in the algorithm proposed there are dynamics that exhibit continuous evolutions described by ordinary differential equations and instantaneous jumps or impulses, we will use an impulsive system approach. A Lyapunov stability proof ensures that the closed-loop system is asymptotically stable. Finally, we illustrate the effectiveness of the proposed solution compared to a timetriggered controller. 相似文献

17.

Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming

Derong Liu Yuzhu Huang Ding Wang Qinglai Wei 《International journal of control》2013,86(9):1554-1566

In this paper, an observer-based optimal control scheme is developed for unknown nonlinear systems using adaptive dynamic programming (ADP) algorithm. First, a neural-network (NN) observer is designed to estimate system states. Then, based on the observed states, a neuro-controller is constructed via ADP method to obtain the optimal control. In this design, two NN structures are used: a three-layer NN is used to construct the observer which can be applied to systems with higher degrees of nonlinearity and without a priori knowledge of system dynamics, and a critic NN is employed to approximate the value function. The optimal control law is computed using the critic NN and the observer NN. Uniform ultimate boundedness of the closed-loop system is guaranteed. The actor, critic, and observer structures are all implemented in real-time, continuously and simultaneously. Finally, simulation results are presented to demonstrate the effectiveness of the proposed control scheme. 相似文献

18.

基于ESN的多指标DHP控制策略在污水处理过程中的应用 总被引：1，自引：0，他引：1

乔俊飞薄迎春韩广《自动化学报》2013,39(7):1146-1151

针对污水处理过程(Wastewater treatment process, WWTP)溶解氧(Dissolved oxygen, DO)及硝态氮浓度控制问题, 提出了一种多评价指标的DHP (Dual heuristic dynamic programming)控制策略. 该策略能够降低评价指标的复杂性, 提高评价网络的逼近精度. 采用回声状态网络(Echo state networks, ESNs)实现评价函数及控制策略的逼近, 研究了控制器的在线学习算法. 实验表明, 该策略在控制性能上优于单评价指标的DHP策略及常规PID控制策略. 相似文献