首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 218 毫秒
1.
由于强化学习算法动作策略学习比较费时,提出一种基于状态回溯的启发式强化学习方法.分析强化学习过程中重复状态,通过比较状态回溯过程中重复动作的选择策略,引入代价函数描述重复动作的重要性.结合动作奖赏及动作代价提出一种新的启发函数定义.该启发函数在强调动作重要性以加快学习速度的同时,基于代价函数计算动作选择的代价以减少不必要的探索,从而平稳地提高学习效率.对基于代价函数的动作选择策略进行证明.建立两种仿真场景,将算法用于机器人路径规划的仿真实验.实验结果表明基于状态回溯的启发式强化学习方法能平衡考虑获得的奖赏及付出的代价,有效提高Q学习的收敛速度.  相似文献   

2.
赵中原  陈刚 《控制与决策》2019,34(8):1635-1644
针对多智能体系统中等式约束下的二次凸优化问题,给出一种事件驱动机制下的分布式优化算法.该算法可以降低每个智能体控制协议的更新频率以及智能体之间的通信负担.基于图论和李雅普诺夫函数方法给出两种不同的事件触发条件,其中第2种事件触发条件不需要拉普拉斯矩阵的最大特征根的信息,可实现算法全分布式实施.两种事件触发条件均可实现算法渐近收敛到优化值,避免智能体控制协议的连续更新以及智能体之间的连续通信,同时保证每个智能体相邻事件触发时刻的时间间隔大于0,避免持续事件触发.将所提出的算法应用于Matlab仿真环境中进行仿真验证,仿真结果验证了所提出算法的有效性.  相似文献   

3.
针对一类非线性零和微分对策问题,本文提出了一种事件触发自适应动态规划(event-triggered adaptive dynamic programming,ET--ADP)算法在线求解其鞍点.首先,提出一个新的自适应事件触发条件.然后,利用一个输入为采样数据的神经网络(评价网络)近似最优值函数,并设计了新型的神经网络权值更新律使得值函数、控制策略及扰动策略仅在事件触发时刻同步更新.进一步地,利用Lyapunov稳定性理论证明了所提出的算法能够在线获得非线性零和微分对策的鞍点且不会引起Zeno行为.所提出的ET--ADP算法仅在事件触发条件满足时才更新值函数、控制策略和扰动策略,因而可有效减少计算量和降低网络负荷.最后,两个仿真例子验证了所提出的ET--ADP算法的有效性.  相似文献   

4.
针对部分系统存在输入约束和不可测状态的最优控制问题,本文将强化学习中基于执行–评价结构的近似最优算法与反步法相结合,提出了一种最优跟踪控制策略.首先,利用神经网络构造非线性观测器估计系统的不可测状态.然后,设计一种非二次型效用函数解决系统的输入约束问题.相比现有的最优方法,本文提出的最优跟踪控制方法不仅具有反步法在处理...  相似文献   

5.
设计了一种基于事件的迭代自适应评判算法,用于解决一类非仿射系统的零和博弈最优跟踪控制问题.通过数值求解方法得到参考轨迹的稳定控制,进而将未知非线性系统的零和博弈最优跟踪控制问题转化为误差系统的最优调节问题.为了保证闭环系统在具有良好控制性能的基础上有效地提高资源利用率,引入一个合适的事件触发条件来获得阶段性更新的跟踪策略对.然后,根据设计的触发条件,采用Lyapunov方法证明误差系统的渐近稳定性.接着,通过构建四个神经网络,来促进所提算法的实现.为了提高目标轨迹对应稳定控制的精度,采用模型网络直接逼近未知系统函数而不是误差动态系统.构建评判网络、执行网络和扰动网络用于近似迭代代价函数和迭代跟踪策略对.最后,通过两个仿真实例,验证该控制方法的可行性和有效性.  相似文献   

6.
障碍规避是无人机等自主无人系统运动规划的重要环节,其核心是设计有效的避障控制方法.为了进一步提高决策优化性和控制效果,本文在最优控制的设定下,提出一种基于强化学习的自主避障控制方法,以自适应方式在线生成安全运行轨迹.首先,利用障碍函数法在代价函数中设计了一个光滑的奖惩函数,从而将避障问题转换为一个无约束的最优控制问题.然后,利用行为–评价神经网络和策略迭代法实现了自适应强化学习,其中评价网络利用状态跟随核函数逼近代价函数,行为网络给出近似最优的控制策略;同时,通过状态外推法获得模拟经验,使得评价网络能利用经验回放实现可靠的局部探索.最后,在简化的无人机系统和非线性数值系统上进行了仿真实验与方法对比,结果表明,提出的避障控制方法能实时生成较优的安全运行轨迹.  相似文献   

7.
跳变约束下马尔可夫切换非线性系统滤波   总被引:1,自引:0,他引:1  
针对系统状态演化多模不确定性和状态约束多样性,本文提出了跳变约束下马尔可夫切换非线性系统的交互式多假设估计方法.定义了包含跳变马尔可夫参数可能取值的假设集,根据最优贝叶斯滤波,推导出状态与假设的后验概率递推更新.基于统计线性回归线性化非线性函数,利用伪量测法,将线性化的约束扩维到真实量测中,给出了非线性系统滤波的近似解析最优解.最终给出所提算法的稀疏网格积分近似最优估计实现.在交叉道路机动目标跟踪仿真场景中,所提算法的滤波精度优于基于泰勒展开的交互式多模型算法,基于统计线性回归的交互式多模型算法,以及基于泰勒展开的非线性系统约束滤波算法.  相似文献   

8.
研究输出受限磁悬浮系统基于事件触发机制的有限时间稳定控制问题.借助具有时变阈值的有限时间触发策略和正切型障碍函数,设计一个新的事件触发控制器,不仅确保磁悬浮系统的悬浮气隙被限制在一个指定范围,其状态还在有限时间内收敛到原点.创新之处在于实现有输出约束和无输出约束控制设计与理论分析的统一,最后通过仿真表明控制策略的有效性.  相似文献   

9.
霍煜  王鼎  乔俊飞 《控制与决策》2023,38(11):3066-3074
针对一类具有不确定性的连续时间非线性系统,提出一种基于单网络评判学习的鲁棒跟踪控制方法.首先建立由跟踪误差与参考轨迹构成的增广系统,将鲁棒跟踪控制问题转换为镇定设计问题.通过采用带有折扣因子和特殊效用项的代价函数,将鲁棒镇定问题转换为最优控制问题.然后,通过构建评判神经网络对最优代价函数进行估计,进而得到最优跟踪控制算法.为了放松该算法的初始容许控制条件,在评判神经网络权值更新律中增加一个额外项.利用Lyapunov方法证明闭环系统的稳定性及鲁棒跟踪性能.最后,通过仿真结果验证该方法的有效性和适用性.  相似文献   

10.
李金娜  尹子轩 《控制与决策》2019,34(11):2343-2349
针对具有数据包丢失的网络化控制系统跟踪控制问题,提出一种非策略Q-学习方法,完全利用可测数据,在系统模型参数未知并且网络通信存在数据丢失的情况下,实现系统以近似最优的方式跟踪目标.首先,刻画具有数据包丢失的网络控制系统,提出线性离散网络控制系统跟踪控制问题;然后,设计一个Smith预测器补偿数据包丢失对网络控制系统性能的影响,构建具有数据包丢失补偿的网络控制系统最优跟踪控制问题;最后,融合动态规划和强化学习方法,提出一种非策略Q-学习算法.算法的优点是:不要求系统模型参数已知,利用网络控制系统可测数据,学习基于预测器状态反馈的最优跟踪控制策略;并且该算法能够保证基于Q-函数的迭代Bellman方程解的无偏性.通过仿真验证所提方法的有效性.  相似文献   

11.
This article proposes three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems. We first propose a model-based time-varying policy iteration algorithm. The method considers time-varying solutions to the Hamiltonian–Jacobi–Bellman equation for finite-horizon optimal control. Based on this algorithm, value function approximation is applied to the Bellman equation by establishing neural networks with time-varying weights. A novel update law for time-varying weights is put forward based on the idea of iterative learning control, which obtains optimal solutions more efficiently compared to previous works. Considering that system models may be unknown in real applications, we propose a partially model-free time-varying policy iteration algorithm that applies integral reinforcement learning to acquiring the time-varying value function. Moreover, analysis of convergence, stability, and optimality is provided for every algorithm. Finally, simulations for different cases are given to verify the convenience and effectiveness of the proposed algorithms.  相似文献   

12.
于镝 《控制理论与应用》2020,37(9):1963-1970
针对输入受限的受扰多智能体网络, 提出具有领航层、估计层、控制层和跟随者层的新型鲁棒包容控制方 案. 首先, 设计有限时间估值器获得跟随者的期望状态, 然后基于包容误差引入非均方折扣代价函数, 从而将鲁棒包 容控制问题转换成受限最优控制问题. 并应用Laypunov拓展原理证明得到的最优控制策略使得网络实现一致最终 有界稳定. 在系统动态完全未知的情况下, 采用提出的积分增强学习算法和执行器–评价器结构, 在线得到近似最 优控制策略. 仿真结果验证了理论方案的有效性和可行性.  相似文献   

13.
This paper investigates finite-time adaptive neural tracking control for a class of nonlinear time-delay systems subject to the actuator delay and full-state constraints. The difficulty is to consider full-state time delays and full-state constraints in finite-time control design. First, finite-time control method is used to achieve fast transient performances, and new Lyapunov–Krasovskii functionals are appropriately constructed to compensate time delays, in which a predictor-like term is utilized to transform input delayed systems into delay-free systems. Second, neural networks are utilized to deal with the unknown functions, the Gaussian error function is used to express the continuously differentiable asymmetric saturation nonlinearity, and barrier Lyapunov functions are employed to guarantee that full-state signals are restricted within certain fixed bounds. At last, based on finite-time stability theory and Lyapunov stability theory, the finite-time tracking control question involved in full-state constraints is solved, and the designed control scheme reduces learning parameters. It is shown that the presented neural controller ensures that all closed-loop signals are bounded and the tracking error converges to a small neighbourhood of the origin in a finite time. The simulation studies are provided to further illustrate the effectiveness of the proposed approach.  相似文献   

14.
《Automatica》2014,50(12):3281-3290
This paper addresses the model-free nonlinear optimal control problem based on data by introducing the reinforcement learning (RL) technique. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton–Jacobi–Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most practical systems are too complicated to establish an accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than a system model. Firstly, a model-free policy iteration algorithm is derived and its convergence is proved. The implementation of the algorithm is based on the actor–critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The data-based API is an off-policy RL method, where the “exploration” is improved by arbitrarily sampling data on the state and input domain. Finally, we test the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.  相似文献   

15.
本文主要解决具有约束的二阶系统的固定时间跟踪控制问题,分别讨论了具有输出约束与具有全状态约束的控制算法设计.首先,为解决输出约束下的固定时间控制问题,本文构建了具有输出约束的新型终端滑模变量,并设计了具有扰动抑制能力的固定时间滑模控制律,保证系统输出始终满足约束条件,同时跟踪误差在固定时间内收敛到原点的充分小的邻域内.进一步,为了处理具有全状态约束的控制问题,本文构建了具有全状态约束的终端滑模变量并设计了相应的固定时间滑模控制律.鉴于系统控制律的不连续性,文章采用非光滑分析及Lyapunov稳定性理论证明了闭环控制系统的稳定性.最后,在数值仿真中,将本文提出的方法与传统固定时间滑模方法进行对比,验证了所建立算法的有效性.  相似文献   

16.
季政  楼旭阳  吴炜 《控制与决策》2021,36(1):97-104
提出一种输入约束下一类连续时间非线性系统最优跟踪控制问题的近似求解方法.针对有限时间跟踪性能指标下一类单输入单输出非线性系统,利用所提出的最优跟踪控制方法实现目标系统所对应性能指标近似最优.首先将系统的性能指标沿时间泰勒展开,得到一个近似的性能指标;其次,在系统状态可观测条件下,将该问题进一步转化为以控制输入为决策变量的非线性规划问题;再次,利用神经动态优化方法,求解含不等式约束下的近似最优控制问题并给出相应的递归神经网络模块原理图;进而,针对整个闭环系统进行理论分析,证明在一定条件下闭环系统的稳定性;最后,通过两个实例仿真验证所提出方法的有效性.  相似文献   

17.
路径积分方法源于随机最优控制,是一种数值迭代方法,可求解连续非线性系统的最优控制问题,不依赖于系统模型,快速收敛.文中将基于路径积分强化学习的策略改善方法用于蛇形机器人的目标导向运动.使用路径积分强化学习方法学习蛇形机器人步态方程的参数,不仅可以在仿真环境下使蛇形机器人规避障碍到达目标点,利用仿真环境的先验知识也能在实际环境下快速完成相同的任务.实验结果验证方法的正确性.  相似文献   

18.
This paper provides an overview of the reinforcement learning and optimal adaptive control literature and its application to robotics. Reinforcement learning is bridging the gap between traditional optimal control, adaptive control and bio-inspired learning techniques borrowed from animals. This work is highlighting some of the key techniques presented by well known researchers from the combined areas of reinforcement learning and optimal control theory. At the end, an example of an implementation of a novel model-free Q-learning based discrete optimal adaptive controller for a humanoid robot arm is presented. The controller uses a novel adaptive dynamic programming (ADP) reinforcement learning (RL) approach to develop an optimal policy on-line. The RL joint space tracking controller was implemented for two links (shoulder flexion and elbow flexion joints) of the arm of the humanoid Bristol-Elumotion-Robotic-Torso II (BERT II) torso. The constrained case (joint limits) of the RL scheme was tested for a single link (elbow flexion) of the BERT II arm by modifying the cost function to deal with the extra nonlinearity due to the joint constraints.  相似文献   

19.
In this article, an optimal command-filtered backstepping control approach is proposed for uncertain strict-feedback nonlinear multi-agent systems (MASs) including output constraints and unmodeled dynamics. One-to-one nonlinear mapping (NM) is utilized to recast constrained systems as corresponding unrestricted systems. A dynamical signal is applied to cope with unmodeled dynamics. Based on dynamic surface control (DSC), the feedforward controller is designed by introducing error compensating signals. The optimal feedback controller is produced applying adaptive dynamic programming (ADP) and integral reinforcement learning (IRL) techniques in which neural networks are utilized to approximate the relevant cost functions online with established weight updating laws. Therefore, the entire controller, including feedforward and feedback controllers, not only ensures that all signals in the closed-loop systems are cooperative semi-globally uniformly ultimately bounded (SGUUB) and the outputs maintain in the provided time-varying constraints, but also makes sure that the cost functions achieve minimization. A simulation example is presented to illustrate the feasibility of the proposed control algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号