期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Two-loop reinforcement learning algorithm for finite-horizon optimal control of continuous-time affine nonlinear systems

Zhe Chen Wenqian Xue Ning Li Frank L. Lewis 《国际强度与非线性控制杂志
》2022,32(1):393-420

This article proposes three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems. We first propose a model-based time-varying policy iteration algorithm. The method considers time-varying solutions to the Hamiltonian–Jacobi–Bellman equation for finite-horizon optimal control. Based on this algorithm, value function approximation is applied to the Bellman equation by establishing neural networks with time-varying weights. A novel update law for time-varying weights is put forward based on the idea of iterative learning control, which obtains optimal solutions more efficiently compared to previous works. Considering that system models may be unknown in real applications, we propose a partially model-free time-varying policy iteration algorithm that applies integral reinforcement learning to acquiring the time-varying value function. Moreover, analysis of convergence, stability, and optimality is provided for every algorithm. Finally, simulations for different cases are given to verify the convenience and effectiveness of the proposed algorithms. 相似文献

2.

Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays

Qinglai Wei Ding Wang Dehua Zhang 《Neural computing & applications》2013,23(7-8):1851-1863

In this paper, a new dual iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for a class of nonlinear systems with time-delays in state and control variables. The idea is to use the dynamic programming theory to solve the expressions of the optimal performance index function and control. Then, the dual iterative ADP algorithm is introduced to obtain the optimal solutions iteratively, where in each iteration, the performance index function and the system states are both updated. Convergence analysis is presented to prove the performance index function to reach the optimum by the proposed method. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the dual iterative ADP algorithm. Simulation examples are given to demonstrate the validity of the proposed optimal control scheme. 相似文献

3.

Optimal control of Takagi–Sugeno fuzzy-model-based systems representing dynamic ship positioning systems

Wen-Hsien Ho Shinn-Horng Chen Jyh-Horng Chou 《Applied Soft Computing》2013,13(7):3197-3210

Orthogonal function approach (OFA) and the hybrid Taguchi-genetic algorithm (HTGA) are used to solve quadratic finite-horizon optimal controller design problems in both a fuzzy parallel distributed compensation (PDC) controller and a non-PDC controller (linear state feedback controller) for Takagi–Sugeno (TS) fuzzy-model-based control systems for dynamic ship positioning systems (TS-DSPS). Based on the OFA, an algorithm requiring only algebraic computation is used to solve dynamic equations for TS-fuzzy-model-based feedback and is then integrated with HTGA to design quadratic finite-horizon optimal controllers for TS-DSPS under the criterion of minimizing a quadratic finite-horizon integral performance index, which is also converted to algebraic form by the OFA. Integration of OFA and HTGA in the proposed approach enables use of simple algebraic computation and is well adapted to the computer implementation. Therefore, it facilitates design tasks of quadratic finite-horizon optimal controllers for the TS-DSPS. The applicability of the proposed approach is demonstrated in the example of a moored tanker designed using quadratic finite-horizon optimal controllers. 相似文献

4.

Neural-network-based approach to finite-time optimal control for a class of unknown nonlinear systems

Ruizhuo Song Wendong Xiao Qinglai Wei Changyin Sun 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(8):1645-1653

This paper proposes a novel finite-time optimal control method based on input–output data for unknown nonlinear systems using adaptive dynamic programming (ADP) algorithm. In this method, the single-hidden layer feed-forward network (SLFN) with extreme learning machine (ELM) is used to construct the data-based identifier of the unknown system dynamics. Based on the data-based identifier, the finite-time optimal control method is established by ADP algorithm. Two other SLFNs with ELM are used in ADP method to facilitate the implementation of the iterative algorithm, which aim to approximate the performance index function and the optimal control law at each iteration, respectively. A simulation example is provided to demonstrate the effectiveness of the proposed control scheme. 相似文献

5.

Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems

Qinglai Wei Derong Liu 《Neural computing & applications》2014,24(6):1355-1367

In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. When the iterative control law and iterative performance index function in each iteration cannot be accurately obtained, it is shown that the iterative controls can make the performance index function converge to within a finite error bound of the optimal performance index function. Stability properties are presented to show that the system can be stabilized under the iterative control law which makes the present iterative ADP algorithm feasible for implementation both on-line and off-line. Neural networks are used to approximate the iterative performance index function and compute the iterative control policy, respectively, to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method. 相似文献

6.

Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach

Qinglai Wei Derong Liu Yancai Xu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2016,20(2):697-706

In this paper, a novel value iteration adaptive dynamic programming (ADP) algorithm, called “generalized value iteration ADP” algorithm, is developed to solve infinite horizon optimal tracking control problems for a class of discrete-time nonlinear systems. The developed generalized value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. Convergence property is developed to guarantee that the iterative performance index function will converge to the optimum. Neural networks are used to approximate the iterative performance index function and compute the iterative control policy, respectively, to implement the iterative ADP algorithm. Finally, a simulation example is given to illustrate the performance of the developed algorithm. 相似文献

7.

基于改进粒子群算法的PID控制参数优化

张继荣张天《计算机工程与设计》2020,41(4):1035-1040

针对粒子群优化算法(particle swarm optimization algorithm,PSO)后期易陷入局部最优解这一缺陷,提出一种惯性权重余弦调整的粒子群优化算法(IWCPSO)。在迭代过程中对惯性权重引入余弦变化,改善迭代后期的不足,提高算法的精度。在matlab 2016仿真环境下,与Ziegler-Nichols(ZN)公式法和惯性权重正弦调整的粒子群优化算法(SIPSO)在PID控制参数优化方面的应用效果对比得出该算法是一种使得PID控制系统响应函数性能指标更好,整定结果更精确的算法。相似文献

8.

Nonrepetitive trajectory tracking for nonlinear autonomous agents with asymmetric output constraints using parametric iterative learning control

Xu Jin 《国际强度与非线性控制杂志
》2019,29(6):1941-1955

In this paper, we present a novel parametric iterative learning control (ILC) algorithm to deal with trajectory tracking problems for a class of nonlinear autonomous agents that are subject to actuator faults. Unlike most of the ILC literature, the desired trajectories in this work can be iteration dependent, and the initial position of the agent in each iteration can be random. Both parametric and nonparametric system unknowns and uncertainties, in particular the control input gain functions that are not fully known, are considered. A new type of universal barrier functions is proposed to guarantee the satisfaction of asymmetric constraint requirements, feasibility of the controller, and prescribed tracking performance. We show that under the proposed algorithm, the distance and angle tracking errors can uniformly converge to an arbitrarily small positive number and zero, respectively, over the iteration domain, beyond a small user‐prescribed initial time interval in each iteration. A numerical simulation is presented in the end to demonstrate the efficacy of the proposed algorithm. 相似文献

9.

基于自适应动态规划的一类带有时滞的离散时间非线性系统的最优控制策略 总被引：4，自引：3，他引：1

魏庆来张化光刘德荣赵琰《自动化学报》2010,36(1):121-129

针对一类状态和控制变量均带有时滞的非线性系统的带有二次性能指标函数最优控制问题, 本文提出了一种基于新的迭代自适应动态规划算法的最优控制方案. 通过引进时滞矩阵函数, 应用动态规划理论, 本文获得了最优控制的显式表达式, 然后通过自适应评判技术获得最优控制量. 本文给出了收敛性证明以保证性能指标函数收敛到最优. 为了实现所提出的算法, 本文采用神经网络近似性能指标函数、计算最优控制策略、求解时滞矩阵函数、以及给非线性系统建模. 最后本文给出了两个仿真例子说明所提出的最优策略的有效性. 相似文献

10.

Fuzzy adaptive dynamic programming-based optimal leader-following consensus for heterogeneous nonlinear multi-agent systems

Cai Yuliang Zhang Huaguang Zhang Kun Liu Chong 《Neural computing & applications》2020,32(13):8763-8781

In this paper, a novel online iterative scheme, based on fuzzy adaptive dynamic programming, is proposed for distributed optimal leader-following consensus of heterogeneous nonlinear multi-agent systems under directed communication graph. This scheme combines game theory, adaptive dynamic programming together with generalized fuzzy hyperbolic model (GFHM). Firstly, based on precompensation technique, an appropriate model transformation is proposed to convert the error system into augmented error system, and an exquisite performance index function is defined for this system. Secondly, on the basis of Hamilton–Jacobi–Bellman (HJB) equation, the optimal consensus control is designed and a novel policy iteration (PI) algorithm is put forward to learn the solutions of the HJB equation online. Here, the proposed PI algorithm is implemented on account of GFHMs. Compared with dual-network model including critic network and action network, the proposed scheme only requires critic network. Thirdly, the augmented consensus error of each agent and the weight estimation error of each GFHM are proved to be uniformly ultimately bounded, and the stability of our method has been verified. Finally, some numerical examples and application examples are conducted to demonstrate the effectiveness of the theoretical results.

相似文献