首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose two new actor-critic algorithms for reinforcement learning. Both algorithms use local linear regression (LLR) to learn approximations of the functions involved. A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning. The first algorithm uses a novel model-based update rule for the actor parameters. The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model. The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm.  相似文献   

2.
In this paper, we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous‐time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data‐based approach to the solution of the Hamilton–Jacobi–Bellman equation, and it does not require explicit knowledge on the system's drift dynamics. A novel adaptive control algorithm is given that is based on policy iteration and implemented using an actor/critic structure having two adaptive approximator structures. Both actor and critic approximation networks are adapted simultaneously. A persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel adaptive control tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed loop dynamical stability. The approximate convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
The two‐player zero‐sum (ZS) game problem provides the solution to the bounded L2‐gain problem and so is important for robust control. However, its solution depends on solving a design Hamilton–Jacobi–Isaacs (HJI) equation, which is generally intractable for nonlinear systems. In this paper, we present an online adaptive learning algorithm based on policy iteration to solve the continuous‐time two‐player ZS game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real time an approximate local solution to the game HJI equation. This method finds, in real time, suitable approximations of the optimal value and the saddle point feedback control policy and disturbance policy, while also guaranteeing closed‐loop stability. The adaptive algorithm is implemented as an actor/critic/disturbance structure that involves simultaneous continuous‐time adaptation of critic, actor, and disturbance neural networks. We call this online gaming algorithm ‘synchronous’ ZS game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor, and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm in solving the HJI equation online for a linear system and a complex nonlinear system. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

4.
Reinforcement learning (RL) is an effective method for the design of robust controllers of unknown nonlinear systems. Normal RLs for robust control, such as actor‐critic (AC) algorithms, depend on the estimation accuracy. Uncertainty in the worst case requires a large state‐action space, this causes overestimation and computational problems. In this article, the RL method is modified with the k‐nearest neighbor and the double Q‐learning algorithm. The modified RL does not need the neural estimator as AC and can stabilize the unknown nonlinear system under the worst‐case uncertainty. The convergence property of the proposed RL method is analyzed. The simulations and the experimental results show that our modified RLs are much more robust compared with the classic controllers, such as the proportional‐integral‐derivative, the sliding mode, and the optimal linear quadratic regulator controllers.  相似文献   

5.
We introduce the novel concept of knowledge states. The knowledge state approach can be used to construct competitive randomized online algorithms and study the trade-off between competitiveness and memory. Many well-known algorithms can be viewed as knowledge state algorithms. A knowledge state consists of a distribution of states for the algorithm, together with a work function which approximates the conditional obligations of the adversary. When a knowledge state algorithm receives a request, it then calculates one or more “subsequent” knowledge states, together with a probability of transition to each. The algorithm uses randomization to select one of those subsequents to be the new knowledge state. We apply this method to randomized k-paging. The optimal minimum competitiveness of any randomized online algorithm for the k-paging problem is the kth harmonic number, \(H_{k}=\sum^{k}_{i=1}\frac{1}{i}\). Existing algorithms which achieve that optimal competitiveness must keep bookmarks, i.e., memory of the names of pages not in the cache. An H k -competitive randomized algorithm for that problem which uses O(k) bookmarks is presented, settling an open question by Borodin and El-Yaniv. In the special cases where k=2 and k=3, solutions are given using only one and two bookmarks, respectively.  相似文献   

6.
In this paper, we propose a new algorithm of an adaptive actor-critic method with multi-step simulated experiences, as a kind of temporal difference (TD) method. In our approach, the TD-error is composed of two value- functions and m utility functions, where m denotes the number of multi-steps in which the experience should be simulated. The value-function is constructed from the critic formulated by a radial basis function neural network (RBFNN), which has a simulated experience as an input, generated from a predictive model based on a kinematic model. Thus, since our approach assumes that the model is available to simulate the m-step experiences and to design a controller, such a kinematic model is also applied to construct the actor and the resultant model based actor (MBA) is also regarded as a network, i.e., it is just viewed as a resolved velocity control network. We implement this approach to control nonholonomic mobile robot, especially in a trajectory tracking control problem for the position coordinates and azimuth. Some simulations show the effectiveness of the proposed method for controlling a mobile robot with two-independent driving wheels.  相似文献   

7.
采用基于径向基神经网络(RBFNN)模型的非线性模型预测控制方法,被控对象选择火花塞点火(SI)发动机的空燃比(AFR)高度非线性复杂系统,利用渐消记忆最小二乘法实现基于RBFNN的SI发动机AFR系统建模以及参数在线自适应更新。针对非线性模型预测控制中寻优问题,运用序列二次规划滤子算法对最优控制序列进行求解,并加入滤子技术避免了罚函数的使用。在相同的实验环境下,与PI控制算法和Volterra模型预测控制方法进行仿真对比实验,结果表明,所提算法的控制效果明显优于其他两种方法。  相似文献   

8.
This paper finds the appropriate pi-coefficients for a parameter estimation adaptive system and uses them to analyze the stability of two estimation algorithms. The estimation error dynamics of the system are modeled by a linear time-invariant subsystem and a nonlinear time-varying update law in a feedback loop. Then the so-called max-p problems are formulated and solved to obtain the pi-coefficients for the linear subsystem and nonlinear update low. For the investigated system, the quantitative results show that the least-squares update algorithm has larger stability range than that of the gradient algorithm, and the σ-modification scheme gives larger stability ranges for both algorithms.  相似文献   

9.
李悄然  丁进良 《控制与决策》2022,37(8):1989-1996
为了解决深度确定性策略梯度算法探索能力不足的问题,提出一种多动作并行异步深度确定性策略梯度(MPADDPG)算法,并用于选矿运行指标强化学习决策.该算法使用多个actor网络,进行不同的初始化和训练,不同程度地提升了探索能力,同时通过扩展具有确定性策略梯度结构的评论家体系,揭示了探索与利用之间的关系.该算法使用多个DDPG代替单一DDPG,可以减轻一个DDPG性能不佳的影响,提高学习稳定性;同时通过使用并行异步结构,提高数据利用效率,加快了网络收敛速度;最后, actor通过影响critic的更新而得到更好的策略梯度.通过选矿过程运行指标决策的实验结果验证了所提出算法的有效性.  相似文献   

10.
A Greedy EM Algorithm for Gaussian Mixture Learning   总被引:7,自引:0,他引:7  
Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get trapped in one of the many local maxima of the likelihood function. In this paper we propose a greedy algorithm for learning a Gaussian mixture which tries to overcome these limitations. In particular, starting with a single component and adding components sequentially until a maximum number k, the algorithm is capable of achieving solutions superior to EM with k components in terms of the likelihood of a test set. The algorithm is based on recent theoretical results on incremental mixture density estimation, and uses a combination of global and local search each time a new component is added to the mixture. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

11.
《Automatica》2014,50(12):3281-3290
This paper addresses the model-free nonlinear optimal control problem based on data by introducing the reinforcement learning (RL) technique. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton–Jacobi–Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most practical systems are too complicated to establish an accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than a system model. Firstly, a model-free policy iteration algorithm is derived and its convergence is proved. The implementation of the algorithm is based on the actor–critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The data-based API is an off-policy RL method, where the “exploration” is improved by arbitrarily sampling data on the state and input domain. Finally, we test the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.  相似文献   

12.
Parallel updates of minimum spanning trees (MSTs) have been studied in the past. These updates allowed a single change in the underlying graph, such as a change in the cost of an edge or an insertion of a new vertex. Multiple update problems for MSTs are concerned with handling more than one such change. In the sequential case multiple update problems may be solved using repeated applications of an efficient algorithm for a single update. However, for efficiency reasons, parallel algorithms for multiple update problems must consider all changes to the underlying graph simultaneously. In this paper we describe parallel algorithms for updating an MST whenk new vertices are inserted or deleted in the underlying graph, when the costs ofk edges are changed, or whenk edge insertions and deletions are performed. For multiple vertex insertion update, our algorithm achieves time and processor bounds ofO(log n·logk) and nk/(logn·logk), respectively, on a CREW parallel random access machine. These bounds are optimal for dense graphs. A novel feature of this algorithm is a transformation of the previous MST andk new vertices to a bipartite graph which enables us to obtain the above-mentioned bounds.  相似文献   

13.
This article tries to handle the alignment initial condition for contraction mapping based iterative learning control, such that the system can operate continuously without any initial condition reset. This goal is achieved for a class of nonlinear systems through the proposed conditional learning control, which has several advantages over the alternative method, adaptive learning control. The conditional learning control guarantees that sufficient knowledge can be learned to update the input and achieve perfect output tracking, despite the non-identical initial conditions. The sufficient conditions of either monotonic or strictly monotonic convergence of the input sequence, and the choice of learning gains are given. The performance of the proposed method is illustrated by simulated examples.  相似文献   

14.
异常检测是机器学习与数据挖掘的热点研究领域之一, 主要应用于故障诊断、入侵检测、欺诈检测等领域. 当前已有很多有效的相关研究工作, 特别是基于隔离森林的异常检测方法, 但在处理高维数据时仍然存在许多困难. 提出了一种新的k近邻隔离森林的异常检算法: k-nearest neighbor based isolation forest (KNIF). 该方法采用超球体作为隔离工具, 利用第k近邻的方法来构建隔离森林, 并构建基于距离的异常值计算方法. 通过充分实验表明KNIF方法能有效地进行复杂分布环境下的异常检测, 并能适应不同分布形式的应用场景.  相似文献   

15.
In this paper a constrained nonlinear predictive control algorithm, that uses the artificial bee colony (ABC) algorithm to solve the optimization problem, is proposed. The main objective is to derive a simple and efficient control algorithm that can solve the nonlinear constrained optimization problem with minimal computational time. Indeed, a modified version, enhancing the exploring and the exploitation capabilities, of the ABC algorithm is proposed and used to design a nonlinear constrained predictive controller. This version allows addressing the premature and the slow convergence drawbacks of the standard ABC algorithm, using a modified search equation, a well-known organized distribution mechanism for the initial population and a new equation for the limit parameter. A convergence statistical analysis of the proposed algorithm, using some well-known benchmark functions is presented and compared with several other variants of the ABC algorithm. To demonstrate the efficiency of the proposed algorithm in solving engineering problems, the constrained nonlinear predictive control of the model of a Multi-Input Multi-Output industrial boiler is considered. The control performances of the proposed ABC algorithm-based controller are also compared to those obtained using some variants of the ABC algorithms.  相似文献   

16.
In this paper, we propose an actor/critic algorithm combined with goal-directed reasoning. The actor/critic algorithm has been considered to be a model of the basal ganglia in the brain. However, the basal ganglia also appear to contribute to goal-directed reasoning. Thus, we study goal-directed reasoning in the framework of the actor/critic algorithm and discuss its neural substrates. First, since goal-directed reasoning is realized by repeatedly setting subgoals, we consider setting a subgoal to be anaction, and incorporated it into the actor/critic algorithm. Next, two additional mechanisms, the rejection of bad subgoals and double learning, are introduced to improve the performance of the new algorithm. As a consequence, goal-directed reasoning is successfully combined with the actor/critic algorithm, and the performance of the actor/critic algorithm is improved by this combination. It is also shown that a hierarchical control structure appears during the learning process and disappears after the learning has been repeated many times.  相似文献   

17.
In this paper, a new modular design technique for globally and practically adaptive output tracking of high-order lower-triangular nonlinear systems is proposed. This technique is not based on certainty equivalence principle and completely uses feedback domination method for these linearly parameterized systems. Contrary to the methods based on adding a power integrator technique, for adaptive control of high-order lower-triangular nonlinear systems, in which the choice of a parameter update law is limited to a Lyapunov-type algorithm, the present method does not have this restriction and uses the swapping identifier as its parameter update law. The modularity of designing the controller and the identifier in this method, which relies on control design using feedback domination approach, is completely different from modular design in Immersion and invariance (I&I) based method, which relies on identifier design and desired features of parameter identification. Finally an example illustrates the feasibility and efficiency of the proposed method.  相似文献   

18.
V. Milenkovic 《Algorithmica》1997,19(1-2):183-218
We present exact algorithms for finding a solution to the two-dimensional translational containment problem: find translations for k polygons which place them inside a polygonal container without overlapping. The term kCN denotes the version in which the polygons are convex and the container is nonconvex, and the term kNN denotes the version in which the polygons and the container are nonconvex. The notation (r,k)CN, (r,k)NN, and so forth refers to the problem of finding all subsets of size k out of r objects that can be placed in a container. The polygons have up to m vertices, and the container has n vertices, where n is usually much larger than m. We present exact algorithms for the following: 2CN in time, (r,2)CN in time (for ), 3CN in time, kCN in or time, and kNN in time, where LP(a,b) is the time to solve a linear program with a variables and b constraints. All these results are improvements on previously known running times except for the last. The algorithm for kNN is slower asymptotically than the naive algorithm, but is expected to be much faster in practice. The algorithm for 2CN is based on the use of separating line orientations as a means of characterizing the solution. The solution to 3CN also uses a separating line orientation characterization leading to a simple and robust ``carrousel' algorithm. The kCN algorithm uses the idea of disassembling the layout to the left. Finally, the kNN algorithm uses the concept of subdivision trees and linear programming. Received July 11, 1994; revised August 22, 1995, and February 26, 1996.  相似文献   

19.
基于故障跟踪估计器的非线性时滞系统故障诊断   总被引:4,自引:0,他引:4  
提出一种可有效检测和估计一类非线性时滞系统故障的故障跟踪估计器.根据预测控制和迭代学习控制的思想,在所选取的优化时域长度内,通过迭代算法调节故障跟踪估计器中的可调参数,使之逼近系统中实际发生的故障.与以往基于观测器的故障诊断方法不同的是,故障跟踪估计器可同时检测和估计系统中发生的故障,而且针对不同类型的故障亦有很好的适应性.仿真结果表明了所提出算法的可行性和有效性.  相似文献   

20.
唐诗淇  文益民  秦一休 《软件学报》2017,28(11):2940-2960
近年来,迁移学习得到越来越多的关注.现有的在线迁移学习算法一般从单个源领域迁移知识,然而,当源领域与目标领域相似度较低时,很难进行有效的迁移学习.基于此,提出了一种基于局部分类精度的多源在线迁移学习方法——LC-MSOTL.LC-MSOTL存储多个源领域分类器,计算新到样本与目标领域已有样本之间的距离以及各源领域分类器对其最近邻样本的分类精度,从源领域分类器中挑选局部精度最高的分类器与目标领域分类器加权组合,从而实现多个源领域知识到目标领域的迁移学习.在人工数据集和实际数据集上的实验结果表明,LC-MSOTL能够有效地从多个源领域实现选择性迁移,相对于单源在线迁移学习算法OTL,显示出了更高的分类准确率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号