首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Reinforcement learning (RL) has now evolved as a major technique for adaptive optimal control of nonlinear systems. However, majority of the RL algorithms proposed so far impose a strong constraint on the structure of environment dynamics by assuming that it operates as a Markov decision process (MDP). An MDP framework envisages a single agent operating in a stationary environment thereby limiting the scope of application of RL to control problems. Recently, a new direction of research has focused on proposing Markov games as an alternative system model to enhance the generality and robustness of the RL based approaches. This paper aims to present this new direction that seeks to synergize broad areas of RL and Game theory, as an interesting and challenging avenue for designing intelligent and reliable controllers. First, we briefly review some representative RL algorithms for the sake of completeness and then describe the recent direction that seeks to integrate RL and game theory. Finally, open issues are identified and future research directions outlined.  相似文献   

2.
Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches??in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch. This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article. A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.  相似文献   

3.
《Applied Soft Computing》2007,7(3):818-827
This paper proposes a reinforcement learning (RL)-based game-theoretic formulation for designing robust controllers for nonlinear systems affected by bounded external disturbances and parametric uncertainties. Based on the theory of Markov games, we consider a differential game in which a ‘disturbing’ agent tries to make worst possible disturbance while a ‘control’ agent tries to make best control input. The problem is formulated as finding a min–max solution of a value function. We propose an online procedure for learning optimal value function and for calculating a robust control policy. Proposed game-theoretic paradigm has been tested on the control task of a highly nonlinear two-link robot system. We compare the performance of proposed Markov game controller with a standard RL-based robust controller, and an H theory-based robust game controller. For the robot control task, the proposed controller achieved superior robustness to changes in payload mass and external disturbances, over other control schemes. Results also validate the effectiveness of neural networks in extending the Markov game framework to problems with continuous state–action spaces.  相似文献   

4.
Reinforcement learning (RL) is an effective method for the design of robust controllers of unknown nonlinear systems. Normal RLs for robust control, such as actor‐critic (AC) algorithms, depend on the estimation accuracy. Uncertainty in the worst case requires a large state‐action space, this causes overestimation and computational problems. In this article, the RL method is modified with the k‐nearest neighbor and the double Q‐learning algorithm. The modified RL does not need the neural estimator as AC and can stabilize the unknown nonlinear system under the worst‐case uncertainty. The convergence property of the proposed RL method is analyzed. The simulations and the experimental results show that our modified RLs are much more robust compared with the classic controllers, such as the proportional‐integral‐derivative, the sliding mode, and the optimal linear quadratic regulator controllers.  相似文献   

5.
Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments. To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands. For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind. In addition, this robot has to tailor its tactics simultaneously under the rules of either game. This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner. The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function). In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward. We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model. Here we extend this framework for RL and propose MOSAIC-MR architecture. It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors. Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards. The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors.  相似文献   

6.
刘旭光  杜昌平  郑耀 《计算机应用》2022,42(12):3950-3956
为进一步提升在未知环境下四旋翼无人机轨迹的跟踪精度,提出了一种在传统反馈控制架构上增加迭代学习前馈控制器的控制方法。针对迭代学习控制(ILC)中存在的学习参数整定困难的问题,提出了一种利用强化学习(RL)对迭代学习控制器的学习参数进行整定优化的方法。首先,利用RL对迭代学习控制器的学习参数进行优化,筛选出当前环境及任务下最优的学习参数以保证迭代学习控制器的控制效果最优;其次,利用迭代学习控制器的学习能力不断迭代优化前馈输入,直至实现完美跟踪;最后,在有随机噪声存在的仿真环境中把所提出的强化迭代学习控制(RL-ILC)算法与未经参数优化的ILC方法、滑模变结构控制(SMC)方法以及比例-积分-微分(PID)控制方法进行对比实验。实验结果表明,所提算法在经过2次迭代后,总误差缩减为初始误差的0.2%,实现了快速收敛;并且与SMC控制方法及PID控制方法相比,RL-ILC算法在算法收敛后不会受噪声影响产生轨迹波动。由此可见,所提算法能够有效提高无人机轨迹跟踪的准确性和鲁棒性。  相似文献   

7.
We propose approximating a Poincaré map of biped walking dynamics using Gaussian processes. We locally optimize parameters of a given biped walking controller based on the approximated Poincaré map. By using Gaussian processes, we can estimate a probability distribution of a target nonlinear function with a given covariance. Thus, an optimization method can take the uncertainty of approximated maps into account throughout the learning process. We use a reinforcement learning (RL) method as the optimization method. Although RL is a useful non-linear optimizer, it is usually difficult to apply RL to real robotic systems due to the large number of iterations required to acquire suitable policies. In this study, we first approximated the Poincaré map by using data from a real robot, and then applied RL using the estimated map in order to optimize stepping and walking policies. We show that we can improve stepping and walking policies both in simulated and real environments. Experimental validation on a humanoid robot of the approach is presented.  相似文献   

8.
Complexity and uncertainty in modern robots and other autonomous systems make it difficult to design controllers for such systems that can achieve desired levels of precision and robustness. Therefore learning methods are being incorporated into controllers for such systems, thereby providing the adaptibility necessary to meet the performance demands of the task. We argue that for learning tasks arising frequently in control applications, the most useful methods in practice probably are those we call direct associative reinforcement learning methods. We describe direct reinforcement learning methods and also illustrate with an example the utility of these methods for learning skilled robot control under uncertainty.  相似文献   

9.
Safety critical control is often trained in a simulated environment to mitigate risk. Subsequent migration of the biased controller requires further adjustments. In this paper, an experience inference human-behavior learning is proposed to solve the migration problem of optimal controllers applied to real-world nonlinear systems. The approach is inspired in the complementary properties that exhibits the hippocampus, the neocortex, and the striatum learning systems located in the brain. The hippocampus defines a physics informed reference model of the real-world nonlinear system for experience inference and the neocortex is the adaptive dynamic programming (ADP) or reinforcement learning (RL) algorithm that ensures optimal performance of the reference model. This optimal performance is inferred to the real-world nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference model. Stability and convergence of the proposed approach is analyzed using Lyapunov stability theory. Simulation studies are carried out to verify the approach.   相似文献   

10.
Reinforcement learning(RL) is an artificial intelligence algorithm with the advantages of clear calculation logic and easy expansion of the model. Through interacting with the environment and maximizing value functions on the premise of obtaining little or no prior information, RL can optimize the performance of strategies and effectively reduce the complexity caused by physical models . The RL algorithm based on strategy gradient has been successfully applied in many fields such as intelligent image recognition, robot control and path planning for automatic driving. However, the highly sampling-dependent characteristics of RL determine that the training process needs a large number of samples to converge, and the accuracy of decision making is easily affected by slight interference that does not match with the simulation environment. Especially when RL is applied to the control field, it is difficult to prove the stability of the algorithm because the convergence of the algorithm cannot be guaranteed. Considering that swarm intelligence algorithm can solve complex problems through group cooperation and has the characteristics of self-organization and strong stability, it is an effective way to be used for improving the stability of RL model. The pigeon-inspired optimization algorithm in swarm intelligence was combined to improve RL based on strategy gradient. A RL algorithm based on pigeon-inspired optimization was proposed to solve the strategy gradient in order to maximize long-term future rewards. Adaptive function of pigeon-inspired optimization algorithm and RL were combined to estimate the advantages and disadvantages of strategies, avoid solving into an infinite loop, and improve the stability of the algorithm. A nonlinear two-wheel inverted pendulum robot control system was selected for simulation verification. The simulation results show that the RL algorithm based on pigeon-inspired optimization can improve the robustness of the system, reduce the computational cost, and reduce the algorithm’s dependence on the sample database. © 2022, Beijing Xintong Media Co., Ltd.. All rights reserved.  相似文献   

11.
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.  相似文献   

12.
《Automatica》2014,50(12):3281-3290
This paper addresses the model-free nonlinear optimal control problem based on data by introducing the reinforcement learning (RL) technique. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton–Jacobi–Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most practical systems are too complicated to establish an accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than a system model. Firstly, a model-free policy iteration algorithm is derived and its convergence is proved. The implementation of the algorithm is based on the actor–critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The data-based API is an off-policy RL method, where the “exploration” is improved by arbitrarily sampling data on the state and input domain. Finally, we test the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.  相似文献   

13.
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both offline learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H(infinity) control, we consider a differential game in which a "disturbing" agent tries to make the worst possible disturbance while a "control" agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H(infinity) control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.  相似文献   

14.
一种基于组合型模糊控制的主动队列管理算法   总被引:1,自引:0,他引:1  
计算机网络具有的复杂性和动态特性使传统控制理论难以进行主动队列管理(Active Queue Management, AQM)算法的设计和分析.本文在模糊集合和模糊系统理论的基础上设计了一个主动队列管理算法CF(Combination Fuzzy control).其中模糊控制器I根据瞬时队列的长度和变化值计算控制量;模糊控制器II根据系统负载因子计算控制增益.通过选择模糊控制器参数,模糊控制系统与使用PI(Proportional Integral)控制器的系统具有相同的局部稳定性.最后通过仿真对CF、PI和单模糊控制器的性能进行了比较.  相似文献   

15.
This paper is to develop a simplified optimized tracking control using reinforcement learning (RL) strategy for a class of nonlinear systems. Since the nonlinear control gain function is considered in the system modeling, it is challenging to extend the existing RL-based optimal methods to the tracking control. The main reasons are that these methods' algorithm are very complex; meanwhile, they also require to meet some strict conditions. Different with these exiting RL-based optimal methods that derive the actor and critic training laws from the square of Bellman residual error, which is a complex function consisting of multiple nonlinear terms, the proposed optimized scheme derives the two RL training laws from negative gradient of a simple positive function, so that the algorithm can be significantly simplified. Moreover, the actor and critic in RL are constructed by employing neural network (NN) to approximate the solution of Hamilton–Jacobi–Bellman (HJB) equation. Finally, the feasibility of the proposed method is demonstrated in accordance with both Lyapunov stability theory and simulation example.  相似文献   

16.
刘洋  李凡长 《计算机科学》2022,49(3):225-231
以神经网络为基础的深度学习在大量领域取得优异成果,但其难以处理相似或未经训练的任务。深度学习在对新任务的学习和适应过程中存在困难,且对训练样本规模要求很高,造成泛化性和扩展性不佳的问题。元学习是一种新的学习框架,旨在解决传统学习方法难以解决的快速学习和适应新任务的问题。针对图像分类的元学习问题,文中提出了一种基于贝叶斯理论的纤维丛元学习算法(Fiber Bundle Meta-learning Algorithm,FBBML)。首先通过卷积神经网络提取支持数据集的图片信息,以得到图片的表示。然后构建数据特征的流形结构和数据特征到标签的纤维丛。最后输入查询集选取当前新任务的流形截面,从而获得适合新任务的纤维,得到图片的正确标签。实验结果表明,基于所提算法实现的模型(FBBML)在公共数据集(mini-ImageNet)上相比标准四层卷积神经网络的模型取得了最佳的准确率性能。同时将纤维丛理论引入元学习,使得算法本身具备更高的可解释性。  相似文献   

17.
The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: (i) decompose the task into subtasks using the qualitative knowledge at hand; (ii) design local controllers to solve the subtasks using the available quantitative knowledge, and (iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to nonadaptive ones in complex environments.  相似文献   

18.
《Advanced Robotics》2013,27(10):1125-1142
This paper presents a novel approach for acquiring dynamic whole-body movements on humanoid robots focused on learning a control policy for the center of mass (CoM). In our approach, we combine both a model-based CoM controller and a model-free reinforcement learning (RL) method to acquire dynamic whole-body movements in humanoid robots. (i) To cope with high dimensionality, we use a model-based CoM controller as a basic controller that derives joint angular velocities from the desired CoM velocity. The balancing issue can also be considered in the controller. (ii) The RL method is used to acquire a controller that generates the desired CoM velocity based on the current state. To demonstrate the effectiveness of our approach, we apply it to a ball-punching task on a simulated humanoid robot model. The acquired whole-body punching movement was also demonstrated on Fujitsu's Hoap-2 humanoid robot.  相似文献   

19.
无人机控制器的设计开发是一项复杂的系统工程, 传统的基于代码编程的开发方式存在开发难度大、周期长及错误率高等缺点. 同时, 强化学习智能飞控算法虽在仿真中取得很好的性能, 但在实际中仍缺乏一套完备的开发系统. 本文提出一套基于模型的智能飞控开发系统, 使用模块化编程及自动代码生成技术, 将强化学习算法应用于飞控的嵌入式开发与部署. 该系统可以实现强化学习算法的训练仿真、测试及硬件部署, 旨在提升以强化学习为代表的智能控制算法的部署速度, 同时降低智能飞行控制系统的开发难度.  相似文献   

20.
The purpose of the paper is two-fold: (i) to introduce the sufficient statistic algebra which is responsible for propagating the sufficient statistics, or information state, in the optimal control of stochastic systems and (ii) to apply certain Lie algebraic methods and gauge transformations, widely used in nonlinear control theory and quantum physics, to derive new results concerning finite dimensional controllers. This enhances our understanding of the role played by the sufficient statistics. The sufficient statistic algebra enables us to determine a priori whether there exist finite-dimensional controllers; it also enables us to classify all finite-dimensional controllers. Relations to minimax dynamic games are delineated  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号