首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
加强型学习系统是一种与没有约束的,未知的环境相互作用的系统,学习系统的目标在大最大可能地获取累积奖励信号,这个奖励信号在有限,未知的生命周期由系统所处的环境中得到,对于一个加强型学习系统,困难之一在于奖励信号非常稀疏,尤其是对于只有时延信号的系统,已有的加强型学习方法以价值函数的形式贮存奖励信号,例如著名的Q-学习。本文提出了一个基于状态的不生估计模型的方法,这个算法对有利用存贮于价值函数中的奖励  相似文献   

2.
In this paper, we address the problem of suboptimal behavior during online partially observable Markov decision process (POMDP) planning caused by time constraints on planning. Taking inspiration from the related field of reinforcement learning (RL), our solution is to shape the agent’s reward function in order to lead the agent to large future rewards without having to spend as much time explicitly estimating cumulative future rewards, enabling the agent to save time to improve the breadth planning and build higher quality plans. Specifically, we extend potential-based reward shaping (PBRS) from RL to online POMDP planning. In our extension, information about belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards beyond its planning horizon, and thus achieve greater cumulative rewards. We develop novel potential functions measuring information useful to agent metareasoning in POMDPs (reflecting on agent knowledge and/or histories of experience with the environment), theoretically prove several important properties and benefits of using PBRS for online POMDP planning, and empirically demonstrate these results in a range of classic benchmark POMDP planning problems.  相似文献   

3.
This paper develops an adaptive fuzzy controller for robot manipulators using a Markov game formulation. The Markov game framework offers a promising platform for robust control of robot manipulators in the presence of bounded external disturbances and unknown parameter variations. We propose fuzzy Markov games as an adaptation of fuzzy Q-learning (FQL) to a continuous-action variation of Markov games, wherein the reinforcement signal is used to tune online the conclusion part of a fuzzy Markov game controller. The proposed Markov game-adaptive fuzzy controller uses a simple fuzzy inference system (FIS), is computationally efficient, generates a swift control, and requires no exact dynamics of the robot system. To illustrate the superiority of Markov game-adaptive fuzzy control, we compare the performance of the controller against a) the Markov game-based robust neural controller, b) the reinforcement learning (RL)-adaptive fuzzy controller, c) the FQL controller, d) the Hinfin theory-based robust neural game controller, and e) a standard RL-based robust neural controller, on two highly nonlinear robot arm control problems of i) a standard two-link rigid robot arm and ii) a 2-DOF SCARA robot manipulator. The proposed Markov game-adaptive fuzzy controller outperformed other controllers in terms of tracking errors and control torque requirements, over different desired trajectories. The results also demonstrate the viability of FISs for accelerating learning in Markov games and extending Markov game-based control to continuous state-action space problems.  相似文献   

4.
Humans can generate accurate and appropriate motor commands in various, and even uncertain, environments. MOSAIC (MOdular Selection And Identification for Control) was originally proposed to describe this human ability, but this model is hard to analyze mathematically because of its emphasis on biological plausibility. In this article, we present an alternative and probabilistic model of MOSAIC (p-MOSAIC) as a mixture of normal distributions and an online EM-based learning method for its predictors and controllers. A theoretical consideration shows that the learning rule of p-MOSAIC corresponds to that of MOSAIC except for some points which are mostly related to the learning of controllers. The results of experiments using synthetic datasets demonstrate some practical advantages of p-MOSAIC. One is that the learning rule of p-MOSAIC stabilizes the estimation of “responsibility.” Another is that p-MOSAIC realizes more accurate control and robust parameter learning in comparison to the original MOSAIC, especially in noisy environments, due to the direct incorporation of the noises into the model. This work was presented in part at the 12th International Symposium on Artificial Life and Robotics, Oita, Japan, January 25–27, 2007  相似文献   

5.
《Applied Soft Computing》2007,7(3):818-827
This paper proposes a reinforcement learning (RL)-based game-theoretic formulation for designing robust controllers for nonlinear systems affected by bounded external disturbances and parametric uncertainties. Based on the theory of Markov games, we consider a differential game in which a ‘disturbing’ agent tries to make worst possible disturbance while a ‘control’ agent tries to make best control input. The problem is formulated as finding a min–max solution of a value function. We propose an online procedure for learning optimal value function and for calculating a robust control policy. Proposed game-theoretic paradigm has been tested on the control task of a highly nonlinear two-link robot system. We compare the performance of proposed Markov game controller with a standard RL-based robust controller, and an H theory-based robust game controller. For the robot control task, the proposed controller achieved superior robustness to changes in payload mass and external disturbances, over other control schemes. Results also validate the effectiveness of neural networks in extending the Markov game framework to problems with continuous state–action spaces.  相似文献   

6.
强化学习算法中启发式回报函数的设计及其收敛性分析   总被引:3,自引:0,他引:3  
(中国科学院沈阳自动化所机器人学重点实验室沈阳110016)  相似文献   

7.
While Reinforcement Learning (RL) is not traditionally designed for interactive supervisory input from a human teacher, several works in both robot and software agents have adapted it for human input by letting a human trainer control the reward signal. In this work, we experimentally examine the assumption underlying these works, namely that the human-given reward is compatible with the traditional RL reward signal. We describe an experimental platform with a simulated RL robot and present an analysis of real-time human teaching behavior found in a study in which untrained subjects taught the robot to perform a new task. We report three main observations on how people administer feedback when teaching a Reinforcement Learning agent: (a) they use the reward channel not only for feedback, but also for future-directed guidance; (b) they have a positive bias to their feedback, possibly using the signal as a motivational channel; and (c) they change their behavior as they develop a mental model of the robotic learner. Given this, we made specific modifications to the simulated RL robot, and analyzed and evaluated its learning behavior in four follow-up experiments with human trainers. We report significant improvements on several learning measures. This work demonstrates the importance of understanding the human-teacher/robot-learner partnership in order to design algorithms that support how people want to teach and simultaneously improve the robot's learning behavior.  相似文献   

8.
自主机器人的强化学习研究进展   总被引:9,自引:1,他引:8  
陈卫东  席裕庚  顾冬雷 《机器人》2001,23(4):379-384
虽然基于行为控制的自主机器人具有较高的鲁棒性,但其对于动态环境缺乏必要的自 适应能力.强化学习方法使机器人可以通过学习来完成任务,而无需设计者完全预先规定机 器人的所有动作,它是将动态规划和监督学习结合的基础上发展起来的一种新颖的学习方法 ,它通过机器人与环境的试错交互,利用来自成功和失败经验的奖励和惩罚信号不断改进机 器人的性能,从而达到目标,并容许滞后评价.由于其解决复杂问题的突出能力,强化学习 已成为一种非常有前途的机器人学习方法.本文系统论述了强化学习方法在自主机器人中的 研究现状,指出了存在的问题,分析了几种问题解决途径,展望了未来发展趋势.  相似文献   

9.
Reinforcement learning (RL) for robot control is an important technology for future robots since it enables us to design a robot’s behavior using the reward function. However, RL for high degree-of-freedom robot control is still an open issue. This paper proposes a discrete action space DCOB which is generated from the basis functions (BFs) given to approximate a value function. The remarkable feature is that, by reducing the number of BFs to enable the robot to learn quickly the value function, the size of DCOB is also reduced, which improves the learning speed. In addition, a method WF-DCOB is proposed to enhance the performance, where wire-fitting is utilized to search for continuous actions around each discrete action of DCOB. We apply the proposed methods to motion learning tasks of a simulated humanoid robot and a real spider robot. The experimental results demonstrate outstanding performance.  相似文献   

10.
The robot soccer game has been proposed as a benchmark problem for the artificial intelligence and robotic researches. Decision-making system is the most important part of the robot soccer system. As the environment is dynamic and complex, one of the reinforcement learning (RL) method named FNN-RL is employed in learning the decision-making strategy. The FNN-RL system consists of the fuzzy neural network (FNN) and RL. RL is used for structure identification and parameters tuning of FNN. On the other hand, the curse of dimensionality problem of RL can be solved by the function approximation characteristics of FNN. Furthermore, the residual algorithm is used to calculate the gradient of the FNN-RL method in order to guarantee the convergence and rapidity of learning. The complex decision-making task is divided into multiple learning subtasks that include dynamic role assignment, action selection, and action implementation. They constitute a hierarchical learning system. We apply the proposed FNN-RL method to the soccer agents who attempt to learn each subtask at the various layers. The effectiveness of the proposed method is demonstrated by the simulation and the real experiments.  相似文献   

11.
Reinforcement learning (RL) has now evolved as a major technique for adaptive optimal control of nonlinear systems. However, majority of the RL algorithms proposed so far impose a strong constraint on the structure of environment dynamics by assuming that it operates as a Markov decision process (MDP). An MDP framework envisages a single agent operating in a stationary environment thereby limiting the scope of application of RL to control problems. Recently, a new direction of research has focused on proposing Markov games as an alternative system model to enhance the generality and robustness of the RL based approaches. This paper aims to present this new direction that seeks to synergize broad areas of RL and Game theory, as an interesting and challenging avenue for designing intelligent and reliable controllers. First, we briefly review some representative RL algorithms for the sake of completeness and then describe the recent direction that seeks to integrate RL and game theory. Finally, open issues are identified and future research directions outlined.  相似文献   

12.
The integration of reinforcement learning (RL) and imitation learning (IL) is an important problem that has long been studied in the field of intelligent robotics. RL optimizes policies to maximize the cumulative reward, whereas IL attempts to extract general knowledge about the trajectories demonstrated by experts, i.e, demonstrators. Because each has its own drawbacks, many methods combining them and compensating for each set of drawbacks have been explored thus far. However, many of these methods are heuristic and do not have a solid theoretical basis. This paper presents a new theory for integrating RL and IL by extending the probabilistic graphical model (PGM) framework for RL, control as inference. We develop a new PGM for RL with multiple types of rewards, called probabilistic graphical model for Markov decision processes with multiple optimality emissions (pMDP-MO). Furthermore, we demonstrate that the integrated learning method of RL and IL can be formulated as a probabilistic inference of policies on pMDP-MO by considering the discriminator in generative adversarial imitation learning (GAIL) as an additional optimality emission. We adapt the GAIL and task-achievement reward to our proposed framework, achieving significantly better performance than policies trained with baseline methods.  相似文献   

13.
In recent robotics fields, much attention has been focused on utilizing reinforcement learning (RL) for designing robot controllers, since environments where the robots will be situated in should be unpredictable for human designers in advance. However there exist some difficulties. One of them is well known as ‘curse of dimensionality problem’. Thus, in order to adopt RL for complicated systems, not only ‘adaptability’ but also ‘computational efficiencies’ should be taken into account. The paper proposes an adaptive state recruitment strategy for NGnet-based actor-critic RL. The strategy enables the learning system to rearrange/divide its state space gradually according to the task complexity and the progress of learning. Some simulation results and real robot implementations show the validity of the method.  相似文献   

14.
Microsoft's Xbox and Sony's PlayStation overlay achievement and trophy systems onto their video games. Though these meta-game reward systems are growing in popularity, little research has examined whether players notice, use, or seek out these systems. In this study, game players participated in focus groups to discuss the advantages and disadvantages of meta-game reward systems. Participants described the value of meta-game reward systems in promoting different ways to play games, giving positive feedback about game play, and boosting self-esteem and online and offline social status. Participants discussed completionists, or gamers that want to earn all of the badges associated with the meta-game. Though self-determination theory and its subtheory cognitive evaluation theory suggest that extrinsic rewards might harm players' intrinsic motivation, our findings suggest players may see these systems as intrinsically motivating in this context. The implications of rewards systems for motivation, video game habits, and internet gaming disorder are discussed.  相似文献   

15.
We present work on a six-legged walking machine that uses a hierarchical version of [C.J.C.H. Watkins, Learning with delayed rewards, Ph.D. Thesis, Psychology Department, Cambridge University, 1989] Q-learning (HQL) to learn both: the elementary swing and stance movements of individual legs as well as the overall coordination scheme to perform forward movements. The architecture consists of a hierarchy of local controllers implemented in layers. The lowest layer consists of control modules performing elementary actions, like moving a leg up, down, left or right to achieve the elementary swing and stance motions for individual legs. The next level consists of controllers that learn to perform more complex tasks like forward movement by using the previously learned, lower level modules. The work is related to similar, although simulation based, work [L.J. Lin, Reinforcement learning for robots using neural networks, Ph.D. Thesis, Carnegie Mellon University, 1993] on hierarchical reinforcement-learning and [S.P. Singh, learning to solve Markovian decision problems, Ph.D. Thesis, Department of Computer Science at the University of Massachusetts, 1994] on compositional Q-learning. We report on the HQL architecture as well as on its implementation on the walking machine Sir Arthur. Results from experiments carried out on the real robot are reported to show the applicability of the HQL approach to real world robot problems.  相似文献   

16.
The article puts forward a simple scheme for multivariable control of robot manipulators to achieve trajectory tracking. The scheme is composed of an inner loop stabilizing controller and an outer loop tracking controller. The inner loop utilizes a multivariable PD controller to stabilize the robot by placing the poles of the linearized robot model at some desired locations. The outer loop employs a multivariable PID controller to achieve input-output decoupling and trajectory tracking. The gains of the PD and PID controllers are related directly to the linearized robot model by simple closed-form expressions. The controller gains are updated on-line to cope with variations in the robot model during gross motion and for payload change. Alternatively, the use of high gain controllers for gross motion and payload change is discussed. Computer simulation results are given for illustration.  相似文献   

17.
Model-based average reward reinforcement learning   总被引:7,自引:0,他引:7  
《Artificial Intelligence》1998,100(1-2):177-224
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Averagereward Reinforcement Learning method called H-learning and show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this “Auto-exploratory H-Learning” performs better than the previously studied exploration strategies. To scale H-learning to larger state spaces, we extend it to learn action models and reward functions in the form of dynamic Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirement of H-learning and making it converge faster in some AGV scheduling tasks.  相似文献   

18.
This paper presents and analyzes Reinforcement Learning (RL) based approaches to solve spacecraft control problems. Different application fields are considered, e.g., guidance, navigation and control systems for spacecraft landing on celestial bodies, constellation orbital control, and maneuver planning in orbit transfers. It is discussed how RL solutions can address the emerging needs of designing spacecraft with highly autonomous on-board capabilities and implementing controllers (i.e., RL agents) robust to system uncertainties and adaptive to changing environments. For each application field, the RL framework core elements (e.g., the reward function, the RL algorithm and the environment model used for the RL agent training) are discussed with the aim of providing some guidelines in the formulation of spacecraft control problems via a RL framework. At the same time, the adoption of RL in real space projects is also analyzed. Different open points are identified and discussed, e.g., the availability of high-fidelity simulators for the RL agent training and the verification of RL-based solutions. This way, recommendations for future work are proposed with the aim of reducing the technological gap between the solutions proposed by the academic community and the needs/requirements of the space industry.  相似文献   

19.
The purpose of this paper is to propose a hybrid trigonometric compound function neural network (NN) to improve the NN-based tracking control performance of a nonholonomic mobile robot with nonlinear disturbances. In the mobile robot control system, two NN controllers embedded in the closed-loop control system have the simple continuous learning and rapid convergence capability without the dynamics information of the mobile robot to realize the tracking control of the mobile robot. The neuron functions of the hidden layer in the three-layer feedforward network structure consist of the compound cosine function and the compound sine function combining a cosine or a sine function with a unipolar sigmoid function. The main advantages of this NN-based mobile robot control system are better real-time control capability and control accuracy by use of the proposed NN controllers for a nonholonomic mobile robot with nonlinear disturbances. Through simulation experiments applied to the nonholonomic mobile robot with the nonlinear disturbances of dynamics uncertainty and external disturbances, the simulation results show that the proposed NN control system of a nonholonomic mobile robot has better real-time control capability and control accuracy than the compound cosine function NN control system of a nonholonomic mobile robot and then verify the effectiveness of the proposed hybrid trigonometric compound function NN controller for improving the tracking control performance of a nonholonomic mobile robot with nonlinear disturbances.  相似文献   

20.
To motivate visitors to engage with websites, e-tailers widely employ monetary rewards (e.g., vouchers, discounts) in their website designs. With advances in user interface technologies, many e-tailers have started to offer gamified monetary reward designs (MRDs), which require visitors to earn the monetary reward by playing a game, rather than simply claiming the reward. However, little is known about whether and why gamified MRDs engage visitors compared to their non-gamified counterpart. Even less is known about the effectiveness of gamified MRDs when providing certain or chance-based rewards, in that visitors do or do not know what reward they will gain for successfully performing in the game. Drawing on cognitive evaluation theory, we investigate gamified MRDs with certain or chance-based rewards and contrast them to non-gamified MRDs with certain rewards in user registration systems. Our results from a multi-method approach encompassing the complementary features of a randomised field experiment (N = 651) and a randomised online experiment (N = 330) demonstrate differential effects of the three investigated MRDs on user registration. Visitors encountering either type of gamified MRD are more likely to register than those encountering a non-gamified MRD. Moreover, gamified MRDs with chance-based rewards have the highest likelihood of user registrations. We also show that MRDs have distinct indirect effects on user registration via anticipated experiences of competence and sensation. Overall, the paper offers theoretical insights and practical guidance on how and why gamified MRDs are effective for e-tailers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号