首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, a multi-agent reinforcement learning method based on action prediction of other agent is proposed. In a multi-agent system, action selection of the learning agent is unavoidably impacted by other agents’ actions. Therefore, joint-state and joint-action are involved in the multi-agent reinforcement learning system. A novel agent action prediction method based on the probabilistic neural network (PNN) is proposed. PNN is used to predict the actions of other agents. Furthermore, the sharing policy mechanism is used to exchange the learning policy of multiple agents, the aim of which is to speed up the learning. Finally, the application of presented method to robot soccer is studied. Through learning, robot players can master the mapping policy from the state information to the action space. Moreover, multiple robots coordination and cooperation are well realized.  相似文献   

2.
多智能体强化学习及其在足球机器人角色分配中的应用   总被引:2,自引:0,他引:2  
足球机器人系统是一个典型的多智能体系统, 每个机器人球员选择动作不仅与自身的状态有关, 还要受到其他球员的影响, 因此通过强化学习来实现足球机器人决策策略需要采用组合状态和组合动作. 本文研究了基于智能体动作预测的多智能体强化学习算法, 使用朴素贝叶斯分类器来预测其他智能体的动作. 并引入策略共享机制来交换多智能体所学习的策略, 以提高多智能体强化学习的速度. 最后, 研究了所提出的方法在足球机器人动态角色分配中的应用, 实现了多机器人的分工和协作.  相似文献   

3.
Batch reinforcement learning methods provide a powerful framework for learning efficiently and effectively in autonomous robots. The paper reviews some recent work of the authors aiming at the successful application of reinforcement learning in a challenging and complex domain. It discusses several variants of the general batch learning framework, particularly tailored to the use of multilayer perceptrons to approximate value functions over continuous state spaces. The batch learning framework is successfully used to learn crucial skills in our soccer-playing robots participating in the RoboCup competitions. This is demonstrated on three different case studies.
Martin RiedmillerEmail:
  相似文献   

4.
强化学习在移动机器人自主导航中的应用   总被引:1,自引:1,他引:1       下载免费PDF全文
概述了移动机器人常用的自主导航算法及其优缺点,在此基础上提出了强化学习方法。描述了强化学习算法的原理,并实现了用神经网络解决泛化问题。设计了基于障碍物探测传感器信息的机器人自主导航强化学习方法,给出了学习算法中各要素的数学模型。经仿真验证,算法正确有效,具有良好的收敛性和泛化能力。  相似文献   

5.

Robot learning, such as reinforcement learning, generally needs a well-defined state space in order to converge. However, building such a state space is one of the main issues of robot learning because of the interdependence between state and action spaces, which resembles the well-known "chicken and egg" problem. This article proposes a method of action-based state space construction for vision-based mobile robots. Basic ideas to cope with the interdependence are that we define a state as a cluster of input vectors from which the robot can reach the goal state or the state already obtained by a sequence of one kind of action primitive regardless of its length, and that this sequence is defined as one action. To realize these ideas, we need many data (experiences) of the robot and we must cluster the input vectors as hyper ellipsoids so that the whole state space is segmented into a state transition map in terms of action from which the optimal action sequence is obtained. To show the validity of the method, we apply it to a soccer robot that tries to shoot a ball into a goal. The simulation and real experiments are shown.  相似文献   

6.
In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at high-level (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning.  相似文献   

7.
It was confirmed that a real mobile robot with a simple visual sensor could learn appropriate motions to reach a target object by direct-vision-based reinforcement learning (RL). In direct-vision-based RL, raw visual sensory signals are put directly into a layered neural network, and then the neural network is trained using back propagation, with the training signal being generated by reinforcement learning. Because of the time-delay in transmitting the visual sensory signals, the actor outputs are trained by the critic output at two time-steps ahead. It was shown that a robot with a simple monochrome visual sensor can learn to reach a target object from scratch without any advance knowledge of this task by direct-vision-based RL. This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

8.
This paper addresses a new method for combination of supervised learning and reinforcement learning (RL). Applying supervised learning in robot navigation encounters serious challenges such as inconsistent and noisy data, difficulty for gathering training data, and high error in training data. RL capabilities such as training only by one evaluation scalar signal, and high degree of exploration have encouraged researchers to use RL in robot navigation problem. However, RL algorithms are time consuming as well as suffer from high failure rate in the training phase. Here, we propose Supervised Fuzzy Sarsa Learning (SFSL) as a novel idea for utilizing advantages of both supervised and reinforcement learning algorithms. A zero order Takagi–Sugeno fuzzy controller with some candidate actions for each rule is considered as the main module of robot's controller. The aim of training is to find the best action for each fuzzy rule. In the first step, a human supervisor drives an E-puck robot within the environment and the training data are gathered. In the second step as a hard tuning, the training data are used for initializing the value (worth) of each candidate action in the fuzzy rules. Afterwards, the fuzzy Sarsa learning module, as a critic-only based fuzzy reinforcement learner, fine tunes the parameters of conclusion parts of the fuzzy controller online. The proposed algorithm is used for driving E-puck robot in the environment with obstacles. The experiment results show that the proposed approach decreases the learning time and the number of failures; also it improves the quality of the robot's motion in the testing environments.  相似文献   

9.
In multi-agent systems, the study of language and communication is an active field of research. In this paper we present the application of Reinforcement Learning (RL) to the self-emergence of a common lexicon in robot teams. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the robots or the states of the environment itself) into symbols or signals we check whether it is possible for the robot team to converge in an autonomous, decentralized way to a common lexicon by means of RL, so that the communication efficiency of the entire robot team is optimal. We have conducted several experiments aimed at testing whether it is possible to converge with RL to an optimal Saussurean Communication System. We have organized our experiments alongside two main lines: first, we have investigated the effect of the team size centered on teams of moderated size in the order of 5 and 10 individuals, typical of multi-robot systems. Second, and foremost, we have also investigated the effect of the lexicon size on the convergence results. To analyze the convergence of the robot team we have defined the team’s consensus when all the robots (i.e. 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that RL allows the convergence to lexicon consensus in a population of autonomous agents.  相似文献   

10.
Reinforcement learning (RL) is a biologically supported learning paradigm, which allows an agent to learn through experience acquired by interaction with its environment. Its potential to learn complex action sequences has been proven for a variety of problems, such as navigation tasks. However, the interactive randomized exploration of the state space, common in reinforcement learning, makes it difficult to be used in real-world scenarios. In this work we describe a novel real-world reinforcement learning method. It uses a supervised reinforcement learning approach combined with Gaussian distributed state activation. We successfully tested this method in two real scenarios of humanoid robot navigation: first, backward movements for docking at a charging station and second, forward movements to prepare grasping. Our approach reduces the required learning steps by more than an order of magnitude, and it is robust and easy to be integrated into conventional RL techniques.  相似文献   

11.
TMS320F2812在足球机器人上的应用   总被引:2,自引:0,他引:2  
介绍了RoboCup小型组足球机器人的结构及其底层控制系统的性能要求,详细阐述了TMS320F2812作为主控芯片在足球机器人底层控制中的地位及其在通讯、电机控制、带球和击挑球控制等方面的具体应用。  相似文献   

12.
最优信息融合Kalman滤波算法给出了实时动态环境中线性方差最小的融合估计。采用该算法对机器人足球系统中的小球进行状态估计和预测,并给出了信息融合处理结构和该算法的具体实现步骤。实验结果表明,该算法可以克服单一视觉传感器采集的数据含有较大噪声等局限性,实现了对小球精确的状态估计和预测,具有可行性和优越性,并且在某一机器人视觉传感器出错时,系统仍具有良好的容错性和鲁棒性。  相似文献   

13.
Transfer learning is a hierarchical approach to reinforcement learning of complex tasks modeled as Markov Decision Processes. The learning results on the source task are used as the starting point for the learning on the target task. In this paper we deal with a hierarchy of constrained systems, where the source task is an under-constrained system, hence called the Partially Constrained Model (PCM). Constraints in the framework of reinforcement learning are dealt with by state-action veto policies. We propose a theoretical background for the hierarchy of training refinements, showing that the effective action repertoires learnt on the PCM are maximal, and that the PCM-optimal policy gives maximal state value functions. We apply the approach to learn the control of Linked Multicomponent Robotic Systems using Reinforcement Learning. The paradigmatic example is the transportation of a hose. The system has strong physical constraints and a large state space. Learning experiments in the target task are realized over an accurate but computationally expensive simulation of the hose dynamics. The PCM is obtained simplifying the hose model. Learning results of the PCM Transfer Learning show an spectacular improvement over conventional Q-learning on the target task.  相似文献   

14.
进化强化学习及其在机器人路径跟踪中的应用   总被引:2,自引:1,他引:2  
研究了一种基于自适应启发评价(AHC)强化学习的移动机器人路径跟踪控制方法.AHC的评价单元(ACE)采用多层前向神经网络来实现.将TD(λ)算法和梯度下降法相结合来更新神经网络的权值.AHC的动作选择单元(ASE)由遗传算法优化的模糊推理系统(FIS)构成.ACE网络的输出构成二次强化信号,用于指导ASE的学习.最后将所提出的算法应用于移动机器人的行为学习,较好地解决了机器人的复杂路径跟踪问题.  相似文献   

15.
In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated.  相似文献   

16.
在FIRA MiroSot机器人足球比赛中,视觉系统是获得比赛场上机器人与球位置信息的唯一途径。视觉系统的识别速度、精度直接影响到比赛的胜负。针对传统视觉系统在机器人足球比赛中获取各实体的位置不够准确的问题,提出了一种结合数学形态学中腐蚀/膨胀算法来处理视觉系统中的实时图像,增加足球机器人视觉系统识别的精度的设计方案。实验结果表明,该方案在没有降低比赛中识别速度的前提下,大大地提高了识别精度。  相似文献   

17.
In this study, a new value function based Reinforcement learning (RL) algorithm, Local Update Dynamic Policy Programming (LUDPP), is proposed. It exploits the nature of smooth policy update using Kullback–Leibler divergence to update its value function locally and considerably reduces the computational complexity. We firstly investigated the learning performance of LUDPP and other algorithms without smooth policy update for tasks of pendulum swing up and n DOFs manipulator reaching in simulation. Only LUDPP could efficiently and stably learn good control policies in high dimensional systems with limited number of training samples. In real word application, we applied LUDPP to control Pneumatic Artificial Muscles (PAMs) driven robots without the knowledge of model which is challenging for traditional methods due to the high nonlinearities of PAM’s air pressure dynamics and mechanical structure. LUDPP successfully achieved one finger control of Shadow Dexterous Hand, a PAM-driven humanoid robot hand, with far lower computational resource compared with other conventional value function based RL algorithms.  相似文献   

18.
Cloud manufacturing is a service-oriented manufacturing model that offers manufacturing resources as cloud services. Robots are an important type of manufacturing resources. In cloud manufacturng, large-scale distributed robots are encapsulated into cloud services and provided to consumers in an on-demand manner. How to effectively and efficiently manage and schedule decentralized robot services in cloud manufacturing to achieve on-demand provisioning is a challenging issue. During the past few years, Deep Reinforcement Learning (DRL) has become very popular and successfully been applied to many different areas such as games, robotics, and manufacturing. DRL also holds tremendous potential for solving scheduling issues in cloud manufacturing. To this end, this paper is devoted to exploring effective approaches for scheduling of decentralized robot manufacturing services in cloud manufacturing with DRL. Specifically, both Deep Q-Networks (DQN) and Dueling Deep Q-Networks (DDQN)-based scheduling algorithms are proposed. Performance of different algorithms, including DQN, DDQN, and other three benchmark algorithms, indicates that DDQN performs the best with respect to each indicator. Effects of different combinations of weight coefficients and influencing degrees of different indicators on the overall scheduling objective are analyzed. Results indicate that the DDQN-based scheduling algorithm is able to generate scheduling solutions efficiently.  相似文献   

19.
基于k–最近邻分类增强学习的除冰机器人抓线控制   总被引:1,自引:0,他引:1  
输电线柔性结构特性给除冰机器人越障抓线控制带来极大困难. 本文提出了一种结合k–最近邻(k-nearest neighbor, KNN)分类算法和增强学习算法的抓线控制方法. 利用基于KNN算法的状态感知机制选择机器人当前状态k个最邻近状态并且对之加权. 根据加权结果决定当前最优动作. 该方法可以得到机器人连续状态的离散表达形式, 从而有效解决传统连续状态泛化方法带来的计算收敛性和维数灾难问题. 借助增强学习算法探测和适应环境的能力, 该方法能够克服机器人模型误差和姿态误差,以及环境干扰等因素对抓线控制的影响. 文中给出了算法具体实现步骤, 并给出了应用此方法控制除冰机器人抓线的仿真实验.  相似文献   

20.
At AROB5, we proposed a solution to the path planning of a mobile robot. In our approach, we formulated the problem as a discrete optimization problem at each time step. To solve the optimization problem, we used an objective function consisting of a goal term, a smoothness term, and a collision term. While the results of our simulation showed the effectiveness of our approach, the values of the weights in the objective function were not given by any theoretical method. This article presents a theoretical method using reinforcement learning for adjusting the weight parameters. We applied Williams' learning algorithm, episodic REINFORCE, to derive a learning rule for the weight parameters. We verified the learning rule by some experiments. This work was presented, in part, at the Sixth International Symposium on Artificial Life and Robotics, Tokyo, Japan, January 15–17, 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号