首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于强化学习的多机器人协作   总被引:3,自引:0,他引:3  
提出了一种动态环境下多个机器人获取合作行为的强化学习方法,该方法采用基于瞬时奖励的Q-学习完成单个机器人的学习,并利用人工势场法的思想确定不同机器人的学习顺序,在此基础上采用交替学习来完成多机器人的学习过程。试验结果表明所提方法的可行性和有效性。  相似文献   

2.
Lately, development in robotics for utilizing in both industry and home is in much progress. In this research, a group of robots is made to handle relatively complicated tasks. Cooperative action among robots is one of the research areas in robotics that is progressing remarkably well. Reinforcement learning is known as a common approach in robotics for deploying acquisition of action under dynamic environment. However, until recently, reinforcement learning is only applied to one agent problem. In multi-agent environment where plural robots exist, it was difficult to differentiate between learning of achievement of task and learning of performing cooperative action. This paper introduces a method of implementing reinforcement learning to induce cooperation among a group of robots where its task is to transport luggage of various weights to a destination. The general Q-learning method is used as a learning algorithm. Also, the switching of learning mode is proposed for reduction of learning time and learning area. Finally, grid world simulation is carried out to evaluate the proposed methods.  相似文献   

3.
基于特定角色上下文的多智能体Q学习   总被引:1,自引:0,他引:1  
One of the main problems in cooperative multiagent learning is that the joint action space grows exponentially with the number of agents. In this paper, we investigate a sparse representation of the coordination dependencies between agents to employ roles and context-specific coordination graphs to reduce the joint action space. In our framework, the global joint Q-function is decomposed into a number of local Q-functions. Each local Q-function is shared among a small group of agents and is composed of a set of value rules. We propose a novel multiagent Q-learning algorithm which learns the weights in each value rule automatically. We give empirical evidence to show that our learning algorithm converges to the same optimal policy with a significantly faster speed than traditional multiagent learning techniques.  相似文献   

4.
This paper describes an adaptive task assignment method for a team of fully distributed mobile robots with initially identical functionalities in unknown task environments. A hierarchical assignment architecture is established for each individual robot. In the higher hierarchy, we employ a simple self-reinforcement learning model inspired by the behavior of social insects to differentiate the initially identical robots into “specialists” of different task types, resulting in stable and flexible division of labor; on the other hand, in dealing with the cooperation problem of the robots engaged in the same type of task, Ant System algorithm is adopted to organize low-level task assignment. To avoid using a centralized component, a “local blackboard” communication mechanism is utilized for knowledge sharing. The proposed method allows the robot team members to adapt themselves to the unknown dynamic environments, respond flexibly to the environmental perturbations and robustly to the modifications in the team arising from mechanical failure. The effectiveness of the presented method is validated in two different task domains: a cooperative concurrent foraging task and a cooperative collection task.  相似文献   

5.
搬运系统作业分配问题的小脑模型关节控制器Q学习算法   总被引:1,自引:1,他引:0  
研究两机器人高速搬运系统的作业分配问题. 在系统的Markov决策过程(MDP)模型中, 状态变量具有连续取值和离散取值的混杂性, 状态空间复杂且存在“维数灾”问题, 传统的数值优化难以进行. 根据小脑模型关节控制器(CMAC)具有收敛速度快和适应性强的特点, 运用该结构作为Q值函数的逼近器, 并与Q学习和性能势概念相结合, 给出了一种适用于平均或折扣性能准则的CMAC-Q学习优化算法. 仿真结果说明, 这种神经元动态规划方法比常规的Q学习算法具有节省存储空间, 优化精度高和优化速度快的优势.  相似文献   

6.
针对多Agent协作强化学习中存在的行为和状态维数灾问题,以及行为选择上存在多个均衡解,为了收敛到最佳均衡解需要搜索策略空间和协调策略选择问题,提出了一种新颖的基于量子理论和蚁群算法的多Agent协作学习算法。新算法首先借签了量子计算理论,将多Agent的行为和状态空间通过量子叠加态表示,利用量子纠缠态来协调策略选择,利用概率振幅进行动作探索,加快学习速度。其次,根据蚁群算法,提出“脚印”思想来间接增强Agent之间的交互。最后,对新算法的理论分析和实验结果都证明了改进的Q学习是可行的,并且可以有效地提高学习效率。  相似文献   

7.
移动机器人的时间最优编队   总被引:4,自引:0,他引:4  
针对移动机器人的最速编队问题,结合路径规划和任务分解,提出一种分派问题的新解法和时间最优的编队策略。该策略充分考虑了障碍物环境约束和各机器人运动时的相互影响,通过将系统整体路径规划的复杂问题分解为独立路径规划问题和冲突协调问题来分别求解,降低了计算的复杂性,并能了快编队。  相似文献   

8.
《Advanced Robotics》2013,27(8):815-832
A group of cooperative and homogeneous Q-learning agents can cooperate to learn faster and gain more knowledge. In order to do so, each learner agent must be able to evaluate the expertness and the intelligence level of the other agents, and to assess the knowledge and the information it gets from them. In addition, the learner needs a suitable method to properly combine its own knowledge and what it gains from the other agents according to their relative expertness. In this paper, some expertness measuring criteria are introduced. Also, a new cooperative learning method called weighted strategy sharing (WSS) is introduced. In WSS, based on the amount of its teammate expertness, each agent assigns a weight to their knowledge and utilizes it accordingly. WSS and the expertness criteria are tested on two simulated hunter–prey and object-pushing systems.  相似文献   

9.
We propose a self-generating algorithm of behavioral evaluation that is important for a learning function in order to develop appropriate cooperative behavior among robots depending on the situation. The behavioral evaluation is composed of rewards and a consumption of energy. Rewards are provided by an operator when the robots share tasks appropriately, and the consumption of energy is measured during the execution of the tasks. Each robot estimates rules of behavior selection by using the evaluation generated, and learns to select an appropriate behavior when it meets the same situation. As a result, the robots may be able to share tasks efficiently even if the purpose of their task is changed by an operator in the middle of execution, because the evaluation is modified depending on the situation. We performed simulations to study the effectiveness of the proposed algorithm. In the simulations, we applied the algorithm to three robots, each with three behaviors. We confirmed that each robot can generate an appropriate behavioral evaluation based on rewards from an operator, and therefore they develop cooperative behaviors such as task sharing. This work was presented, in part, at the Second International Symposium on Artificial Life and Robotics, Oita, Japan, February 18–20, 1997  相似文献   

10.
Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning   总被引:1,自引:0,他引:1  
This paper presents a dynamic fuzzy Q-learning (DFQL) method that is capable of tuning fuzzy inference systems (FIS) online. A novel online self-organizing learning algorithm is developed so that structure and parameters identification are accomplished automatically and simultaneously based only on Q-learning. Self-organizing fuzzy inference is introduced to calculate actions and Q-functions so as to enable us to deal with continuous-valued states and actions. Fuzzy rules provide a natural mean of incorporating the bias components for rapid reinforcement learning. Experimental results and comparative studies with the fuzzy Q-learning (FQL) and continuous-action Q-learning in the wall-following task of mobile robots demonstrate that the proposed DFQL method is superior.  相似文献   

11.
多Agent协作追捕问题是多Agent协调与协作研究中的一个典型问题。针对具有学习能力的单逃跑者追捕问题,提出了一种基于博弈论及Q学习的多Agent协作追捕算法。首先,建立协作追捕团队,并构建协作追捕的博弈模型;其次,通过对逃跑者策略选择的学习,建立逃跑者有限的Step-T累积奖赏的运动轨迹,并把运动轨迹调整到追捕者的策略集中;最后,求解协作追捕博弈得到Nash均衡解,每个Agent执行均衡策略完成追捕任务。同时,针对在求解中可能存在多个均衡解的问题,加入了虚拟行动行为选择算法来选择最优的均衡策略。C#仿真实验表明,所提算法能够有效地解决障碍环境中单个具有学习能力的逃跑者的追捕问题,实验数据对比分析表明该算法在同等条件下的追捕效率要优于纯博弈或纯学习的追捕算法。  相似文献   

12.
多任务联盟形成中的Agent行为策略研究   总被引:2,自引:0,他引:2  
Agent联盟是多Agent系统中一种重要的合作方式,联盟形成是其研究的关键问题.本文提出一种串行多任务联盟形成中的Agent行为策略,首先论证了Agent合作求解多任务的过程是一个Markov决策过程,然后基于Q-学习求解单个Agent的最优行为策略.实例表明该策略在面向多任务的领域中可以快速、有效地串行形成多个任务求解联盟.  相似文献   

13.
车辆驻站是减少串车现象和改善公交服务可靠性的常用且有效控制策略,其执行过程需要在随机交互的系统环境中进行动态决策。考虑实时公交运营信息的可获得性,研究智能体完全合作环境下公交车辆驻站增强学习控制问题,建立基于多智能体系统的单线公交控制概念模型,描述学习框架下包括智能体状态、动作集、收益函数、协调机制等主要元素,采用hysteretic Q-learning算法求解问题。仿真实验结果表明该方法能有效防止串车现象并保持单线公交服务系统车头时距的均衡性。  相似文献   

14.
Reinforcement learning is one of the more prominent machine-learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system so the speed of learning can be accelerated. In the proposed learning algorithm, an agent adapts to comply with its peers by learning carefully when it obtains a positive reinforcement feedback signal, but should learn more aggressively if a negative reward follows the action just taken. These two properties are applied to develop the proposed cooperative learning method. This research presents the novel use of the fastest policy hill-climbing methods of Win or Lose Fast (WoLF) with policy-sharing. Results from the multi-agent cooperative domain illustrate that the proposed algorithms perform better than Q-learning alone in a piano mover environment. It also demonstrates that agents can learn to accomplish a task together efficiently through repetitive trials.  相似文献   

15.
模糊Q学习的足球机器人双层协作模型   总被引:1,自引:0,他引:1  
针对传统的足球机器人3层决策模型存在决策不连贯的问题和缺乏适应性与学习能力的缺点,提出了一种基于模糊Q学习的足球机器人双层协作模型.该模型使协调决策和机器人运动成为2个功能独立的层次,使群体意图到个体行为的过度变为一个直接的过程,并在协调层通过采用Q学习算法在线学习不同状态下的最优策略,增强了决策系统的适应性和学习能力.在Q学习中通过把状态繁多的系统状态映射为为数不多的模糊状态,大大减少了状态空间的大小,避免了传统Q学习在状态空间和动作空间较大的情况下收敛速度慢,甚至不能收敛的缺点,提高了Q学习算法的收敛速度.最后,通过在足球机器人SimuroSot仿真比赛平台上进行实验,验证了双层协作模型的有效性.  相似文献   

16.
Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q.  相似文献   

17.
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the “curse of dimensionality” issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network; such a process is called experience replay. Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.   相似文献   

18.
任燚  陈宗海 《控制与决策》2006,21(4):430-434
多机器人系统中,随着机器人数目的增加.系统中的冲突呈指数级增加.甚至出现死锁.本文提出了基于过程奖赏和优先扫除的强化学习算法作为多机器人系统的冲突消解策略.针对典型的多机器人可识别群体觅食任务.以计算机仿真为手段,以收集的目标物数量为系统性能指标,以算法收敛时学习次数为学习速度指标,进行仿真研究,并与基于全局奖赏和Q学习算法等其他9种算法进行比较.结果表明所提出的基于过程奖赏和优先扫除的强化学习算法能显著减少冲突.避免死锁.提高系统整体性能.  相似文献   

19.
近年来,传统仓储系统已满足不了日益增长的订单需求并已渐渐向智能仓储转变。针对智能仓储中移动机器人的调度问题,以移动机器人执行任务时的转向次数、路程代价、最大任务等待时间为优化目标,提出一种兼顾任务分配和路径规划的调度算法。算法采用遗传算法进行任务分配,同时以多个移动机器人为目标进行任务分配,保证每个机器人分配到的任务没有重复。然后采用Q-learning算法对机器人分配到的任务进行路径规划,根据转向次数和路程代价约束路径,对于路径转向和每一步可行的动作均设有惩罚值,最终形成一条转向次数少、行程较短的路径。通过将该算法与其他算法进行对比,证实了该算法的有效性。  相似文献   

20.
Patrolling indoor infrastructures with a team of cooperative mobile robots is a challenging task, which requires effective multi-agent coordination. Deterministic patrol circuits for multiple mobile robots have become popular due to their exceeding performance. However their predefined nature does not allow the system to react to changes in the system’s conditions or adapt to unexpected situations such as robot failures, thus requiring recovery behaviors in such cases. In this article, a probabilistic multi-robot patrolling strategy is proposed. A team of concurrent learning agents adapt their moves to the state of the system at the time, using Bayesian decision rules and distributed intelligence. When patrolling a given site, each agent evaluates the context and adopts a reward-based learning technique that influences future moves. Extensive results obtained in simulation and real world experiments in a large indoor environment show the potential of the approach, presenting superior results to several state of the art strategies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号