首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
多智能体强化学习及其在足球机器人角色分配中的应用   总被引:2,自引:0,他引:2  
足球机器人系统是一个典型的多智能体系统, 每个机器人球员选择动作不仅与自身的状态有关, 还要受到其他球员的影响, 因此通过强化学习来实现足球机器人决策策略需要采用组合状态和组合动作. 本文研究了基于智能体动作预测的多智能体强化学习算法, 使用朴素贝叶斯分类器来预测其他智能体的动作. 并引入策略共享机制来交换多智能体所学习的策略, 以提高多智能体强化学习的速度. 最后, 研究了所提出的方法在足球机器人动态角色分配中的应用, 实现了多机器人的分工和协作.  相似文献   

2.
In this paper, we first discuss the meaning of physical embodiment and the complexity of the environment in the context of multi-agent learning. We then propose a vision-based reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup (Kitano et al., 1997) to illustrate the effectiveness of our method. Each agent works with other team members to achieve a common goal against opponents. Our method estimates the relationships between a learner's behaviors and those of other agents in the environment through interactions (observations and actions) using a technique from system identification. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis to clarify the relationship between the observed data in terms of actions and future observations. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior policy. The proposed method is applied to a soccer playing situation. The method successfully models a rolling ball and other moving agents and acquires the learner's behaviors. Computer simulations and real experiments are shown and a discussion is given.  相似文献   

3.
刘健  顾扬  程玉虎  王雪松 《自动化学报》2022,48(5):1246-1258
通过分析基因突变过程, 提出利用强化学习对癌症患者由正常状态至患病状态的过程进行推断, 发现导致患者死亡的关键基因突变. 首先, 将基因视为智能体, 基于乳腺癌突变数据设计多智能体强化学习环境; 其次, 为保证智能体探索到与专家策略相同的策略和满足更多智能体快速学习, 根据演示学习理论, 分别提出两种多智能体深度Q网络: 基于行为克隆的多智能体深度Q网络和基于预训练记忆的多智能体深度Q网络; 最后, 根据训练得到的多智能体深度Q网络进行基因排序, 实现致病基因预测. 实验结果表明, 提出的多智能体强化学习方法能够挖掘出与乳腺癌发生、发展过程密切相关的致病基因.  相似文献   

4.
In this paper, a multi-agent reinforcement learning method based on action prediction of other agent is proposed. In a multi-agent system, action selection of the learning agent is unavoidably impacted by other agents’ actions. Therefore, joint-state and joint-action are involved in the multi-agent reinforcement learning system. A novel agent action prediction method based on the probabilistic neural network (PNN) is proposed. PNN is used to predict the actions of other agents. Furthermore, the sharing policy mechanism is used to exchange the learning policy of multiple agents, the aim of which is to speed up the learning. Finally, the application of presented method to robot soccer is studied. Through learning, robot players can master the mapping policy from the state information to the action space. Moreover, multiple robots coordination and cooperation are well realized.  相似文献   

5.
Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual agents and enable policy sharing across agents. Our complexity analysis indicates that multi-agent systems with the BTF have a much smaller state space and a higher level of flexibility, compared with the general form of n-ary (n > 2) tree formation. We have applied the proposed cooperative learning strategy to a class of reinforcement learning agents known as temporal difference-fusion architecture for learning and cognition (TD-FALCON). Comparative experiments based on a generic network routing problem, which is a typical TMAS domain, show that the TD-FALCON BTF teams outperform alternative methods, including TD-FALCON teams in single agent and n-ary tree formation, a Q-learning method based on the table lookup mechanism, as well as a classical linear programming algorithm. Our study further shows that TD-FALCON BTF can adapt and function well under various scales of network complexity and traffic volume in TMAS domains.  相似文献   

6.
多智能体强化学习方法在仿真模拟、游戏对抗、推荐系统等许多方面取得了突出的进展。然而,现实世界的复杂问题使得强化学习方法存在无效探索多、训练速度慢、学习能力难以持续提升等问题。该研究嵌入规则的多智能体强化学习技术,提出基于组合训练的规则与学习结合的方式,分别设计融合规则的多智能体强化学习模型与规则选择模型,通过组合训练将两者有机结合,能够根据当前态势决定使用强化学习决策还是使用规则决策,有效解决在学习中使用哪些规则以及规则使用时机的问题。依托中国电子科技集团发布的多智能体对抗平台,对提出的方法进行实验分析和验证。通过与内置对手对抗,嵌入规则的方法经过约1.4万局训练就收敛到60%的胜率,而没有嵌入规则的算法需要约1.7万局的时候收敛到50%的胜率,结果表明嵌入规则的方法能够有效提升学习的收敛速度和最终效果。  相似文献   

7.
随机博弈框架下的多agent强化学习方法综述   总被引:4,自引:0,他引:4  
宋梅萍  顾国昌  张国印 《控制与决策》2005,20(10):1081-1090
多agent学习是在随机博弈的框架下,研究多个智能体间通过自学习掌握交互技巧的问题.单agent强化学习方法研究的成功,对策论本身牢固的数学基础以及在复杂任务环境中广阔的应用前景,使得多agent强化学习成为目前机器学习研究领域的一个重要课题.首先介绍了多agent系统随机博弈中基本概念的形式定义;然后介绍了随机博弈和重复博弈中学习算法的研究以及其他相关工作;最后结合近年来的发展,综述了多agent学习在电子商务、机器人以及军事等方面的应用研究,并介绍了仍存在的问题和未来的研究方向.  相似文献   

8.
Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q.  相似文献   

9.
Cooperative Multi-Agent Learning: The State of the Art   总被引:1,自引:4,他引:1  
Cooperative multi-agent systems (MAS) are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to MAS problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning, RL or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources.  相似文献   

10.
《Automatica》2014,50(12):3038-3053
This paper introduces a new class of multi-agent discrete-time dynamic games, known in the literature as dynamic graphical games. For that reason a local performance index is defined for each agent that depends only on the local information available to each agent. Nash equilibrium policies and best-response policies are given in terms of the solutions to the discrete-time coupled Hamilton–Jacobi equations. Since in these games the interactions between the agents are prescribed by a communication graph structure we have to introduce a new notion of Nash equilibrium. It is proved that this notion holds if all agents are in Nash equilibrium and the graph is strongly connected. A novel reinforcement learning value iteration algorithm is given to solve the dynamic graphical games in an online manner along with its proof of convergence. The policies of the agents form a Nash equilibrium when all the agents in the neighborhood update their policies, and a best response outcome when the agents in the neighborhood are kept constant. The paper brings together discrete Hamiltonian mechanics, distributed multi-agent control, optimal control theory, and game theory to formulate and solve these multi-agent dynamic graphical games. A simulation example shows the effectiveness of the proposed approach in a leader-synchronization case along with optimality guarantees.  相似文献   

11.
多智能体深度强化学习方法可应用于真实世界中需要多方协作的场景,是强化学习领域内的研究热点。在多目标多智能体合作场景中,各智能体之间具有复杂的合作与竞争并存的混合关系,在这些场景中应用多智能体强化学习方法时,其性能取决于该方法是否能够充分地衡量各智能体之间的关系、区分合作和竞争动作,同时也需要解决高维数据的处理以及算法效率等应用难点。针对多目标多智能体合作场景,在QMIX模型的基础上提出一种基于目标的值分解深度强化学习方法,并使用注意力机制衡量智能体之间的群体影响力,利用智能体的目标信息实现量两阶段的值分解,提升对复杂智能体关系的刻画能力,从而提高强化学习方法在多目标多智能体合作场景中的性能。实验结果表明,相比QMIX模型,该方法在星际争霸2微观操控平台上的得分与其持平,在棋盘游戏中得分平均高出4.9分,在多粒子运动环境merge和cross中得分分别平均高出25分和280.4分,且相较于主流深度强化学习方法也具有更高的得分与更好的性能表现。  相似文献   

12.
Reinforcement learning is one of the more prominent machine-learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system so the speed of learning can be accelerated. In the proposed learning algorithm, an agent adapts to comply with its peers by learning carefully when it obtains a positive reinforcement feedback signal, but should learn more aggressively if a negative reward follows the action just taken. These two properties are applied to develop the proposed cooperative learning method. This research presents the novel use of the fastest policy hill-climbing methods of Win or Lose Fast (WoLF) with policy-sharing. Results from the multi-agent cooperative domain illustrate that the proposed algorithms perform better than Q-learning alone in a piano mover environment. It also demonstrates that agents can learn to accomplish a task together efficiently through repetitive trials.  相似文献   

13.
Individual learning in an environment where more than one agent exist is a chal-lengingtask. In this paper, a single learning agent situated in an environment where multipleagents exist is modeled based on reinforcement learning. The environment is non-stationaryand partially accessible from an agents' point of view. Therefore, learning activities of anagent is influenced by actions of other cooperative or competitive agents in the environment.A prey-hunter capture game that has the above characteristics is defined and experimentedto simulate the learning process of individual agents. Experimental results show that thereare no strict rules for reinforcement learning. We suggest two new methods to improve theperformance of agents. These methods decrease the number of states while keeping as muchstate as necessary.  相似文献   

14.
一种新的多智能体Q学习算法   总被引:2,自引:0,他引:2  
郭锐  吴敏  彭军  彭姣  曹卫华 《自动化学报》2007,33(4):367-372
针对非确定马尔可夫环境下的多智能体系统,提出了一种新的多智能体Q学习算法.算法中通过对联合动作的统计来学习其它智能体的行为策略,并利用智能体策略向量的全概率分布保证了对联合最优动作的选择. 同时对算法的收敛性和学习性能进行了分析.该算法在多智能体系统RoboCup中的应用进一步表明了算法的有效性与泛化能力.  相似文献   

15.
In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.  相似文献   

16.
《Advanced Robotics》2013,27(12):1379-1395
The existing reinforcement learning methods have been seriously suffering from the curse of the dimension problem, especially when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space variations. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive behavior against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multilearning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level and the macro actions are used to reduce the size of the physical action space. Second, the state of the other, to what extent it is close to its own goal, is estimated by observation and used as a state variable in the top layer state space to realize the cooperative/competitive behavior. The method is applied to a four (defense team)-on-five (offense team) game task and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within much shorter learning time.  相似文献   

17.
使用深度强化学习解决单智能体任务已经取得了突破性的进展。由于多智能体系统的复杂性,普通算法无法解决其主要难点。同时,由于智能体数量增加,将最大化单个智能体的累积回报的期望值作为学习目标往往无法收敛,某些特殊的收敛点也不满足策略的合理性。对于不存在最优解的实际问题,强化学习算法更是束手无策,将博弈理论引入强化学习可以很好地解决智能体的相互关系,可以解释收敛点对应策略的合理性,更重要的是可以用均衡解来替代最优解以求得相对有效的策略。因此,从博弈论的角度梳理近年来出现的强化学习算法,总结当前博弈强化学习算法的重难点,并给出可能解决上述重难点的几个突破方向。  相似文献   

18.
In multi-agent reinforcement learning, transfer learning is one of the key techniques used to speed up learning performance through the exchange of knowledge among agents. However, there are three challenges associated with applying this technique to real-world problems. First, most real-world domains are partially rather than fully observable. Second, it is difficult to pre-collect knowledge in unknown domains. Third, negative transfer impedes the learning progress. We observe that differentially private mechanisms can overcome these challenges due to their randomization property. Therefore, we propose a novel differential transfer learning method for multi-agent reinforcement learning problems, characterized by the following three key features. First, our method allows agents to implement real-time knowledge transfers between each other in partially observable domains. Second, our method eliminates the constraints on the relevance of transferred knowledge, which expands the knowledge set to a large extent. Third, our method improves robustness to negative transfers by applying differentially exponential noise and relevance weights to transferred knowledge. The proposed method is the first to use the randomization property of differential privacy to stimulate the learning performance in multi-agent reinforcement learning system. We further implement extensive experiments to demonstrate the effectiveness of our proposed method.  相似文献   

19.

The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid traction, and the latest accomplishments address problems with real-world complexity. This article provides an overview of the current developments in the field of multi-agent deep reinforcement learning. We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario. To survey the works that constitute the contemporary landscape, the main contents are divided into three parts. First, we analyze the structure of training schemes that are applied to train multiple agents. Second, we consider the emergent patterns of agent behavior in cooperative, competitive and mixed scenarios. Third, we systematically enumerate challenges that exclusively arise in the multi-agent domain and review methods that are leveraged to cope with these challenges. To conclude this survey, we discuss advances, identify trends, and outline possible directions for future work in this research area.

  相似文献   

20.
基于后悔值的多Agent冲突博弈强化学习模型   总被引:1,自引:0,他引:1  
肖正  张世永 《软件学报》2008,19(11):2957-2967
对于冲突博弈,研究了一种理性保守的行为选择方法,即最小化最坏情况下Agent的后悔值.在该方法下,Agent当前的行为策略在未来可能造成的损失最小,并且在没有任何其他Agent信息的条件下,能够得到Nash均衡混合策略.基于后悔值提出了多Agent复杂环境下冲突博弈的强化学习模型以及算法实现.该模型中通过引入交叉熵距离建立信念更新过程,进一步优化了冲突博弈时的行为选择策略.基于Markov重复博弈模型验证了算法的收敛性,分析了信念与最优策略的关系.此外,与MMDP(multi-agent markov decision process)下Q学习扩展算法相比,该算法在很大程度上减少了冲突发生的次数,增强了Agent行为的协调性,并且提高了系统的性能,有利于维持系统的稳定.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号