期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴士泓李德华潘莹《计算机工程与应用》2010,46(17):8-10

将多Agent协作学习过程看作是一个个的阶段博弈,针对博弈中存在多个均衡解的问题,提出一种集体理性约束下的多Agent协作强化学习算法。该算法使得系统中的每个Agent均按照集体利益最大化的集体理性原则进行行为选择,从而解决均衡解一致问题,同时使得集体长期回报值最大化,加快了学习速度。在集体理性的基础上通过评价各Agent对整体任务求解的贡献度,解决信度分配问题。追捕问题的仿真实验结果验证了算法的有效性。相似文献

2.

基于量子计算的多Agent协作学习算法 总被引：1，自引：0，他引：1

谭万禹王建忠孟祥萍《计算机工程与应用》2008,44(26):62-64

针对多Agent协作强化学习中存在的行为和状态维数灾问题,以及行为选择上存在多个均衡解,为了收敛到最佳均衡解需要搜索策略空间和协调策略选择问题,提出了一种新颖的基于量子理论的多Agent协作学习算法。新算法借签了量子计算理论,将多Agent的行为和状态空间通过量子叠加态表示,利用量子纠缠态来协调策略选择,利用概率振幅表示行为选择概率,并用量子搜索算法来加速多Agent的学习。相应的仿真实验结果显示新算法的有效性。相似文献

3.

基于联合博弈的多Agent学习

黄付亮张荣国陈大川刘焜《计算机与数字工程》2011,39(6):21-24

在研究Q-Learning算法的基础上,将博弈论中的团队协作理论引入到强化学习中,提出了一种基于联合博弈的多Agent学习算法。该算法通过建立多个阶段博弈,根据回报矩阵对阶段博弈的结果进行评估,为其提供一种有效的A-gent行为决策策略,使每个Agent通过最优均衡解或观察协作Agent的历史动作和自身当前情况来预测其所要执行的动作。对任务调度问题进行仿真实验,验证了该算法的收敛性。相似文献

4.

基于量子理论及蚁群算法的多Agent Q学习

下载免费PDF全文

孟祥萍王圣镔《计算机工程与应用》2010,46(21):43-46

针对多Agent协作强化学习中存在的行为和状态维数灾问题,以及行为选择上存在多个均衡解,为了收敛到最佳均衡解需要搜索策略空间和协调策略选择问题,提出了一种新颖的基于量子理论和蚁群算法的多Agent协作学习算法。新算法首先借签了量子计算理论,将多Agent的行为和状态空间通过量子叠加态表示,利用量子纠缠态来协调策略选择,利用概率振幅进行动作探索,加快学习速度。其次,根据蚁群算法,提出“脚印”思想来间接增强Agent之间的交互。最后,对新算法的理论分析和实验结果都证明了改进的Q学习是可行的,并且可以有效地提高学习效率。相似文献

5.

基于随机博弈的Agent协同强化学习方法 总被引：3，自引：0，他引：3

下载免费PDF全文

王长缨尹晓虎鲍翊平姚莉《计算机工程与科学》2006,28(2):107-110

本文针对一类追求系统得益最大化的协作团队的学习问题，基于随机博弈的思想，提出了一种新的多Agent协同强化学习方法。协作团队中的每个Agent通过观察协作相识者的历史行为，依照随机博弈模型预测其行为策略，进而得出最优的联合行为策略。相似文献

6.

一种新的多目标优化策略机制及其应用

柴玉梅张靖《计算机应用》2007,27(9):2287-2289

在博弈问题中很多学习机制只能使Agent收敛到Nash均衡解，不能很好地满足实际需要。将博弈问题转化为多目标优化问题，提出了一种新的多目标优化策略机制——保留受控策略机制，并将其应用到囚徒困境问题中得到比Nash均衡更有意义的Pareto最优解，在自博弈实验中取得了较高的满意度。实验结果表明,该策略机制求解Pareto最优解的有效性。相似文献

7.

混合多Agent环境下动态策略强化学习算法

肖正何青松张世永《小型微型计算机系统》2009,30(7)

机器学习在多Agent系统的协作和行为决策中得到广泛关注和深入研究.分析基于均衡解和最佳响应的学习算法,提出了两个混合多Agent环境下动态策略的强化学习算法.该算法不仅能适应系统中其他Agent的行为策略和变化,而且能利用过去的行为历史制定更为准确的时间相关的行为策略.基于两个知名零和博弈,验证了该算法的收敛性和理性,在与最佳响应Agent的重复博弈中能获得更高的收益. 相似文献

8.

一种面向多Agent交互的博弈Nash均衡求解方法

李劲岳昆刘惟一《计算机科学》2007,34(3):181-185

现有的图型博弈Nash均衡求解方法基本是在离散化剖面空间中搜索求解，最终只能得到近似Nash均衡。针对现有求解方法存在的不足，把求解图型博弈的Nash均衡看作是连续策略空间中的函数优化问题，定义Agents在策略剖面中的效用偏离度之和为优化目标，其最优解就是博弈的Nash均衡。本文基于对实例的分析指出目标函数下降梯度的计算可归结为一组线性规划，进而提出一种求解图型博弈Nash均衡的新型梯度下降算法。算法分析及实验研究表明，对于多Agent交互模型中的相关问题，本文提出的方法可求解任意图结构图型博弈Nash均衡，对于大规模图型博弈也有较好的求解精度和求解效率。相似文献

9.

基于多领导者Stackelberg博弈的分层联邦学习激励机制设计

耿方兴李卓陈昕《计算机应用》2023,(11):3551-3558

分层联邦学习中隐私安全与资源消耗等问题的存在降低了参与者的积极性。为鼓励足够多的参与者积极参与学习任务，并针对多移动设备与多边缘服务器之间的决策问题，提出基于多领导者Stackelberg博弈的激励机制。首先，通过量化移动设备的成本效用与边缘服务器的支付报酬，构建效用函数并定义最优化问题；其次，将移动设备之间的交互建模为演化博弈，将边缘服务器之间的交互建模为非合作博弈。为求解最优边缘服务器选择和定价策略，提出多轮迭代边缘服务器选择算法（MIES）和梯度迭代定价算法（GIPA），前者用于求解移动设备之间的演化博弈均衡解，后者用于求解边缘服务器之间的定价竞争问题。实验结果表明，所提算法GIPA与最优定价预测策略（OPPS）、历史最优定价策略（HOPS）和随机定价策略（RPS）相比，可使边缘服务器的平均效用分别提高4.06%、10.08%和31.39%。相似文献

10.

MAS动态协作任务求解模型与算法

蒋伟进骆菲史德嘉《智能系统学报》2010,5(2):161-168

针对网格环境的自治性、动态性、分布性和异构性等特征.提出基于多智能体系统(mutil agent system, MAS) 博弈协作的资源动态分配和任务调度模型,建立了能够反映供求关系的网格资源调度动态任务求解算法,证明了资源分配博弈中Nash均衡点的存在性、惟一性和Nash均衡解.该方法能够利用消费者Agent的学习和协商能力,引入消费者的心理行为,使消费者的资源申请和任务调度具有较高的合理性和有效性.实验结果表明,该方法在响应时间的平滑性、吞吐率及任务求解效率方面比传统算法要好,从而使得整个资源供需合理、满足用户QoS要求. 相似文献

11.

A two-layered multi-agent reinforcement learning model and algorithm

Ben-Nian Wang Yang Gao Zhao-Qian Chen Jun-Yuan Xie Shi-Fu Chen 《Journal of Network and Computer Applications》2007,30(4):1366-1376

Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q. 相似文献

12.

多智能体博弈强化学习研究综述

下载免费PDF全文

王军曹雷陈希亮赖俊章乐贵《计算机工程与应用》2021,57(21):1-13

使用深度强化学习解决单智能体任务已经取得了突破性的进展。由于多智能体系统的复杂性,普通算法无法解决其主要难点。同时,由于智能体数量增加,将最大化单个智能体的累积回报的期望值作为学习目标往往无法收敛,某些特殊的收敛点也不满足策略的合理性。对于不存在最优解的实际问题,强化学习算法更是束手无策,将博弈理论引入强化学习可以很好地解决智能体的相互关系,可以解释收敛点对应策略的合理性,更重要的是可以用均衡解来替代最优解以求得相对有效的策略。因此,从博弈论的角度梳理近年来出现的强化学习算法,总结当前博弈强化学习算法的重难点,并给出可能解决上述重难点的几个突破方向。相似文献

13.

基于主动风险防御机制的多机器人强化学习协同对抗策略

下载免费PDF全文

孙辉辉胡春鹤张军国《控制与决策》2023,38(5):1420-1429

深度强化学习因其在多机器人系统中的高效表现,已经成为多机器人领域的研究热点.然而,当遭遇连续时变、风险未知的非结构场景时,传统方法暴露出风险防御能力差、系统安全性能脆弱的问题,未知风险将以对抗攻击的形式给多机器人的状态空间带来非线性入侵.针对这一问题,提出一种基于主动风险防御机制的多机器人强化学习方法(APMARL).首先,基于局部可观察马尔可夫博弈模型,建立多机记忆池共享的风险判别机制,通过构建风险状态指数提前预测当前行为的安全性,并根据风险预测结果自适应执行与之匹配的风险处理模式;特别地,针对有风险侵入的非安全状态,提出基于增强型注意力机制的Actor-Critic主动防御网络架构,实现对重点信息的分级增强和危险信息的有效防御.最后,通过广泛的多机协作对抗任务实验表明,具有主动风险防御机制的强化学习策略可以有效降低敌对信息的入侵风险,提高多机器人协同对抗任务的执行效率,增强策略的稳定性和安全性. 相似文献

14.

Concept learning games

Arman Didandeh Nima Mirbakhsh Mohsen Afsharchi 《Information Systems Frontiers》2013,15(4):653-676

In this paper, we intend to have a game theoretic study on the concept learning problem in a multi-agent system. Concept learning is a very essential and well-studied domain of machine learning when it is studied under the characteristics of a multi-agent system. The most important reasons are the partiality of the environment perception for any agent and also the communication holdbacks, resulting into a deep need for a collaborative protocol in favor of multi-agent transactions. Here we wish to investigate multi-agent concept learning with the help of its components, thoroughly with a game theoretic taste, esp. on the pre-learning processes. Based on two standard notations, we address the non-unanimity of concepts, classification of objects, voting and communicating protocol, and also the learning itself. In such a game of concept learning, we consider a group of agents, communicating and consulting to upgrade their ontologies based on their conceptualizations of the environment. For this purpose, we investigate the problem in two separate and standard distinctions of game theory study, cooperation and competition. Several solution concepts and innovative ideas from the multi-agent realm are used to produce an approach that contains the reasoning process of the agents in this system. Some experimentations come at the end to show the functionality of our approach. These experimentations come distinctly for both cooperative and competitive views. 相似文献

15.

强化学习中异构反馈信号的分析与集成

余雪丽李志周昌能崔倩胡坤《计算机科学与探索》2012,6(4):366-376

探讨了在高度危险行业的游戏式专业救援培训系统中,视觉与听觉信号能否协同作用以提高人们的记忆和推理能力问题;运用半马尔科夫博弈模型(semi-Markov game,SMG)提出了合作型多agent分层强化学习框架和算法,构建了由视觉处理agent、听觉处理agent以及人类agent组成的异构异质多agent系统;指出分析和归纳视觉听觉相干反馈信号的性质和特点是非常具有挑战性的任务,其决定了强化学习中异构信号的集成方法和途径。在此基础上,提出了将异构反馈信号进行集成的偏信息学习算法,大大缩小了状态搜索空间,缓解了强化学习固有的"维数灾难"问题;根据心理治疗的"系统脱敏"原理,设计了"情绪-个性-刺激-调节"(mood-personality-stimulus-regulation,MPSR)模型和恐怖场景个性化呈现算法(personalized rendering algorithm for terrorist scene,PRATS),用于提升救援队员的心理承受能力,并通过实验验证了算法的有效性。相似文献

16.

基于势博弈的认知全双工中继选择策略研究

李召义刘占军薛亚茹刘红霞《计算机工程与科学》2019,41(2):286-292

在对主用户干扰功率限制、自干扰限制和总功率干扰限制的网络中,针对认知中继选择算法复杂度较高的问题,提出基于势博弈理论的认知全双工协作网络下中继选择策略。认知中继选择问题被建模为使用认知协作网络的系统速率作为共同效用函数的势博弈模型,并分析出在没有不可行策略集信息的前提下,所提的博弈可以保证纯策略纳什均衡(NE)的存在性和可行性条件。在此基础上,给出全双工中继选择迭代算法,并对算法的复杂度进行讨论。仿真分析表明,所提算法在较低复杂度的情况下,能够获得最优或者接近最优速率的性能,并与传统的半双工中继模式相比,性能也有明显提升。相似文献

17.

Key Technologies of Confrontational Intelligent Decision Support for Multi-Agent Systems

Yun Zhang 《Automatic Control and Computer Sciences》2018,52(4):283-290

This paper firstly studies intelligent learning techniques based on reinforcement learning theory. It proposes an improved multi-agent cooperative learning method that can be shared through continuous learning and the strategies of individual agents to achieve the integration of multi-agent strategy and learning in order to improve the capabilities of intelligent multi-agent systems. Secondly, according to the analysis of data mining and AHP theory, a new concept is proposed to build a data mining model (based on intelligent learning) that has been named ‘ACMC’ (AHP Construct Mining Component); designed ACMC strategy evaluation and assistant decision-making based on multiagent systems, to achieve a strategic assessment of the current situation and reach a final decision. Finally, after research on Intelligent Decision Technology based on game theory, aspects of game theory are employed to deal with the real demand of confrontational environments. 相似文献

18.

Multi-agent graphical games with input constraints: an online learning solution

Tianxiang WANG Bingchang WANG Yong LIANG 《控制理论与应用(英文版)》2020,18(2):148-159

This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints. In order to obtain the optimal strategy of each agent, it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman (HJB) equations. It is very difficult to solve HJB equations by the traditional method. The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained. In this paper, an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents. Actually, this algorithm is to find the optimal solution of Bellman equations online. This solution employs a distributed policy iteration process, using only the local information available to each agent. It can be proved that under certain conditions, when each agent updates its own strategy simultaneously, the whole multi-agent system will reach Nash equilibrium. In the process of algorithm implementation, for each agent, two layers of neural networks are used to fit the value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method. 相似文献