期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

汪志军赵艳彬程月华廖鹤姜斌《控制理论与应用》2019,36(12):2046-2053

以微小卫星集群实现小行星探测为背景,研究局部信息交互的空间目标观测任务构形调整.针对难以直接求取集群构形调整的全局最优解问题,利用通信协调图,将全局协调决策分解成多个局部求解问题,并引入强化学习机制实现求解.针对集群全局协调决策问题,通过设计基于Max-plus算法的全局协调决策算法来实现全局协作;针对单星局部优化问题,设计基于神经网络的局部Q学习算法来实现单星动作调整规划.仿真结果表明,本文所提的协作规划算法能自主有效地将集群调整至期望构形,实现协同观测任务. 相似文献

2.

Generalized multiagent learning with performance bound

Bikramjit Banerjee Jing Peng 《Autonomous Agents and Multi-Agent Systems》2007,15(3):281-312

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and “other” (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV _σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents—a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV _σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV _σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV _σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game. 相似文献

3.

A layered approach to learning coordination knowledge in multiagent environments

Guray Erus Faruk Polat 《Applied Intelligence》2007,27(3):249-267

Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals. Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL) method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical levels; in the first level agents learn to select their target and then they select the action directed to their target in the second level. The agents communicate their perception to their neighbors and use the communication information in their decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior. Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University, Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding. Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests include artificial intelligence, multi-agent systems and object oriented data models. 相似文献

4.

Multiagent learning is not the answer. It is the question

Peter Stone 《Artificial Intelligence》2007,171(7):402-405

The article by Shoham, Powers, and Grenager called “If multi-agent learning is the answer, what is the question?” does a great job of laying out the current state of the art and open issues at the intersection of game theory and artificial intelligence (AI). However, from the AI perspective, the term “multiagent learning” applies more broadly than can be usefully framed in game theoretic terms. In this larger context, how (and perhaps whether) multiagent learning can be usefully applied in complex domains is still a large open question. 相似文献

5.

基于虚拟关系知识图可自适应聚合的推荐算法

李源杨谋均《计算技术与自动化》2024,43(4):73-78

在信息爆炸的时代,推荐算法成为应对信息过载的有效手段.近年来,图神经网络(GNN)以其强大的建模能力和应对冷启动的优势被广泛应用于推荐算法.本文提出了一种基于深度强化学习与GNN-R的联合训练框架,解决GNN-R中固定层数和聚合策略的问题,通过间隔经验回放和延后奖励机制,优化了推荐模型的学习过程.在此基础上,提出了自适应优化GNN-R聚合层数和虚拟关系数量的两个优化算法,改进了 VRKG4Rec模型的性能.实验结果表明,两个优化算法对比VRKG4Rec模型都有较好的性能提升. 相似文献

6.

图着色问题的算法研究综述

宋家欢王晓峰胡思敏贾璟伟颜冬《计算机工程与应用》2024,60(18):66-77

图着色问题（graph coloring problem;GCP）是一个经典的组合优化问题;已广泛应用于数学、计算机科学和生物科学等多个领域。由于图着色问题的NP难特性;目前还没有多项式时间内的精确算法求解该问题;为了给出求解该问题的高效算法;需要对现有算法进行梳理。主要分为智能优化算法、启发式算法、强化学习算法等;从算法原理、改进思路、性能和精度等方面进行对比分析;归纳出算法的优缺点;并指出GCP的研究方向和算法设计路径;对于相关问题的研究有指导意义。相似文献

7.

多智能体系构架下的属性图分布式聚类算法

边宅安李慧嘉陈俊华马雨晗赵丹《计算机科学》2017,44(Z6):407-413

近年来属性图聚类受到了广泛关注,其目的是将属性图中的节点划分到若干簇中,使得每一个集群都有紧密的簇内结构和均匀的属性值。现有的理论主要是假设属性图中的节点或对象是为了协助优化某个给定的方程,而忽略了它们在现实生活中本身的属性。同时,一些开放性问题尚未得到有效解决,如异构信息集成、计算成本高等。为此,把属性图聚类问题理解为自身节点代理的集群形成博弈。为了有效地整合拓扑结构和属性信息,提出了基于紧密性和均匀性约束的节点代理策略选择。进一步证明了博弈过程将会收敛到弱帕累托纳什均衡。在实证方面,设计了一个分布式和异构的多智能体系统,给出了一个快速的分布式学习算法。该算法的主要特点是结果分区的重叠率可以由一个事先给定的阈值控制。最后,在现实社交网络上进行了模拟实验,并与目前先进方法进行比较,结果证实了所提算法的有效性。相似文献

8.

基于强化学习的知识图谱推理研究综述

刘世侠李卫军刘雪洋丁建平苏易礌李浩南《计算机应用研究》2024,41(9)

知识推理作为知识图谱补全中的一项重要任务,受到了学术界的广泛关注。为了提高模型的推理效果和可解释性,将强化学习与知识推理的结合是一种可行的解决方法。基于强化学习的知识推理方法将知识图谱研究的问题建模成路径或序列决策问题,能够更好地利用实体、关系等语义信息来提高推理效果和可解释性。首先,对知识图谱和知识推理的基本概念进行了叙述,阐述了近年来的研究进展。随后,从单层强化学习知识推理和双层强化学习知识推理两个角度,对基于强化学习的知识推理相关研究进行了分析与对比。最后,对知识推理如何应用于知识问答、智能推荐、医疗和交通等领域进行了探讨,并对基于强化学习的知识推理的未来研究方向进行了展望。相似文献

9.

An experts approach to strategy selection in multiagent meeting scheduling

Elisabeth Crawford Manuela Veloso 《Autonomous Agents and Multi-Agent Systems》2007,15(1):5-28

In the multiagent meeting scheduling problem, agents negotiate with each other on behalf of their users to schedule meetings. While a number of negotiation approaches have been proposed for scheduling meetings, it is not well understood how agents can negotiate strategically in order to maximize their users’ utility. To negotiate strategically, agents need to learn to pick good strategies for negotiating with other agents. In this paper, we show how agents can learn online to negotiate strategically in order to better satisfy their users’ preferences. We outline the applicability of experts algorithms to the problem of learning to select negotiation strategies. In particular, we show how two different experts approaches, plays [3] and Exploration–Exploitation Experts (EEE) [10] can be adapted to the task. We show experimentally the effectiveness of our approach for learning to negotiate strategically. 相似文献

10.

Multiagent learning using a variable learning rate

Michael Bowling 《Artificial Intelligence》2002,136(2):215-250

Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents. This creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, “Win or Learn Fast”, for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method. 相似文献

11.

基于强化学习的多移动Agent学习算法

刘菲曾广周《计算机工程与应用》2006,42(5):50-53

结合强化学习技术讨论了单移动Agent学习的过程,然后扩展到多移动Agent学习领域,提出一个多移动Agent学习算法MMAL(MultiMobileAgentLearning)。算法充分考虑了移动Agent学习的特点,使得移动Agent能够在不确定和有冲突目标的上下文中进行决策,解决在学习过程中Agent对移动时机的选择,并且能够大大降低计算代价。目的是使Agent能在随机动态的环境中进行自主、协作的学习。最后,通过仿真试验表明这种学习算法是一种高效、快速的学习方法。相似文献

12.

Opportunities for multiagent systems and multiagent reinforcement learning in traffic control

Ana L. C. Bazzan 《Autonomous Agents and Multi-Agent Systems》2009,18(3):342-375

The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them. 相似文献

13.

AUTOMATIC COMPLEXITY REDUCTION IN REINFORCEMENT LEARNING 总被引：1，自引：0，他引：1

Chung-Cheng Chiu Von-Wun Soo 《Computational Intelligence》2010,26(1):1-25

High dimensionality of state representation is a major limitation for scale-up in reinforcement learning (RL). This work derives the knowledge of complexity reduction from partial solutions and provides algorithms for automated dimension reduction in RL. We propose the cascading decomposition algorithm based on the spectral analysis on a normalized graph Laplacian to decompose a problem into several subproblems and then conduct parameter relevance analysis on each subproblem to perform dynamic state abstraction. The elimination of irrelevant parameters projects the original state space into the one with lower dimension in which some subtasks are projected onto the same shared subtasks. The framework could identify irrelevant parameters based on performed action sequences and thus relieve the problem of high dimensionality in learning process. We evaluate the framework with experiments and show that the dimension reduction approach could indeed make some infeasible problem to become learnable. 相似文献

14.

A hierarchical clustering algorithm using strong components

R. Endre Tarjan 《Information Processing Letters》1982,14(1):26-29

相似文献

15.

大规模图的分布式核分解算法

下载免费PDF全文

翁同峰周旭李肯立胡逸騉《软件学报》2024,35(12):5341-5362

随着互联网信息技术的发展,社交网络、计算机网络及生物信息网络等领域涌现海量大规模图数据.鉴于传统图数据管理技术在处理大规模图时存在存储及性能方面的局限,大规模图的分布式处理技术已成为图数据库领域的研究热点,并得到工业界和学术界的广泛关注.图的核分解用于计算图中所有顶点的核值,有助于挖掘重要图结构信息,在社区搜索、蛋白质结构分析和网络结构可视化等诸多应用中发挥着关键作用.当前以顶点为中心计算模式的分布式核分解算法中采用一种广播的消息传递机制,一方面,存在大量的冗余通信及计算开销;另一方面,处理大规模图核分解过程中易产生内存溢出问题.为此,分别提出基于全局激活和层次剥离计算框架,并提出分布式核分解新算法,通过引入基于顶点核值局部性特点的消息剪枝策略和以计算节点为中心的计算新模式,保证算法有效性的同时提升其性能.在国家超级计算长沙中心分布式集群上,分别针对大规模真实和合成数据集,算法总耗时性能提升比例为37%–98%,验证所提模型和算法的有效性和高效性. 相似文献

16.

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Adrian K. Agogino Kagan Tumer 《Autonomous Agents and Multi-Agent Systems》2008,17(2):320-338

The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents’ movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards. 相似文献

17.

基于图嵌入编码形态信息的非均匀多任务强化学习方法

贺晓王文学《计算机应用研究》2024,41(4):1022-1028

传统强化学习方法存在效率低下、泛化性能差、策略模型不可迁移的问题。针对此问题,提出了一种非均匀多任务强化学习方法,通过学习多个强化任务提升效率和泛化性能,将智能体形态构建为图,利用图神经网络能处理任意连接和大小的图来解决状态和动作空间维度不同的非均匀任务,突破模型不可迁移的局限,充分发挥图神经网络天然地利用图结构归纳偏差的优点,实现了模型高效训练和泛化性能提升,并可快速迁移到新任务。多任务学习实验结果表明,与以往方法相比,该方法在多任务学习和迁移学习实验中均表现出更好的性能,在迁移学习实验中展现出更准确的知识迁移。通过引入图结构偏差,使该方法具备更高的效率和更好的迁移泛化性能。相似文献

18.

基于蚁群算法和轮盘算法的多Agent Q学习

孟祥萍王圣镔王欣欣《计算机工程与应用》2009,45(16):60-62

提出了一种新颖的基于Q-学习、蚁群算法和轮盘赌算法的多Agent强化学习。在强化学习算法中,当Agent数量增加到足够大时,就会出现动作空间灾难性问题,即：其学习速度骤然下降。另外,Agent是利用Q值来选择下一步动作的,因此,在学习早期,动作的选择严重束缚于高Q值。把蚁群算法、轮盘赌算法和强化学习三者结合起来,期望解决上述提出的问题。最后,对新算法的理论分析和实验结果都证明了改进的Q学习是可行的,并且可以有效地提高学习效率。相似文献

19.

基于end-to-end深度强化学习的多车场车辆路径优化

雷坤郭鹏王祺欣赵文超唐连生《计算机应用研究》2022,39(10):3013-3019

为提高多车场车辆路径问题(multi-depot vehicle routing problem, MDVRP)的求解效率,提出了端到端的深度强化学习框架。首先,将MDVRP建模为马尔可夫决策过程(Markov decision process, MDP),包括对其状态、动作、收益的定义;同时,提出了改进图注意力网络(graph attention network, GAT)作为编码器对MDVRP的图表示进行特征嵌入编码,设计了基于Transformer的解码器;采用改进REINFORCE算法来训练该模型,该模型不受图的大小约束,即其一旦完成训练,就可用于求解任意车场和客户数量的算例问题。最后,通过随机生成的算例和公开的标准算例验证了所提出框架的可行性和有效性,即使在求解客户节点数为100的MDVRP上,经训练的模型平均仅需2 ms即可得到与现有方法相比更具优势的解。相似文献

20.

面向主动配电网实时优化调度的图强化学习方法

下载免费PDF全文

陈俊斌余涛潘振宁《控制理论与应用》2024,41(6):999-1008

主动配电网的新能源、储能等能源形式可以有效提高运行的灵活性和可靠性,同时新能源和负荷也给配电网带来了双重不确定性,致使主动配电网的实时优化调度决策维度大、建模精度差.针对这一问题,本文提出结合图神经网络和强化学习的图强化学习方法,避免对复杂系统的精准建模.首先,将实时优化调度问题表述为马尔可夫决策过程,并将其表述为动态序贯决策问题.其次,提出了基于物理连接关系的图表示方法,用以表达状态量的隐含相关性.随后,提出图强化学习来学习将系统状态图映射到决策输出的最优策略.最后,将图强化学习推广到分布式图强化学习.算例结果表明,图强化学习在最优性和效率方面都取得了更好的效果. 相似文献