共查询到20条相似文献,搜索用时 160 毫秒
1.
AGV(automated guided vehicle)路径规划问题已成为货物运输、快递分拣等领域中一项关键技术问题。由于在此类场景中需要较多的AGV合作完成,传统的规划模型难以协调多AGV之间的相互作用,采用分而治之的思想或许能获得系统的最优性能。基于此,该文提出一种最大回报频率的多智能体独立强化学习MRF(maximum reward frequency)Q-learning算法,对任务调度和路径规划同时进行优化。在学习阶段AGV不需要知道其他AGV的动作,减轻了联合动作引起的维数灾问题。采用Boltzmann与ε-greedy结合策略,避免收敛到较差路径,另外算法提出采用获得全局最大累积回报的频率作用于Q值更新公式,最大化多AGV的全局累积回报。仿真实验表明,该算法能够收敛到最优解,以最短的时间步长完成路径规划任务。 相似文献
2.
在现实世界的复杂多智能体环境中,任务的完成通常需要多个智能体之间的相互协作,这促使各种多智能体强化学习方法不断涌现.动作价值函数估计偏差是单智能体强化学习领域中备受关注的一个重要问题,而在多智能体环境中却鲜有研究.针对这一问题,分别从理论和实验上证明了多智能体深度确定性策略梯度方法存在价值函数被高估.提出基于双评论家的多智能体深度确定性策略梯度(multiagent deep deterministic policy gradient method based on double critics,MADDPG-DC)方法,通过在双评论家网络上的最小值操作来避免价值被高估,进一步促进智能体学得最优的策略.此外,延迟行动者网络更新,保证行动者网络策略更新的效率和稳定性,提高策略学习和更新的质量.在多智能体粒子环境和交通信号控制环境上的实验结果证明了所提方法的可行性和优越性. 相似文献
3.
4.
5.
对话策略是面向任务的对话系统中的关键组件,给定当前对话状态输出下一个系统动作.近年来,对话策略学习已被广泛地描述为强化学习问题.一种常见的方法是让对话智能体与用户模拟器互动学习.然而,构建一个可靠的用户模拟器并不是一件容易的事,通常与构建一个好的对话代理一样困难.为了避免显式地构建一个用户模拟器,提出了一种PPO强化学习的多智能体对话策略学习方法,将系统端和用户端都构建为智能体.该方法主要通过两个阶段进行策略学习:1)阶段1是模仿学习,采用模仿学习中的行为克隆的方式,对系统策略和用户策略进行预训练;2)阶段2是多智能体强化学习,采用一种数据样本利用率更高以及鲁棒性更好的近端策略优化(PPO)算法,对系统端和用户端的对话策略进行学习.最后,在公开的多域多意图的面向任务的对话语料MultiWOZ上进行了实验,验证了方法的有效性,还分析了在复杂任务中的可伸缩性.此外,将学到的对话策略集成到ConvLab-2平台上进行整体效果评估. 相似文献
6.
多智能体深度强化学习是机器学习领域的一个新兴的研究热点和应用方向,涵盖众多算法、规则、框架,并广泛应用于自动驾驶、能源分配、编队控制、航迹规划、路由规划、社会难题等现实领域,具有极高的研究价值和意义。对多智能体深度强化学习的基本理论、发展历程进行简要的概念介绍;按照无关联型、通信规则型、互相合作型和建模学习型4种分类方式阐述了现有的经典算法;对多智能体深度强化学习算法的实际应用进行了综述,并简单罗列了多智能体深度强化学习的现有测试平台;总结了多智能体深度强化学习在理论、算法和应用方面面临的挑战和未来的发展方向。 相似文献
7.
多智能体强化学习(Multi-Agent Reinforcement Learning,MARL)在群体控制领域中被广泛应用,但由于单个智能体的马尔可夫决策模型被破坏,现有的MARL算法难以学习到最优策略,且训练中智能体的随机性会导致策略不稳定.本文从状态空间到行为空间的映射出发,研究同构多智能体系统的耦合转换,以提高策略的先进性及稳定性.首先,我们调查了同构智能体行为空间的重组,打破智能体与策略对应的固定思维,通过构建抽象智能体将智能体之间的耦合转换为不同智能体行为空间同一维度的耦合,以提高策略网络的训练效率和稳定.随后,在重组策略映射的基础上,我们从序列决策的角度出发,为抽象智能体的策略网络和评估网络分别设计自注意力模块,编码并稀疏化智能体的状态信息.重组后的状态信息经过自注意力编码后,能显示地解释智能体的决策行为.本文在三个常用的多智能体任务上对所提出方法的有效性进行了全面的验证和分析,实验结果表明,在集中奖励的情况下,本文所提出的方法能够学到比基线方法更为先进的策略,平均回报提高了20%,且训练过程与训练结果的稳定性提高了50%以上.多个对应的消融实验也分别验证了抽象智能体与自... 相似文献
8.
多智能体路径规划(multi-agent path finding,MAPF)是为多个智能体规划路径的问题,关键约束是多个智能体同时沿着规划路径行进而不会发生冲突。MAPF在物流、军事、安防等领域有着大量应用。对国内外关于MAPF的主要研究成果进行系统整理和分类,按照规划方式不同,MAPF算法分为集中式规划算法和分布式执行算法。集中式规划算法是最经典和最常用的MAPF算法,主要分为基于[A*]搜索、基于冲突搜索、基于代价增长树和基于规约四种算法。分布式执行算法是人工智能领域兴起的基于强化学习的MAPF算法,按照改进技术不同,分布式执行算法分为专家演示型、改进通信型和任务分解型三种算法。基于上述分类,比较MAPF各种算法的特点和适用性,分析现有算法的优点和不足,指出现有算法面临的挑战并对未来工作进行了展望。 相似文献
9.
船舶避碰是智能航行中首要解决的问题,多船会遇局面下,只有相互协作,共同规划避碰策略,才能有效降低碰撞风险.为使船舶智能避碰策略具有协同性、安全性和实用性,提出一种基于多智能体深度强化学习的船舶协同避碰决策方法.首先,研究船舶会遇局面辨识方法,设计满足《国际海上避碰规则》的多船避碰策略.其次,研究多船舶智能体合作方式,构建多船舶智能体协同避碰决策模型:利用注意力推理方法提取有助于避碰决策的关键数据;设计记忆驱动的经验学习方法,有效积累交互经验;引入噪音网络和多头注意力机制,增强船舶智能体决策探索能力.最后,分别在实验地图与真实海图上,对多船会遇场景进行仿真实验.结果表明,在协同性和安全性方面,相较于多个对比方法,所提出的避碰策略均能获得具有竞争力的结果,且满足实用性要求,从而为提高船舶智能航行水平和保障航行安全提供一种新的解决方案. 相似文献
10.
随着越来越多的新能源发电商加入电力现货市场的竞价行列,各发电商都面临着如何调整报价策略来使自身利益最大化的问题。为了解决各发电商的报价策略问题,提出了基于多智能体强化学习的MARL-SCCP模型来模拟各发电商的竞价行为以学习报价策略。上述模型首先将每一个发电商建模为一个智能体。其次在建模过程中使用随机机会约束规划来解决风力发电商的不确定性。最后,将神经网络引入WoLF-PHC算法来更好应对智能体较大的状态空间并大幅提高求解速度。实验表明,使用多智能体强化学习来模拟各发电商的竞价过程是可行的,并且能够在较少的迭代次数后学习到较优的策略。在此策略下,各发电商均能实现利益最大化,且新能源发电商能够减少由外界不确定性带来的影响。 相似文献
11.
This paper considers the problems of target tracking and obstacle avoidance for multi-agent systems. To solve the problem that multiple agents cannot effectively track the target while avoiding obstacle in dynamic environment, a novel control algorithm based on potential function and behavior rules is proposed. Meanwhile, the interactions among agents are also considered. According to the state whether an agent is within the area of its neighbors' influence, two kinds of potential functions are presented. Meanwhile, the distributed control input of each agent is determined by relative velocities as well as relative positions among agents, target and obstacle. The maximum linear speed of the agents is also discussed. Finally, simulation studies are given to demonstrate the performance of the proposed algorithm. 相似文献
12.
In this paper, the problems of target tracking and obstacle avoidance for multi-agent networks with input constraints are investigated. When there is a moving obstacle, the control objectives are to make the agents track a moving target and to avoid collisions among agents. First, without considering the input constraints, a novel distributed controller can be obtained based on the potential function. Second, at each sampling time, the control algorithm is optimized. Furthermore, to solve the problem that agents cannot effectively avoid the obstacles in dynamic environment where the obstacles are moving, a new velocity repulsive potential is designed. One advantage of the designed control algorithm is that each agent only requires local knowledge of its neighboring agents. Finally, simulation results are provided to verify the effectiveness of the proposed approach. 相似文献
13.
为实现移动机器人在目标跟踪的同时进行避障,采用Kinect代替传统的测距雷达和摄像机.针对Kinect在使用中存在盲区和噪声的问题,提出一种基于统计的局部地图更新方法,利用动态更新的局部地图保存可能影响机器人运动的障碍物信息,并通过统计信息来消除测距噪声的影响,确保障碍物信息的有效性.同时使用增加安全区域的人工势场法去除对移动机器人运动无干扰的障碍物信息,改善了传统人工势场法通过狭窄通道的能力.在差动驱动移动机器人的实验证实了此系统能够很好地完成跟踪与避障任务,结果表明,使用Kinect可以代替传统测距传感器. 相似文献
14.
Lijing Dong Baihai Zhang Xiangshun Li Sing Kiong Nguang 《International journal of systems science》2016,47(15):3509-3517
We propose an iterative learning control (ILC) tracking strategy to solve the tracking problem of multi-agent systems with nonlinear dynamics and time-varying communication delays. The distributed tracking strategy, in which each tracking agent only utilises its own and neighbours’ information, enables the tracking agents successfully track a maneuvering target in a finite time interval although with presence of time delays. Compared with the existing related work, the quantitative relationship between the boundary of tracking errors and the estimation of time delays is derived. Furthermore, in many practical control problems, identical initialisation condition may not be satisfied, which is called initial-shift problem. Hence, a forgetting factor is introduced to deal with that problem. It is proved that the presented results are effective via conducting numerical examples. 相似文献
15.
针对移动机器人局部动态避障路径规划问题开展优化研究。基于动态障碍物当前历史位置轨迹,提出动态障碍物运动趋势预测算法。在移动机器人的动态避障路径规划过程中,考虑障碍物当前的位置,评估动态障碍物的移动轨迹;提出改进的D*Lite路径规划算法,大幅提升机器人动态避障算法的效率与安全性。搭建仿真验证环境,给出典型的单动态障碍物、多动态障碍物场景,对比验证了避障路径规划算法的有效性。 相似文献
16.
为进一步提高深度强化学习算法在连续动作环境中的探索能力,以获得更高水平的奖励值,本文提出了基于自生成专家样本的探索增强算法.首先,为满足自生成专家样本机制以及在连续动作环境中的学习,在双延迟深度确定性策略梯度算法的基础上,设置了两个经验回放池结构,搭建了确定性策略算法的总体框架.同时提出复合策略更新方法,在情节的内部循环中加入一种类同策略学习过程,智能体在这个过程中完成对于参数空间的启发式探索.然后,提出基于自生成专家样本的演示机制,由智能体自身筛选产生专家样本,并根据参数的更新不断调整,进而形成动态的筛选标准,之后智能体将模仿这些专家样本进行学习.在OpenAI Gym的8组虚拟环境中的仿真实验表明,本文提出的算法能够有效提升深度强化学习的探索能力. 相似文献
17.
Naoki Mizuno Kazunori Ohno Ryunosuke Hamada Hiroyoshi Kojima Jun Fujita Hisanori Amano 《Advanced Robotics》2013,27(14):687-698
The firefighting robot system (FFRS) comprises several autonomous robots that can be deployed to fire disasters in petrochemical complexes. For autonomous navigation, the path planner should consider the robot constraints and characteristics. Specifically, three requirements should be satisfied for a path to be suitable for the FFRS. First, the path must satisfy the maximum curvature constraint. Second, it must be smooth for robots to easily execute the trajectory. Third, it must allow reaching the target location in a specific heading. We propose a path planner that provides smooth paths, satisfy the maximum curvature constraint, and allows a suitable robot heading. The path smoother is based on the conjugate gradient descent, and three approaches are proposed for this path planner to meet all the FFRS requirements. The effectiveness of these approaches is qualitatively and quantitatively evaluated by examining the generated paths. Finally, the path planner is applied to an actual robot to verify the suitability of the generated paths for the FFRS, and planning is applied to another type of robot to demonstrate the wide applicability of the proposed planner. 相似文献
18.
摘要:将可拓策略应用于移动机器人路径规划,提出了一种新的路径规划算法。该方法在绕障时引入临时目标,模拟了人在未知环境中的路径选择,使得环境信息得到有效压缩,避免了在实时计算过程中对复杂环境的建模。基于安全距离的关联函数得到的评价函数,使得所选路径更加平滑,并且降低了对机器人自身控制及传感器测量精度的要求。由于拟人策略的鲁棒性,极大地缓解了其他传统方法的振荡及局部最小现象。实验及仿真均表明该方法实时性好,规划所得路径优于已有方法。 相似文献
19.
When there are obstacles around the target point, the mobile robot cannot reach the target using the traditional artificial potential field (APF). Besides, the traditional APF is prone to local oscillation in complex terrain such as three-point collinear or semiclosed obstacles. Aiming at solving the defects of traditional APF, a novel improved APF algorithm named back virtual obstacle setting strategy-APF has been proposed in this paper. There are two main advantages of the proposed method. First, by redefining the gravitational function as a logarithmic function, the proposed method can make the mobile robot reach the target point when there are obstacles around the target. Second, the proposed method can avoid falling into local oscillation for both three-point collinear and semiclosed obstacles. Compare with APF and other improved APF, the feasibility of the algorithm is proved through software simulation and practical application. 相似文献
20.
In this work, we present an optimal cooperative control scheme for a multi-agent system in an unknown dynamic obstacle environment, based on an improved distributed cooperative reinforcement learning (RL) strategy with a three-layer collaborative mechanism. The three collaborative layers are collaborative perception layer, collaborative control layer, and collaborative evaluation layer. The incorporation of collaborative perception expands the perception range of a single agent, and improves the early warning ability of the agents for the obstacles. Neural networks (NNs) are employed to approximate the cost function and the optimal controller of each agent, where the NN weight matrices are collaboratively optimized to achieve global optimal performance. The distinction of the proposed control strategy is that cooperation of the agents is embodied not only in the input of NNs (in a collaborative perception layer) but also in their weight updating procedure (in the collaborative evaluation and collaborative control layers). Comparative simulations are carried out to demonstrate the effectiveness and performance of the proposed RL-based cooperative control scheme. 相似文献