首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
在多机器人协同搬运过程中,针对传统的强化学习算法仅使用数值分析却忽略了推理环节的问题,将多机器人的独立强化学习与“信念-愿望-意向”(BDI)模型相结合,使得多机器人系统拥有了逻辑推理能力,并且,采用距离最近原则将离障碍物最近的机器人作为主机器人,并指挥从机器人运动,提出随多机器人系统位置及最近障碍物位置变化的评价函数,同时将其与基于强化学习的行为权重结合运用,在多机器人通过与环境不断交互中,使行为权重逐渐趋向最佳。仿真实验表明,该方法可行,能够成功实现协同搬运过程。  相似文献   

2.
深度强化学习因其在多机器人系统中的高效表现,已经成为多机器人领域的研究热点.然而,当遭遇连续时变、风险未知的非结构场景时,传统方法暴露出风险防御能力差、系统安全性能脆弱的问题,未知风险将以对抗攻击的形式给多机器人的状态空间带来非线性入侵.针对这一问题,提出一种基于主动风险防御机制的多机器人强化学习方法(APMARL).首先,基于局部可观察马尔可夫博弈模型,建立多机记忆池共享的风险判别机制,通过构建风险状态指数提前预测当前行为的安全性,并根据风险预测结果自适应执行与之匹配的风险处理模式;特别地,针对有风险侵入的非安全状态,提出基于增强型注意力机制的Actor-Critic主动防御网络架构,实现对重点信息的分级增强和危险信息的有效防御.最后,通过广泛的多机协作对抗任务实验表明,具有主动风险防御机制的强化学习策略可以有效降低敌对信息的入侵风险,提高多机器人协同对抗任务的执行效率,增强策略的稳定性和安全性.  相似文献   

3.
Reinforcement learning has been widely applied to solve a diverse set of learning tasks, from board games to robot behaviours. In some of them, results have been very successful, but some tasks present several characteristics that make the application of reinforcement learning harder to define. One of these areas is multi-robot learning, which has two important problems. The first is credit assignment, or how to define the reinforcement signal to each robot belonging to a cooperative team depending on the results achieved by the whole team. The second one is working with large domains, where the amount of data can be large and different in each moment of a learning step. This paper studies both issues in a multi-robot environment, showing that introducing domain knowledge and machine learning algorithms can be combined to achieve successful cooperative behaviours.  相似文献   

4.
In this paper, we propose a distributed dynamic correlation matrix based multi-Q (D-DCM-Multi-Q) learning method for multi-robot systems. First, a dynamic correlation matrix is proposed for multi-agent reinforcement learning, which not only considers each individual robot’s Q-value, but also the correlated Q-values of neighboring robots. Then, the theoretical analysis of the system convergence for this D-DCM-Multi-Q method is provided. Various simulations for multi-robot foraging as well as a proof-of-concept experiment with a physical multi-robot system have been conducted to evaluate the proposed D-DCM-Multi-Q method. The extensive simulation/experimental results show the effectiveness, robustness, and stability of the proposed method.  相似文献   

5.
顾国昌  仲宇  张汝波 《机器人》2003,25(4):344-348
在多机器人系统中,评价一个机器人行为的好坏常常依赖于其它机器人的行为,此 时必须采用组合动作以实现多机器人的协作,但采用组合动作的强化学习算法由于学习空间 异常庞大而收敛得极慢.本文提出的新方法通过预测各机器人执行动作的概率来降低学习空 间的维数,并应用于多机器人协作任务之中.实验结果表明,基于预测的加速强化学习算法 可以比原始算法更快地获得多机器人的协作策略.  相似文献   

6.
A multi-agent reinforcement learning algorithm with fuzzy policy is addressed in this paper. This algorithm is used to deal with some control problems in cooperative multi-robot systems. Specifically, a leader-follower robotic system and a flocking system are investigated. In the leader-follower robotic system, the leader robot tries to track a desired trajectory, while the follower robot tries to follow the reader to keep a formation. Two different fuzzy policies are developed for the leader and follower, respectively. In the flocking system, multiple robots adopt the same fuzzy policy to flock. Initial fuzzy policies are manually crafted for these cooperative behaviors. The proposed learning algorithm finely tunes the parameters of the fuzzy policies through the policy gradient approach to improve control performance. Our simulation results demonstrate that the control performance can be improved after the learning.  相似文献   

7.
This paper is concerned with the problem of odor source localization using multi-robot system. A learning particle swarm optimization algorithm, which can coordinate a multi-robot system to locate the odor source, is proposed. First, in order to develop the proposed algorithm, a source probability map for a robot is built and updated by using concentration magnitude information, wind information, and swarm information. Based on the source probability map, the new position of the robot can be generated. Second, a distributed coordination architecture, by which the proposed algorithm can run on the multi-robot system, is designed. Specifically, the proposed algorithm is used on the group level to generate a new position for the robot. A consensus algorithm is then adopted on the robot level in order to control the robot to move from the current position to the new position. Finally, the effectiveness of the proposed algorithm is illustrated for the odor source localization problem.  相似文献   

8.
针对Internet多机器人系统中存在的操作指令延迟、工作效率低、协作能力差等问题,提出了多机器人神经元群网络控制模型。在学习过程中,来自不同功能区域的多类型神经元连接形成动态神经元群集,来描述各机器人的运动行为与外部条件、内部状态之间复杂的映射关系,通过对内部权值连接的评价选择,以实现最佳的多机器人运动行为协调。以互联网足球机器人系统为实验平台,给出了学习算法描述。仿真结果表明,己方机器人成功实现了配合射门的任务要求,所提模型和方法提高了多机器人的协作能力,并满足系统稳定性和实时性要求。  相似文献   

9.
研究基于网络的多操作者多机器人的协作方案,分别提出了针对机器人约束和非约束体协作条件下的协调策略,并在局域网内建立一个仿真实验系统对机器人协作进行仿真.在约束情况下以机器人协同搬运为研究点,提出采用"主、从"机器人控制方法,确定工件运动轨迹后采用位置反解算法确定机器人运动.在非约束情况下提出采用"时间戳"、"回滚"方法协调操作.在整个仿真系统中采用基于AABB的碰撞检测算法确保机器人操作的安全性.仿真实验的结果验证了协调策略的可行性和正确性.  相似文献   

10.
应用遗传算法的多机器人协调动作学习   总被引:1,自引:0,他引:1  
本文力图做出的系统是应用遗传算法使多机器人学习可以动作协调而总体实现最多的搬运。多机器人移动的环境采用图表表示,移动的规则是用遗传算法优化制订的,在两预定结点之间的往返次数取为适合度,用计算机构造环境并进行仿真,结果表明多机器人协调作学习时可视情况需要而互相让路。  相似文献   

11.
邵杰  杜丽娟  杨静宇 《计算机科学》2013,40(8):249-251,292
XCS分类器在解决机器人强化学习方面已显示出较强的能力,但在多机器人领域仅局限于MDP环境,只能解决环境空间较小的学习问题。提出了XCSG来解决多机器人的强化学习问题。XCSG建立低维的逼近函数,梯度下降技术利用在线知识建立稳定的逼近函数,使Q-表格一直保持在稳定低维状态。逼近函数Q不仅所需的存储空间更小,而且允许机器人在线对已获得的知识进行归纳一般化。仿真实验表明,XCSG算法很好地解决了多机器人学习空间大、学习速度慢、学习效果不确定等问题。  相似文献   

12.
In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated.  相似文献   

13.
复杂未知环境下智能感知与自动控制是目前机器人在控制领域的研究热点之一,而新一代人工智能为其实现智能自动化赋予了可能.近年来,在高维连续状态-动作空间中,尝试运用深度强化学习进行机器人运动控制的新兴方法受到了相关研究人员的关注.首先,回顾了深度强化学习的兴起与发展,将用于机器人运动控制的深度强化学习算法分为基于值函数和策略梯度2类,并对各自典型算法及其特点进行了详细介绍;其次,针对仿真至现实之前的学习过程,简要介绍5种常用于深度强化学习的机器人运动控制仿真平台;然后,根据研究类型的不同,综述了目前基于深度强化学习的机器人运动控制方法在自主导航、物体抓取、步态控制、人机协作以及群体协同等5个方面的研究进展;最后,对其未来所面临的挑战以及发展趋势进行了总结与展望.  相似文献   

14.
The area of competitive robotic systems usually leads to highly complicated strategies that must be achieved by complex learning architectures since analytic solutions are unpractical or completely unfeasible. In this work we design an experiment in order to study and validate a model about the complex phenomena of adaptation. In particular, we study a reinforcement learning problem that comprises a complex predator–protector–prey system composed by three different robots: a pure bio-mimetic reactive (in Brook’s sense, i.e. without reasoning and representation) predator-like robot, a protector-like robot with reinforcement learning capabilities and a pure bio-mimetic reactive prey-like robot. From the high-level point of view, we are interested in studying whether the Law of Adaptation is useful enough to model and explain the whole learning process occurring in this multi-robot system. From a low-level point of view, our interest is in the design of a learning system capable of solving such a complex competitive predator–protector–prey system optimally. We show how this learning problem can be addressed and solved effectively by means of a reinforcement learning setup that uses abstract actions to select a goal or target towards which a pure bio-mimetic reactive robot must navigate. The experimental results clearly show how the Law of Adaptation fits this complex learning system and that the proposed Reinforcement Learning setup is able to find an optimal policy to control the defender robot in its role of protecting the prey against the predator robot.  相似文献   

15.
空地正交视角下的多机器人协同定位及融合建图   总被引:1,自引:0,他引:1  
针对单一机器人在复杂场景下进行同步定位与建图存在的视角局限等问题,本文提出了一种空地正交视角下的空中无人机与地面机器人协同定位与融合建图方法.鉴于无人机的空中视角与地面机器人视角属于正交关系,该方法主要思想是解决空地正交视角的坐标系转换问题.首先,设计了一种空中无人机和地面机器人协同定位与建图的框架,通过无人机提供的全局俯视图像与地面机器人的局部平视图像获得全面丰富的场景信息.在此基础上,通过融合惯性测量单元和图像信息修正偏移并优化轨迹,利用地面机器人上带有尺度信息的视觉标识,获得坐标系转换矩阵以融合地图.最后多组真实场景实验验证了该方法具有有效性,是空地协同多机器人协同定位及融合建图(simultaneous localization and mapping, SLAM)领域中值得参考的方法.  相似文献   

16.
In this paper, we first discuss the meaning of physical embodiment and the complexity of the environment in the context of multi-agent learning. We then propose a vision-based reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup (Kitano et al., 1997) to illustrate the effectiveness of our method. Each agent works with other team members to achieve a common goal against opponents. Our method estimates the relationships between a learner's behaviors and those of other agents in the environment through interactions (observations and actions) using a technique from system identification. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis to clarify the relationship between the observed data in terms of actions and future observations. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior policy. The proposed method is applied to a soccer playing situation. The method successfully models a rolling ball and other moving agents and acquires the learner's behaviors. Computer simulations and real experiments are shown and a discussion is given.  相似文献   

17.
《Knowledge》2007,20(3):310-319
In order to build a heterogeneous multi-robot system that can be regarded as a primitive prototype of a future symbiotic autonomous human-robot system, this paper presents a knowledge model-based heterogeneous multi-robot system implemented by a software platform. With using frame-based knowledge representation, a knowledge model is constructed to describe the features of heterogeneous robots as well as their behaviors according to human requests. The required knowledge for constructing a heterogeneous multi-robot system can be therefore integrated in a single model and shared by robots and users. Based on the knowledge model, the heterogeneous multi-robot system is defined in the Software Platform for Agents and Knowledge Management (SPAK) by use of XML format. With the use of SPAK, the cooperative operation of heterogeneous multi-robot system can be flexibly carried out. The proposed system not only can integrate heterogeneous robots and various techniques for robots, but also can automatically perform human-robot interaction and plan robot behaviors taking into account different intelligence of robots corresponding to human requests. In this paper, an actual heterogeneous multi-robot system comprised by humanoid robots (Robovie, PINO), mobile robot (Scout) and entertainment robot dog (AIBO) is built and the effectiveness of the proposed system is verified by experiment.  相似文献   

18.
Individuals exchange information, experience and strategy based on communication. Communication is the basis for individuals to form swarms and the bridge of swarms to realize cooperative control. In this paper, the multi-robot swarm and its cooperative control and communication methods are reviewed, and we summarize these methods from the task, control, and perception levels. Based on the research, the cooperative control and communication methods of intelligent swarms are divided into the following four categories: task assignment based methods (divided into market-based methods and alliance based methods), bio-inspired methods (divided into biochemical information inspired methods, vision based methods and self-organization based methods), distributed sensor fusion and reinforcement learning based methods, and we briefly define each method and introduce its basic ideas. Based on WOS database, we divide the development of each method into several stages according to the time distribution of the literature, and outline the main research content of each stage. Finally, we discuss the communication problems of intelligent swarms and the key issues, challenges and future work of each method.  相似文献   

19.
Given a collection of parameterized multi-robot controllers associated with individual behaviors designed for particular tasks, this paper considers the problem of how to sequence and instantiate the behaviors for the purpose of completing a more complex, overarching mission. In addition, uncertainties about the environment or even the mission specifications may require the robots to learn, in a cooperative manner, how best to sequence the behaviors. In this paper, we approach this problem by using reinforcement learning to approximate the solution to the computationally intractable sequencing problem, combined with an online gradient descent approach to selecting the individual behavior parameters, while the transitions among behaviors are triggered automatically when the behaviors have reached a desired performance level relative to a task performance cost. To illustrate the effectiveness of the proposed method, it is implemented on a team of differential-drive robots for solving two different missions, namely, convoy protection and object manipulation.  相似文献   

20.
基于双层模糊逻辑的多机器人路径规划与避碰   总被引:1,自引:0,他引:1  
针对无通信情况下的多机器人系统在未知动态环境下的路径规划问题,设计了基于双层模糊逻辑的多机器人路径规划与动态避碰系统。方向模糊控制器充分考虑了障碍物的距离信息和目标的角度信息,转化为机器人与障碍物的碰撞可能性,从而输出转向角度实现机器人的动态避障;速度模糊控制器将障碍物的距离信息作为输入,将速度因子作为输出,提高了多机器人路径规划与动态避碰系统的效率和鲁棒性。在Pioneer3-DX机器人实体上验证了该系统的可行性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号