首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
自主机器人的强化学习研究进展   总被引:9,自引:1,他引:8  
陈卫东  席裕庚  顾冬雷 《机器人》2001,23(4):379-384
虽然基于行为控制的自主机器人具有较高的鲁棒性,但其对于动态环境缺乏必要的自 适应能力.强化学习方法使机器人可以通过学习来完成任务,而无需设计者完全预先规定机 器人的所有动作,它是将动态规划和监督学习结合的基础上发展起来的一种新颖的学习方法 ,它通过机器人与环境的试错交互,利用来自成功和失败经验的奖励和惩罚信号不断改进机 器人的性能,从而达到目标,并容许滞后评价.由于其解决复杂问题的突出能力,强化学习 已成为一种非常有前途的机器人学习方法.本文系统论述了强化学习方法在自主机器人中的 研究现状,指出了存在的问题,分析了几种问题解决途径,展望了未来发展趋势.  相似文献   

2.

移动群智感知(mobile crowdsensing, MCS)是利用大规模移动智能设备进行数据收集、数据挖掘和智能决策的新范式,高效的任务分配方法是MCS获得高性能的关键. 传统的贪婪算法或蚂蚁算法假设工人和任务固定,不适用于工人和任务的位置、数量和时间动态变化的场景. 而且,现有任务分配方法通常由中央服务器收集工人和任务的信息进行决策,容易导致工人隐私泄露. 因此,提出具有隐私保护的深度强化学习(deep reinforcement learning, DRL)模型来获得优化的任务分配策略. 首先,将任务分配建模为多目标优化的动态规划问题,旨在最大化工人和平台的双向收益,实现纳什均衡. 其次,提出基于DRL的近端策略优化(proximal policy optimization, PPO)模型进行训练,学习模型参数. 最后,通过本地差分隐私方式,对工人位置等敏感信息加入随机噪声实现隐私保护,并由中央服务器训练整个模型,获得最优分配策略. 对收敛时间、最大收益和任务覆盖率等指标进行实验评估,在模拟数据集上的实验结果表明,与传统方法和其他基于DRL的方法对比,该方法在不同的评估指标上均有明显提升,并且能够保护工人的隐私.

  相似文献   

3.
Approach to the dynamically reconfigurable robotic system   总被引:2,自引:0,他引:2  
In this paper, a newly proposed robotic system called the dynamically reconfigurable robotic system (DRRS), is reconfigurable for given tasks, so that the level of flexibility and adaptability is much higher for a change of working environments than conventional robots which have un-metamorphic shapes and structures. This robotic system consists of many cells which have fundamental mechanical functions. Each cell is able to detach and combine autonomously, so that the system can self-reorganize depending on a task or on working environments, and can also be self-repairing. DRRS has many applications in many fields, e.g. maintenance robots, more advanced working robots, free-flying service robots in space, more evolved flexible automation, etc. This paper shows the concept of this system, the mechanism of cells, the basic experimental results of the rough approach control between cells, and the decision method of such cell-structured manipulator configurations. This method is based on the reachability of the manipulators for working points, and so is able to apply the design of ordinary manipulators.  相似文献   

4.
Manufacturing companies are in constant need for improved agility. An adequate combination of speed, responsiveness, and business agility to cope with fluctuating raw material costs is essential for today’s increasingly demanding markets. Agility in robots is key in operations requiring on-demand control of a robot’s tool position and orientation, reducing or eliminating extra programming efforts. Vision-based perception using full-state or partial-state observations and learning techniques are useful to create truly adaptive industrial robots. We propose using a Deep Reinforcement Learning (DRL) approach to solve path-following tasks using a simplified virtual environment with domain randomisation to provide the agent with enough exploration and observation variability during the training to generate useful policies to be transferred to an industrial robot. We validated our approach using a KUKA KR16HW robot equipped with a Fronius GMAW welding machine. The path was manually drawn on two workpieces so the robot was able to perceive, learn and follow it during welding experiments. It was also found that small processing times due to motion prediction (3.5 ms) did not slow down the process, which resulted in smooth robot operations. The novel approach can be implemented onto different industrial robots to carry out different tasks requiring material deposition.  相似文献   

5.
The current trends in the robotics field have led to the development of large-scale multiple robot systems, and they are deployed for complex missions. The robots in the system can communicate and interact with each other for resource sharing and task processing. Many of such systems fail despite the availability of necessary resources. The major reason for this is their poor coordination mechanism. Task planning, which involves task decomposition and task allocation, is paramount in the design of coordination and cooperation strategies of multiple robot systems. Task allocation mechanism allocates the task in a mission to the robots by maximizing the overall expected performance, and thereby reducing the total allocation cost for the team. In this paper, we formulate a heuristic search-based task allocation algorithm for the task processing in heterogeneous multiple robot system, by maximizing the efficiency in terms of both communication and processing cost. We assume a set of decomposed tasks of a mission, which needs to be allocated to the robots. The near-optimal allocation schemes are found using the proposed peer structure algorithm for the given problem, where the number of the tasks is more than the robots present in the system. The cost function is the summation of static overhead cost of robots, assignment cost, and the communication cost between the dependent tasks, if they are assigned to different robots. Experiments are performed to verify the effectiveness of the algorithm by comparing it with the existing methods in terms of computational time and quality of solution. The experimental results show that the proposed algorithm performs the best under different problem scales. This proves that the algorithm can be scaled for larger system and it can work for dynamic multiple robot system.  相似文献   

6.
为了进行群机器人协同作业,提出目标搜索中导航类集体行为学习策略.在使用具有闭环调节功能的动态任务分工方法进行任务分配、自组织地生成多个子群后,在子群中引入基于社会学习微粒群算法的机器人行为学习策略.在子群框架内,机器人各自独立地以感知的共同意向目标信号强度为标准对所有成员排序,将感知优于自己的机器人作为行为示范者.然后在搜索空间各维度上分别随机选择一个行为示范者,学习其在相应维度上的位置坐标,经构造得到搜索空间中自己的学习行为向量,由此决策自身的运动行为.仿真结果表明,在不需要学习全局社会经验的前提下,机器人能针对所属子群的共同意向目标进行协同作业,提高搜索效率.  相似文献   

7.
随着移动机器人作业环境复杂度的提高、随机性的增强、信息量的减少,移动机器人的运动规划能力受到了严峻的挑战.研究移动机器人高效自主的运动规划理论与方法,使其在长期任务中始终保持良好的复杂环境适应能力,对保障工作安全和提升任务效率具有重要意义.对此,从移动机器人运动规划典型应用出发,重点综述了更加适应于机器人动态复杂环境的运动规划方法——深度强化学习方法.分别从基于价值、基于策略和基于行动者-评论家三类强化学习运动规划方法入手,深入分析深度强化学习规划方法的特点和实际应用场景,对比了它们的优势和不足.进而对此类算法的改进和优化方向进行分类归纳,提出了目前深度强化学习运动规划方法所面临的挑战和亟待解决的问题,并展望了未来的发展方向,为机器人智能化的发展提供参考.  相似文献   

8.
多Agent深度强化学习综述   总被引:10,自引:4,他引:6  
近年来, 深度强化学习(Deep reinforcement learning, DRL)在诸多复杂序贯决策问题中取得巨大突破.由于融合了深度学习强大的表征能力和强化学习有效的策略搜索能力, 深度强化学习已经成为实现人工智能颇有前景的学习范式.然而, 深度强化学习在多Agent系统的研究与应用中, 仍存在诸多困难和挑战, 以StarCraft Ⅱ为代表的部分观测环境下的多Agent学习仍然很难达到理想效果.本文简要介绍了深度Q网络、深度策略梯度算法等为代表的深度强化学习算法和相关技术.同时, 从多Agent深度强化学习中通信过程的角度对现有的多Agent深度强化学习算法进行归纳, 将其归纳为全通信集中决策、全通信自主决策、欠通信自主决策3种主流形式.从训练架构、样本增强、鲁棒性以及对手建模等方面探讨了多Agent深度强化学习中的一些关键问题, 并分析了多Agent深度强化学习的研究热点和发展前景.  相似文献   

9.
In mobile surveillance systems, complex task allocation addresses how to optimally assign a set of surveillance tasks to a set of mobile sensing agents to maximize overall expected performance, taking into account the priorities of the tasks and the skill ratings of the mobile sensors. This paper presents a market-based approach to complex task allocation. Complex tasks are the tasks that can be decomposed into subtasks. Both centralized and hierarchical allocations are investigated as winner determination strategies for different levels of allocation and for static and dynamic search tree structures. The objective comparison results show that hierarchical dynamic tree task allocation outperforms all the other techniques especially in complex surveillance operations where large number of robots is used to scan large number of areas.  相似文献   

10.
In this work, we study behavioral specialization in a swarm of autonomous robots. In the studied swarm, robots have to carry out tasks of different types that appear stochastically in time and space in a given environment. We consider a setting in which a robot working repeatedly on tasks of the same type improves its performance on them due to learning. Robots can exploit learning by adapting their task selection behavior, that is, by selecting with higher probability tasks of the type on which they have improved their performance. This adaptation of behavior is called behavioral specialization. We employ a simple task allocation strategy that allows a swarm of robots to behaviorally specialize. We study the influence of different environmental parameters on the performance of the swarm and show that the swarm can exploit learning successfully. However, there is a trade-off between the benefits and the costs of specialization. We study this trade-off in multiple experiments using different swarm sizes. Our experimental results indicate that spatiality has a major influence on the costs and benefits of specialization.  相似文献   

11.
并行多任务分配是多agent系统中极具挑战性的课题, 主要面向资源分配、灾害应急管理等应用需求, 研究如何把一组待求解任务分配给相应的agent联盟去执行. 本文提出了一种基于自组织、自学习agent的分布式并行多任务分配算法, 该算法引入P学习设计了单agent寻找任务的学习模型, 并给出了agent之间通信和协商策略. 对比实验说明该算法不仅能快速寻找到每个任务的求解联盟, 而且能明确给出联盟中各agent成员的实际资源承担量, 从而可以为实际的控制和决策任务提供有价值的参考依据.  相似文献   

12.
随着边缘计算的发展,边缘节点的计算规模不断增加,现有的边缘设备难以搭载深度神经网络模型,网络通信与云端服务器承受着巨大压力。为解决上述问题,通过对Roofline模型进行改进,借助新模型对边缘设备的性能与网络环境进行动态评估。根据评估指标,对神经网络模型进行分离式拆分,部分计算任务分配给边缘节点完成,云端服务器结合节点返回数据完成其它任务。该方法基于节点自身性能与网络环境,进行动态任务分配,具有一定兼容性与鲁棒性。实验结果表明,基于边缘节点的深度神经网络任务分配方法可在不同环境中利用设备的闲置性能,大幅度降低中心服务器的计算负载。  相似文献   

13.
Nowadays, robots generally have a variety of capabilities, which often form a coalition replacing human to work in dangerous environment, such as rescue, exploration, etc. In these operating conditions, the energy supply of robots usually cannot be guaranteed. If the energy resources of some robots are consumed too fast, the number of the future tasks of the coalition will be affected. This paper will develop a novel task allocation method based on Gini coefficient to make full use of limited energy resources of multi-robot system to maximize the number of tasks. At the same time, considering resources consumption, we incorporate the market-based allocation mechanism into our Gini coefficient-based method and propose a hybrid method, which can flexibly optimize the task completion number and the resource consumption according to the application contexts. Experiments show that the multi-robot system with limited energy resources can accomplish more tasks by the proposed Gini coefficient-based method, and the hybrid method can be dynamically adaptive to changes of the work environment and realize the dual optimization goals.   相似文献   

14.
A rearrangement problem involving multiple mobile robots is addressed in this paper. In the problem, it is important to identify task decomposition, task allocation, and path planning applicable to distinct environments while rearrangement tasks are executed. We here define ‘task apportionment’ as an operation that sets up task decomposition and conducts task allocation based on that setup. We propose a method for task apportionment and path planning applicable to distinct environments. The method establishes the necessary intermediate configurations of objects as one way of task decomposition and determines task allocation and path planning as a semi-optimized solution by using simulated annealing. The proposed method is compared with a continuous transportation method and a territorial method through simulations and experiments. In the simulations, the proposed method is, on the average, 17 and 20% faster than the continuous transportation method and the territorial method, respectively. In the experiments, the proposed method is, on the average, 22 and 16% faster than the continuous transportation and the territorial method, respectively. These results show that the proposed method can realize an efficient rearrangement task by mobile robots in various working environments under feasible computation time, especially in environments with a mixture of wide and narrow areas and an uneven distribution of objects.  相似文献   

15.
王童  李骜  宋海荦  刘伟  王明会 《控制与决策》2022,37(11):2799-2807
针对现有基于深度强化学习(deep reinforcement learning, DRL)的分层导航方法在包含长廊、死角等结构的复杂环境下导航效果不佳的问题,提出一种基于option-based分层深度强化学习(hierarchical deep reinforcement learning, HDRL)的移动机器人导航方法.该方法的模型框架分为高层和低层两部分,其中低层的避障和目标驱动控制模型分别实现避障和目标接近两种行为策略,高层的行为选择模型可自动学习稳定、可靠的行为选择策略,从而有效避免对人为设计调控规则的依赖.此外,所提出方法通过对避障控制模型进行优化训练,使学习到的避障策略更加适用于复杂环境下的导航任务.在与现有DRL方法的对比实验中,所提出方法在全部仿真测试环境中均取得最高的导航成功率,同时在其他指标上也具有整体优势,表明所提出方法可有效解决复杂环境下导航效果不佳的问题,且具有较强的泛化能力.此外,真实环境下的测试进一步验证了所提出方法的潜在应用价值.  相似文献   

16.
基于DFS的多Agent动态任务分配算法   总被引:1,自引:1,他引:0       下载免费PDF全文
陈凤  先晓兵 《计算机工程》2009,35(14):230-232
针对任务分配算法应用于不确定动态环境时存在的不足,研究具有动态模糊特性的任务环境,借助动态模糊集理论,给出相关的多Agent动态任务分配算法并进行实例测试。测试结果表明,该算法模型可以合理地模拟MAS系统中任务分配的运行过程,并获得最优的任务分配策略与良好的任务实现效果。  相似文献   

17.
Distributed Coordination in Heterogeneous Multi-Robot Systems   总被引:1,自引:0,他引:1  
Coordination in multi-robot systems is a very active research field in Artificial Intelligence and Robotics, since through coordination one can achieve a more effective execution of the robots' tasks. In this paper we present an approach to distributed coordination of a multi-robot system that is based on dynamic role assignment. The approach relies on the broadcast communication of utility functions that define the capability for every robot to perform a task and on the execution of a coordination protocol for dynamic role assignment. The presented method is robust to communication failures and suitable for application in dynamic environments. In addition to experimental results showing the effectiveness of our approach, the method has been successfully implemented within the team of heterogeneous robots Azzurra Robot Team in a very dynamic hostile environment provided by the RoboCup robotic soccer competitions.  相似文献   

18.
Zweig  Alon  Chechik  Gal 《Machine Learning》2017,106(9-10):1747-1770

Sharing information among multiple learning agents can accelerate learning. It could be particularly useful if learners operate in continuously changing environments, because a learner could benefit from previous experience of another learner to adapt to their new environment. Such group-adaptive learning has numerous applications, from predicting financial time-series, through content recommendation systems, to visual understanding for adaptive autonomous agents. Here we address the problem in the context of online adaptive learning. We formally define the learning settings of Group Online Adaptive Learning and derive an algorithm named Shared Online Adaptive Learning (SOAL) to address it. SOAL avoids explicitly modeling changes or their dynamics, and instead shares information continuously. The key idea is that learners share a common small pool of experts, which they can use in a weighted adaptive way. We define group adaptive regret and prove that SOAL maintains known bounds on the adaptive regret obtained for single adaptive learners. Furthermore, it quickly adapts when learning tasks are related to each other. We demonstrate the benefits of the approach for two domains: vision and text. First, in the visual domain, we study a visual navigation task where a robot learns to navigate based on outdoor video scenes. We show how navigation can improve when knowledge from other robots in related scenes is available. Second, in the text domain, we create a new dataset for the task of assigning submitted papers to relevant editors. This is, inherently, an adaptive learning task due to the dynamic nature of research fields evolving in time. We show how learning to assign editors improves when knowledge from other editors is available. Together, these results demonstrate the benefits for sharing information across learners in concurrently changing environments.

  相似文献   

19.
This paper describes an adaptive task assignment method for a team of fully distributed mobile robots with initially identical functionalities in unknown task environments. A hierarchical assignment architecture is established for each individual robot. In the higher hierarchy, we employ a simple self-reinforcement learning model inspired by the behavior of social insects to differentiate the initially identical robots into “specialists” of different task types, resulting in stable and flexible division of labor; on the other hand, in dealing with the cooperation problem of the robots engaged in the same type of task, Ant System algorithm is adopted to organize low-level task assignment. To avoid using a centralized component, a “local blackboard” communication mechanism is utilized for knowledge sharing. The proposed method allows the robot team members to adapt themselves to the unknown dynamic environments, respond flexibly to the environmental perturbations and robustly to the modifications in the team arising from mechanical failure. The effectiveness of the presented method is validated in two different task domains: a cooperative concurrent foraging task and a cooperative collection task.  相似文献   

20.
Lately, development in robotics for utilizing in both industry and home is in much progress. In this research, a group of robots is made to handle relatively complicated tasks. Cooperative action among robots is one of the research areas in robotics that is progressing remarkably well. Reinforcement learning is known as a common approach in robotics for deploying acquisition of action under dynamic environment. However, until recently, reinforcement learning is only applied to one agent problem. In multi-agent environment where plural robots exist, it was difficult to differentiate between learning of achievement of task and learning of performing cooperative action. This paper introduces a method of implementing reinforcement learning to induce cooperation among a group of robots where its task is to transport luggage of various weights to a destination. The general Q-learning method is used as a learning algorithm. Also, the switching of learning mode is proposed for reduction of learning time and learning area. Finally, grid world simulation is carried out to evaluate the proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号