首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
随着移动互联网的快速发展,许多利用手机App打车的网约车平台也应运而生.这些网约车平台大大减少了网约车的空驶时间和乘客等待时间,从而提高了交通效率.作为平台核心模块,网约车路径规划问题致力于调度空闲的网约车以服务潜在的乘客,从而提升平台的运营效率,近年来受到广泛关注.现有研究主要采用基于值函数的深度强化学习算法(如deep Q-network,DQN)来解决这一问题.然而,由于基于值函数的方法存在局限,无法应用到高维和连续的动作空间.提出了一种具有动作采样策略的执行者-评论者(actor-critic with action sampling policy,AS-AC)算法来学习最优的空驶网约车调度策略,该方法能够感知路网中的供需分布,并根据供需不匹配度来确定最终的调度位置.在纽约市和海口市的网约车订单数据集上的实验表明,该算法取得了比对比算法更低的请求拒绝率.  相似文献   

2.
近年来,网上约车成为人们日常出行不可或缺的一部分。网约车平台的核心任务是如何有效地把订单派送给合适的司机,使得用户总体等待时间尽可能短,而司机的收益尽可能高。在目前的研究中,主要采用贪心算法以及强化学习来构建模型。但当前方法大都只考虑乘客的即时满意度,未能有效地考虑车辆、订单之间相对位置关系,从长远的角度来降低全体乘客的等待时间。为此,将订单派送构建为一个马尔可夫过程,提出了一种基于局部位置感知的多智能体的车辆调度方法。该方法通过设计合适的输入状态和卷积神经网络来捕捉人与车的时空关系,从长远角度来降低乘客的总体等待时间。实验结果表明,在不同规格的地图、不同数量的车辆和订单的场景中,提出的方法均优于现有的研究方法,并且拥有更好的泛化能力。特别是在大规模人车环境的复杂场景中,该方法所取得的结果要明显优于现有方法。  相似文献   

3.
针对目前大多数多智能体强化学习算法在智能体数量增多以及环境动态不稳定的情况下导致的维度爆炸和奖励稀疏的问题,提出了一种基于加权值函数分解的多智能体分层强化学习技能发现算法。首先,该算法将集中训练分散执行的架构与分层强化学习相结合,在上层采用加权值函数分解的方法解决智能体在训练过程中容易忽略最优策略而选择次优策略的问题;其次,在下层采用独立Q学习算法使其能够在多智能体环境中分散式地处理高维复杂的任务;最后,在底层独立Q学习的基础上引入技能发现策略,使智能体之间相互学习互补的技能。分别在简易团队运动和星际争霸Ⅱ两个仿真实验平台上对该算法与多智能体强化学习算法和分层强化学习算法进行对比,实验表明,该算法在奖励回报以及双方对抗胜率等性能指标上都有所提高,提升了整个多智能体系统的决策能力和收敛速度,验证了算法的可行性。  相似文献   

4.
徐诚  殷楠  段世红  何昊  王然 《计算机学报》2022,(11):2306-2320
近年来,强化学习方法在游戏博弈、机器人导航等多种应用领域取得了令人瞩目的成果.随着越来越多的现实场景需要多个智能体完成复杂任务,强化学习的研究领域已逐渐从单一智能体转向多智能体.而在多智能体强化学习问题的研究中,让智能体学会协作成为当前的一大研究热点.在这一过程中,多智能体信用分配问题亟待解决.这是因为部分可观测环境会针对智能体产生的联合动作产生奖励强化信号,并将其用于强化学习网络参数的更新.也就是说,当所有智能体共享一个相同的全局奖励时,难以确定系统中的每一个智能体对整体所做出的贡献.除此之外,当某个智能体提前学习好策略并获得较高的回报时,其他智能体可能停止探索,使得整个系统陷入局部最优.针对这些问题,本文提出了一种简单有效的方法,即基于奖励滤波的信用分配算法.将其他智能体引起的非平稳环境影响建模为噪声,获取集中训练过程中的全局奖励信号,经过滤波后得到每个智能体的局部奖励,用于协调多智能体的行为,更好地实现奖励最大化.我们还提出了基于奖励滤波的多智能体深度强化学习(RF-MADRL)框架,并在Open AI提供的合作导航环境中成功地进行了验证.实验结果表明,和基线算法相比,使用基于奖...  相似文献   

5.
舒凌洲  吴佳  王晨 《计算机应用》2019,39(5):1495-1499
针对城市交通信号控制中如何有效利用相关信息优化交通控制并保证控制算法的适应性和鲁棒性的问题,提出一种基于深度强化学习的交通信号控制算法,利用深度学习网络构造一个智能体来控制整个区域交通。首先通过连续感知交通环境的状态来选择当前状态下可能的最优控制策略,环境的状态由位置矩阵和速度矩阵抽象表示,矩阵表示法有效地抽象出环境中的主要信息并减少了冗余信息;然后智能体以在有限时间内最大化车辆通行全局速度为目标,根据所选策略对交通环境的影响,利用强化学习算法不断修正其内部参数;最后,通过多次迭代,智能体学会如何有效地控制交通。在微观交通仿真软件Vissim中进行的实验表明,对比其他基于深度强化学习的算法,所提算法在全局平均速度、平均等待队长以及算法稳定性方面展现出更好的结果。其中,与基线相比,平均速度提高9%,平均等待队长降低约13.4%。实验结果证明该方法能够适应动态变化的复杂的交通环境。  相似文献   

6.
因网约车订单派送不合理,导致资源利用率和出行效率降低。基于联合Q值函数分解的框架,提出两种订单派送方法ODDRL和LF-ODDRL,高效地将用户订单请求派送给合适的网约车司机,尽可能缩短乘客等待时间。为捕获网约车订单派送场景中随机需求与供应动态变化关系,把城市定义为一张四边形网格的地图,将每辆车视为一个独立的智能体,构建多智能体马尔可夫决策过程模型,通过最大化熵与累计奖励训练智能体。将多智能体的联合Q值函数转化为易分解函数,使联合Q值函数与单个智能体值函数中的动作具有一致性,同时设计动作搜索函数,结合集中训练、分散执行策略的优点,让每辆车以分布式的方式解决订单匹配问题,而不需要与其他车辆进行协调,从而降低复杂性。实验结果表明,相比Random、Greedy、QMIX等方法,所提ODDRL和LF-ODDRL具有较优的扩展性,其中,在500×500网格上,当乘客数为10、车辆数为2时,相对于QMIX方法接送乘客所产生的总时间分别缩短5%和12%。  相似文献   

7.
针对多智能体系统(multi-agent systems,MAS)中环境具有不稳定性、智能体决策相互影响所导致的策略学习困难的问题,提出了一种名为观测空间关系提取(observation relation extraction,ORE)的方法,该方法使用一个完全图来建模MAS中智能体观测空间不同部分之间的关系,并使用注意力机制来计算智能体观测空间不同部分之间关系的重要程度。通过将该方法应用在基于值分解的多智能体强化学习算法上,提出了基于观测空间关系提取的多智能体强化学习算法。在星际争霸微观场景(StarCraft multi-agent challenge,SMAC)上的实验结果表明,与原始算法相比,带有ORE结构的值分解多智能体算法在收敛速度和最终性能方面都有更好的性能。  相似文献   

8.
基于多智能体的Option自动生成算法   总被引:2,自引:0,他引:2  
目前分层强化学习中的任务自动分层都是采用基于单智能体的串行学习算法,为解决串行算法学习速度较慢的问题,以Sutton的Option分层强化学习方法为基础框架,提出了一种基于多智能体的Option自动生成算法,该算法由多智能体合作对状态空间进行并行探测并集中应用aiNet实现免疫聚类产生状态子空间,然后并行学习生成各子空间上的内部策略,最终生成Option. 以二维有障碍栅格空间内2点间最短路径规划为任务背景给出了算法并进行了仿真实验和分析.结果表明,基于多智能体的Option自动生成算法速度明显快于基于单智能体的算法.  相似文献   

9.
针对协作多智能体强化学习中的全局信用分配机制很难捕捉智能体之间的复杂协作关系及无法有效地处理非马尔可夫奖励信号的问题,提出了一种增强的协作多智能体强化学习中的全局信用分配机制。首先,设计了一种新的基于奖励高速路连接的全局信用分配结构,使得智能体在决策时能够考虑其所分得的局部奖励信号与团队的全局奖励信号;其次,通过融合多步奖励信号提出了一种能够适应非马尔可夫奖励的值函数估计方法。在星际争霸微操作实验平台上的多个复杂场景下的实验结果表明:所提方法不仅能够取得先进的性能,同时还能大大提高样本的利用率。  相似文献   

10.
针对多智能体强化学习中因智能体之间的复杂关系所导致的学习效率低及收敛速度慢的问题, 提出基于两级注意力机制的方法MADDPG-Attention, 在MADDPG算法的Critic网络中增加了软硬两级注意力机制, 通过注意力机制学习智能体之间的可借鉴经验, 提升智能体之间的相互学习效率. 由于单层的软注意力机制会给完全不相关的智能体也赋予学习权重, 因此采用硬注意力判断两个智能体之间学习的必要性, 裁减无关信息的智能体, 再用软注意力判断两个智能体间学习的重要性, 按重要性分布来分配学习权重, 据此向有可用经验的智能体学习. 在多智能体粒子的合作导航环境上进行测试, 实验结果表明, MADDPG-Attention算法对复杂关系的理解更为清晰, 在3种环境的导航成功率都达到了90%以上, 有效提高了学习效率, 加快了收敛速度.  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号