首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
车城网全面整合了城市动静态信息,为车城融合应用建设提供了数据和基础设施。基于车城网,提出了智能网联车辆管理系统的设计方案,包含智能网联公交、车路协同信息服务、停车信息服务3方面主要内容,为市民提供智能公交、自动驾驶、全域停车信息、综合出行信息等创新交通服务,提升交通出行服务体验和满意度。  相似文献   

2.
一种基于分布式强化学习的多智能体协调方法   总被引:2,自引:0,他引:2  
范波  潘泉  张洪才 《计算机仿真》2005,22(6):115-118
多智能体系统研究的重点在于使功能独立的智能体通过协商、协调和协作,完成复杂的控制任务或解决复杂的问题。通过对分布式强化学习算法的研究和分析,提出了一种多智能体协调方法,协调级将复杂的系统任务进行分解,协调智能体利用中央强化学习进行子任务的分配,行为级中的任务智能体接受各自的子任务,利用独立强化学习分别选择有效的行为,协作完成系统任务。通过在Robot Soccer仿真比赛中的应用和实验,说明了基于分布式强化学习的多智能体协调方法的效果优于传统的强化学习。  相似文献   

3.
公交串车是公交系统运行失效的集中体现。为防治公交串车,建立了公交线路运行的近似动态规划模型,从而实现公交线路的动态自适应式控制。利用Q学习算法和基于人工神经网络近似的价值函数求解上述模型可以得到公交线路控制的最优策略,即基于系统状态确定站点滞留时间的状态价值函数。新的防治方法不仅可以利用仿真模型细致刻画公交线路的实际运行过程,而且可以动态整合在线和离线数据实现控制策略的动态优化调整。数值分析验证了新方法的有效性。与无控制情景相比,新方法可以不仅能防止公交串车发生,而且可有效降低公交车头时距的波动。  相似文献   

4.
平行车联网:基于ACP的智能车辆网联管理与控制   总被引:8,自引:7,他引:1  
本文将平行智能方法引入智能车辆的网联化管理与控制, 提出平行车联网的概念、框架、功能与流程.平行车联网致力于通过人工车联网与物理车联网的虚实互动、协同演化与闭环反馈, 为人-车-路-智能交通信息网一体化的智能交通系统增加计算实验与平行引导的功能, 实现描述、预测与引导相结合的车联网智能, 有效解决异构、移动、融合的交通网络环境下智能车辆的管理与控制问题.  相似文献   

5.
针对多变环境条件下的交通堵塞问题,将强化学习、神经网络、多智能体和交通仿真技术结合起来,提出了用于优化多路口条件下交通状况的trajectory reward light(TR-light)模型。该方法具有几个显著特点:基于红绿灯拟定交通组织方案;将多智能体强化学习用于红绿灯控制;通过红绿灯的协同达到区域级的交通组织优化;在智能体每次行为执行结束后实施轨迹重构,在OD对不改变的情况下改变车辆行驶路径,根据方案和重构轨迹来计算智能体的最终回报。通过SUMO进行交通仿真实验和交通指标对比,验证了该模型在多交叉口中能够提高路网畅通率,改善交通状态。实验表明该模型可行,可有效缓解交通拥堵。  相似文献   

6.
臧嵘  王莉  史腾飞 《计算机应用》2022,42(11):3346-3353
通信是非全知环境中多智能体间实现有效合作的重要途径,当智能体数量较多时,通信过程会产生冗余消息。为有效处理通信消息,提出一种基于注意力消息共享的多智能体强化学习算法AMSAC。首先,在智能体间搭建用于有效沟通的消息共享网络,智能体通过消息读取和写入完成信息共享,解决智能体在非全知、任务复杂场景下缺乏沟通的问题;其次,在消息共享网络中,通过注意力消息共享机制对通信消息进行自适应处理,有侧重地处理来自不同智能体的消息,解决较大规模多智能体系统在通信过程中无法有效识别消息并利用的问题;然后,在集中式Critic网络中,使用Native Critic依据时序差分(TD)优势策略梯度更新Actor网络参数,使智能体的动作价值得到有效评判;最后,在执行期间,智能体分布式Actor网络根据自身观测和消息共享网络的信息进行决策。在星际争霸Ⅱ多智能体挑战赛(SMAC)环境中进行实验,结果表明,与朴素Actor?Critic (Native AC)、博弈抽象通信(GA?Comm)等多智能体强化学习方法相比,AMSAC在四个不同场景下的平均胜率提升了4 ~ 32个百分点。AMSAC的注意力消息共享机制为处理多智能体系统中智能体间的通信消息提供了合理方案,在交通枢纽控制和无人机协同领域都具备广泛的应用前景。  相似文献   

7.
为了节能减排和提高效率,论文提出了基于物联网的公交智能停车系统。系统采用RFID技术实现公交车辆的定位与监控,并通过站台子系统采集站点某车次候车乘客信息,基于无线传感网实现候车乘客与车之间的信息交互,公交司机根据该车次各站点乘客的实时数据智能地在站点停车。公交智能停车系统可以节约能耗,降低车辆磨损,提高运营效率,实现公交运营的绿色化、高效化、智能化。  相似文献   

8.
多智能体深度强化学习(MADRL)将深度强化学习的思想和算法应用到多智能体系统的学习和控制中,是开发具有群智能体的多智能体系统的重要方法.现有的MADRL研究主要基于环境完全可观测或通信资源不受限的假设展开算法设计,然而部分可观测性是多智能体系统实际应用中客观存在的问题,例如智能体的观测范围通常是有限的,可观测的范围外不包括完整的环境信息,从而对多智能体间协同造成困难.鉴于此,针对实际场景中的部分可观测问题,基于集中式训练分布式执行的范式,将深度强化学习算法Actor-Critic扩展到多智能体系统,并增加智能体间的通信信道和门控机制,提出recurrent gated multi-agent Actor-Critic算法(RGMAAC).智能体可以基于历史动作观测记忆序列进行高效的通信交流,最终利用局部观测、历史观测记忆序列以及通过通信信道显式地由其他智能体共享的观察进行行为决策;同时,基于多智能体粒子环境设计多智能体同步且快速到达目标点任务,并分别设计2种奖励值函数和任务场景.实验结果表明,当任务场景中明确出现部分可观测问题时,RGMAAC算法训练后的智能体具有很好的表现,在稳定性...  相似文献   

9.
针对未知环境下的多智能体覆盖控制问题,提出一种基于稀疏高斯过程回归的多智能体在线学习区域覆盖控制算法(SGPR-Lloyd)。在基站-多智能体通信框架下,多智能体系统采集环境数据并发送至基站,基站通过一种在线稀疏高斯过程回归方法,根据已有数据对全局环境密度函数进行预测和对预测模型中的超参数进行训练,并给出预测方差;其次,基于预测方差,基站对多智能体进行任务分配,对相应维诺区域预测方差较大的智能体指派学习任务,方差较小的则指派覆盖任务,合理地设计覆盖控制律,实现在线学习区域覆盖控制目标;进而通过仿真实验验证了所提出的在线学习区域覆盖控制算法的有效性。实验结果表明,相对于基于传统GPR的学习覆盖控制算法,SGPR-Lloyd算法在预测性能、覆盖效果及运行效率上表现均更优,是一种新型高效的适用于未知环境的区域覆盖控制算法。  相似文献   

10.
基于集散决策体系结构的智能车辆自主导航   总被引:2,自引:0,他引:2  
智能车辆的体系结构作为智能车辆系统的基础,在构建智能车辆前必须得到合理的设计。为保证智能车辆系统的实时性和智能性,提出了基于集散决策的智能车辆体系结构。该结构由信息感知、规划决策、执行3个基本模块组成,其中规划决策分为低层次的分散决策和高层次的集中决策;分散决策对各种环境信息进行并行处理以得到各局部决策结果,集中决策对各分散决策结果进行综合判断并做出最终决策。按照以上设计思想,对道路环境下的智能车辆体系结构进行了仿真,同时实际构建了智能车辆车道识别及跟踪系统的体系结构。并进行了系统设计及实车试验。仿真结果表明,智能车辆能够根据实际环境信息做出合理决策,顺利完成车道跟踪、车距保持、换道行驶等任务。试验结果表明,在该体系结构控制下的智能车辆系统能够准确、可靠地完成车道识别、车道跟踪及车速保持任务。  相似文献   

11.
多智能体强化学习及其在足球机器人角色分配中的应用   总被引:2,自引:0,他引:2  
足球机器人系统是一个典型的多智能体系统, 每个机器人球员选择动作不仅与自身的状态有关, 还要受到其他球员的影响, 因此通过强化学习来实现足球机器人决策策略需要采用组合状态和组合动作. 本文研究了基于智能体动作预测的多智能体强化学习算法, 使用朴素贝叶斯分类器来预测其他智能体的动作. 并引入策略共享机制来交换多智能体所学习的策略, 以提高多智能体强化学习的速度. 最后, 研究了所提出的方法在足球机器人动态角色分配中的应用, 实现了多机器人的分工和协作.  相似文献   

12.
Bus bunching could seriously damage the stability of transit system. This resultant instability always causes a dissatisfying performance of transit system. The strategies of resisting bus bunching aim at improving schedule and headway reliability so as to improve the level of service of transit system. From a perspective of effectively using accurate and reliable information, a generalized framework is proposed to deal with bus bunching and explain the potential efficiencies of various existing anti-bunching strategies. In particular, a strategy that adaptively determines the actual holding time and/or adjusts the bus cruising speed when a bus arrives at a bus stop is studied in depth. The information required by this new strategy includes only the arrival time of the current bus at the current bus stop and the arrival times of the preceding bus at this bus stop and the next. The nonlinearity of boarding process is also taken into account in this new strategy. Numerical analysis and simulation experiment show that the new strategy can not only alleviate bus bunching and keep high schedule and headway reliability with less slack added in schedule, but also generate a relatively high commercial speed for the cruising buses.  相似文献   

13.
从基于动态、异构网络上快速构建稳健的多agent系统出发,设计了多agent远程过程调用通信模型,定义了三种基本类型的agent,对KQML消息规范进行扩展,增加了对消息生存周期的控制,设计了双缓存消息推送器以实现agent消息的主动推送,并在WCF的基础上实现了该通信框架。针对同目标多agent协作系统提出了基于开销均衡的agent系统交互协商策略,通过实例证明相对于独立运行和基于正交互协商策略的agent系统,本协商策略可有效降低系统总开销,并可使运行负载更为均衡。  相似文献   

14.
Communication and coordination are the main cores for reaching a constructive agreement among multi-agent systems (MASs). Dividing the overall performance of MAS to individual agents may lead to group learning as opposed to individual learning, which is one of the weak points of MASs. This paper proposes a recursive genetic framework for solving problems with high dynamism. In this framework, a combination of genetic algorithm and multi-agent capabilities is utilised to accelerate team learning and accurate credit assignment. The argumentation feature is used to accomplish agent learning and the negotiation features of MASs are used to achieve a credit assignment. The proposed framework is quite general and its recursive hierarchical structure could be extended. We have dedicated one special controlling module for increasing convergence time. Due to the complexity of blackjack, we have applied it as a possible test bed to evaluate the system’s performance. The learning rate of agents is measured as well as their credit assignment. The analysis of the obtained results led us to believe that our robust framework with the proposed negotiation operator is a promising methodology to solve similar problems in other areas with high dynamism.  相似文献   

15.
鲁斌  衣楠 《软件》2013,(11):80-82
本文首先介绍了微网控制系统的多Agent结构以及各Agent的工作流程,然后提出了应用于微网控制系统的多Agent结构的协作学习算法,该算法在Q学习算法的基础上进行了改进,使之适用于混合环境中。最后将IEEE9节点系统作为微网模拟系统并在其中进行了仿真,结果显示该算法可以在微网功率发生波动时快速地使功率恢复到稳定状态。  相似文献   

16.
An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by the agent, and the effect on the acquired knowledge, and hence, on future rewards expected to be received. The agent must therefore make a tradeoff between the wish to exploit its current knowledge, and the wish to explore other alternatives, to improve its knowledge for better decisions in the future. The goal of this work is to develop exploration strategies for a model-based learning agent to handle its encounters with other agents in a common environment. We first show how to incorporate exploration methods usually used in reinforcement learning into model-based learning. We then demonstrate the risk involved in exploration—an exploratory action taken by the agent can yield a better model of the other agent but also carries the risk of putting the agent into a much worse position.We present the lookahead-based exploration strategy that evaluates actions according to their expected utility, their expected contribution to the acquired knowledge, and the risk they carry. Instead of holding one model, the agent maintains a mixed opponent model, a belief distribution over a set of models that reflects its uncertainty about the opponent's strategy. Every action is evaluated according to its long run contribution to the expected utility and to the knowledge regarding the opponent's strategy. Risky actions are more likely to be detected by considering their expected outcome according to the alternative models of the opponent's behavior. We present an efficient algorithm that returns an almost optimal exploration plan against the mixed model and provide a proof of its correctness and an analysis of its complexity.We report experimental results in the Iterated Prisoner's Dilemma domain, comparing the capabilities of the different exploration strategies. The experiments demonstrate the superiority of lookahead-based exploration over other exploration methods.  相似文献   

17.
未解决当前的远程教育系统存在形式单一和被动教学等问题,该文提出了一个基于学习者个性因素的多Agent学习系统模型。该模型结合智能代理技术,通过分析学习者个性因素,给出了个体Agent能力描述语言,提出了新的个性化分组策略和学习任务分配策略,采用补偿机制鼓励agent合作,结合状态空间搜索理论使M AS系统具有更强的解题能力,并可满足学习者主动学习的要求,还能在一定程度上节约系统的通讯。  相似文献   

18.
徐鹏  谢广明      文家燕    高远 《智能系统学报》2019,14(1):93-98
针对经典强化学习的多智能体编队存在通信和计算资源消耗大的问题,本文引入事件驱动控制机制,智能体的动作决策无须按固定周期进行,而依赖于事件驱动条件更新智能体动作。在设计事件驱动条件时,不仅考虑智能体的累积奖赏值,还引入智能体与邻居奖赏值的偏差,智能体间通过交互来寻求最优联合策略实现编队。数值仿真结果表明,基于事件驱动的强化学习多智能体编队控制算法,在保证系统性能的情况下,能有效降低多智能体的动作决策频率和资源消耗。  相似文献   

19.
This paper proposes a model-free learning scheme for the developmental acquisition of robot kinematic control and dexterous manipulation skills. The approach is based on a nested-hierarchical multi-agent architecture that intuitively encapsulates the topology of robot kinematic chains, where the activity of each independent degree-of-freedom (DOF) is finally mapped onto a distinct agent. Each one of those agents progressively evolves a local kinematic control strategy in a game-theoretic sense, that is, based on a partial (local) view of the whole system topology, which is incrementally updated through a recursive communication process according to the nested-hierarchical topology. Learning is thus approached not through demonstration and training but through an autonomous self-exploration process. A fuzzy reinforcement learning scheme is employed within each agent to enable efficient exploration in a continuous state–action domain. This paper constitutes in fact a proof of concept, demonstrating that global dexterous manipulation skills can indeed evolve through such a distributed iterative learning of local agent sensorimotor mappings. The main motivation behind the development of such an incremental multi-agent topology is to enhance system modularity, to facilitate extensibility to more complex problem domains and to improve robustness with respect to structural variations including unpredictable internal failures. These attributes of the proposed system are assessed in this paper through numerical experiments in different robot manipulation task scenarios, involving both single and multi-robot kinematic chains. The generalisation capacity of the learning scheme is experimentally assessed and robustness properties of the multi-agent system are also evaluated with respect to unpredictable variations in the kinematic topology. Furthermore, these numerical experiments demonstrate the scalability properties of the proposed nested-hierarchical architecture, where new agents can be recursively added in the hierarchy to encapsulate individual active DOFs. The results presented in this paper demonstrate the feasibility of such a distributed multi-agent control framework, showing that the solutions which emerge are plausible and near-optimal. Numerical efficiency and computational cost issues are also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号