首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 453 毫秒
1.
强化学习的研究需要解决的重要难点之一是:探索未知的动作和采用已知的最优动作之间的平衡。贝叶斯学习是一种基于已知的概率分布和观察到的数据进行推理,做出最优决策的概率手段。因此,把强化学习和贝叶斯学习相结合,使 Agent 可以根据已有的经验和新学到的知识来选择采用何种策略:探索未知的动作还是采用已知的最优动作。本文分别介绍了单 Agent 贝叶斯强化学习方法和多 Agent 贝叶斯强化学习方法:单 Agent 贝叶斯强化学习包括贝叶斯 Q 学习、贝叶斯模型学习以及贝叶斯动态规划等;多 Agent 贝叶斯强化学习包括贝叶斯模仿模型、贝叶斯协同方法以及在不确定下联合形成的贝叶斯学习等。最后,提出了贝叶斯在强化学习中进一步需要解决的问题。  相似文献   

2.
一种基于强化学习的学习Agent   总被引:24,自引:2,他引:22  
强化学习通过感知环境状态和从环境中获得不确定奖赏值来学习动态系统的最优行为策略,是构造智能Agent的核心技术之一,在面向Agent的开发环境AODE中扩充BDI模型,引入策略和能力心智成分,采用强化学习技术实现策略构造函数,从而提出一种基于强化学习技术的学习Agent,研究AODE中自适应Agent物结构和运行方式,使智能Agent具有动态环境的在线学习能力,有效期能够有效地满足Agent各种心智要求。  相似文献   

3.
以复杂任务下多个智能体路径规划问题为研究对象,提出一种基于强化学习的多Agent路径规划方法。该方法采用无模型的在线Q学习算法,多个Agent不断重复"探索-学习-利用"过程,积累历史经验评估动作策略并优化决策,完成未知环境下的多Agent的路径规划任务。仿真结果表明,与基于强化学习的单Agent路径规划方法相比,该方法在多Agent避免了相碰并成功躲避障碍物的前提下,减少了17.4%的总探索步数,形成了到达目标点的最短路径。  相似文献   

4.
杨洋  陈小平 《计算机科学》2005,32(1):151-154
本文提出一种智能体分层决策结构模型,试图通过分层决策技术有效地解决动态、不确定环境中的智能体的实时决策问题。本模型的高层采用BDI结构,以便为较长期任务的规划和推理提供充分的支持;模型的底层采用反应式结构,以保证对短期实时任务的及时响应。实验结果表明了这种分层模型在某些复杂任务领域中的有效性。  相似文献   

5.
Agent的学习理论是目前研究的热点问题。本文基于动态模糊集(DFS),抓住Agent心智特性,提出了一种Agent学习模型,构建出该模型下的Agent混合结构并给出了该模型的工作机制,最后借助动态模糊集(DFS)和强化学习技术实现了模型中的策略构造函数,使Agent具有自适应动态环境的能力和在线学习能力。  相似文献   

6.
利用多Agent系统具有的自治性和实时反应性,探讨对抗环境下的多Agent协商决策问题,提出一种混合式的多Agent结构协商模型,给出以最大团队效益为前提的协商求解策略和协商角色交换算法。通过协商,对抗环境中的Agent成员能够很好地进行动作策略选择和移动,能更好地进行进攻和防守。仿真实验验证了算法的可行性和有效性,结果表明其在一定程度上解决了多Agent系统中实时动态和受限通信对抗环境下的多Agent决策与合作问题。  相似文献   

7.
基于BDI框架的多Agent动态协作模型与应用研究   总被引:8,自引:0,他引:8  
近年来,多Agent学习已经成为人工智能和机器学习研究方向发展最迅速的领域之一.将强化学习和BDI思维状态模型相结合,形成针对多Agent的动态协作模型.在此模型中,个体最优化概念失去其意义,因为每个Agent的回报,不仅取决于自身,而且取决于其它Agent的选择.模型采用AFS神经网络对输入状态空间进行压缩,提高强化学习的收敛速度.与此同时,利用模拟退火算法启发性地指明动作空间搜索方向,使其跳出局部最小点,避免迭代步数的无限增长.理论分析和在机器人足球领域的成功应用,都证明了基于BDI框架的多Agent动态协作模型的有效性。  相似文献   

8.
针对BDI模型在不确定领域应用的薄弱,提出了一种改进的BDI(Beliefs—Desires—Intentions)模型。以BDI体系结构为基础,结合贝叶斯网络,建立了智能体对可能世界的认知表示;通过贝叶斯网络推理,实现了对达成目标状态的期望估计;在意图中,通过引入规划因子,完成行为决策。用Java语言在eclipse平台上对农作物栽培实例进行仿真,验证了模型能够在不确定环境中,进行理性的认知、推理、规划。  相似文献   

9.
多Agent系统是近年来比较热门的一个研究领域,而Q-learning算法是强化学习算法中比较著名的算法,也是应用最广泛的一种强化学习算法。以单Agent强化学习Qlearning算法为基础,提出了一种新的学习协作算法,并根据此算法提出了一种新的多Agent系统体系结构模型,该结构的最大特点是提出了知识共享机制、团队结构思想和引入了服务商概念,最后通过仿真实验说明了该结构体系的优越性。  相似文献   

10.
结合强化学习技术讨论了单移动Agent学习的过程,然后扩展到多移动Agent学习领域,提出一个多移动Agent学习算法MMAL(MultiMobileAgentLearning)。算法充分考虑了移动Agent学习的特点,使得移动Agent能够在不确定和有冲突目标的上下文中进行决策,解决在学习过程中Agent对移动时机的选择,并且能够大大降低计算代价。目的是使Agent能在随机动态的环境中进行自主、协作的学习。最后,通过仿真试验表明这种学习算法是一种高效、快速的学习方法。  相似文献   

11.
J. M.  Corchado  M.  Glez-Bedia  Y.  de Paz  J.  Bajo  J. F.  de Paz 《Computational Intelligence》2008,24(2):77-107
This paper proposes a replanning mechanism for deliberative agents as a new approach to tackling the frame problem. We propose a beliefs desires and intentions (BDI) agent architecture using a case-based planning (CBP) mechanism for reasoning. We discuss the characteristics of the problems faced with planning where constraint satisfaction problems (CSP) resources are limited and formulate, through variation techniques, a reasoning model agent to resolve them. The design of the agent proposed, named MRP-Ag (most-replanable agent), will be evaluated in different environments using a series of simulation experiments, comparing it with others such as E-Ag (Efficient Agent) and O-Ag (Optimum Agent). Last, the most important results will be summarized, and the notion of an adaptable agent will be introduced.  相似文献   

12.
We present a temporal reasoning mechanism for an individual agent situated in a dynamic environment such as the web and collaborating with other agents while interleaving planning and acting. Building a collaborative agent that can flexibly achieve its goals in changing environments requires a blending of real-time computing and AI technologies. Therefore, our mechanism consists of an Artificial Intelligence (AI) planning subsystem and a Real-Time (RT) scheduling subsystem. The AI planning subsystem is based on a model for collaborative planning. The AI planning subsystem generates a partial order plan dynamically. During the planning it sends the RT scheduling subsystem basic actions and time constraints. The RT scheduling subsystem receives the dynamic basic actions set with associated temporal constraints and inserts these actions into the agent's schedule of activities in such a way that the resulting schedule is feasible and satisfies the temporal constraints. Our mechanism allows the agent to construct its individual schedule independently. The mechanism handles various types of temporal constraints arising from individual activities and its collaborators. In contrast to other works on scheduling in planning systems which are either not appropriate for uncertain and dynamic environments or cannot be expanded for use in multi-agent systems, our mechanism enables the individual agent to determine the time of its activities in uncertain situations and to easily integrate its activities with the activities of other agents. We have proved that under certain conditions temporal reasoning mechanism of the AI planning subsystem is sound and complete. We show the results of several experiments on the system. The results demonstrate that interleave planning and acting in our environment is crucial.  相似文献   

13.
针对二维动态场景下的移动机器人路径规划问题,提出了一种新颖的路径规划方法——连续动态运动基元(continuous dynamic movement primitives, CDMPs).该方法将传统的单一动态运动基元推广到连续动态运动基元,通过对演示运动轨迹的学习,获得各运动基元的权重序列,利用相位变量的更新,实现对未知动态目标的追踪.该方法克服了移动机器人对环境模型的依赖,解决了动态场景下追踪运动目标和躲避动态障碍物的路径规划问题.最后通过一系列仿真实验,验证了算法的可行性.仿真实验结果表明,对于动态场景下移动机器人路径规划问题, CDMPs算法比传统的DMPs方法在连续性能和规划效率上具有更好的表现.  相似文献   

14.
一种BDI Agent的动态推理模型   总被引:1,自引:0,他引:1  
论文提出了基于环境感知的具有动态推理能力的BDIAgent模型。该模型引入了环境感知函数、意见函数、过滤函数和行为函数对环境的变化和Agent自身推理过程进行研究,并用扩展了的ASL语言描述该模型的动态推理过程。然后,用机器人摞积木的例子来验证这种推理过程的合理性。最后进行总结并讨论了进一步的研究工作。  相似文献   

15.
In this paper, we propose an agent-based geo-simulation framework EKEMAS to assist human planners when planning under strong spatial constraints in a real large-scale space. The approach consists in drawing a parallel between the real environment (for example, a forest in fire) and the simulated environment based on GIS data. This virtual environment uses software agents which are aware of the space and equipped with advanced spatial reasoning capabilities. In addition, we suggest some enhancements for the Continual Planning approach. Our aim is to demonstrate how EKEMAS, when coupled with a continual planning approach and agent’s spatial reasoning capabilities, can assist human planners overcoming obstacles related to real world constraints: dynamic, uncertain, and spatially constrained environment. We illustrate this idea on the forest firefighting problem and we use MAGS as a simulation platform and Prometheus as a fire simulator. Finally, and since plans in the studied case (wildfire fighting) are mainly paths, we also propose a new approach based on agent geo-simulation in order to solve particular Pathfinding problems.  相似文献   

16.
In this article we discuss the problem of inferring threats in an urban environment, where the knowledge of the environment involves multiple types of intelligence and infrastructure data, and is by nature uncertain or approximate. We use a collection of situation-aware agents to infer potential threats in such environments, where agents are responsible for event correlation and situation assessment. We review the weaknesses of a current approach to threat assessment in Homeland Security and then describe our agent-based approach. The key innovations of our agent-based approach are: an ontological commitment to events and situations, fuzzy event correlation, fuzzy situation assessment, adaptability and learning during threat assessment operations, and an enhancement of traditional belief-desire-intention (BDI) agents with situation awareness. We describe the properties of situation-aware BDI agents and discuss the implementation of them on a variety of BDI agent platforms. Lastly, we discuss the interoperability of these platforms and address the issue of scalability through coupling to large-scale peer-to-peer overlays.  相似文献   

17.
针对传统智能优化算法处理不确定复杂环境下多UAV集结航路规划存在计算量大、耗时长的问题,提出了一种基于互惠速度障碍法(reciprocal velocity obstacle,RVO)的深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法。引入互惠速度障碍法指导UAV对不确定环境内障碍进行避碰,有效提高了目标actor网络的收敛速度,增强了算法的学习效率。设计了一种基于综合代价的奖励函数,将多UAV航路规划中的多目标优化问题转化为DDPG算法的奖励函数设计问题,该设计有效解决了传统DDPG算法易产生局部最优解的问题。基于Pycharm软件平台通过仿真验证了该算法的性能,并与多种算法进行对比。仿真实验表明,RVO-DDPG算法具有更快的决策速度和更好的实用性。  相似文献   

18.
Embedding planning systems in real-world domains has led to the necessity of Distributed Continual Planning (DCP) systems where planning activities are distributed across multiple agents and plan generation may occur concurrently with plan execution. A key challenge in DCP systems is how to coordinate activities for a group of planning agents. This problem is compounded when these agents are situated in a real-world dynamic domain where the agents often encounter differing, incomplete, and possibly inconsistent views of their environment. To date, DCP systems have only focused on cases where agents’ behavior is designed to optimize a global plan. In contrast, this paper presents a temporal reasoning mechanism for self-interested planning agents. To do so, we model agents’ behavior based on the Belief-Desire-Intention (BDI) theoretical model of cooperation, while modeling dynamic joint plans with group time constraints through creating hierarchical abstraction plans integrated with temporal constraints network. The contribution of this paper is threefold: (i) the BDI model specifies a behavior for self interested agents working in a group, permitting an individual agent to schedule its activities in an autonomous fashion, while taking into consideration temporal constraints of its group members; (ii) abstract plans allow the group to plan a joint action without explicitly describing all possible states in advance, making it possible to reduce the number of states which need to be considered in a BDI-based approach; and (iii) a temporal constraints network enables each agent to reason by itself about the best time for scheduling activities, making it possible to reduce coordination messages among a group. The mechanism ensures temporal consistency of a cooperative plan, enables the interleaving of planning and execution at both individual and group levels. We report on how the mechanism was implemented within a commercial training and simulation application, and present empirical evidence of its effectiveness in real-life scenarios and in reducing communication to coordinate group members’ activities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号