首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Dealing with changing situations is a major issue in building agent systems. When the time is limited, knowledge is unreliable, and resources are scarce, the issue becomes more challenging. The BDI (Belief-Desire-Intention) agent architecture provides a model for building agents that addresses that issue. The model can be used to build intentional agents that are able to reason based on explicit mental attitudes, while behaving reactively in changing circumstances. However, despite the reactive and deliberative features, a classical BDI agent is not capable of learning. Plans as recipes that guide the activities of the agent are assumed to be static. In this paper, an architecture for an intentional learning agent is presented. The architecture is an extension of the BDI architecture in which the learning process is explicitly described as plans. Learning plans are meta-level plans which allow the agent to introspectively monitor its mental states and update other plans at run time. In order to acquire the intricate structure of a plan, a process pattern called manipulative abduction is encoded as a learning plan. This work advances the state of the art by combining the strengths of learning and BDI agent frameworks in a rich language for describing deliberation processes and reactive execution. It enables domain experts to specify learning processes and strategies explicitly, while allowing the agent to benefit from procedural domain knowledge expressed in plans.  相似文献   

2.
一种基于强化学习的学习Agent   总被引:24,自引:2,他引:22  
强化学习通过感知环境状态和从环境中获得不确定奖赏值来学习动态系统的最优行为策略,是构造智能Agent的核心技术之一,在面向Agent的开发环境AODE中扩充BDI模型,引入策略和能力心智成分,采用强化学习技术实现策略构造函数,从而提出一种基于强化学习技术的学习Agent,研究AODE中自适应Agent物结构和运行方式,使智能Agent具有动态环境的在线学习能力,有效期能够有效地满足Agent各种心智要求。  相似文献   

3.
Abstract: A software agent is defined as an autonomous software entity that is able to interact with its environment. Such an agent is able to respond to other agents and/or its environment to some degree, and has some sort of control over its internal state and actions. In belief–desire–intention (BDI) theory, an agent's behavior is described in terms of a processing cycle. In this paper, based on BDI theory, the processing cycle is studied with a software feedback mechanism. A software feedback or loop‐back control mechanism can perform functions without direct external intervention. A feedback mechanism can continuously monitor the output of the system under control (the target system), compare the result against preset values (goals of the feedback control) and feed the difference back to adjust the behavior of the target system in a processing cycle. We discuss the modeling and design aspects of an autonomous, adaptive monitoring agent with layered control architecture. The architecture consists of three layers: a scheduling layer, an optimizing layer and a regulating layer. Experimental results show that the monitoring agent developed for an e‐mail server is effective.  相似文献   

4.
避免逻辑全知的BDI语义   总被引:6,自引:0,他引:6  
程显毅  石纯一 《软件学报》2002,13(5):966-971
BDI(belief, desire, intention)是基于Agent计算的理论模型,BDI语义关系着Agent计算的发展.通过把相信划分为主观相信和客观相信,把可能世界理解为认知的不同阶段,给出具有进化特征的BDI语义.该语义既能描述Agent,又避免了"逻辑全知"问题.  相似文献   

5.
Grasping an object is a task that inherently needs to be treated in a hybrid fashion. The system must decide both where and how to grasp the object. While selecting where to grasp requires learning about the object as a whole, the execution only needs to reactively adapt to the context close to the grasp’s location. We propose a hierarchical controller that reflects the structure of these two sub-problems, and attempts to learn solutions that work for both. A hybrid architecture is employed by the controller to make use of various machine learning methods that can cope with the large amount of uncertainty inherent to the task. The controller’s upper level selects where to grasp the object using a reinforcement learner, while the lower level comprises an imitation learner and a vision-based reactive controller to determine appropriate grasping motions. The resulting system is able to quickly learn good grasps of a novel object in an unstructured environment, by executing smooth reaching motions and preshaping the hand depending on the object’s geometry. The system was evaluated both in simulation and on a real robot.  相似文献   

6.
As a foundation for action selection and task-sequencing intelligence, the reactive and deliberative subsystems of a hybrid agent can be unified by a single, shared representation of intention. In this paper, we summarize a framework for hybrid dynamical cognitive agents (HDCAs) that incorporates a representation of dynamical intention into both reactive and deliberative structures of a hybrid dynamical system model, and we present methods for learning in these intention-guided agents. The HDCA framework is based on ideas from spreading activation models and belief–desire–intention (BDI) models. Intentions and other cognitive elements are represented as interconnected, continuously varying quantities, employed by both reactive and deliberative processes. HDCA learning methods—such as Hebbian strengthening of links between co-active elements, and belief–intention learning of task-specific relationships—modify interconnections among cognitive elements, extending the benefits of reactive intelligence by enhancing high-level task sequencing without additional reliance on or modification of deliberation. We also present demonstrations of simulated robots that learned geographic and domain-specific task relationships in an office environment.  相似文献   

7.
Land use planning is a potentially demanding search and optimization task that has been challenged by numerous researchers in the field of spatial planning. Agent and multi-agent systems are examples of the modern concepts, which have been gaining more attention in challenging spatial issues recently. Although the efficiency of belief, desire, and intention (BDI) architecture of agents is validated in varieties of sciences, its uses in Geospatial Information Systems (GIS) and specifically among spatial planners is still burgeoning. In this paper, we attempted to integrate the concepts of BDI agent architecture into spatial issues; as a result, a novel spatial agent model is designed and implemented to analyze the urban land use planning. The proposed approach was checked in urban land use planning problems using a case study in a municipal area. The result of implementation showed the effects of spatial agents' behaviors such as intention, commitment, and interaction on their decision.  相似文献   

8.
BDI模型能够很好地解决在特定环境下的Agent的推理和决策问题,但在动态和不确定环境下缺少决策和学习的能力。强化学习解决了Agent在未知环境下的决策问题,却缺少BDI模型中的规则描述和逻辑推理。针对BDI在未知和动态环境下的策略规划问题,提出基于强化学习Q-learning算法来实现BDI Agent学习和规划的方法,并针对BDI的实现模型ASL的决策机制做出了改进,最后在ASL的仿真平台Jason上建立了迷宫的仿真,仿真实验表明,在加入Q-learning学习机制后的新的ASL系统中,Agent在不确定环境下依然可以完成任务。  相似文献   

9.
Cooperative, hybrid agent architecture for real-time traffic signal control   总被引:1,自引:0,他引:1  
This paper presents a new hybrid, synergistic approach in applying computational intelligence concepts to implement a cooperative, hierarchical, multiagent system for real-time traffic signal control of a complex traffic network. The large-scale traffic signal control problem is divided into various subproblems, and each subproblem is handled by an intelligent agent with a fuzzy neural decision-making module. The decisions made by lower-level agents are mediated by their respective higher-level agents. Through adopting a cooperative distributed problem solving approach, coordinated control by the agents is achieved. In order for the multiagent architecture to adapt itself continuously to the dynamically changing problem domain, a multistage online learning process for each agent is implemented involving reinforcement learning, learning rate and weight adjustment as well as dynamic update of fuzzy relations using an evolutionary algorithm. The test bed used for this research is a section of the Central Business District of Singapore. The performance of the proposed multiagent architecture is evaluated against the set of signal plans used by the current real-time adaptive traffic control system. The multiagent architecture produces significant improvements in the conditions of the traffic network, reducing the total mean delay by 40% and total vehicle stoppage time by 50%.  相似文献   

10.
Go! is a multi-paradigm programming language that is oriented to the needs of programming secure, production quality, agent based applications. It is multi-threaded, strongly typed and higher order (in the functional programming sense). It has relation, function and action procedure definitions. Threads execute action procedures, calling functions and querying relations as need be. Threads in different agents communicate and coordinate using asynchronous messages. Threads within the same agent can also use shared dynamic relations acting as Linda-style tuple stores. In this paper we introduce the essential features of Go!. We then illustrate them by programming a simple multi-agent application comprising hybrid reactive/deliberative agents interacting in a simulated ballroom. The dancer agents negotiate to enter into joint commitments to dance a particular dance (e.g., polka) they both desire. When the dance is announced, they dance together. The agents' reactive and deliberative components are concurrently executing threads which communicate and coordinate using belief, desire and intention memory stores. We believe such a multi-threaded agent architecture represents a powerful and natural style of agent implementation, for which Go! is well suited.  相似文献   

11.
社会Agent的BDO模型   总被引:15,自引:0,他引:15  
目前MAS中思维状态的研究趋势是在个体模型中加入社会思维属性,研究社会承诺、依赖、联合意图的推理关系。在BDI模型中,以意图为中心的观点不描述社会Agent。该文分析了以竭力为中心的Agent思维状态研究存在的问题,提出MAS的分层模型,并提出以信念、愿望和义务作为基本思维属性(简称BDO)来描述Agent的思维状态和社会属性,给出了BDO逻辑和语义模型,考虑了奖励、惩罚、承诺和承诺解除问题,并对队、组织、组织意图等现象给出了描述。该文力图更自然地描述社会性Agent的思维状态和群体概念,是对Rao和Georgeff提出的BDI模型的改进。最后通过一个例子说明了BDO的表达能力。进一步的工作包括建立更为完善的语义模型、结合各个思维属性的动态修正语义给出BDO Agent的动态模型以及给出基于BDO逻辑框架的Agent/MAS实现结构。  相似文献   

12.
In this paper, hybrid integrated dynamic control algorithm for humanoid locomotion mechanism is presented. The proposed structure of controller involves two feedback loops: model-based dynamic controller including impart-force controller and reinforcement learning feedback controller around zero-moment point. The proposed new reinforcement learning algorithm is based on modified version of actor-critic architecture for dynamic reactive compensation. Simulation experiments were carried out in order to validate the proposed control approach.The obtained simulation results served as the basis for a critical evaluation of the controller performance.  相似文献   

13.
《Advanced Robotics》2013,27(10):1177-1199
A novel integrative learning architecture based on a reinforcement learning schemata model (RLSM) with a spike timing-dependent plasticity (STDP) network is described. This architecture models operant conditioning with discriminative stimuli in an autonomous agent engaged in multiple reinforcement learning tasks. The architecture consists of two constitutional learning architectures: RLSM and STDP. RLSM is an incremental modular reinforcement learning architecture, and it makes an autonomous agent acquire several behavioral concepts incrementally through continuous interactions with its environment and/or caregivers. STDP is a learning rule of neuronal plasticity found in cerebral cortices and the hippocampus of the human brain. STDP is a temporally asymmetric learning rule that contrasts with the Hebbian learning rule. We found that STDP enabled an autonomous robot to associate auditory input with its acquired behaviors and to select reinforcement learning modules more effectively. Auditory signals interpreted based on the acquired behaviors were revealed to correspond to 'signs' of required behaviors and incoming situations. This integrative learning architecture was evaluated in the context of on-line modular learning.  相似文献   

14.
基于行为的自主微小移动机器人智能体系结构研究   总被引:3,自引:0,他引:3  
该文提出了一种模拟人类学习与进化过程的机器人智能体系结构,微小机器人利用设计人员事先设计的机器人基本行为,根据实际环境和具体任务要求,采用增强学习方式,通过群体行为进化,自主创建满足任务要求和适应环境的具体动作。克服了设计人员在采用基于符号的传统人工智能方法时,由于对外部环境和任务的认识不足而造成的局限性,使机器人的行为动作更适合环境和任务要求。  相似文献   

15.
针对现有深度强化学习算法在状态空间维度大的环境中难以收敛的问题,提出了在时间维度上提取特征的基于一维卷积循环网络的强化学习算法;首先在深度Q网络(DQN,deep Q network)的基础上构建一个深度强化学习系统;然后在深度循环Q网络(DRQN,deep recurrent Q network)的神经网络结构基础上加入了一层一维卷积层,用于在长短时记忆(LSTM,long short-term memory)层之前提取时间维度上的特征;最后在与时序相关的环境下对该新型强化学习算法进行训练和测试;实验结果表明这一改动可以提高智能体的决策水平,并使得深度强化学习算法在非图像输入的时序相关环境中有更好的表现。  相似文献   

16.
Introspective and elaborative processes in rational agents   总被引:1,自引:0,他引:1  
This paper explores the design of rational agent architectures from the perspective of the dynamics of information change. The procedural elements that guide an agent's behavior and that reflect the evolution of pro-attitudes (for example, from desire to intention to plan) are described in terms of McCarthy's notion of a reified mental action. The function of each module of an agent architecture is exactly specified by identifying processes with each module and then describing the effects of those processes or mental actions (such as updating beliefs, elaborating plans, deliberating, reconsidering, revising intentions, filtering intentions, and monitoring) in the same way as one would describe the effects of physical actions. A new semantics for intention is presented that is both dynamic and causal in the sense that it is given in terms of the relation of an intention to both previous and subsequent mental states as well as to the choice of physical action. Desires are given a syntactic analysis while the pro-attitude of intentions-that, which has been proposed in the SharedPlans framework of Grosz and Kraus, is axiomatized in terms of an evolving commitment to certain deliberative, mental actions that evolve as a function of knowledge of the state of the joint activity. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

17.
陈浩  李嘉祥  黄健  王菖  刘权  张中杰 《控制与决策》2023,38(11):3209-3218
面对高维连续状态空间或稀疏奖励等复杂任务时,仅依靠深度强化学习算法从零学习最优策略十分困难,如何将已有知识表示为人与学习型智能体之间相互可理解的形式,并有效地加速策略收敛仍是一个难题.对此,提出一种融合认知行为模型的深度强化学习框架,将领域内先验知识建模为基于信念-愿望-意图(belief- desire-intention, BDI)的认知行为模型,用于引导智能体策略学习.基于此框架,分别提出融合认知行为模型的深度Q学习算法和近端策略优化算法,并定量化设计认知行为模型对智能体策略更新的引导方式.最后,通过典型gym环境和空战机动决策对抗环境,验证所提出算法可以高效利用认知行为模型加速策略学习,有效缓解状态空间巨大和环境奖励稀疏的影响.  相似文献   

18.
This research presents an optimization technique for route planning and exploration in unknown environments. It employs the hybrid architecture that implements detection, avoidance and planning using autonomous agents with coordination capabilities. When these agents work for a common objective, they require a robust information interchange module for coordination. They cannot achieve the goal when working independently. The coordination module enhances their performance and efficiency. The multi agent systems can be employed for searching items in unknown environments. The searching of unexploded ordinance such as the land mines is an important application where multi agent systems can be best employed. The hybrid architecture incorporates learning real time A* algorithm for route planning and compares it with A* searching algorithm. Learning real time A* shows better results for multi agent environment and proved to be efficient and robust algorithm. A simulated ant agent system is presented for route planning and optimization and proved to be efficient and robust for large and complex environments.  相似文献   

19.
To date, many researchers have proposed various methods to improve the learning ability in multiagent systems. However, most of these studies are not appropriate to more complex multiagent learning problems because the state space of each learning agent grows exponentially in terms of the number of partners present in the environment. Modeling other learning agents present in the domain as part of the state of the environment is not a realistic approach. In this paper, we combine advantages of the modular approach, fuzzy logic and the internal model in a single novel multiagent system architecture. The architecture is based on a fuzzy modular approach whose rule base is partitioned into several different modules. Each module deals with a particular agent in the environment and maps the input fuzzy sets to the action Q-values; these represent the state space of each learning module and the action space, respectively. Each module also uses an internal model table to estimate actions of the other agents. Finally, we investigate the integration of a parallel update method with the proposed architecture. Experimental results obtained on two different environments of a well-known pursuit domain show the effectiveness and robustness of the proposed multiagent architecture and learning approach.  相似文献   

20.
一种理性Agent的BDI模型   总被引:12,自引:2,他引:10  
康小强  石纯一 《软件学报》1999,10(12):1268-1274
该文通过引入假设信念,解释愿望和意图在Agent思维状态的认知方面的含义,进而定义愿望和意图,并引入规划,建立理性Agent的动态BDI模型.与Cohen和Levesque,Rao和Georgeff,Konolige和Pollack等人的工作相比,克服了对信念、愿望和意图的反直观解释问题,解决了关于愿望和意图的无为而治和副作用问题,强调了愿望的激发与维护作用,表达了信念、愿望和意图三者间的动态约束与激发关系.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号