首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Autonomous agents that learn about their environment can be divided into two broad classes. One class of existing learners, reinforcement learners, typically employ weak learning methods to directly modify an agent's execution knowledge. These systems are robust in dynamic and complex environments but generally do not support planning or the pursuit of multiple goals. In contrast, symbolic theory revision systems learn declarative planning knowledge that allows them to pursue multiple goals in large state spaces, but these approaches are generally only applicable to fully sensed, deterministic environments with no exogenous events. This research investigates the hypothesis that by limiting an agent to procedural access to symbolic planning knowledge, the agent can combine the powerful, knowledge-intensive learning performance of the theory revision systems with the robust performance in complex environments of the reinforcement learners. The system, IMPROV, uses an expressive knowledge representation so that it can learn complex actions that produce conditional or sequential effects over time. By developing learning methods that only require limited procedural access to the agent's knowledge, IMPROV's learning remains tractable as the agent's knowledge is scaled to large problems. IMPROV learns to correct operator precondition and effect knowledge in complex environments that include such properties as noise, multiple agents and time-critical tasks, and demonstrates a general learning method that can be easily strengthened through the addition of many different kinds of knowledge.  相似文献   

2.
ROGUE is an architecture built on a real robot which provides algorithms for the integration of high-level planning, low-level robotic execution, and learning. ROGUE addresses successfully several of the challenges of a dynamic office gopher environment. This article presents the techniques for the integration of planning and execution.ROGUE uses and extends a classical planning algorithm to create plans for multiple interacting goals introduced by asynchronous user requests. ROGUE translates the planner';s actions to robot execution actions and monitors real world execution. ROGUE is currently implemented using the PRODIGY4.0 planner and the Xavier robot. This article describes how plans are created for multiple asynchronous goals, and how task priority and compatibility information are used to achieve appropriate efficient execution. We describe how ROGUE communicates with the planner and the robot to interleave planning with execution so that the planner can replan for failed actions, identify the actual outcome of an action with multiple possible outcomes, and take opportunities from changes in the environment.ROGUE represents a successful integration of a classical artificial intelligence planner with a real mobile robot.  相似文献   

3.
This paper presents a hybrid agent architecture that integrates the behaviours of BDI agents, specifically desire and intention, with a neural network based reinforcement learner known as Temporal Difference-Fusion Architecture for Learning and COgNition (TD-FALCON). With the explicit maintenance of goals, the agent performs reinforcement learning with the awareness of its objectives instead of relying on external reinforcement signals. More importantly, the intention module equips the hybrid architecture with deliberative planning capabilities, enabling the agent to purposefully maintain an agenda of actions to perform and reducing the need of constantly sensing the environment. Through reinforcement learning, plans can also be learned and evaluated without the rigidity of user-defined plans as used in traditional BDI systems. For intention and reinforcement learning to work cooperatively, two strategies are presented for combining the intention module and the reactive learning module for decision making in a real time environment. Our case study based on a minefield navigation domain investigates how the desire and intention modules may cooperatively enhance the capability of a pure reinforcement learner. The empirical results show that the hybrid architecture is able to learn plans efficiently and tap both intentional and reactive action execution to yield a robust performance.  相似文献   

4.
In this paper, we show that through self-interaction and self-observation, an anthropomorphic robot equipped with a range camera can learn object affordances and use this knowledge for planning. In the first step of learning, the robot discovers commonalities in its action-effect experiences by discovering effect categories. Once the effect categories are discovered, in the second step, affordance predictors for each behavior are obtained by learning the mapping from the object features to the effect categories. After learning, the robot can make plans to achieve desired goals, emulate end states of demonstrated actions, monitor the plan execution and take corrective actions using the perceptual structures employed or discovered during learning. We argue that the learning system proposed shares crucial elements with the development of infants of 7–10 months age, who explore the environment and learn the dynamics of the objects through goal-free exploration. In addition, we discuss goal emulation and planning in relation to older infants with no symbolic inference capability and non-linguistic animals which utilize object affordances to make action plans.  相似文献   

5.
Algorithms for planning under uncertainty require accurate action models that explicitly capture the uncertainty of the environment. Unfortunately, obtaining these models is usually complex. In environments with uncertainty, actions may produce countless outcomes and hence, specifying them and their probability is a hard task. As a consequence, when implementing agents with planning capabilities, practitioners frequently opt for architectures that interleave classical planning and execution monitoring following a replanning when failure paradigm. Though this approach is more practical, it may produce fragile plans that need continuous replanning episodes or even worse, that result in execution dead‐ends. In this paper, we propose a new architecture to relieve these shortcomings. The architecture is based on the integration of a relational learning component and the traditional planning and execution monitoring components. The new component allows the architecture to learn probabilistic rules of the success of actions from the execution of plans and to automatically upgrade the planning model with these rules. The upgraded models can be used by any classical planner that handles metric functions or, alternatively, by any probabilistic planner. This architecture proposal is designed to integrate off‐the‐shelf interchangeable planning and learning components so it can profit from the last advances in both fields without modifying the architecture.  相似文献   

6.
Most state-of-the-art navigation systems for autonomous service robots decompose navigation into global navigation planning and local reactive navigation. While the methods for navigation planning and local navigation themselves are well understood, the plan execution problem, the problem of how to generate and parameterize local navigation tasks from a given navigation plan is largely unsolved.

This paper describes how a robot can autonomously learn to execute navigation plans. We formalize the problem as a Markov Decision Process (MDP) and derive a decision theoretic action selection function from it. The action selection function employs models of the robot’s navigation actions, which are autonomously acquired from experience using neural networks or regression tree learning algorithms. We show, both in simulation and on an RWI B21 mobile robot, that the learned models together with the derived action selection function achieve competent navigation behavior.  相似文献   


7.
8.
When we negotiate, the arguments uttered to persuade the opponent are not the result of an isolated analysis, but of an integral view of the problem that we want to agree about. Before the negotiation starts, we have in mind what arguments we can utter, what opponent we can persuade, which negotiation can finish successfully and which cannot. Thus, we plan the negotiation, and in particular, the argumentation. This fact allows us to take decisions in advance and to start the negotiation more confidently. With this in mind, we claim that this planning can be exploited by an autonomous agent. Agents plan the actions that they should execute to achieve their goals. In these plans, some actions are under the agent's control, while some others are not. The latter must be negotiated with other agents. Negotiation is usually carried out during the plan execution. In our opinion, however, negotiation can be considered during the planning stage, as in real life. In this paper, we present a novel approach to integrate argumentation-based negotiation planning into the general planning process of an autonomous agent. This integration allows the agent to take key decisions in advance. We evaluated this proposal in a multiagent scenario by comparing the performance of agents that plan the argumentation and agents that do not. These evaluations demonstrated that performance improves when the argumentation is planned, specially, when the negotiation alternatives increase.  相似文献   

9.
This paper presents a method of solving planning problems that involve actions whose effects change according to the situations in which they are performed. The approach is an extension of the conventional planning methodology in which plans are constructed through an iterative process of scanning for goals that are not yet satisfied, inserting actions to achieve them, and introducing subgoals to these actions. This methodology was originally developed under the assumption that one would be dealing exclusively with actions that produce the same effects in every situation. The extension involves introducing additional subgoals to actions above and beyond the preconditions of execution normally introduced. These additional subgoals, called secondary preconditions, ensure that the actions are performed in contexts conducive to producing the effects we desire. This paper defines and analyzes secondary preconditions from a mathematically rigorous standpoint and demonstrates how they can be derived from regression operators.  相似文献   

10.
Collaborative privacy-preserving planning (CPPP) is a multi-agent planning task in which agents need to achieve a common set of goals without revealing certain private information. In many CPPP algorithms, the individual agents reason about a projection of the multi-agent problem onto a single-agent classical planning problem. For example, an agent can plan as if it controls the public actions of other agents, ignoring any private preconditions and effects theses actions may have, and use the cost of this plan as a heuristic estimate of the cost of the full, multi-agent plan. Using such a projection, however, ignores some dependencies between agents’ public actions. In particular, it does not contain dependencies between public actions of other agents caused by their private facts. We propose a projection in which these private dependencies are maintained. The benefit of our dependency-preserving projection is demonstrated by using it to produce high-level plans in a new privacy-preserving planner, and as a heuristic for guiding forward search privacy-preserving algorithms. Both are able to solve more benchmark problems than any other state-of-the-art privacy-preserving planner. This more informed projection does not explicitly expose any private fact, action, or precondition. In addition, we show that even if an adversary agent knows that an agent has some private objects of a given type (e.g., trucks), it cannot infer the number of such private objects that the agent controls. This introduces a novel form of strong privacy, which we call object-cardinality privacy, that is motivated by real-world requirements.  相似文献   

11.
Quad-Q-learning     
Develops the theory of quad-Q-learning which is a learning algorithm that evolved from Q-learning. Quad-Q-learning is applicable to problems that can be solved by "divide and conquer" techniques. Quad-Q-learning concerns an autonomous agent that learns without supervision to act optimally to achieve specified goals. The learning agent acts in an environment that can be characterized by a state. In the Q-learning environment, when an action is taken, a reward is received and a single new state results. The objective of Q-learning is to learn a policy function that maps states to actions so as to maximize a function of the rewards such as the sum of rewards. However, with respect to quad-Q-learning, when an action is taken from a state either an immediate reward and no new state results, or no reward is received and four new states result from taking that action. The environment in which quad-Q-learning operates can thus be viewed as a hierarchy of states where lower level states are the children of higher level states. The hierarchical aspect of quad-Q-learning leads to a bottom up view of learning that improves the efficiency of learning at higher levels in the hierarchy. The objective of quad-Q-learning is to maximize the sum of rewards obtained from each of the environments that result as actions are taken. Two versions of quad-Q-learning are discussed; these are discrete state and mixed discrete and continuous state quad-Q-learning. The discrete state version is only applicable to problems with small numbers of states. Scaling up to problems with practical numbers of states requires a continuous state learning method. Continuous state learning can be accomplished using functional approximation methods. Application of quad-Q-learning to image compression is briefly described.  相似文献   

12.
基于行为的自主微小移动机器人智能体系结构研究   总被引:3,自引:0,他引:3  
该文提出了一种模拟人类学习与进化过程的机器人智能体系结构,微小机器人利用设计人员事先设计的机器人基本行为,根据实际环境和具体任务要求,采用增强学习方式,通过群体行为进化,自主创建满足任务要求和适应环境的具体动作。克服了设计人员在采用基于符号的传统人工智能方法时,由于对外部环境和任务的认识不足而造成的局限性,使机器人的行为动作更适合环境和任务要求。  相似文献   

13.
Multiagent learning provides a promising paradigm to study how autonomous agents learn to achieve coordinated behavior in multiagent systems. In multiagent learning, the concurrency of multiple distributed learning processes makes the environment nonstationary for each individual learner. Developing an efficient learning approach to coordinate agents’ behavior in this dynamic environment is a difficult problem especially when agents do not know the domain structure and at the same time have only local observability of the environment. In this paper, a coordinated learning approach is proposed to enable agents to learn where and how to coordinate their behavior in loosely coupled multiagent systems where the sparse interactions of agents constrain coordination to some specific parts of the environment. In the proposed approach, an agent first collects statistical information to detect those states where coordination is most necessary by considering not only the potential contributions from all the domain states but also the direct causes of the miscoordination in a conflicting state. The agent then learns to coordinate its behavior with others through its local observability of the environment according to different scenarios of state transitions. To handle the uncertainties caused by agents’ local observability, an optimistic estimation mechanism is introduced to guide the learning process of the agents. Empirical studies show that the proposed approach can achieve a better performance by improving the average agent reward compared with an uncoordinated learning approach and by reducing the computational complexity significantly compared with a centralized learning approach. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
Džeroski  Sašo  De Raedt  Luc  Driessens  Kurt 《Machine Learning》2001,43(1-2):7-52
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Q-functions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement learning. In particular, relational reinforcement learning allows us to employ structural representations, to abstract from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.  相似文献   

15.
Conformant planning is used to refer to planning for unobservable problems whose solutions, like classical planning, are linear sequences of operators called linear plans. The term ‘conformant’ is automatically associated with both the unobservable planning model and with linear plans, mainly because the only possible solutions for unobservable problems are linear plans. In this paper we show that linear plans are not only meaningful for unobservable problems but also for partially-observable problems. In such case, the execution of a linear plan generates observations from the environment which must be collected by the agent during the execution of the plan and used at the end in order to determine whether the goal had been achieved or not; this is the typical case in problems of diagnosis in which all the actions are knowledge-gathering actions.Thus, there are substantial differences about linear plans for the case of unobservable or fully-observable problems, and for the case of partially-observable problems: while linear plans for the former model must conform with properties in state space, linear plans for partially-observable problems must conform with properties in belief space. This differences surface when the problems are allowed to express epistemic goals and conditions using modal logic, and place the plan-existence decision problem in different complexity classes.Linear plans is one extreme point in a discrete spectrum of solution forms for planning problems. The other extreme point is contingent plans in which there is a branch point for every possible observation at each time step, and thus the number of branch points is not bounded a priori. In the middle of the spectrum, there are plans with a bounded number of branch points. Thus, linear plans are plans with zero branch points and contingent plans are plans with unbounded number of branch points.In this work, we lay down foundations and principles for the general treatment of linear plans and plans of bounded branching, and provide exact complexity results for novel decision problems. We also show that linear plans for partially-observable problems are not only of theoretical interest since some challenging real-life problems can be dealt with them.  相似文献   

16.
Automatic motion planning is one of the basic modules that are needed to increase robot intelligence and usability. Unfortunately, the inherent complexity of motion planning has rendered traditional search algorithms incapable of solving every problem in real time. To circumvent this difficulty, we explore the alternative of allowing human operators to participate in the problem solving process. By having the human operator teach during difficult motion planning episodes, the robot should be able to learn and improve its own motion planning capability and gradually reduce its reliance on the human operator. In this paper, we present such a learning framework in which both human and robot can cooperate to achieve real-time automatic motion planning. To enable a deeper understanding of the framework in terms of performance, we present it as a simple learning algorithm and provide theoretical analysis of its behavior. In particular, we characterize the situations in which learning is useful, and provide quantitative bounds to predict the necessary training time and the maximum achievable speedup in planning time.  相似文献   

17.
When a reinforcement learning agent executes actions that can cause frequent damage to itself, it can learn, by using Q-learning, that these actions must not be executed again. However, there are other actions that do not cause damage frequently but only once in a while, for example, risky actions such as parachuting. These actions may imply punishment to the agent and, depending on its personality, it would be better to avoid them. Nevertheless, using the standard Q-learning algorithm, the agent is not able to learn to avoid them, because the result of these actions can be positive on average. In this article, an additional mechanism of Q-learning, inspired by the emotion of fear, is introduced in order to deal with those risky actions by considering the worst results. Moreover, there is a daring factor for adjusting the consideration of the risk. This mechanism is implemented on an autonomous agent living in a virtual environment. The results present the performance of the agent with different daring degrees.  相似文献   

18.
The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations online. RL is a paradigm for learning sequential decision making tasks, usually formulated as a Markov Decision Process (MDP). For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays, and continuous state features. In this article, we present texplore, the first algorithm to address all of these challenges together. texplore is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. The agent explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, texplore can select actions continually in real-time whenever necessary. We empirically evaluate the importance of each component of texplore in isolation and then demonstrate the complete algorithm learning to control the velocity of an autonomous vehicle in real-time.  相似文献   

19.
20.
This paper presents a system-theoretic approach to the general problem of autonomous planning under uncertainty. The autonomous planning problem involves an automaton (an autonomous machine) which interacts with an environment via a set of unreliable control and sensing operations. The task assigned to the automaton is to plan and execute a sequence of control and sensing operations which changes the state of the environment in a desirable way.

The paper introduces the concept of an Uncertainty Machine which models the propagation of the information from the environment to the automaton during the execution of the plans. Based on the concept of an uncertainty machine, mathematical expressions are presented for the active-sensing, the passive-sensing and the control (sensorless) entropies of an arbitrary execution instance. These entropies are shown to be useful measures of the automaton's ability to utilize its control and sensing resources in reducing its uncertainty.

In a companion paper,7 the concept of an uncertainty machine is utilized to synthesize strategies which enable the automaton to actively explore the environment.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号