首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Elevator Group Control Using Multiple Reinforcement Learning Agents   总被引:22,自引:0,他引:22  
Crites  Robert H.  Barto  Andrew G. 《Machine Learning》1998,33(2-3):235-262
Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems.Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility.  相似文献   

2.
Reinforcement Learning in the Multi-Robot Domain   总被引:16,自引:4,他引:16  
This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task.  相似文献   

3.
Multiagent Systems: A Survey from a Machine Learning Perspective   总被引:27,自引:0,他引:27  
Distributed Artificial Intelligence (DAI) has existed as a subfield of AI for less than two decades. DAI is concerned with systems that consist of multiple independent entities that interact in a domain. Traditionally, DAI has been divided into two sub-disciplines: Distributed Problem Solving (DPS) focuses on the information management aspects of systems with several components working together towards a common goal; Multiagent Systems (MAS) deals with behavior management in collections of several independent entities, or agents. This survey of MAS is intended to serve as an introduction to the field and as an organizational framework. A series of general multiagent scenarios are presented. For each scenario, the issues that arise are described along with a sampling of the techniques that exist to deal with them. The presented techniques are not exhaustive, but they highlight how multiagent systems can be and have been used to build complex systems. When options exist, the techniques presented are biased towards machine learning approaches. Additional opportunities for applying machine learning to MAS are highlighted and robotic soccer is presented as an appropriate test bed for MAS. This survey does not focus exclusively on robotic systems. However, we believe that much of the prior research in non-robotic MAS is relevant to robotic MAS, and we explicitly discuss several robotic MAS, including all of those presented in this issue.  相似文献   

4.
Mixed-initiative problem solving lies at the heart of knowledge- based learning environments. While learners are actively engaged in problem-solving activities, learning environments should monitor their progress and provide them with feedback in a manner that contributes to achieving the twin goals of learning effectiveness and learning efficiency. Mixed-initiative interactions are particularly critical for constructivist learning environments in which learners participate in active problem solving. We have recently begun to see the emergence of believable agents with lifelike qualities. Featured prominently in constructivist learning environments, lifelike pedagogical agents could couple key feedback functionalities with a strong visual presence by observing learners' progress and providing them with visually contextualized advice during mixed-initiative problem solving. For the past three years, we have been engaged in a large-scale research program on lifelike pedagogical agents and their role in constructivist learning environments. In the resulting computational framework, lifelike pedagogical agents are specified by(1) a behavior space containing animated and vocal behaviors,(2) a design-centered context model that maintains constructivist problem representations, multimodal advisory contexts, and evolving problem-solving tasks, and(3) a behavior sequencing engine that in realtime dynamically selects and assembles agents' actions to create pedagogically effective, lifelike behaviors.To empirically investigate this framework, it has been instantiated in a full-scale implementation of a lifelike pedagogical agent for Design-A-Plant, a learning environment developed for the domain of botanical anatomy and physiology for middle school students. Experience with focus group studies conducted with middle school students interacting with the implemented agent suggests that lifelike pedagogical agents hold much promise for mixed-initiative learning.  相似文献   

5.
Agents in a competitive interaction can greatly benefit from adapting to a particular adversary, rather than using the same general strategy against all opponents. One method of such adaptation isOpponent Modeling, in which a model of an opponent is acquired and utilized as part of the agents decision procedure in future interactions with this opponent. However, acquiring an accurate model of a complex opponent strategy may be computationally infeasible. In addition, if the learned model is not accurate, then using it to predict the opponents actions may potentially harm the agents strategy rather than improving it. We thus define the concept ofopponent weakness, and present a method for learning a model of this simpler concept. We analyze examples of past behavior of an opponent in a particular domain, judging its actions using a trusted judge. We then infer aweakness model based on the opponents actions relative to the domain state, and incorporate this model into our agents decision procedure. We also make use of a similar self-weakness model, allowing the agent to prefer states in which the opponent is weak and our agent strong; where we have arelative advantage over the opponent. Experimental results spanning two different test domains demonstrate the agents improved performance when making use of the weakness models.  相似文献   

6.
Conventional robot control schemes are basically model-based methods. However, exact modeling of robot dynamics poses considerable problems and faces various uncertainties in task execution. This paper proposes a reinforcement learning control approach for overcoming such drawbacks. An artificial neural network (ANN) serves as the learning structure, and an applied stochastic real-valued (SRV) unit as the learning method. Initially, force tracking control of a two-link robot arm is simulated to verify the control design. The simulation results confirm that even without information related to the robot dynamic model and environment states, operation rules for simultaneous controlling force and velocity are achievable by repetitive exploration. Hitherto, however, an acceptable performance has demanded many learning iterations and the learning speed proved too slow for practical applications. The approach herein, therefore, improves the tracking performance by combining a conventional controller with a reinforcement learning strategy. Experimental results demonstrate improved trajectory tracking performance of a two-link direct-drive robot manipulator using the proposed method.  相似文献   

7.
A model is developed of the emergence of the knowledge level in asociety of agents where agents model and manage other agents as resources,and manage the learning of other agents to develop such resources. It isargued that any persistent system that actively creates the conditions forits persistence is appropriately modeled in terms of the rationalteleological models that Newell defines as characterizing the knowledgelevel. The need to distribute tasks in agent societies motivates suchmodeling, and it is shown that if there is a rich order relationship ofdifficulty on tasks that is reasonably independent of agents then it isefficient to model agents competencies in terms of their possessingknowledge. It is shown that a simple training strategy of keeping an agent'sperformance constant by allocating tasks of increasing difficulty as anagent adapts optimizes the rate of learning and linearizes the otherwisesigmoidal learning curves. It is suggested that this provides a basis forassigning a granularity to knowledge that enables learning processes to bemanaged simply and efficiently.  相似文献   

8.
9.
Genetic Reinforcement Learning for Neurocontrol Problems   总被引:4,自引:1,他引:4  
Empirical tests indicate that at least one class of genetic algorithms yields good performance for neural network weight optimization in terms of learning rates and scalability. The successful application of these genetic algorithms to supervised learning problems sets the stage for the use of genetic algorithms in reinforcement learning problems. On a simulated inverted-pendulum control problem, genetic reinforcement learning produces competitive results with AHC, another well-known reinforcement learning paradigm for neural networks that employs the temporal difference method. These algorithms are compared in terms of learning rates, performance-based generalization, and control behavior over time.  相似文献   

10.
Coordinating Multiple Agents via Reinforcement Learning   总被引:2,自引:0,他引:2  
In this paper, we attempt to use reinforcement learning techniques to solve agent coordination problems in task-oriented environments. The Fuzzy Subjective Task Structure model (FSTS) is presented to model the general agent coordination. We show that an agent coordination problem modeled in FSTS is a Decision-Theoretic Planning (DTP) problem, to which reinforcement learning can be applied. Two learning algorithms, coarse-grained and fine-grained, are proposed to address agents coordination behavior at two different levels. The coarse-grained algorithm operates at one level and tackle hard system constraints, and the fine-grained at another level and for soft constraints. We argue that it is important to explicitly model and explore coordination-specific (particularly system constraints) information, which underpins the two algorithms and attributes to the effectiveness of the algorithms. The algorithms are formally proved to converge and experimentally shown to be effective.  相似文献   

11.
Since real-time search provides an attractive framework for resource-bounded problem solving, this paper extends the framework for autonomous agents and for a multiagent world. To adaptively control search processes, we propose -search which allows suboptimal solutions with error, and -search which balances the tradeoff between exploration and exploitation. We then consider search in uncertain situations, where the goal may change during the course of the search, and propose a moving target search (MTS) algorithm. We also investigate real-time bidirectional search (RTBS) algorithms, where two problem solvers cooperatively achieve a shared goal. Finally, we introduce a new problem solving paradigm, called organizational problem solving, for multiagent systems.  相似文献   

12.
The decentralized navigation function methodology, established in our previous work for navigation of multiple holonomic agents with global sensing capabilities is extended to the case of local sensing capabilities. Each agent plans its actions without knowing the destinations of the others and the positions of those agents lying outside its sensing neighborhood. The stability properties of the closed loop system are checked via Lyapunov stability techniques for nonsmooth systems. The collision avoidance and global convergence properties are verified through simulations. This work was partially presented in [5].  相似文献   

13.
Design of an autonomous agricultural robot   总被引:5,自引:0,他引:5  
This paper presents a state-of-the-art review in the development of autonomous agricultural robots including guidance systems, greenhouse autonomous systems and fruit-harvesting robots. A general concept for a field crops robotic machine to selectively harvest easily bruised fruit and vegetables is designed. Future trends that must be pursued in order to make robots a viable option for agricultural operations are focused upon.A prototype machine which includes part of this design has been implemented for melon harvesting. The machine consists of a Cartesian manipulator mounted on a mobile chassis pulled by a tractor. Two vision sensors are used to locate the fruit and guide the robotic arm toward it. A gripper grasps the melon and detaches it from the vine. The real-time control hardware architecture consists of a blackboard system, with autonomous modules for sensing, planning and control connected through a PC bus. Approximately 85% of the fruit are successfully located and harvested.  相似文献   

14.
Autonomous control systems are designed to perform well under significant uncertainties in the system and environment for extended periods of time, and they must be able to compensate for system failures without external intervention. Intelligent autonomous control systems use techniques from the field of artificial intelligence to achieve this autonomy. Such control systems evolve from conventional control systems by adding intelligent components, and their development requires interdisciplinary research. A hierarchical functional intelligent autonomous control architecture is introduced here and its functions are described in detail. The fundamental issues in autonomous control system modelling and analysis are discussed.  相似文献   

15.
We describe a framework and equations used to model and predict the behavior of multi-agent systems (MASs) with learning agents. A difference equation is used for calculating the progression of an agent's error in its decision function, thereby telling us how the agent is expected to fare in the MAS. The equation relies on parameters which capture the agent's learning abilities, such as its change rate, learning rate and retention rate, as well as relevant aspects of the MAS such as the impact that agents have on each other. We validate the framework with experimental results using reinforcement learning agents in a market system, as well as with other experimental results gathered from the AI literature. Finally, we use PAC-theory to show how to calculate bounds on the values of the learning parameters.  相似文献   

16.
This paper examines the performance of simple reinforcement learningalgorithms in a stationary environment and in a repeated game where theenvironment evolves endogenously based on the actions of other agents. Sometypes of reinforcement learning rules can be extremely sensitive to smallchanges in the initial conditions, consequently, events early in a simulationcan affect the performance of the rule over a relatively long time horizon.However, when multiple adaptive agents interact, algorithms that performedpoorly in a stationary environment often converge rapidly to a stableaggregate behaviors despite the slow and erratic behavior of individuallearners. Algorithms that are robust in stationary environments can exhibitslow convergence in an evolving environment.  相似文献   

17.
This paper presents a new approach to the intelligent navigation of a mobile robot. The hybrid control architecture described combines properties of purely reactive and behaviour-based systems, providing the ability both to learn automatically behaviours from inception, and to capture these in a distributed hierarchy of decision tree networks. The robot is first trained in the simplest world which has no obstacles, and is then trained in successively more complex worlds, using the knowledge acquired in the previous worlds. Each world representing the perceptual space is thus directly mapped on a unique rule layer which represents in turn the robot action space encoded in a distinct decision tree. A major advantage of the current implementation, compared with the previous work, is that the generated rules are easily understood by human users. The paper demonstrates that the proposed behavioural decomposition approach provides efficient management of complex knowledge, and that the learning mechanism is able to cope with noise and uncertainty in sensory data.  相似文献   

18.
  总被引:1,自引:0,他引:1  
Researchers of artificial intelligence in education have been developing adaptive learning material for complex domains such as programming languages, mathematics, medicine, physics, avionics trouble shooting, pulp and paper mill factories, and electronics. The actual learning material is itself, however, only part of the total Learning Environment (LE) within which learning takes place. This paper presents an extension to the brief overview of an LE first described by Sandberg and Barnard (1993) and Sandberg (1994), and later augmented by Schneider and Peraya (1995). The LE is presented as a conceptual glue which binds several areas of research in an effort to provide a complete and cohesive environment within which the learner is central.  相似文献   

19.
Bennett  Scott W.  DeJong  Gerald F. 《Machine Learning》1996,23(2-3):121-161
In executing classical plans in the real world, small discrepancies between a planner's internal representations and the real world are unavoidable. These can conspire to cause real-world failures even though the planner is sound and, therefore, proves that a sequence of actions achieves the goal. Permissive planning, a machine learning extension to classical planning, is one response to this difficulty. This paper describes the permissive planning approach and presents GRASPER, a permissive planning robotic system that learns to robustly pick up novel objects.  相似文献   

20.
协调机器学习的稳定性研究   总被引:5,自引:0,他引:5  
传统的机器学习方法 ,学习过程不影响被学习系统 ,并且被学习系统通常不是可变的 ,本文提出的协调机器学习系统 ,把学习与被学习作为一个整体来研究 ,进一步丰富和发展了机器学习的基本内容  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号