首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them.  相似文献   

Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals. Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL) method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical levels; in the first level agents learn to select their target and then they select the action directed to their target in the second level. The agents communicate their perception to their neighbors and use the communication information in their decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior. Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University, Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding. Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests include artificial intelligence, multi-agent systems and object oriented data models.  相似文献   

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and “other” (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents—a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.  相似文献   

In the multiagent meeting scheduling problem, agents negotiate with each other on behalf of their users to schedule meetings. While a number of negotiation approaches have been proposed for scheduling meetings, it is not well understood how agents can negotiate strategically in order to maximize their users’ utility. To negotiate strategically, agents need to learn to pick good strategies for negotiating with other agents. In this paper, we show how agents can learn online to negotiate strategically in order to better satisfy their users’ preferences. We outline the applicability of experts algorithms to the problem of learning to select negotiation strategies. In particular, we show how two different experts approaches, plays [3] and Exploration–Exploitation Experts (EEE) [10] can be adapted to the task. We show experimentally the effectiveness of our approach for learning to negotiate strategically.  相似文献   

Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents. This creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, “Win or Learn Fast”, for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method.  相似文献   

This paper discusses If multi-agent learning is the answer, what is the question? [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue] from the perspective of evolutionary game theory. We briefly discuss the concepts of evolutionary game theory, and examine the main conclusions from [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue] with respect to some of our previous work. Overall we find much to agree with, concluding, however, that the central concerns of multiagent learning are rather narrow compared with the broad variety of work identified in [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Inteligence 171 (7) (2007) 365-377, this issue].  相似文献   

Computer simulations are powerful tools frequently used today in many important applications, for example to build safer buildings, to crash-test an automobile before it is built, to stabilize the Pisa tower, to design artificial joints that are comfortable and durable, or to investigate what-if scenarios to avoid and best recover from natural or man-made disasters. The simulation codes have reached a very high-level of sophistication and, by running on powerful computing machinery, can accurately track with infinitesimal time steps dozens of physical properties of millions of interacting elements under extreme conditions. In order to take fully advantage of the bounty of information concealed in the data produced, visualization is a uniquely powerful tool since it caters to the sense that provides our highest bandwidth connection to the surrounding world.Unfortunately, simulation results are usually examined with graphics and visualization tools that are one or several steps behind the state-of-the-art. We describe our efforts of producing high-fidelity visualizations of the results of large-scale simulations using the latest commercial rendering and animation systems. To this effect we built a scalable and reusable link between the software worlds of animation and simulation. Our system also offers a set of tools that allow integrating the results of the simulation in the surrounding scene, of great importance when the intended audience extends beyond the researchers that designed the simulation. We built our system as part of the efforts of a larger, interdisciplinary team to produce a high-quality, physically accurate visualization of the September 11 attack on the Pentagon.  相似文献   

A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to stateion,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge.  相似文献   

This article presents an intelligent multiagent application system in AI. The research trend into multiagents is changing from a centralized computing environment to a distributed computing environment. Also, the research into multiagents can be changed to a mobile environment. Initially, the study of multiagents is from research into human modeling. Therefore, we fi rst present a brief concept of a mobile multiagent, and then we present some application areas for mobile multiagents, especially in elearning, bioinformatics, control, and information retrieval, etc. Finally, we present the research theme of multiagents in AI. This work was presented in part at the 12th International Symposium on Articial Life and Robotics, Oita, January 25–27, 2007  相似文献   

The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program, KITTY. This work was presented, in part, at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.  相似文献   

In this work we investigate the use of a reinforcement learning (RL) framework for the autonomous navigation of a group of mini-robots in a multi-agent collaborative environment. Each mini-robot is driven by inertial forces provided by two vibration motors that are controlled by a simple and efficient low-level speed controller. The action of the RL agent is the direction of each mini-robot, and it is based on the position of each mini-robot, the distance between them and the sign of the distance gradient between each mini-robot and the nearest one. Each mini-robot is considered a moving obstacle that must be avoided by the others. We propose suitable state space and reward function that result in an efficient collaborative RL framework. The classical and the double Q-learning algorithms are employed, where the latter is considered to learn optimal policies of mini-robots that offers more stable and reliable learning process. A simulation environment is created, using the ROS framework, that include a group of four mini-robots. The dynamic model of each mini-robot and of the vibration motors is also included. Several application scenarios are simulated and the results are presented to demonstrate the performance of the proposed approach.  相似文献   

Reinforcement learning (RL) is one of the methods of solving problems defined in multiagent systems. In the real world, the state is continuous, and agents take continuous actions. Since conventional RL schemes are often defined to deal with discrete worlds, there are difficulties such as the representation of an RL evaluation function. In this article, we intend to extend an RL algorithm so that it is applicable to continuous world problems. This extension is done by a combination of an RL algorithm and a function approximator. We employ Q-learning as the RL algorithm, and a neural network model called the normalized Gaussian network as the function approximator. The extended RL method is applied to a chase problem in a continuous world. The experimental result shows that our RL scheme was successful. This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000  相似文献   

In electronic marketplaces, trust is modeled, for instance, in order to allow buying agents to make effective selection of selling agents. Familiarity is often considered to be an important factor in determining the level of trust. In previous research, familiarity between two agents has been simply assumed to be the similarity between them. We propose an improved familiarity measurement based on the exploration of factors that affect a human’s feelings of familiarity. We also carry out experiments to show that the trust model with our improved familiarity measurement is more effective and more stable.  相似文献   

Consensus algorithms in multiagent cooperative control systems with bounded control input are studied in this paper.Consensus algorithms are considered for the single-integrator dynamics and double-integrator dynamics under different communication interaction topologies,and show that consensus is reached asymptotically using the algorithm proposed in this paper for the single-integrator dynamics if the undirected interaction graph is connected,and consensus is reached asymptotically if the directed interaction graph is strongly connected,respectively.In addition,the paper further shows that consensus is reached asymptotically using the algorithm proposed for the double-integrator dynamics if the directed interaction graph is strongly connected.The effectiveness of these algorithms is demonstrated through simulations.  相似文献   

This paper considers the problems of formation and obstacle avoidance for multiagent systems.The objective is to design a term of agents that can reach a desired formation while avoiding collision with obstacles.To reduce the amount of information interaction between agents and target,we adopt the leader-follower formation strategy.By using the receding horizon control (RHC),an optimal problem is formulated in terms of cost minimization under constraints.Information on obstacles is incorporated online as sensed in a limited sensing range.The communication requirements between agents are that the followers should obtain the previous optimal control trajectory of the leader to each update time.The stability is guaranteed by adding a terminal-state penalty to the cost function and a terminal-state region to optimal problem.Finally,simulation studies are provided to verify the effectiveness of the proposed approach.  相似文献   

We study a linear stochastic approximation algorithm that arises in the context of reinforcement learning. The algorithm employs a decreasing step-size, and is driven by Markov noise with time-varying statistics. We show that under suitable conditions, the algorithm can track the changes in the statistics of the Markov noise, as long as these changes are slower than the rate at which the step-size of the algorithm goes to zero.  相似文献   

强化学习算法中启发式回报函数的设计及其收敛性分析   总被引:3,自引:0,他引:3  
(中国科学院沈阳自动化所机器人学重点实验室沈阳110016)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号