期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Tuomas Sandholm 《Artificial Intelligence》2007,171(7):382-391

相似文献

2.

Multiagent reinforcement learning applied to a chase problem in a continuous world

Hiroki Tamakoshi Shin Ishii 《Artificial Life and Robotics》2001,5(4):202-206

Reinforcement learning (RL) is one of the methods of solving problems defined in multiagent systems. In the real world, the state is continuous, and agents take continuous actions. Since conventional RL schemes are often defined to deal with discrete worlds, there are difficulties such as the representation of an RL evaluation function. In this article, we intend to extend an RL algorithm so that it is applicable to continuous world problems. This extension is done by a combination of an RL algorithm and a function approximator. We employ Q-learning as the RL algorithm, and a neural network model called the normalized Gaussian network as the function approximator. The extended RL method is applied to a chase problem in a continuous world. The experimental result shows that our RL scheme was successful. This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000 相似文献

3.

Generalized multiagent learning with performance bound

Bikramjit Banerjee Jing Peng 《Autonomous Agents and Multi-Agent Systems》2007,15(3):281-312

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and “other” (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV _σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents—a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV _σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV _σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV _σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game. 相似文献

4.

Multiagent learning towards RoboCup

Minoru Asada Eiji Uchibe 《New Generation Computing》2001,19(2):103-120

This article describes the issues in multiagent learning towards RoboCup,^1≈3) especially for the real robot leagues. First, the review of the issue in the context of the related area is given, then related works from several viewpoints are reviewed. Next, our approach towards RoboCup Initiative is introduced and finally future issues are given. Minoru Asada, Ph.D.: He received B.E., M.Sc., and Ph.D., degrees in control engineering from Osaka University, in 1977, 1979, and 1982, respectively. From 1982 to 1988, he was a research associate of Control Engineering, Osaka University. In 1989, he became an associate professor of Mechanical Engineering for Computer-Controlled Machinery, Osaka University. In 1995 he became a professor of the department of Adaptive Machine Systems at the same university. From 1986 to 1987, he was a visiting researcher of Center for Automation Research, University of Maryland, College Park, MD. He received the 1992 best paper award of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS92), and the 1996 best paper award of RSJ (Robotics Society of Japan). Also, his paper was one of the finalists of IEEE Robotics and Automation Society 1995 Best Conference Paper Award. He was a general chair of IEEE/RSJ 1996 International Conference on Intelligent Robots and Systems (IROS96). Since early 1990, he has been involved in RoboCup activities and his team was the first champion team with USC team in the middle size league of the first RoboCup held in conjunction with IJCAI-97, Nagoya, Japan. Eiji Uchibe, Ph.D.: He received a Ph.D. degree in mechanical engineering from Osaka University in 1999. He is currently a research associate of the Japan Society for the Promotion of Science, in Research for the Future Program titled Cooperative Distributed Vision for Dynamic Three Dimensional Scene Understanding. His research interests are in reinforcement learning, evolutionary computation, and their applications. He is a member of IEEE, AAAI, RSJ, and JSAI. 相似文献

5.

Multiagent learning is not the answer. It is the question

Peter Stone 《Artificial Intelligence》2007,171(7):402-405

The article by Shoham, Powers, and Grenager called “If multi-agent learning is the answer, what is the question?” does a great job of laying out the current state of the art and open issues at the intersection of game theory and artificial intelligence (AI). However, from the AI perspective, the term “multiagent learning” applies more broadly than can be usefully framed in game theoretic terms. In this larger context, how (and perhaps whether) multiagent learning can be usefully applied in complex domains is still a large open question. 相似文献

6.

Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning

Dipyaman Banerjee Sandip Sen 《Autonomous Agents and Multi-Agent Systems》2007,15(1):91-108

We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs. 相似文献

7.

A layered approach to learning coordination knowledge in multiagent environments

Guray Erus Faruk Polat 《Applied Intelligence》2007,27(3):249-267

Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals. Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL) method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical levels; in the first level agents learn to select their target and then they select the action directed to their target in the second level. The agents communicate their perception to their neighbors and use the communication information in their decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior. Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University, Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding. Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests include artificial intelligence, multi-agent systems and object oriented data models. 相似文献

8.

Opportunities for multiagent systems and multiagent reinforcement learning in traffic control

Ana L. C. Bazzan 《Autonomous Agents and Multi-Agent Systems》2009,18(3):342-375

The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them. 相似文献

9.

What evolutionary game theory tells us about multiagent learning

Karl Tuyls Simon Parsons 《Artificial Intelligence》2007,171(7):406-416

This paper discusses If multi-agent learning is the answer, what is the question? [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue] from the perspective of evolutionary game theory. We briefly discuss the concepts of evolutionary game theory, and examine the main conclusions from [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue] with respect to some of our previous work. Overall we find much to agree with, concluding, however, that the central concerns of multiagent learning are rather narrow compared with the broad variety of work identified in [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Inteligence 171 (7) (2007) 365-377, this issue]. 相似文献

10.

Application of reinforcement learning to the game of Othello

Nees Jan van Eck Michiel van Wezel 《Computers & Operations Research》2008

Operations research and management science are often confronted with sequential decision making problems with large state spaces. Standard methods that are used for solving such complex problems are associated with some difficulties. As we discuss in this article, these methods are plagued by the so-called curse of dimensionality and the curse of modelling. In this article, we discuss reinforcement learning, a machine learning technique for solving sequential decision making problems with large state spaces. We describe how reinforcement learning can be combined with a function approximation method to avoid both the curse of dimensionality and the curse of modelling. To illustrate the usefulness of this approach, we apply it to a problem with a huge state space—learning to play the game of Othello. We describe experiments in which reinforcement learning agents learn to play the game of Othello without the use of any knowledge provided by human experts. It turns out that the reinforcement learning agents learn to play the game of Othello better than players that use basic strategies. 相似文献

11.

Reinforcement learning of multiple tasks using a hierarchical CMAC architecture

Chen K. Tham 《Robotics and Autonomous Systems》1995,15(4):247-274

A reinforcement learning approach based on modular function approximation is presented. Cerebellar Model Articulation Controller (CMAC) networks are incorporated in the Hierarchical Mixtures of Experts (HME) architecture and the resulting architecture is referred to as HME-CMAC. A computationally efficient on-line learning algorithm based on the Expectation Maximization (EM) algorithm is proposed in order to achieve fast function approximation with the HME-CMAC architecture.

The Compositional Q-Learning (CQ-L) framework establishes the relationship between the Q-values of composite tasks and those of elemental tasks in its decomposition. This framework is extended here to allow rewards in non-terminal states. An implementation of the extended CQ-L framework using the HME-CMAC architecture is used to perform task decomposition in a realistic simulation of a two-linked manipulator having non-linear dynamics. The context-dependent reinforcement learning achieved by adopting this approach has advantages over monolithic approaches in terms of speed of learning, storage requirements and the ability to cope with changing goals. 相似文献

12.

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Adrian K. Agogino Kagan Tumer 《Autonomous Agents and Multi-Agent Systems》2008,17(2):320-338

The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochastic domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning where the effectiveness of the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method to determine an effective reward without performing extensive simulations. We then test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents’ movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational limitations of the domain, providing rewards that combine the best properties of traditional rewards. 相似文献

13.

Multiagent control of self-reconfigurable robots

Hristo Bojinov 《Artificial Intelligence》2002,142(2):99-120

We demonstrate how multiagent systems provide useful control techniques for modular self-reconfigurable (metamorphic) robots. Such robots consist of many modules that can move relative to each other, thereby changing the overall shape of the robot to suit different tasks. Multiagent control is particularly well-suited for tasks involving uncertain and changing environments. We illustrate this approach through simulation experiments of Proteo, a metamorphic robot system currently under development. 相似文献

14.

A teaching method using a self-organizing map for reinforcement learning

Takeshi?Tateyama Email author Seiichi?Kawata Toshiki?Oguchi 《Artificial Life and Robotics》2004,7(4):193-197

We described a new preteaching method for re-inforcement learning using a self-organizing map (SOM). The purpose is to increase the learning rate using a small amount of teaching data generated by a human expert. In our proposed method, the SOM is used to generate the initial teaching data for the reinforcement learning agent from a small amount of teaching data. The reinforcement learning function of the agent is initialized by using the teaching data generated by the SOM in order to increase the probability of selecting the optimal actions it estimates. Because the agent can get high rewards from the start of reinforcement learning, it is expected that the learning rate will increase. The results of a mobile robot simulation showed that the learning rate had increased even though the human expert had showed only a small amount of teaching data. This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002 相似文献

15.

The medical diagnostic support system using extended Rough Neural Network and Multiagent

Daisuke Yamaguchi Fumiyo Katayama Muneo Takahashi Masataka Arai Kenneth J. Mackin 《Artificial Life and Robotics》2008,13(1):184-187

Multiagent technologies enable us to explore their sociological and psychological foundations. Amedical dignostic support system is built using this. Moreover, We think that the data inputted can acquire higher diagnostic accuracy by sorting out using a determination table. In this paper, the recurrence diagnostic system of cancer is built and the output error of Multiagents learning method into the usual Neural Network and a Rough Neural Network and Genetic Programming be compared. The data of the prostates cancer offered by the medical institution and a renal cancer was used for verification of a system. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008 相似文献

16.

Biped dynamic walking using reinforcement learning

Hamid Benbrahim Judy A. Franklin 《Robotics and Autonomous Systems》1997,22(3-4):283-302

This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The self scaling reinforcement (SSR) learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architecture was developed in order to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of new modules that represent new knowledge, or new requirements for the desired task. 相似文献

17.

Bio-insect and artificial robot interaction using cooperative reinforcement learning

《Applied Soft Computing》2014

In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated. 相似文献

18.

Xueqing SUN Tao MAO Laura RAY Dongqing SHI Jerald KRALIK 《控制理论与应用(英文版)》2011,9(3):440-450

A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to stateion,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge. 相似文献

19.

An intelligent negotiator agent design for bilateral contracts of electrical energy

《Expert systems with applications》2014,41(9):4073-4082

In this paper, an intelligent agent (using the Fuzzy SARSA learning approach) is proposed to negotiate for bilateral contracts (BC) of electrical energy in Block Forward Markets (BFM or similar market environments). In the BFM energy markets, the buyers (or loads) and the sellers (or generators) submit their bids and offers on a daily basis. The loads and generators could employ intelligent software agents to trade energy in BC markets on their behalves. Since each agent attempts to choose the best bid/offer in the market, conflict of interests might happen. In this work, the trading of energy in BC markets is modeled and solved using Game Theory and Reinforcement Learning (RL) approaches. The Stackelberg equation concept is used for the match making among load and generator agents. Then to overcome the negotiation limited time problems (it is assumed that a limited time is given to each generator–load pairs to negotiate and make an agreement), a Fuzzy SARSA Learning (FSL) method is used. The fuzzy feature of FSL helps the agent cope with continuous characteristics of the environment and also prevents it from the curse of dimensionality. The performance of the FSL (compared to other well-known traditional negotiation techniques, such as time-dependent and imitative techniques) is illustrated through simulation studies. The case study simulation results show that the FSL based agent could achieve more profits compared to the agents using other reviewed techniques in the BC energy market. 相似文献

20.

Multiple mini-robots navigation using a collaborative multiagent reinforcement learning framework

Piyabhum Chaysri Kostas Vlachos 《Advanced Robotics》2020,34(13):902-916

In this work we investigate the use of a reinforcement learning (RL) framework for the autonomous navigation of a group of mini-robots in a multi-agent collaborative environment. Each mini-robot is driven by inertial forces provided by two vibration motors that are controlled by a simple and efficient low-level speed controller. The action of the RL agent is the direction of each mini-robot, and it is based on the position of each mini-robot, the distance between them and the sign of the distance gradient between each mini-robot and the nearest one. Each mini-robot is considered a moving obstacle that must be avoided by the others. We propose suitable state space and reward function that result in an efficient collaborative RL framework. The classical and the double Q-learning algorithms are employed, where the latter is considered to learn optimal policies of mini-robots that offers more stable and reliable learning process. A simulation environment is created, using the ROS framework, that include a group of four mini-robots. The dynamic model of each mini-robot and of the vibration motors is also included. Several application scenarios are simulated and the results are presented to demonstrate the performance of the proposed approach. 相似文献