共查询到20条相似文献,搜索用时 0 毫秒
2.
Reinforcement learning (RL) is one of the methods of solving problems defined in multiagent systems. In the real world, the
state is continuous, and agents take continuous actions. Since conventional RL schemes are often defined to deal with discrete
worlds, there are difficulties such as the representation of an RL evaluation function. In this article, we intend to extend
an RL algorithm so that it is applicable to continuous world problems. This extension is done by a combination of an RL algorithm
and a function approximator. We employ Q-learning as the RL algorithm, and a neural network model called the normalized Gaussian
network as the function approximator. The extended RL method is applied to a chase problem in a continuous world. The experimental
result shows that our RL scheme was successful.
This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January
26–28, 2000 相似文献
3.
We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes
of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary,
self-play and “other” (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first
two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate
the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV
σ(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against
eventually stationary opponents—a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV
σ(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV
σ(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly,
our analysis is in continuous time. Moreover, experiments show that RV
σ(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF
converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key
desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge,
this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game. 相似文献
4.
This article describes the issues in multiagent learning towards RoboCup, 1≈3) especially for the real robot leagues. First, the review of the issue in the context of the related area is given, then related
works from several viewpoints are reviewed. Next, our approach towards RoboCup Initiative is introduced and finally future
issues are given.
Minoru Asada, Ph.D.: He received B.E., M.Sc., and Ph.D., degrees in control engineering from Osaka University, in 1977, 1979, and 1982, respectively.
From 1982 to 1988, he was a research associate of Control Engineering, Osaka University. In 1989, he became an associate professor
of Mechanical Engineering for Computer-Controlled Machinery, Osaka University. In 1995 he became a professor of the department
of Adaptive Machine Systems at the same university. From 1986 to 1987, he was a visiting researcher of Center for Automation
Research, University of Maryland, College Park, MD.
He received the 1992 best paper award of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS92), and
the 1996 best paper award of RSJ (Robotics Society of Japan). Also, his paper was one of the finalists of IEEE Robotics and
Automation Society 1995 Best Conference Paper Award. He was a general chair of IEEE/RSJ 1996 International Conference on Intelligent
Robots and Systems (IROS96). Since early 1990, he has been involved in RoboCup activities and his team was the first champion
team with USC team in the middle size league of the first RoboCup held in conjunction with IJCAI-97, Nagoya, Japan.
Eiji Uchibe, Ph.D.: He received a Ph.D. degree in mechanical engineering from Osaka University in 1999. He is currently a research associate
of the Japan Society for the Promotion of Science, in Research for the Future Program titled Cooperative Distributed Vision
for Dynamic Three Dimensional Scene Understanding. His research interests are in reinforcement learning, evolutionary computation,
and their applications. He is a member of IEEE, AAAI, RSJ, and JSAI. 相似文献
5.
The article by Shoham, Powers, and Grenager called “If multi-agent learning is the answer, what is the question?” does a great job of laying out the current state of the art and open issues at the intersection of game theory and artificial intelligence (AI). However, from the AI perspective, the term “multiagent learning” applies more broadly than can be usefully framed in game theoretic terms. In this larger context, how (and perhaps whether) multiagent learning can be usefully applied in complex domains is still a large open question. 相似文献
6.
We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume
that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium
in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash
Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated
outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that
also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to
identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional
probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically
show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL
learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge
to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of
other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and
JAL on all structurally distinct two-player conflict games with ordinal payoffs. 相似文献
7.
Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals.
Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal
learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL)
method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical
levels; in the first level agents learn to select their target and then they select the action directed to their target in
the second level. The agents communicate their perception to their neighbors and use the communication information in their
decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior.
Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle
East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University,
Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent
perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding.
Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received
his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees
in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting
NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests
include artificial intelligence, multi-agent systems and object oriented data models. 相似文献
8.
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general,
and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional
capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to
multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation
system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also
the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of
a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly
adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and
act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent
systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches
and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and
challenges so that future research in multiagent systems can address them. 相似文献
9.
This paper discusses If multi-agent learning is the answer, what is the question? [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue] from the perspective of evolutionary game theory. We briefly discuss the concepts of evolutionary game theory, and examine the main conclusions from [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue] with respect to some of our previous work. Overall we find much to agree with, concluding, however, that the central concerns of multiagent learning are rather narrow compared with the broad variety of work identified in [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Inteligence 171 (7) (2007) 365-377, this issue]. 相似文献
10.
Operations research and management science are often confronted with sequential decision making problems with large state spaces. Standard methods that are used for solving such complex problems are associated with some difficulties. As we discuss in this article, these methods are plagued by the so-called curse of dimensionality and the curse of modelling. In this article, we discuss reinforcement learning, a machine learning technique for solving sequential decision making problems with large state spaces. We describe how reinforcement learning can be combined with a function approximation method to avoid both the curse of dimensionality and the curse of modelling. To illustrate the usefulness of this approach, we apply it to a problem with a huge state space—learning to play the game of Othello. We describe experiments in which reinforcement learning agents learn to play the game of Othello without the use of any knowledge provided by human experts. It turns out that the reinforcement learning agents learn to play the game of Othello better than players that use basic strategies. 相似文献
11.
A reinforcement learning approach based on modular function approximation is presented. Cerebellar Model Articulation Controller (CMAC) networks are incorporated in the Hierarchical Mixtures of Experts (HME) architecture and the resulting architecture is referred to as HME-CMAC. A computationally efficient on-line learning algorithm based on the Expectation Maximization (EM) algorithm is proposed in order to achieve fast function approximation with the HME-CMAC architecture. The Compositional Q-Learning (CQ-L) framework establishes the relationship between the Q-values of composite tasks and those of elemental tasks in its decomposition. This framework is extended here to allow rewards in non-terminal states. An implementation of the extended CQ-L framework using the HME-CMAC architecture is used to perform task decomposition in a realistic simulation of a two-linked manipulator having non-linear dynamics. The context-dependent reinforcement learning achieved by adopting this approach has advantages over monolithic approaches in terms of speed of learning, storage requirements and the ability to cope with changing goals. 相似文献
12.
The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning
algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often
preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among
the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic,
stochastic domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning where the effectiveness of
the reward structure is difficult to distinguish from the effectiveness of the chosen learning algorithm. In this paper, we
present a new reward evaluation method that provides a visualization of the tradeoff between the level of coordination among
the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm
and is only a function of the problem domain and the agents’ reward structure. We use this reward property visualization method
to determine an effective reward without performing extensive simulations. We then test this method in both a static and a
dynamic multi-rover learning domain where the agents have continuous state spaces and take noisy actions (e.g., the agents’
movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward
efficiency visualization method provides a two order of magnitude speedup in selecting good rewards, compared to running a
full simulation. In addition, this method facilitates the design and analysis of new rewards tailored to the observational
limitations of the domain, providing rewards that combine the best properties of traditional rewards. 相似文献
13.
We demonstrate how multiagent systems provide useful control techniques for modular self-reconfigurable (metamorphic) robots. Such robots consist of many modules that can move relative to each other, thereby changing the overall shape of the robot to suit different tasks. Multiagent control is particularly well-suited for tasks involving uncertain and changing environments. We illustrate this approach through simulation experiments of Proteo, a metamorphic robot system currently under development. 相似文献
14.
We described a new preteaching method for re-inforcement learning using a self-organizing map (SOM). The purpose is to increase
the learning rate using a small amount of teaching data generated by a human expert. In our proposed method, the SOM is used
to generate the initial teaching data for the reinforcement learning agent from a small amount of teaching data. The reinforcement
learning function of the agent is initialized by using the teaching data generated by the SOM in order to increase the probability
of selecting the optimal actions it estimates. Because the agent can get high rewards from the start of reinforcement learning,
it is expected that the learning rate will increase. The results of a mobile robot simulation showed that the learning rate
had increased even though the human expert had showed only a small amount of teaching data.
This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18,
2002 相似文献
15.
Multiagent technologies enable us to explore their sociological and psychological foundations. Amedical dignostic support
system is built using this. Moreover, We think that the data inputted can acquire higher diagnostic accuracy by sorting out
using a determination table. In this paper, the recurrence diagnostic system of cancer is built and the output error of Multiagents
learning method into the usual Neural Network and a Rough Neural Network and Genetic Programming be compared. The data of
the prostates cancer offered by the medical institution and a renal cancer was used for verification of a system.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
16.
This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The self scaling reinforcement (SSR) learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architecture was developed in order to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of new modules that represent new knowledge, or new requirements for the desired task. 相似文献
17.
In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated. 相似文献
18.
A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to stateion,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge. 相似文献
19.
In this paper, an intelligent agent (using the Fuzzy SARSA learning approach) is proposed to negotiate for bilateral contracts (BC) of electrical energy in Block Forward Markets (BFM or similar market environments). In the BFM energy markets, the buyers (or loads) and the sellers (or generators) submit their bids and offers on a daily basis. The loads and generators could employ intelligent software agents to trade energy in BC markets on their behalves. Since each agent attempts to choose the best bid/offer in the market, conflict of interests might happen. In this work, the trading of energy in BC markets is modeled and solved using Game Theory and Reinforcement Learning (RL) approaches. The Stackelberg equation concept is used for the match making among load and generator agents. Then to overcome the negotiation limited time problems (it is assumed that a limited time is given to each generator–load pairs to negotiate and make an agreement), a Fuzzy SARSA Learning (FSL) method is used. The fuzzy feature of FSL helps the agent cope with continuous characteristics of the environment and also prevents it from the curse of dimensionality. The performance of the FSL (compared to other well-known traditional negotiation techniques, such as time-dependent and imitative techniques) is illustrated through simulation studies. The case study simulation results show that the FSL based agent could achieve more profits compared to the agents using other reviewed techniques in the BC energy market. 相似文献
20.
In this work we investigate the use of a reinforcement learning (RL) framework for the autonomous navigation of a group of mini-robots in a multi-agent collaborative environment. Each mini-robot is driven by inertial forces provided by two vibration motors that are controlled by a simple and efficient low-level speed controller. The action of the RL agent is the direction of each mini-robot, and it is based on the position of each mini-robot, the distance between them and the sign of the distance gradient between each mini-robot and the nearest one. Each mini-robot is considered a moving obstacle that must be avoided by the others. We propose suitable state space and reward function that result in an efficient collaborative RL framework. The classical and the double Q-learning algorithms are employed, where the latter is considered to learn optimal policies of mini-robots that offers more stable and reliable learning process. A simulation environment is created, using the ROS framework, that include a group of four mini-robots. The dynamic model of each mini-robot and of the vibration motors is also included. Several application scenarios are simulated and the results are presented to demonstrate the performance of the proposed approach. 相似文献
|