共查询到20条相似文献,搜索用时 0 毫秒
1.
Ana L. C. Bazzan 《Autonomous Agents and Multi-Agent Systems》2009,18(3):342-375
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general,
and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional
capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to
multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation
system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also
the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of
a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly
adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and
act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent
systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches
and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and
challenges so that future research in multiagent systems can address them. 相似文献
2.
We propose a new type of artificial potential field, that we call hybrid potential field, to navigate a robot in situations in which the environment is known except for unknown and possibly moving obstacles. We show how to compute hybrid potential fields in real time and use them to control the motions of a real robot. Our method is tested on both a real robot and a simulated one. We present a feature matching approach for position error correction that we have validated experimentally with our mobile robot. We show extensive simulation results with up to 50 randomly moving obstacles. 相似文献
3.
Tuomas Sandholm 《Artificial Intelligence》2007,171(7):382-391
4.
This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The self scaling reinforcement (SSR) learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architecture was developed in order to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of new modules that represent new knowledge, or new requirements for the desired task. 相似文献
5.
Neural reinforcement learning for behaviour synthesis 总被引:5,自引:0,他引:5
We present the results of a research aimed at improving the Q-learning method through the use of artificial neural networks. Neural implementations are interesting due to their generalisation ability. Two implementations are proposed: one with a competitive multilayer perceptron and the other with a self-organising map. Results obtained on a task of learning an obstacle avoidance behaviour for the mobile miniature robot Khepera show that this last implementation is very effective, learning more than 40 times faster than the basic Q-learning implementation. These neural implementations are also compared with several Q-learning enhancements, like the Q-learning with Hamming distance, Q-learning with statistical clustering and Dyna-Q. 相似文献
6.
We described a new preteaching method for re-inforcement learning using a self-organizing map (SOM). The purpose is to increase
the learning rate using a small amount of teaching data generated by a human expert. In our proposed method, the SOM is used
to generate the initial teaching data for the reinforcement learning agent from a small amount of teaching data. The reinforcement
learning function of the agent is initialized by using the teaching data generated by the SOM in order to increase the probability
of selecting the optimal actions it estimates. Because the agent can get high rewards from the start of reinforcement learning,
it is expected that the learning rate will increase. The results of a mobile robot simulation showed that the learning rate
had increased even though the human expert had showed only a small amount of teaching data.
This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18,
2002 相似文献
7.
Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals.
Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal
learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL)
method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical
levels; in the first level agents learn to select their target and then they select the action directed to their target in
the second level. The agents communicate their perception to their neighbors and use the communication information in their
decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior.
Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle
East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University,
Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent
perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding.
Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received
his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees
in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting
NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests
include artificial intelligence, multi-agent systems and object oriented data models. 相似文献
8.
In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated. 相似文献
9.
Michael Bowling 《Artificial Intelligence》2002,136(2):215-250
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents. This creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, “Win or Learn Fast”, for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method. 相似文献
10.
The fuzzy min-max neural network constitutes a neural architecture that is based on hyperbox fuzzy sets and can be incrementally trained by appropriately adjusting the number of hyperboxes and their corresponding volumes. Two versions have been proposed: for supervised and unsupervised learning. In this paper a modified approach is presented that is appropriate for reinforcement learning problems with discrete action space and is applied to the difficult task of autonomous vehicle navigation when no a priori knowledge of the enivronment is available. Experimental results indicate that the proposed reinforcement learning network exhibits superior learning behavior compared to conventional reinforcement schemes. 相似文献
11.
Stock trading is an important decision-making problem that involves both stock selection and asset management. Though many promising results have been reported for predicting prices, selecting stocks, and managing assets using machine-learning techniques, considering all of them is challenging because of their complexity. In this paper, we present a new stock trading method that incorporates dynamic asset allocation in a reinforcement-learning framework. The proposed asset allocation strategy, called meta policy (MP), is designed to utilize the temporal information from both stock recommendations and the ratio of the stock fund over the asset. Local traders are constructed with pattern-based multiple predictors, and used to decide the purchase money per recommendation. Formulating the MP in the reinforcement learning framework is achieved by a compact design of the environment and the learning agent. Experimental results using the Korean stock market show that the proposed MP method outperforms other fixed asset-allocation strategies, and reduces the risks inherent in local traders. 相似文献
12.
13.
Gabriel Gómez-Pérez José D. Martín-Guerrero Emilio Soria-Olivas Emili Balaguer-Ballester Alberto Palomares Nicolás Casariego 《Expert systems with applications》2009,36(4):8022-8031
In this work, RL is used to find an optimal policy for a marketing campaign. Data show a complex characterization of state and action spaces. Two approaches are proposed to circumvent this problem. The first approach is based on the self-organizing map (SOM), which is used to aggregate states. The second approach uses a multilayer perceptron (MLP) to carry out a regression of the action-value function. The results indicate that both approaches can improve a targeted marketing campaign. Moreover, the SOM approach allows an intuitive interpretation of the results, and the MLP approach yields robust results with generalization capabilities. 相似文献
14.
The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player
games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty
avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted
with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce
several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program,
KITTY.
This work was presented, in part, at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January
16–18, 2002 相似文献
15.
Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method. 相似文献
16.
In this article, we propose a new control method using reinforcement learning (RL) with the concept of sliding mode control
(SMC). Some remarkable characteristics of the SMC method are good robustness and stability for deviations from control conditions.
On the other hand, RL may be applicable to complex systems that are difficult to model. However, applying reinforcement learning
to a real system has a serious problem, i.e., many trials are required for learning. We intend to develop a new control method
with good characteristics for both these methods. To realize it, we employ the actor-critic method, a kind of RL, to unite
with the SMC. We are able to verify the effectiveness of the proposed control method through a computer simulation of inverted
pendulum control without the use of inverted pendulum dynamics. In particular, it is shown that the proposed method enables
the RL to learn in fewer trials than the reinforcement learning method.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
17.
The aim of this paper is to present (jointly) a series of robust high performance (award winning) implementations of reinforcement learning algorithms based on temporal-difference learning and weighted k- nearest neighbors for linear function approximation. These algorithms, named kNN‐TD(λ) methods, where rigorously tested at the Second and Third Annual Reinforcement Learning Competitions (RLC2008 and RCL2009) held in Helsinki and Montreal respectively, where the kNN‐TD(λ) method (JAMH team) won in the PolyAthlon 2008 domain, obtained the second place in 2009 and also the second place in the Mountain-Car 2008 domain showing that it is one of the state of the art general purpose reinforcement learning implementations. These algorithms are able to learn quickly, to generalize properly over continuous state spaces and also to be robust to a high degree of environmental noise. Furthermore, we describe a derivation of kNN‐TD(λ) algorithm for problems where the use of continuous actions have clear advantages over the use of fine grained discrete actions: the Ex〈a〉 reinforcement learning algorithm. 相似文献
18.
P. L. Lanzi 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2002,6(3-4):162-170
We analyze learning classifier systems in the light of tabular reinforcement learning. We note that although genetic algorithms
are the most distinctive feature of learning classifier systems, it is not clear whether genetic algorithms are important
to learning classifiers systems. In fact, there are models which are strongly based on evolutionary computation (e.g., Wilson's
XCS) and others which do not exploit evolutionary computation at all (e.g., Stolzmann's ACS). To find some clarifications,
we try to develop learning classifier systems “from scratch”, i.e., starting from one of the most known reinforcement learning
technique, Q-learning. We first consider thebasics of reinforcement learning: a problem modeled as a Markov decision process
and tabular Q-learning. We introduce a formal framework to define a general purpose rule-based representation which we use
to implement tabular Q-learning. We formally define generalization within rules and discuss the possible approaches to extend
our rule-based Q-learning with generalization capabilities. We suggest that genetic algorithms are probably the most general
approach for adding generalization although they might be not the only solution. 相似文献
19.
One of the difficulties encountered in the application of reinforcement learning methods to real-world problems is their limited ability to cope with large-scale or continuous spaces. In order to solve the curse of the dimensionality problem, resulting from making continuous state or action spaces discrete, a new fuzzy Actor-Critic reinforcement learning network (FACRLN) based on a fuzzy radial basis function (FRBF) neural network is proposed. The architecture of FACRLN is realized by a four-layer FRBF neural network that is used to approximate both the action value function of the Actor and the state value function of the Critic simultaneously. The Actor and the Critic networks share the input, rule and normalized layers of the FRBF network, which can reduce the demands for storage space from the learning system and avoid repeated computations for the outputs of the rule units. Moreover, the FRBF network is able to adjust its structure and parameters in an adaptive way with a novel self-organizing approach according to the complexity of the task and the progress in learning, which ensures an economic size of the network. Experimental studies concerning a cart-pole balancing control illustrate the performance and applicability of the proposed FACRLN. 相似文献
20.
David Vengerov 《The Journal of supercomputing》2008,43(1):1-19
Multi-tier storage systems are becoming more and more widespread in the industry. They have more tunable parameters and built-in
policies than traditional storage systems, and an adequate configuration of these parameters and policies is crucial for achieving
high performance. A very important performance indicator for such systems is the response time of the file I/O requests. The
response time can be minimized if the most frequently accessed (“hot”) files are located in the fastest storage tiers. Unfortunately,
it is impossible to know a priori which files are going to be hot, especially because the file access patterns change over
time. This paper presents a policy-based framework for dynamically deciding which files need to be upgraded and which files
need to be downgraded based on their recent access pattern and on the system’s current state. The paper also presents a reinforcement
learning (RL) algorithm for automatically tuning the file migration policies in order to minimize the average request response
time. A multi-tier storage system simulator was used to evaluate the migration policies tuned by RL, and such policies were
shown to achieve a significant performance improvement over the best hand-crafted policies found for this domain.
相似文献
David VengerovEmail: |