首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We comment on the Shoham, Powers, and Grenager survey of multi-agent learning and game theory, emphasizing that some of their categories are important for economics and others are not. We also try to correct some minor imprecisions in their discussion of the economics literature on learning in games.  相似文献   

2.
Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method.  相似文献   

3.
Computer Supported Collaborative Learning is a pedagogical approach that can be used for deploying educational games in the classroom. However, there is no clear understanding as to which technological platforms are better suited for deploying co-located collaborative games, nor the general affordances that are required. In this work we explore two different technological platforms for developing collaborative games in the classroom: one based on augmented reality technology and the other based on multiple-mice technology. In both cases, the same game was introduced to teach electrostatics and the results were compared experimentally using a real class.  相似文献   

4.
RAM-based neural networks are designed to be efficiently implemented in hardware. The desire to retain this property influences the training algorithms used, and has led to the use of reinforcement (reward-penalty) learning. An analysis of the reinforcement algorithm applied to RAM-based nodes has shown the ease with which unlearning can occur. An amended algorithm is proposed which demonstrates improved learning performance compared to previously published reinforcement regimes.  相似文献   

5.
Recently, many models of reinforcement learning with hierarchical or modular structures have been proposed. They decompose a task into simpler subtasks and solve them by using multiple agents. However, these models impose certain restrictions on the topological relations of agents and so on. By relaxing these restrictions, we propose networked reinforcement learning, where each agent in a network acts autonomously by regarding the other agents as a part of its environment. Although convergence to an optimal policy is no longer assured, by means of numerical simulations, we show that our model functions appropriately, at least in certain simple situations. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

6.
In multi-agent systems, the study of language and communication is an active field of research. In this paper we present the application of Reinforcement Learning (RL) to the self-emergence of a common lexicon in robot teams. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the robots or the states of the environment itself) into symbols or signals we check whether it is possible for the robot team to converge in an autonomous, decentralized way to a common lexicon by means of RL, so that the communication efficiency of the entire robot team is optimal. We have conducted several experiments aimed at testing whether it is possible to converge with RL to an optimal Saussurean Communication System. We have organized our experiments alongside two main lines: first, we have investigated the effect of the team size centered on teams of moderated size in the order of 5 and 10 individuals, typical of multi-robot systems. Second, and foremost, we have also investigated the effect of the lexicon size on the convergence results. To analyze the convergence of the robot team we have defined the team’s consensus when all the robots (i.e. 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that RL allows the convergence to lexicon consensus in a population of autonomous agents.  相似文献   

7.
Transfer in variable-reward hierarchical reinforcement learning   总被引:1,自引:1,他引:1  
Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, Variable-Reward Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs share the same task hierarchy. Our experimental results in a simplified real-time strategy domain show that significant transfer learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across different SMDPs.  相似文献   

8.
Hamid  M.R.   《Automatica》2008,44(5):1350-1357
Cellular learning automata is a combination of cellular automata and learning automata. The synchronous version of cellular learning automata in which all learning automata in different cells are activated synchronously, has found many applications. In some applications a type of cellular learning automata in which learning automata in different cells are activated asynchronously (asynchronous cellular learning automata) is needed. In this paper, we introduce asynchronous cellular learning automata and study its steady state behavior. Then an application of this new model to cellular networks has been presented.  相似文献   

9.
Shaping multi-agent systems with gradient reinforcement learning   总被引:1,自引:0,他引:1  
An original reinforcement learning (RL) methodology is proposed for the design of multi-agent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordinated system is of crucial importance. To that end, we design simple reactive agents in a decentralized way as independent learners. But to cope with the difficulties inherent to RL used in that framework, we have developed an incremental learning algorithm where agents face a sequence of progressively more complex tasks. We illustrate this general framework by computer experiments where agents have to coordinate to reach a global goal. This work has been conducted in part in NICTA’s Canberra laboratory.  相似文献   

10.
为加快分层强化学习中任务层次结构的自动生成速度,提出了一种基于多智能体系统的并行自动分层方法,该方法以Sutton提出的Option分层强化学习方法为理论框架,首先由多智能体合作对状态空间进行并行探测并集中聚类产生状态子空间,然后多智能体并行学习生成各子空间上内部策略,最终生成Option.以二维有障碍栅格空间内两点间最短路径规划为任务背景给出了算法并进行了仿真实验和分析,结果表明,并行自动分层方法生成任务层次结构的速度明显快于以往的串行自动分层方法.本文的方法适用于空间探测、路径规划、追逃等类问题领域.  相似文献   

11.
Inductive inference can be considered as one of the fundamental paradigms of algorithmic learning theory. We survey results recently obtained and show their impact to potential applications.  相似文献   

12.
In multi-agent reinforcement learning systems, it is important to share a reward among all agents. We focus on theRationality Theorem of Profit Sharing 5) and analyze how to share a reward among all profit sharing agents. When an agent gets adirect reward R (R>0), anindirect reward μR (μ≥0) is given to the other agents. We have derived the necessary and sufficient condition to preserve the rationality as follows;
whereM andL are the maximum number of conflicting all rules and rational rules in the same sensory input,W andW o are the maximum episode length of adirect and anindirect-reward agents, andn is the number of agents. This theory is derived by avoiding the least desirable situation whose expected reward per an action is zero. Therefore, if we use this theorem, we can experience several efficient aspects of reward sharing. Through numerical examples, we confirm the effectiveness of this theorem. Kazuteru Miyazaki, Dr. Eng.: He is an associate professor in the Faculty of Assessment and Research for Degrees at National Institution for Academic Degrees. He obtained his BEng. form Meiji University in 1991, and his Dr. Eng. form Tokyo Institute of Technology in 1996. His research interests are in Machine Learning and Robotics. He has published over 30 research papers and received several awards. He is a member of the Japan Society of Mechanical Engineers (JSME), Japanese Society for Artificial Intelligence (JSAI), and the Society of Instrument and Control Engineers of Japan (SICE). Shigenobu Kobayashi, Dr. Eng.: He received his Dr. Eng. from Tokyo Institute of Technology in 1974. He is professor at Dept. of Computational Intelligence and Systems Science, Tokyo Institute of Technology. His research interests include artificial intelligence, emergent systems, evolutionary computation and reinforcement learning.  相似文献   

13.
Multi-agent reinforcement learning methods suffer from several deficiencies that are rooted in the large state space of multi-agent environments. This paper tackles two deficiencies of multi-agent reinforcement learning methods: their slow learning rate, and low quality decision-making in early stages of learning. The proposed methods are applied in a grid-world soccer game. In the proposed approach, modular reinforcement learning is applied to reduce the state space of the learning agents from exponential to linear in terms of the number of agents. The modular model proposed here includes two new modules, a partial-module and a single-module. These two new modules are effective for increasing the speed of learning in a soccer game. We also apply the instance-based learning concepts, to choose proper actions in states that are not experienced adequately during learning. The key idea is to use neighbouring states that have been explored sufficiently during the learning phase. The results of experiments in a grid-soccer game environment show that our proposed methods produce a higher average reward compared to the situation where the proposed method is not applied to the modular structure.  相似文献   

14.
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.  相似文献   

15.
Artificial intelligence for digital games constitutes the implementation of a set of algorithms and techniques from both traditional and modern artificial intelligence in order to provide solutions to a range of game dependent problems. However, the majority of current approaches lead to predefined, static and predictable game agent responses, with no ability to adjust during game-play to the behaviour or playing style of the player. Machine learning techniques provide a way to improve the behavioural dynamics of computer controlled game agents by facilitating the automated generation and selection of behaviours, thus enhancing the capabilities of digital game artificial intelligence and providing the opportunity to create more engaging and entertaining game-play experiences. This paper provides a survey of the current state of academic machine learning research for digital game environments, with respect to the use of techniques from neural networks, evolutionary computation and reinforcement learning for game agent control.  相似文献   

16.
As suggested by the title of Shoham, Powers, and Grenager's position paper [Y. Shoham, R. Powers, T. Grenager, If multi-agent learning is the answer, what is the question? Artificial Intelligence 171 (7) (2007) 365-377, this issue], the ultimate lens through which the multi-agent learning framework should be assessed is “what is the question?”. In this paper, we address this question by presenting challenges motivated by engineering applications and discussing the potential appeal of multi-agent learning to meet these challenges. Moreover, we highlight various differences in the underlying assumptions and issues of concern that generally distinguish engineering applications from models that are typically considered in the economic game theory literature.  相似文献   

17.
A cooperative game of a pair of learning automata   总被引:1,自引:0,他引:1  
A cooperative game played in a sequential manner by a pair of learning automata is investigated in this paper. The automata operate in an unknown random environment which gives a common pay-off to the automata. Necessary and sufficient conditions on the functions in the reinforcement scheme are given for absolute monotonicity which enables the expected pay-off to be monotonically increasing in any arbitrary environment. As each participating automaton operates with no information regarding the other partner, the results of the paper are relevant to decentralized control.  相似文献   

18.
Concussion education and prevention for youth hockey players has been an issue of recent concern amongst sport medicine practitioners and hockey’s administrative bodies. This article details the assessment of a sports-action hockey video game that aims to reduce the aggressive and negligent behaviours that can lead to concussions. The game, termed Alert Hockey, was designed to modify game playing behaviour by embedding an implicit teaching mechanism within the gameplay. In Alert Hockey, participants were expected to learn by simply playing to win, in contrast to playing to learn. We studied learning in an experimental simulated environment where the possibility to win the game was exaggerated as a consequence of desirable safety behaviours (positive learning group) and effectively reduced as a consequence of undesirable (negative learning group) behaviour. The positive learning group significantly improved their mean score on a composite behavioural indicator compared with no significant change amongst control group participants. The results demonstrate that implicit learning embedded in a sports-action game can lead to changes in game-play behaviour.  相似文献   

19.
对强化学习中的探索方案进行了研究,描述了间接探索和直接探索两种方案各自的特点.综合它们的优点,提出了一种集直接探索和间接探索为一体的混合探索方案.该方案在学习的初始阶段,由于对环境的经验知识较少,侧重于直接探索;在获得比较多的经验后,侧重于间接探索,使得行动选择渐渐趋向于最优策略.实验表明该方案比纯粹的间接探索-greedy方案有更高的学习效率.  相似文献   

20.
We deal with a special class of games against nature which correspond to subsymbolic learning problems where we know a local descent direction in the error landscape but not the amount gained at each step of the learning procedure. Namely, Alice and Bob play a game where the probability of victory grows monotonically by unknown amounts with the resources each employs. For a fixed effort on Alice’s part Bob increases his resources on the basis of the results of the individual contests (victory, tie or defeat). Quite unlike the usual ones in game theory, his aim is to stop as soon as the defeat probability goes under a given threshold with high confidence. We adopt such a game policy as an archetypal remedy to the general overtraining threat of learning algorithms. Namely, we deal with the original game in a computational learning framework analogous to the Probably Approximately Correct formulation. Therein, a wise use of a special inferential mechanism (known as twisting argument) highlights relevant statistics for managing different trade-offs between observability and controllability of the defeat probability. With similar statistics we discuss an analogous trade-off at the basis of the stopping criterion of subsymbolic learning procedures. As a conclusion, we propose a principled stopping rule based solely on the behavior of the training session, hence without distracting examples into a test set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号