首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 797 毫秒
1.
This paper studies a multi-goal Q-learning algorithm of cooperative teams. Member of the cooperative teams is simulated by an agent. In the virtual cooperative team, agents adapt its knowledge according to cooperative principles. The multi-goal Q-learning algorithm is approached to the multiple learning goals. In the virtual team, agents learn what knowledge to adopt and how much to learn (choosing learning radius). The learning radius is interpreted in Section 3.1. Five basic experiments are manipulated proving the validity of the multi-goal Q-learning algorithm. It is found that the learning algorithm causes agents to converge to optimal actions, based on agents’ continually updated cognitive maps of how actions influence learning goals. It is also proved that the learning algorithm is beneficial to the multiple goals. Furthermore, the paper analyzes how sensitive the learning performance is affected by the parameter values of the learning algorithm.  相似文献   

2.
Temporal-Difference-Fusion Architecture for Learning, Cognition, and Navigation (TD-FALCON) is a generalization of adaptive resonance theory (a class of self-organizing neural networks) that incorporates TD methods for real-time reinforcement learning. In this paper, we investigate how a team of TD-FALCON networks may cooperate to learn and function in a dynamic multiagent environment based on minefield navigation and a predator/prey pursuit tasks. Experiments on the navigation task demonstrate that TD-FALCON agent teams are able to adapt and function well in a multiagent environment without an explicit mechanism of collaboration. In comparison, traditional Q-learning agents using gradient-descent-based feedforward neural networks, trained with the standard backpropagation and the resilient-propagation (RPROP) algorithms, produce a significantly poorer level of performance. For the predator/prey pursuit task, we experiment with various cooperative strategies and find that a combination of a high-level compressed state representation and a hybrid reward function produces the best results. Using the same cooperative strategy, the TD-FALCON team also outperforms the RPROP-based reinforcement learners in terms of both task completion rate and learning efficiency.  相似文献   

3.
Team development and group processes of virtual learning teams   总被引:2,自引:0,他引:2  
This study describes the community building process of virtual learning teams as they form, establish roles and group norms, and address conflict. Students enrolled in an HRD masters program taught entirely online were studied to determine (1) how virtual learning teams develop their group process, and (2) what process and strategies they use as they work through the stages of group development. Both quantitative and qualitative methods of inquiry were used to capture the dynamic interaction within groups and the underlying factors that guided group process and decision-making. The results show that virtual learning groups can collaborate effectively from a distance to accomplish group tasks. The development of virtual learning teams is closely connected to the timeline for their class projects. Virtual teams are also similar in terms of their task process and the use of communication technologies. In contrast to face-to-face teams, the leadership role of virtual teams is shared among team members. Recommendations are discussed in order to facilitate peak integration of virtual learning teams into Internet-based training courses.  相似文献   

4.
Local strategy learning in networked multi-agent team formation   总被引:1,自引:0,他引:1  
Networked multi-agent systems are comprised of many autonomous yet interdependent agents situated in a virtual social network. Two examples of such systems are supply chain networks and sensor networks. A common challenge in many networked multi-agent systems is decentralized team formation among the spatially and logically extended agents. Even in cooperative multi-agent systems, efficient team formation is made difficult by the limited local information available to the individual agents. We present a model of distributed multi-agent team formation in networked multi-agent systems, describe a policy learning framework for joining teams based on local information, and give empirical results on improving team formation performance. In particular, we show that local policy learning from limited information leads to a significant increase in organizational team formation performance compared to a random policy.  相似文献   

5.
A new impetus for greater knowledge‐sharing among team members needs to be emphasized due to the emergence of a significant new form of working known as ‘global virtual teams’. As information and communication technologies permeate every aspect of organizational life and impact the way teams communicate, work and structure relationships, global virtual teams require innovative communication and learning capabilities for different team members to effectively work together across cultural, organizational and geographical boundaries. Whereas information technology‐facilitated communication processes rely on technologically advanced systems to succeed, the ability to create a knowledge‐sharing culture within a global virtual team rests on the existence (and maintenance) of intra‐team respect, mutual trust, reciprocity and positive individual and group relationships. Thus, some of the inherent questions we address in our paper are: (1) what are the cross‐cultural challenges faced by global virtual teams?; (2) how do organizations develop a knowledge sharing culture to promote effective organizational learning among culturally‐diverse team members? and; (3) what are some of the practices that can help maximize the performance of global virtual teams? We conclude by examining ways that global virtual teams can be more effectively managed in order to reach their potential in this new interconnected world and put forward suggestions for further research.  相似文献   

6.
Software process tailoring (SPT) is a team‐based and learning‐intensive activity that addresses the particular dynamic characteristics of a development project. Because SPT critically influences how projects are conducted, its performance should be investigated. However, the extant literature lacks empirical evidence on how the underlying effects of SPT performance and its team‐supportive factors operate and influence software project performance. From the knowledge perspective, this study adopts dynamic capabilities theory and considers the learning ability and absorptive capacity of software project teams to develop a theoretical model to address this gap. The results of an empirical examination of the model with 135 software project teams advance our understanding of how team‐level learning antecedents—experience, communication quality and trust—dynamically facilitate teams' absorptive capacity (AC) when they conduct SPT, which in turn reinforces project performance. The mediating effects of the proposed model are unveiled and discussed, and theoretical implications as well as practical guidance for how AC and these factors promote SPT and project performance are suggested.  相似文献   

7.
Learning Situation-Specific Coordination in Cooperative Multi-agent Systems   总被引:1,自引:0,他引:1  
Achieving effective cooperation in a multi-agent system is a difficult problem for a number of reasons such as limited and possibly out-dated views of activities of other agents and uncertainty about the outcomes of interacting non-local tasks. In this paper, we present a learning system called COLLAGE, that endows the agents with the capability to learn how to choose the most appropriate coordination strategy from a set of available coordination strategies. COLLAGE relies on meta-level information about agents' problem solving situations to guide them towards a suitable choice for a coordination strategy. We present empirical results that strongly indicate the effectiveness of the learning algorithm.  相似文献   

8.
自适应系统所处的环境往往是不确定的,其变化事先难以预测,如何支持这种环境下复杂自适应系统的开发已经成为软件工程领域面临的一项重要挑战.强化学习是机器学习领域中的一个重要分支,强化学习系统能够通过不断试错的方式,学习环境状态到可执行动作的最优对应策略.本文针对自适应系统环境不确定的问题,将Agent技术与强化学习技术相结...  相似文献   

9.
Multi-agent learning (MAL) studies how agents learn to behave optimally and adaptively from their experience when interacting with other agents in dynamic environments. The outcome of a MAL process is jointly determined by all agents’ decision-making. Hence, each agent needs to think strategically about others’ sequential moves, when planning future actions. The strategic interactions among agents makes MAL go beyond the direct extension of single-agent learning to multiple agents. With the strategic thinking, each agent aims to build a subjective model of others decision-making using its observations. Such modeling is directly influenced by agents’ perception during the learning process, which is called the information structure of the agent’s learning. As it determines the input to MAL processes, information structures play a significant role in the learning mechanisms of the agents. This review creates a taxonomy of MAL and establishes a unified and systematic way to understand MAL from the perspective of information structures. We define three fundamental components of MAL: the information structure (i.e., what the agent can observe), the belief generation (i.e., how the agent forms a belief about others based on the observations), as well as the policy generation (i.e., how the agent generates its policy based on its belief). In addition, this taxonomy enables the classification of a wide range of state-of-the-art algorithms into four categories based on the belief-generation mechanisms of the opponents, including stationary, conjectured, calibrated, and sophisticated opponents. We introduce Value of Information (VoI) as a metric to quantify the impact of different information structures on MAL. Finally, we discuss the strengths and limitations of algorithms from different categories and point to promising avenues of future research.  相似文献   

10.
Adversarial decision making is aimed at determining optimal decision strategies to deal with an adaptive opponent. A clear example of such situation is the repeated imitation game presented here. Two agents compete in an adversarial model where one agent wants to learn how to imitate the actions taken by the other agent by means of the observation and memorization of the past actions. One defense against this adversary is to make decisions that are intended to confuse him. To achieve this, randomized strategies that change along time for one of the agents are proposed and their performance is analysed from both a theoretical and empirical point of view. We also study the ability of the imitator to avoid deception and adapt to a new behaviour by forgetting the oldest observations. The results confirm that wrong assumptions about the imitator’s behaviour lead to dramatic losses due to a failure in causing deception.  相似文献   

11.
《Information & Management》2002,39(6):445-456
Forming virtual organizations (VOs) is a new workplace strategy that is also needed to prepare information, technology, and knowledge workers for functioning well in inter-organizational teams. University information studies programs can simulate VOs in courses and teach certain skill sets that are needed in VO work: critical thinking, analytical methods, ethical problem solving, stakeholder analysis, and writing policy are among the needed skills and abilities. Simulated virtual teams allow participants to learn to trust team members and to understand how communication and product development can work effectively in a virtual workspace. It is hoped that some of these methods could be employed in corporate training programs also.In an innovative course, inter-university VOs were created to develop information products. Groups in four geographically dispersed universities cooperated in the project; at its conclusion, students answered a self-administered survey about their experience. Each team’s success or difficulties were apparently closely related to issues of trust in the team process. Access to and ease of communication tools also played a role in the participants’ perceptions of the learning experience and teamwork.  相似文献   

12.
Multiagent learning provides a promising paradigm to study how autonomous agents learn to achieve coordinated behavior in multiagent systems. In multiagent learning, the concurrency of multiple distributed learning processes makes the environment nonstationary for each individual learner. Developing an efficient learning approach to coordinate agents’ behavior in this dynamic environment is a difficult problem especially when agents do not know the domain structure and at the same time have only local observability of the environment. In this paper, a coordinated learning approach is proposed to enable agents to learn where and how to coordinate their behavior in loosely coupled multiagent systems where the sparse interactions of agents constrain coordination to some specific parts of the environment. In the proposed approach, an agent first collects statistical information to detect those states where coordination is most necessary by considering not only the potential contributions from all the domain states but also the direct causes of the miscoordination in a conflicting state. The agent then learns to coordinate its behavior with others through its local observability of the environment according to different scenarios of state transitions. To handle the uncertainties caused by agents’ local observability, an optimistic estimation mechanism is introduced to guide the learning process of the agents. Empirical studies show that the proposed approach can achieve a better performance by improving the average agent reward compared with an uncoordinated learning approach and by reducing the computational complexity significantly compared with a centralized learning approach. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

13.
As a typical data visualization technique, self-organizing map (SOM) has been extensively applied to data clustering, image analysis, dimension reduction, and so forth. In a conventional adaptive SOM, it needs to choose an appropriate learning rate whose value is monotonically reduced over time to ensure the convergence of the map, meanwhile being kept large enough so that the map is able to gradually learn the data topology. Otherwise, the SOM's performance may seriously deteriorate. In general, it is nontrivial to choose an appropriate monotonically decreasing function for such a learning rate. In this letter, we therefore propose a novel rival-model penalized self-organizing map (RPSOM) learning algorithm that, for each input, adaptively chooses several rivals of the best-matching unit (BMU) and penalizes their associated models, i.e., those parametric real vectors with the same dimension as the input vectors, a little far away from the input. Compared to the existing methods, this RPSOM utilizes a constant learning rate to circumvent the awkward selection of a monotonically decreased function for the learning rate, but still reaches a robust result. The numerical experiments have shown the efficacy of our algorithm  相似文献   

14.
In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.  相似文献   

15.
Simulator-based training is in constant pursuit of increasing level of realism. The transition from doctrine-driven computer-generated forces (CGF) to adaptive CGF represents one such effort. The use of doctrine-driven CGF is fraught with challenges such as modeling of complex expert knowledge and adapting to the trainees’ progress in real time. Therefore, this paper reports on how the use of adaptive CGF can overcome these challenges. Using a self-organizing neural network to implement the adaptive CGF, air combat maneuvering strategies are learned incrementally and generalized in real time. The state space and action space are extracted from the same hierarchical doctrine used by the rule-based CGF. In addition, this hierarchical doctrine is used to bootstrap the self-organizing neural network to improve learning efficiency and reduce model complexity. Two case studies are conducted. The first case study shows how adaptive CGF can converge to the effective air combat maneuvers against rule-based CGF. The subsequent case study replaces the rule-based CGF with human pilots as the opponent to the adaptive CGF. The results from these two case studies show how positive outcome from learning against rule-based CGF can differ markedly from learning against human subjects for the same tasks. With a better understanding of the existing constraints, an adaptive CGF that performs well against rule-based CGF and human subjects can be designed.  相似文献   

16.
基于强化学习的多Agent协作研究   总被引:2,自引:0,他引:2  
强化学习为多Agent之间的协作提供了鲁棒的学习方法.本文首先介绍了强化学习的原理和组成要素,其次描述了多Agent马尔可夫决策过程MMDP,并给出了Agent强化学习模型.在此基础上,对多Agent协作过程中存在的两种强化学习方式:IL(独立学习)和JAL(联合动作学习)进行了比较.最后分析了在有多个最优策略存在的情况下,协作多Agent系统常用的几种协调机制.  相似文献   

17.
Innovative products, services, and processes are consequences of knowledge integration. Often the integration is of newly generated knowledge with other knowledge that the firm already has available. Especially in firms where long run performance depends on innovation, it behooves managers to think about how newly generated knowledge can be transferred quickly, effectively, and reliably, and thereby can be integrated with the firm’s current knowledge for early exploitation or captured as organizational knowledge in the form of practices, procedures, or files. New knowledge frequently originates in the context and activity of project teams – e.g., R&D teams, design teams, and re-engineering teams. In order to carry out their tasks, such teams frequently need to learn things already known to other organizational units, i.e., they need to acquire and assimilate organizational knowledge. Theoretically then, project teams both draw on the firm’s knowledge and contribute to the firm’s knowledge. The more effectively they carry out these actions, the more effective they are and the more effective their parent firms will be. This article identifies project team and organizational design practices that facilitate project team learning and contributions to organizational knowledge.  相似文献   

18.
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.  相似文献   

19.
One problem in the design of multi-agent systems is the difficulty of predicting the occurrences that one agent might face, also to recognize and to predict their optimum behavior in these situations. Therefore, one of the most important characteristic of the agent is their ability during adoption, to learn, and correct their behavior. With consideration of the continuously changing environment, the back and forth learning of the agents, the inability to see the agent’s action first hand, and their chosen strategies, learning in a multi-agent environment can be very complex. On the one hand, with recognition to the current learning models that are used in deterministic environment that behaves linearly, which contain weaknesses; therefore, the current learning models are unproductive in complex environments that the actions of agents are stochastic. Therefore, it is necessary for the creation of learning models that are effective in stochastic environments. Purpose of this research is the creation of such a learning model. For this reason, the Hopfield and Boltzmann learning algorithms are used. In order to demonstrate the performance of their algorithms, first, an unlearned multi-agent model is created. During the interactions of the agents, they try to increase their knowledge to reach a specific value. The predicated index is the number of changed states needed to reach the convergence. Then, the learned multi-agent model is created with the Hopfield learning algorithm, and in the end, the learned multi-agent model is created with the Boltzmann learning algorithm. After analyzing the obtained figures, a conclusion can be made that when learning impose to multi-agent environment the average number of changed states needed to reach the convergence decreased and the use of Boltzmann learning algorithm decreased the average number of changed states even further in comparison with Hopfield learning algorithm due to the increase in the number of choices in each situation. Therefore, it is possible to say that the multi-agent systems behave stochastically, the more closer they behave to their true character, the speed of reaching the global solution increases.  相似文献   

20.
Colearning in Differential Games   总被引:1,自引:0,他引:1  
Sheppard  John W. 《Machine Learning》1998,33(2-3):201-233
Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite set of actions. Further, most of this research has focused on a single player or team learning how to play against another player or team that is applying a fixed strategy for playing the game. In this paper, we explore multiagent learning in the context of game playing and develop algorithms for co-learning in which all players attempt to learn their optimal strategies simultaneously. Specifically, we address two approaches to colearning, demonstrating strong performance by a memory-based reinforcement learner and comparable but faster performance with a tree-based reinforcement learner.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号