首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multiagent learning provides a promising paradigm to study how autonomous agents learn to achieve coordinated behavior in multiagent systems. In multiagent learning, the concurrency of multiple distributed learning processes makes the environment nonstationary for each individual learner. Developing an efficient learning approach to coordinate agents’ behavior in this dynamic environment is a difficult problem especially when agents do not know the domain structure and at the same time have only local observability of the environment. In this paper, a coordinated learning approach is proposed to enable agents to learn where and how to coordinate their behavior in loosely coupled multiagent systems where the sparse interactions of agents constrain coordination to some specific parts of the environment. In the proposed approach, an agent first collects statistical information to detect those states where coordination is most necessary by considering not only the potential contributions from all the domain states but also the direct causes of the miscoordination in a conflicting state. The agent then learns to coordinate its behavior with others through its local observability of the environment according to different scenarios of state transitions. To handle the uncertainties caused by agents’ local observability, an optimistic estimation mechanism is introduced to guide the learning process of the agents. Empirical studies show that the proposed approach can achieve a better performance by improving the average agent reward compared with an uncoordinated learning approach and by reducing the computational complexity significantly compared with a centralized learning approach. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
Cooperative, hybrid agent architecture for real-time traffic signal control   总被引:1,自引:0,他引:1  
This paper presents a new hybrid, synergistic approach in applying computational intelligence concepts to implement a cooperative, hierarchical, multiagent system for real-time traffic signal control of a complex traffic network. The large-scale traffic signal control problem is divided into various subproblems, and each subproblem is handled by an intelligent agent with a fuzzy neural decision-making module. The decisions made by lower-level agents are mediated by their respective higher-level agents. Through adopting a cooperative distributed problem solving approach, coordinated control by the agents is achieved. In order for the multiagent architecture to adapt itself continuously to the dynamically changing problem domain, a multistage online learning process for each agent is implemented involving reinforcement learning, learning rate and weight adjustment as well as dynamic update of fuzzy relations using an evolutionary algorithm. The test bed used for this research is a section of the Central Business District of Singapore. The performance of the proposed multiagent architecture is evaluated against the set of signal plans used by the current real-time adaptive traffic control system. The multiagent architecture produces significant improvements in the conditions of the traffic network, reducing the total mean delay by 40% and total vehicle stoppage time by 50%.  相似文献   

3.
《Advanced Robotics》2013,27(12):1379-1395
The existing reinforcement learning methods have been seriously suffering from the curse of the dimension problem, especially when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space variations. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive behavior against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multilearning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level and the macro actions are used to reduce the size of the physical action space. Second, the state of the other, to what extent it is close to its own goal, is estimated by observation and used as a state variable in the top layer state space to realize the cooperative/competitive behavior. The method is applied to a four (defense team)-on-five (offense team) game task and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within much shorter learning time.  相似文献   

4.
Societal norms or conventions help identify one of many appropriate behaviors during an interaction between agents. The offline study of norms is an active research area where one can reason about normative systems and include research on designing and enforcing appropriate norms at specification time. In our work, we consider the problem of the emergence of conventions in a society through distributed adaptation by agents from their online experiences at run time. The agents are connected to each other within a fixed network topology and interact over time only with their neighbours in the network. Agents recognize a social situation involving two agents that must choose one available action from multiple ones. No default behavior is specified. We study the emergence of system-wide conventions via the process of social learning where an agent learns to choose one of several available behaviors by interacting repeatedly with randomly chosen neighbors without considering the identity of the interacting agent in any particular interaction. While multiagent learning literature has primarily focused on developing learning mechanisms that produce desired behavior when two agents repeatedly interact with each other, relatively little work exists in understanding and characterizing the dynamics and emergence of conventions through social learning. We experimentally show that social learning always produces conventions for random, fully connected and ring networks and study the effect of population size, number of behavior options, different learning algorithms for behavior adoption, and influence of fixed agents on the speed of convention emergence. We also observe and explain the formation of stable, distinct subconventions and hence the lack of emergence of a global convention when agents are connected in a scale-free network.  相似文献   

5.
软间隔支持向量机(SVM,support vector machine)分类算法是目前入侵检测中最好的分类异常行为的机器学习算法之一,但是它是有监督学习方法,并不能适用于检测新的入侵行为;而1类SVM方法是一种可用于检测异常的无监督学习方法,但误警率比较高。根据以上两种方法,提出了一种改进的SVM方法,仿真实验证明这种方法是一种具有低误警率的无监督学习方法,具有和软间隔SVM相似的检测能力。  相似文献   

6.
Multiagent systems and data mining have recently attracted considerable attention in the field of computing. Reinforcement learning is the most commonly used learning process for multiagent systems. However, it still has some drawbacks, including modeling other learning agents present in the domain as part of the state of the environment, and some states are experienced much less than others, or some state-action pairs are never visited during the learning phase. Further, before completing the learning process, an agent cannot exhibit a certain behavior in some states that may be experienced sufficiently. In this study, we propose a novel multiagent learning approach to handle these problems. Our approach is based on utilizing the mining process for modular cooperative learning systems. It incorporates fuzziness and online analytical processing (OLAP) based mining to effectively process the information reported by agents. First, we describe a fuzzy data cube OLAP architecture which facilitates effective storage and processing of the state information reported by agents. This way, the action of the other agent, not even in the visual environment. of the agent under consideration, can simply be predicted by extracting online association rules, a well-known data mining technique, from the constructed data cube. Second, we present a new action selection model, which is also based on association rules mining. Finally, we generalize not sufficiently experienced states, by mining multilevel association rules from the proposed fuzzy data cube. Experimental results obtained on two different versions of a well-known pursuit domain show the robustness and effectiveness of the proposed fuzzy OLAP mining based modular learning approach. Finally, we tested the scalability of the approach presented in this paper and compared it with our previous work on modular-fuzzy Q-learning and ordinary Q-learning.  相似文献   

7.
8.
This paper suggests an evolutionary approach to design coordination strategies for multiagent systems. Emphasis is given to auction protocols since they are of utmost importance in many real world applications such as power markets. Power markets are one of the most relevant instances of multiagent systems and finding a profitable bidding strategy is a key issue to preserve system functioning and improve social welfare. Bidding strategies are modeled as fuzzy rule-based systems due to their modeling power, transparency, and ability to naturally handle imprecision in input data, an essential ingredient to a multiagent system act efficiently in practice. Specific genetic operators are suggested in this paper. Evolution of bidding strategies uncovers unknown and unexpected agent behaviors and allows a richer analysis of auction mechanisms and their role as a coordination protocol. Simulation experiments with a typical power market using actual thermal plants data show that the evolutionary, genetic-based design approach evolves strategies that enhance agents profitability when compared with the marginal cost-based strategies commonly adopted  相似文献   

9.
Conjectural Equilibrium in Multiagent Learning   总被引:2,自引:0,他引:2  
Wellman  Michael P.  Hu  Junling 《Machine Learning》1998,33(2-3):179-200
Learning in a multiagent environment is complicated by the fact that as other agents learn, the environment effectively changes. Moreover, other agents' actions are often not directly observable, and the actions taken by the learning agent can strongly bias which range of behaviors are encountered. We define the concept of a conjectural equilibrium, where all agents' expectations are realized, and each agent responds optimally to its expectations. We present a generic multiagent exchange situation, in which competitive behavior constitutes a conjectural equilibrium. We then introduce an agent that executes a more sophisticated strategic learning strategy, building a model of the response of other agents. We find that the system reliably converges to a conjectural equilibrium, but that the final result achieved is highly sensitive to initial belief. In essence, the strategic learner's actions tend to fulfill its expectations. Depending on the starting point, the agent may be better or worse off than had it not attempted to learn a model of the other agents at all.  相似文献   

10.
Computer science in general, and artificial intelligence and multiagent systems in particular, are part of an effort to build intelligent transportation systems. An efficient use of the existing infrastructure relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. In particular, traffic signal controllers located at intersections can be seen as autonomous agents. However, challenging issues are involved in this kind of modeling: the number of agents is high; in general agents must be highly adaptive; they must react to changes in the environment at individual level while also causing an unpredictable collective pattern, as they act in a highly coupled environment. Therefore, traffic signal control poses many challenges for standard techniques from multiagent systems such as learning. Despite the progress in multiagent reinforcement learning via formalisms based on stochastic games, these cannot cope with a high number of agents due to the combinatorial explosion in the number of joint actions. One possible way to reduce the complexity of the problem is to have agents organized in groups of limited size so that the number of joint actions is reduced. These groups are then coordinated by another agent, a tutor or supervisor. Thus, this paper investigates the task of multiagent reinforcement learning for control of traffic signals in two situations: agents act individually (individual learners) and agents can be “tutored”, meaning that another agent with a broader sight will recommend a joint action.  相似文献   

11.
This paper studies the cooperative control problem for a class of multiagent dynamical systems with partially unknown nonlinear system dynamics. In particular, the control objective is to solve the state consensus problem for multiagent systems based on the minimisation of certain cost functions for individual agents. Under the assumption that there exist admissible cooperative controls for such class of multiagent systems, the formulated problem is solved through finding the optimal cooperative control using the approximate dynamic programming and reinforcement learning approach. With the aid of neural network parameterisation and online adaptive learning, our method renders a practically implementable approximately adaptive neural cooperative control for multiagent systems. Specifically, based on the Bellman's principle of optimality, the Hamilton–Jacobi–Bellman (HJB) equation for multiagent systems is first derived. We then propose an approximately adaptive policy iteration algorithm for multiagent cooperative control based on neural network approximation of the value functions. The convergence of the proposed algorithm is rigorously proved using the contraction mapping method. The simulation results are included to validate the effectiveness of the proposed algorithm.  相似文献   

12.
This paper presents an Artificial Immune System (AIS)-based model for cooperative control of multiagent systems. This cooperative control model describes collective behaviors of autonomous agents, known as the AIS agents that are exemplified by the regulated activities performed by individual agents under the computation paradigm of Artificial Immune System. The regulations and emergence of agent behaviors are derived from the immune threshold measures that determine those activities performed by the AIS agents at an individual level. These threshold measures together with the collective behavioral model defined the cooperative control of the AIS-based control framework under which AIS agents behave and act strategically according to the changing environment. The cooperative control model is presented under the three domains, namely exploration, achievement and cooperation domains where AIS agents operate. In this research, we implemented the proposed cooperative control model with a case study of automated material handling with a group of AIS agents that cooperate to achieve the defined tasks.  相似文献   

13.
Cooperation among agents is important for multiagent systems having a shared goal. In this paper, an example of the pursuit problem is studied, in which four hunters collaborate to catch a target. A reinforcement learning algorithm is employed to model how the hunters acquire this cooperative behavior to achieve the task. In order to apply Q-learning, which is one way of reinforcement learning, two kinds of prediction are needed for each hunter agent. One is the location of the other hunter agents and target agent, and the other is the movement direction of the target agent at next time step t. In our treatment we extend the standard problem to systems with heterogeneous agents. One motivation for this is that the target agent and hunter agents have differing abilities. In addition, even though those hunter agents are homogeneous at the beginning of the problem, their abilities become heterogeneous in the learning process. Simulations of this pursuit problem were performed on a continuous action state space, the results of which are displayed, accompanied by a discussion of their outcomes’ dependence upon the initial locations of the hunters and the speeds of the hunters and a target.  相似文献   

14.
A hybrid machine learning approach to network anomaly detection   总被引:3,自引:0,他引:3  
Zero-day cyber attacks such as worms and spy-ware are becoming increasingly widespread and dangerous. The existing signature-based intrusion detection mechanisms are often not sufficient in detecting these types of attacks. As a result, anomaly intrusion detection methods have been developed to cope with such attacks. Among the variety of anomaly detection approaches, the Support Vector Machine (SVM) is known to be one of the best machine learning algorithms to classify abnormal behaviors. The soft-margin SVM is one of the well-known basic SVM methods using supervised learning. However, it is not appropriate to use the soft-margin SVM method for detecting novel attacks in Internet traffic since it requires pre-acquired learning information for supervised learning procedure. Such pre-acquired learning information is divided into normal and attack traffic with labels separately. Furthermore, we apply the one-class SVM approach using unsupervised learning for detecting anomalies. This means one-class SVM does not require the labeled information. However, there is downside to using one-class SVM: it is difficult to use the one-class SVM in the real world, due to its high false positive rate. In this paper, we propose a new SVM approach, named Enhanced SVM, which combines these two methods in order to provide unsupervised learning and low false alarm capability, similar to that of a supervised SVM approach.We use the following additional techniques to improve the performance of the proposed approach (referred to as Anomaly Detector using Enhanced SVM): First, we create a profile of normal packets using Self-Organized Feature Map (SOFM), for SVM learning without pre-existing knowledge. Second, we use a packet filtering scheme based on Passive TCP/IP Fingerprinting (PTF), in order to reject incomplete network traffic that either violates the TCP/IP standard or generation policy inside of well-known platforms. Third, a feature selection technique using a Genetic Algorithm (GA) is used for extracting optimized information from raw internet packets. Fourth, we use the flow of packets based on temporal relationships during data preprocessing, for considering the temporal relationships among the inputs used in SVM learning. Lastly, we demonstrate the effectiveness of the Enhanced SVM approach using the above-mentioned techniques, such as SOFM, PTF, and GA on MIT Lincoln Lab datasets, and a live dataset captured from a real network. The experimental results are verified by m-fold cross validation, and the proposed approach is compared with real world Network Intrusion Detection Systems (NIDS).  相似文献   

15.
Bacteria, bees, and birds often work together in groups to find food. A group of mobile wheeled robots can be designed to coordinate their activities to achieve a goal. Networked cooperative uninhabited air vehicles (UAVs) are being developed for commercial and military applications. In order for such multiagent systems to succeed it is often critical that they can both maintain cohesive behaviors and appropriately respond to environmental stimuli. In this paper, we characterize cohesiveness of discrete-time multiagent systems as a boundedness or stability property of the agents' position trajectories and use a Lyapunov approach to develop conditions under which local agent actions will lead to cohesive group behaviors even in the presence of i) an interagent "sensing topology" that constrains information flow, where by "information flow," we mean the sensing of positions and velocities of agents, ii) a random but bounded delay and "noise" in sensing other agents' positions and velocities, and iii) noise in sensing a resource profile that represents an environmental stimulus and quantifies the goal of the multiagent system. Simulations are used to illustrate the ideas for multivehicle systems and to make connections to synchronization of coupled oscillators  相似文献   

16.
Biasing Coevolutionary Search for Optimal Multiagent Behaviors   总被引:1,自引:0,他引:1  
Cooperative coevolutionary algorithms (CEAs) offer great potential for concurrent multiagent learning domains and are of special utility to domains involving teams of multiple agents. Unfortunately, they also exhibit pathologies resulting from their game-theoretic nature, and these pathologies interfere with finding solutions that correspond to optimal collaborations of interacting agents. We address this problem by biasing a cooperative CEA in such a way that the fitness of an individual is based partly on the result of interactions with other individuals (as is usual), and partly on an estimate of the best possible reward for that individual if partnered with its optimal collaborator. We justify this idea using existing theoretical models of a relevant subclass of CEAs, demonstrate how to apply biasing in a way that is robust with respect to parameterization, and provide some experimental evidence to validate the biasing approach. We show that it is possible to bias coevolutionary methods to better search for optimal multiagent behaviors  相似文献   

17.
针对Q学习状态空间非常大,导致收敛速度非常慢的问题,给出一种基于边界样本协调的多智能体在线合作学习方法,使得智能体在特定的子空间上进行特化并通过边界状态上的开关函数相互协调,从而能够较快地学习到局部最优.仿真实验表明该方法能够取得比全局学习更好的在线学习性能.  相似文献   

18.
In recent years, great strides have been made towards creating autonomous agents that can learn via interaction with their environment. When considering just an individual agent, it is often appropriate to model the world as being stationary, meaning that the same action from the same state will always yield the same (possibly stochastic) effects. However, in the presence of other independent agents, the environment is not stationary: an action’s effects may depend on the actions of the other agents. This non-stationarity poses the primary challenge of multiagent learning and comprises the main reason that it is best considered distinctly from single agent learning. The multiagent learning problem is often studied in the stylized settings provided by repeated matrix games. The goal of this article is to introduce a novel multiagent learning algorithm for such a setting, called Convergence with Model Learning and Safety (or CMLeS), that achieves a new set of objectives which have not been previously achieved. Specifically, CMLeS is the first multiagent learning algorithm to achieve the following three objectives: (1) converges to following a Nash equilibrium joint-policy in self-play; (2) achieves close to the best response when interacting with a set of memory-bounded agents whose memory size is upper bounded by a known value; and (3) ensures an individual return that is very close to its security value when interacting with any other set of agents. Our presentation of CMLeS is backed by a rigorous theoretical analysis, including an analysis of sample complexity wherever applicable.  相似文献   

19.
Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations.  相似文献   

20.
Reinforcement learning is one of the more prominent machine-learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system so the speed of learning can be accelerated. In the proposed learning algorithm, an agent adapts to comply with its peers by learning carefully when it obtains a positive reinforcement feedback signal, but should learn more aggressively if a negative reward follows the action just taken. These two properties are applied to develop the proposed cooperative learning method. This research presents the novel use of the fastest policy hill-climbing methods of Win or Lose Fast (WoLF) with policy-sharing. Results from the multi-agent cooperative domain illustrate that the proposed algorithms perform better than Q-learning alone in a piano mover environment. It also demonstrates that agents can learn to accomplish a task together efficiently through repetitive trials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号