首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 337 毫秒
1.
This paper relieves the ‘curse of dimensionality’ problem, which becomes intractable when scaling reinforcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others’ actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster learning speed compared with friend-Q learning and independent learning.  相似文献   

2.
An agent is a computer software that is capable of taking independent action on behalf of its user or owner. It is an entity with goals, actions and domain knowledge, situated in an environment. Multiagent systems comprises of multiple autonomous, interacting computer software, or agents. These systems can successfully emulate the entities active in a distributed environment. The analysis of multiagent behavior has been studied in this paper based on a specific board game problem similar to the famous problem of GO. In this paper a framework is developed to define the states of the multiagent entities and measure the convergence metrics for this problem. An analysis of the changes of states leading to the goal state is also made. We support our study of multiagent behavior by simulations based on a CORBA framework in order to substantiate our findings.  相似文献   

3.
In many applications, the pre-information on regression function is always unknown. Therefore, it is necessary to learn regression function by means of some valid tools. In this paper we investigate the regression problem in learning theory, i.e., convergence rate of regression learning algorithm with least square schemes in multi-dimensional polynomial space. Our main aim is to analyze the generalization error for multi-regression problems in learning theory. By using the famous Jackson operators in approximation theory, covering number, entropy number and relative probability inequalities, we obtain the estimates of upper and lower bounds for the convergence rate of learning algorithm. In particular, it is shown that for multi-variable smooth regression functions, the estimates are able to achieve almost optimal rate of convergence except for a logarithmic factor. Our results are significant for the research of convergence, stability and complexity of regression learning algorithm.  相似文献   

4.
Mobile agent has shown its promise as a powerful means to complement and enhance existing technology in various application areas. In particular, existing work has demonstrated that MA can simplify the development and improve the performance of certain classes of distributed applications, especially for those running on a wide-area, heterogeneous, and dynamic networking environment like the Internet. In our previous work, we extended the application of MA to the design of distributed control functions, which require the maintenance of logical relationship among and/or coordination of proc- essing entities in a distributed system. A novel framework is presented for structuring and building distributed systems, which use cooperating mobile agents as an aid to carry out coordination and cooperation tasks in distributed systems. The framework has been used for designing various distributed control functions such as load balancing and mutual ex- clusion in our previous work. In this paper, we use the framework to propose a novel ap- proach to detecting deadlocks in distributed system by using mobile agents, which dem- onstrates the advantage of being adaptive and flexible of mobile agents. We first describe the MAEDD (Mobile Agent Enabled Deadlock Detection) scheme, in which mobile agents are dispatched to collect and analyze deadlock information distributed across the network sites and, based on the analysis, to detect and resolve deadlocks. Then the design of an adaptive hybrid algorithm derived from the framework is presented. The algorithm can dynamically adapt itself to the changes in system state by using different deadlock detec- tion strategies. The performance of the proposed algorithm has been evaluated using simulations. The results show that the algorithm can outperform existing algorithms that use a fixed deadlock detection strategy.  相似文献   

5.
In this paper, we propose a framework that uses localization for multi-objective optimization to simultaneously guide an evolutionary algorithm in both the decision and objective spaces. The localization is built using a limited number of adaptive spheres (local models) in the decision space. These spheres axe usually guided, using some direction information, in the decision space towards the areas with non-dominated solutions. We use a second mechanism to adjust the spheres to specialize on different parts of the Paxeto front by using a guided dominance technique in the objective space. Through this interleaved guidance in both spaces, the spheres will be guided towards different parts of the Paxeto front while also exploring the decision space efficiently. The experimental results showed good performance for the local models using this dual guidance, in comparison with their original version.  相似文献   

6.
基于主动学习的文档分类   总被引:3,自引:0,他引:3  
In the field of text categorization,the number of unlabeled documents is generally much gretaer than that of labeled documents. Text categorization is the problem of categorization in high-dimension vector space, and more training samples will generally improve the accuracy of text classifier. How to add the unlabeled documents of training set so as to expand training set is a valuable problem. The theory of active learning is introducted and applied to the field of text categorization in this paper ,exploring the method of using unlabeled documents to improve the accuracy oftext classifier. It is expected that such technology will improve text classifier's accuracy through adopting relativelylarge number of unlabelled documents samples. We brought forward an active learning based algorithm for text categorization,and the experiments on Reuters news corpus showed that when enough training samples available,it′s effective for the algorithm to promote text classifier's accuracy through adopting unlabelled document samples.  相似文献   

7.
Clonal selection algorithm is improved and proposed as a method to implement nonlinear optimal iterative learning control algorithm. In the method, more priori information was coded in a clonal selection algorithm to decrease the size of the search space and to deal with constraint on input. Another clonal selection algorithm is used as a model modifying device to cope with uncertainty in the plant model. Finally, simulations show that the convergence speed is satisfactory regardless of the nature of the plant and whether or not the plant model is precise.  相似文献   

8.
We consider the problem of controlling a group of mobile agents to form a designated formation while flocking within a constrained environment. We first propose a potential field based method to drive the agents to move in connection with their neighbors, and regulate their relative positions to achieve the specific formation. The communication topology is preserved during the motion. We then extend the method to flocking with environmental constraints. Stability properties are analyzed to guarantee that all agents eventually form the desired formation while flocking, and flock safely without collision with the environment boundary. We verify our algorithm through simulations on a group of agents performing maximum coverage flocking and traveling through an unknown constrained environment.  相似文献   

9.
We propose a novel approach,namely local reduction of networks,to extract the global core(GC,for short)from a complex network.The algorithm is built based on the small community phenomenon of networks.The global cores found by our local reduction from some classical graphs and benchmarks convince us that the global core of a network is intuitively the supporting graph of the network,which is"similar to"the original graph,that the global core is small and essential to the global properties of the network,and that the global core,together with the small communities gives rise to a clear picture of the structure of the network,that is,the galaxy structure of networks.We implement the local reduction to extract the global cores for a series of real networks,and execute a number of experiments to analyze the roles of the global cores for various real networks.For each of the real networks,our experiments show that the found global core is small,that the global core is similar to the original network in the sense that it follows the power law degree distribution with power exponent close to that of the original network,that the global core is sensitive to errors for both cascading failure and physical attack models,in the sense that a small number of random errors in the global core may cause a major failure of the whole network,and that the global core is a good approximate solution to the r-radius center problem,leading to a galaxy structure of the network.  相似文献   

10.
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics. We propose a model-free data-driven inverse reinforcement learning (RL) algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert’s states and control inputs. This allows us to address the imitation learning problem without prior knowledge of the expert’s system dynamics. To achieve this, we provide a basic model-based algorithm that is built upon RL and inverse optimal control. This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators. Theoretical analysis and simulation examples verify the methods.  相似文献   

11.
Creating coordinated multiagent policies in environments with uncertainty is a challenging problem, which can be greatly simplified if the coordination needs are known to be limited to specific parts of the state space. In this work, we explore how such local interactions can simplify coordination in multiagent systems. We focus on problems in which the interaction between the agents is sparse and contribute a new decision-theoretic model for decentralized sparse-interaction multiagent systems, Dec-SIMDPs, that explicitly distinguishes the situations in which the agents in the team must coordinate from those in which they can act independently. We relate our new model to other existing models such as MMDPs and Dec-MDPs. We then propose a solution method that takes advantage of the particular structure of Dec-SIMDPs and provide theoretical error bounds on the quality of the obtained solution. Finally, we show a reinforcement learning algorithm in which independent agents learn both individual policies and when and how to coordinate. We illustrate the application of the algorithms throughout the paper in several multiagent navigation scenarios.  相似文献   

12.
强化学习在多Agent系统中面对的最大问题就是随着Agent数量的增加而导致的状态和动作空间的指数增长以及随之而来的缓慢的学习效率。采用了一种局部合作的Q-学习方法,只有在Agent之间有明确协作时才考察联合动作,否则,就只进行简单的个体Agent的Q-学习,从而使的学习时所要考察的状态动作对值大大减少。最后算法在捕食者-猎物的追逐问题和机器人足球仿真2D上的实验结果,与常用的多Agent强化学习技术相比有更好的效能。  相似文献   

13.
Multiagent learning provides a promising paradigm to study how autonomous agents learn to achieve coordinated behavior in multiagent systems. In multiagent learning, the concurrency of multiple distributed learning processes makes the environment nonstationary for each individual learner. Developing an efficient learning approach to coordinate agents’ behavior in this dynamic environment is a difficult problem especially when agents do not know the domain structure and at the same time have only local observability of the environment. In this paper, a coordinated learning approach is proposed to enable agents to learn where and how to coordinate their behavior in loosely coupled multiagent systems where the sparse interactions of agents constrain coordination to some specific parts of the environment. In the proposed approach, an agent first collects statistical information to detect those states where coordination is most necessary by considering not only the potential contributions from all the domain states but also the direct causes of the miscoordination in a conflicting state. The agent then learns to coordinate its behavior with others through its local observability of the environment according to different scenarios of state transitions. To handle the uncertainties caused by agents’ local observability, an optimistic estimation mechanism is introduced to guide the learning process of the agents. Empirical studies show that the proposed approach can achieve a better performance by improving the average agent reward compared with an uncoordinated learning approach and by reducing the computational complexity significantly compared with a centralized learning approach. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
多agent系统的一个关键性的问题就是多agent之间的协作,即一组agent需要选择出一个联合动作,使得整体效用最大化。该文提出了基于值规则的协作图,并改进了变量消减算法,使用它们可以实现多agent在通信条件受到限制的离散状态空间里进行动作选择。  相似文献   

15.
16.
We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration (PI), i.e., start from some base policy and generate an improved policy. Rollout is the simplest method of this type, where just one improved policy is generated. We can view PI as repeated application of rollout, where the rollout policy at each iteration serves as the base policy for the next iteration. In contrast with PI, rollout has a robustness property: it can be applied on-line and is suitable for on-line replanning. Moreover, rollout can use as base policy one of the policies produced by PI, thereby improving on that policy. This is the type of scheme underlying the prominently successful AlphaZero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected (conceptually) by a separate agent. This is the class of multiagent problems where the agents have a shared objective function, and a shared and perfect state information. Based on a problem reformulation that trades off control space complexity with state space complexity, we develop an approach, whereby at every stage, the agents sequentially (one-at-a-time) execute a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of total computation required at every stage grows linearly with the number of agents. By contrast, in the standard rollout algorithm, the amount of total computation grows exponentially with the number of agents. Despite the dramatic reduction in required computation, we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout: it guarantees an improved performance relative to the base policy. We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information, which is sufficient to maintain the cost improvement property, without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems, we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation. For one of our PI algorithms, we prove convergence to an agent-by-agent optimal policy, thus establishing a connection with the theory of teams. For another PI algorithm, which is executed over a more complex state space, we prove convergence to an optimal policy. Approximate forms of these algorithms are also given, based on the use of policy and value neural networks. These PI algorithms, in both their exact and their approximate form are strictly off-line methods, but they can be used to provide a base policy for use in an on-line multiagent rollout scheme.   相似文献   

17.
多Agent之间的协调(coordination)与协作(cooperation)已经成为多Agent系统(multiagent system,MAS)中的一个关键问题。这是因为MAS的主要研究目标之一就是使得多Agent的信念、意图、期望、行为达到协调甚至协作。在开放、动态的MAS环境下,具有不同目标的多个Agent必须对其资源的使用以及目标的实现进行协调[1,4]。例如,在出现资源冲突时,若没有很好的协调机制,就有可能出现死锁。而在另一种情况下,当单个Agent无法独立完成目标,需要其它Agent帮助时,则需要协作。本文提出了一种基于正关系的多Agent协调机制和协调算法。在该算法中,通过使用这种协调机制,Agent能委托或接受交互中的子计划,从而形成系统负载均衡和有效降低系统运行开销。  相似文献   

18.
Computer science in general, and artificial intelligence and multiagent systems in particular, are part of an effort to build intelligent transportation systems. An efficient use of the existing infrastructure relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. In particular, traffic signal controllers located at intersections can be seen as autonomous agents. However, challenging issues are involved in this kind of modeling: the number of agents is high; in general agents must be highly adaptive; they must react to changes in the environment at individual level while also causing an unpredictable collective pattern, as they act in a highly coupled environment. Therefore, traffic signal control poses many challenges for standard techniques from multiagent systems such as learning. Despite the progress in multiagent reinforcement learning via formalisms based on stochastic games, these cannot cope with a high number of agents due to the combinatorial explosion in the number of joint actions. One possible way to reduce the complexity of the problem is to have agents organized in groups of limited size so that the number of joint actions is reduced. These groups are then coordinated by another agent, a tutor or supervisor. Thus, this paper investigates the task of multiagent reinforcement learning for control of traffic signals in two situations: agents act individually (individual learners) and agents can be “tutored”, meaning that another agent with a broader sight will recommend a joint action.  相似文献   

19.
以微小卫星集群实现小行星探测为背景,研究局部信息交互的空间目标观测任务构形调整.针对难以直接求取集群构形调整的全局最优解问题,利用通信协调图,将全局协调决策分解成多个局部求解问题,并引入强化学习机制实现求解.针对集群全局协调决策问题,通过设计基于Max-plus算法的全局协调决策算法来实现全局协作;针对单星局部优化问题,设计基于神经网络的局部Q学习算法来实现单星动作调整规划.仿真结果表明,本文所提的协作规划算法能自主有效地将集群调整至期望构形,实现协同观测任务.  相似文献   

20.
To date, many researchers have proposed various methods to improve the learning ability in multiagent systems. However, most of these studies are not appropriate to more complex multiagent learning problems because the state space of each learning agent grows exponentially in terms of the number of partners present in the environment. Modeling other learning agents present in the domain as part of the state of the environment is not a realistic approach. In this paper, we combine advantages of the modular approach, fuzzy logic and the internal model in a single novel multiagent system architecture. The architecture is based on a fuzzy modular approach whose rule base is partitioned into several different modules. Each module deals with a particular agent in the environment and maps the input fuzzy sets to the action Q-values; these represent the state space of each learning module and the action space, respectively. Each module also uses an internal model table to estimate actions of the other agents. Finally, we investigate the integration of a parallel update method with the proposed architecture. Experimental results obtained on two different environments of a well-known pursuit domain show the effectiveness and robustness of the proposed multiagent architecture and learning approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号