期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development

Minoru Asada Eiji Uchibe Koh Hosoda 《Artificial Intelligence》1999,110(2):275

In this paper, we first discuss the meaning of physical embodiment and the complexity of the environment in the context of multi-agent learning. We then propose a vision-based reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup (Kitano et al., 1997) to illustrate the effectiveness of our method. Each agent works with other team members to achieve a common goal against opponents. Our method estimates the relationships between a learner's behaviors and those of other agents in the environment through interactions (observations and actions) using a technique from system identification. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis to clarify the relationship between the observed data in terms of actions and future observations. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior policy. The proposed method is applied to a soccer playing situation. The method successfully models a rolling ball and other moving agents and acquires the learner's behaviors. Computer simulations and real experiments are shown and a discussion is given. 相似文献

2.

Distributed learning and cooperative control for multi-agent systems 总被引：1，自引：0，他引：1

Jongeun Choi Songhwai Oh Roberto HorowitzAuthor vitae 《Automatica》2009,45(12):2802-2814

This paper presents an algorithm and analysis of distributed learning and cooperative control for a multi-agent system so that a global goal of the overall system can be achieved by locally acting agents. We consider a resource-constrained multi-agent system, in which each agent has limited capabilities in terms of sensing, computation, and communication. The proposed algorithm is executed by each agent independently to estimate an unknown field of interest from noisy measurements and to coordinate multiple agents in a distributed manner to discover peaks of the unknown field. Each mobile agent maintains its own local estimate of the field and updates the estimate using collective measurements from itself and nearby agents. Each agent then moves towards peaks of the field using the gradient of its estimated field while avoiding collision and maintaining communication connectivity. The proposed algorithm is based on a recursive spatial estimation of an unknown field. We show that the closed-loop dynamics of the proposed multi-agent system can be transformed into a form of a stochastic approximation algorithm and prove its convergence using Ljung’s ordinary differential equation (ODE) approach. We also present extensive simulation results supporting our theoretical results. 相似文献

3.

Using machine learning in a cooperative hybrid parallel strategy of metaheuristics

J.M. Cadenas M.C. Garrido 《Information Sciences》2009,179(19):3255-3267

This paper proposes the construction of a centralized hybrid metaheuristic cooperative strategy to solve optimization problems. Knowledge (intelligence) is incorporated into the coordinator to improve performance. This knowledge is incorporated through a set of rules and models obtained from a knowledge extraction process applied to the records of the results returned by individual metaheuristics. The effectiveness of the approach is tested in several computational experiments in which we compare the results obtained by the individual metaheuristics, by several non-cooperative and cooperative strategies and by the strategy proposed in this paper. 相似文献

4.

Bio-insect and artificial robot interaction using cooperative reinforcement learning

《Applied Soft Computing》2014

In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated. 相似文献

5.

A pilot study of cooperative programming learning behavior and its relationship with students' learning performance

Wu-Yuin HwangRustam Shadiev Chin-Yu Wang Zhi-Hua Huang 《Computers & Education》2012,58(4):1267-1281

In this study we proposed a web-based programming assisted system for cooperation (WPASC) and we also designed one learning activity for facilitating students' cooperative programming learning. The aim of this study was to investigate cooperative programming learning behavior of students and its relationship with learning performance. Students' opinions and perceptions toward learning activity and the WPASC were also investigated. The results of this study revealed that most of students perceived that learning activity and the WPASC were useful for cooperative programming learning. Students' learning behavior during cooperative programming learning activity was classified into six different categories and we found that learning behavior has relationship with learning performance. Students from completely independent, self-improving using assistance, confident after enlightenment and imitating categories performed well due to their effective and motivated learning behavior. However, students from performing poorly without assistance and plagiarizing categories performed the worse; the former could not get assistance at all and the later had no learning motivation. The results also showed that students' learning behavior may have increasing, decreasing and no transition during problems solving. Therefore, performing poorly without assistance and plagiarizing learning behavior and decreasing transition or no transition in learning behavior should be identified right after completing a programming problem. Then the instructor should intervene into learning behavior in order to change it into more effective for learning. Besides, more incentives need to be given for increasing students' learning motivation and posting solutions and feedback by students at the early stage of a problem solving period. 相似文献

6.

Shaping multi-agent systems with gradient reinforcement learning 总被引：1，自引：0，他引：1

Olivier Buffet Alain Dutech François Charpillet 《Autonomous Agents and Multi-Agent Systems》2007,15(2):197-220

An original reinforcement learning (RL) methodology is proposed for the design of multi-agent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordinated system is of crucial importance. To that end, we design simple reactive agents in a decentralized way as independent learners. But to cope with the difficulties inherent to RL used in that framework, we have developed an incremental learning algorithm where agents face a sequence of progressively more complex tasks. We illustrate this general framework by computer experiments where agents have to coordinate to reach a global goal. This work has been conducted in part in NICTA’s Canberra laboratory. 相似文献

7.

A robust reinforcement learning using the concept of sliding mode control

M. Obayashi N. Nakahara T. Kuremoto K. Kobayashi 《Artificial Life and Robotics》2009,13(2):526-530

In this article, we propose a new control method using reinforcement learning (RL) with the concept of sliding mode control (SMC). Some remarkable characteristics of the SMC method are good robustness and stability for deviations from control conditions. On the other hand, RL may be applicable to complex systems that are difficult to model. However, applying reinforcement learning to a real system has a serious problem, i.e., many trials are required for learning. We intend to develop a new control method with good characteristics for both these methods. To realize it, we employ the actor-critic method, a kind of RL, to unite with the SMC. We are able to verify the effectiveness of the proposed control method through a computer simulation of inverted pendulum control without the use of inverted pendulum dynamics. In particular, it is shown that the proposed method enables the RL to learn in fewer trials than the reinforcement learning method. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008 相似文献

8.

Decentralized control of cooperative robots without velocity-force measurements

Juan C. Martínez-Rosas^{Author Vitae} Marco A. Arteaga Author Vitae Author Vitae 《Automatica》2006,42(2):329-336

One of the main practical problems on cooperative robots is the complexity of integrating a large amount of expensive velocity-force sensors. In this paper, the control of cooperative robots using only joint measurements is considered to manipulate an object firmly. Experimental results are shown to support the developed theory. 相似文献

9.

Reinforcement learning of dynamic behavior by using recurrent neural networks

Ahmet Onat Hajime Kita Yoshikazu Nishikawa 《Artificial Life and Robotics》1997,1(3):117-121

Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy. In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states. This work was presented, in part, at the International Symposium on Artificial Life and Robotics, Oita, Japan, February 18–20, 1996 相似文献

10.

A reinforcement learning optimized negotiation method based on mediator agent

《Expert systems with applications》2014,41(16):7630-7640

This paper firstly proposes a bilateral optimized negotiation model based on reinforcement learning. This model negotiates on the issue price and the quantity, introducing a mediator agent as the mediation mechanism, and uses the improved reinforcement learning negotiation strategy to produce the optimal proposal. In order to further improve the performance of negotiation, this paper then proposes a negotiation method based on the adaptive learning of mediator agent. The simulation results show that the proposed negotiation methods make the efficiency and the performance of the negotiation get improved. 相似文献

11.

A novel modular Q-learning architecture to improve performance under incomplete learning in a grid soccer game

Sahar Araghi Abbas Khosravi Michael Johnstone Douglas Creighton 《Engineering Applications of Artificial Intelligence》2013,26(9):2164-2171

Multi-agent reinforcement learning methods suffer from several deficiencies that are rooted in the large state space of multi-agent environments. This paper tackles two deficiencies of multi-agent reinforcement learning methods: their slow learning rate, and low quality decision-making in early stages of learning. The proposed methods are applied in a grid-world soccer game. In the proposed approach, modular reinforcement learning is applied to reduce the state space of the learning agents from exponential to linear in terms of the number of agents. The modular model proposed here includes two new modules, a partial-module and a single-module. These two new modules are effective for increasing the speed of learning in a soccer game. We also apply the instance-based learning concepts, to choose proper actions in states that are not experienced adequately during learning. The key idea is to use neighbouring states that have been explored sufficiently during the learning phase. The results of experiments in a grid-soccer game environment show that our proposed methods produce a higher average reward compared to the situation where the proposed method is not applied to the modular structure. 相似文献

12.

Optimal consensus algorithms for cooperative team of agents subject to partial information 总被引：1，自引：0，他引：1

E. Semsar-Kazerooni Author Vitae Author Vitae 《Automatica》2008,44(11):2766-2777

The objectives of this work are the development and design of controllers for a team of agents that accomplish consensus for agents’ output in both leaderless (LL) and modified leader-follower (MLF) architectures. Towards this end, a semi-decentralized optimal control strategy is designed based on minimization of individual cost functions over a finite horizon using local information. Interactions among agents due to information flows are represented through the control channels in characterization of the dynamical model of each agent. It is shown that minimization of the proposed cost functions results in a modified consensus algorithm for LL and MLF architectures. In the latter case, the desired output is assumed to be available for only the leader while the followers should follow the leader using information exchanges existing among themselves and the leader through a predefined topology. Furthermore, the performance of the cooperative team under a member’s fault is formally analyzed and investigated. The robustness of the team to uncertainties and faults in the leader or followers and adaptability of the team members to these unanticipated situations are also shown rigorously. Finally, simulation results are presented to demonstrate the effectiveness of our proposed methodologies in achieving prespecified requirements. 相似文献

13.

A feedback control structure for on-line learning tasks

Manfred Huber Roderic A. Grupen 《Robotics and Autonomous Systems》1997,22(3-4):303-315

This paper addresses adaptive control architectures for systems that respond autonomously to changing tasks. Such systems often have many sensory and motor alternatives and behavior drawn from these produces varying quality solutions. The objective is then to ground behavior in control laws which, combined with resources, enumerate closed-loop behavioral alternatives. Use of such controllers leads to analyzable and predictable composite systems, permitting the construction of abstract behavioral models. Here, discrete event system and reinforcement learning techniques are employed to constrain the behavioral alternatives and to synthesize behavior on-line. To illustrate this, a quadruped robot learning a turning gait subject to safety and kinematic constraints is presented. 相似文献

14.

A cooperative multi-agent platform for invention based on patent document analysis and ontology 总被引：6，自引：0，他引：6

Von-Wun Soo Szu-Yin Lin Shih-Yao Yang Shih-Neng Lin Shian-Luen Cheng 《Expert systems with applications》2006,31(4):766-775

We propose a cooperative multi-agent platform to support the invention process based on the patent document analysis. It helps industrial knowledge managers to retrieve and analyze existing patent documents and extract structure information from patents with the aid of ontology and natural language processing techniques. It allows the invention process to be carried out through the cooperation and coordination among software agents delegated by the various domain experts in the complex industrial R&D environment. Furthermore, it integrates the patent document analysis with the inventive problem solving method known as TRIZ method that can suggest invention directions based on the heuristics or principles to resolve the contradictions among design objectives and engineering parameters. We chose the patent invention for chemical mechanical polishing (CMP) as our case study. However, the platform and techniques could be extended to most cooperative invention domains. 相似文献

15.

Learning control of autonomous robots using an instance-based classifier generator in continuous state space

M. Svinin K. Kuroyama K. Ueda Y. Nakamura 《Artificial Life and Robotics》1999,3(2):90-96

A classifier system for the reinforcement learning control of autonomous mobile robots is proposed. The classifier system contains action selection, rules reproduction, and credit assignment mechanisms. An important feature of the classifier system is that it operates with continuous sensor and action spaces. The system is applied to the control of mobile robots. The local controllers use independent classifiers specified at the wheel-level. The controllers work autonomously, and with respect to each other represent dynamic systems connected through the external environment. The feasibility of the proposed system is tested in an experiment with a Khepera robot. It is shown that some patterns of global behavior can emerge from locally organized classifiers. This work was presented, in part, at the Third International Symposium on Artificial Life and Robotics, Oita, Japan, January 19–21, 1998 相似文献

16.

A study of evolution strategy based cooperative behavior in collective agents

Malrey Lee 《Artificial Intelligence Review》2006,25(3):195-209

The following paper introduces an evolution strategy on the basis of cooperative behaviors in each group of agents. The evolution strategy helps each agent to be self-defendable and self-maintainable. To determine an optimal group behavior strategy under dynamically varying circumstances, agents in same group cooperate with each other. This proposed method use reinforcement learning, enhanced neural network, and artificial life. In the present paper, we apply two different reward models: reward model 1 and reward model 2. Each reward model is designed as considering the reinforcement or constraint of behaviors. In competition environments of agents, the behavior considered to be advantageous is reinforced as adding reward values. On the contrary, the behavior considered to be disadvantageous is constrained as subtracting the values. And we propose an enhanced neural network to add learning behavior of an artificial organism-level to artificial life simulation. In future, the system models and results described in this paper will be applied to the framework of healthcare systems that consists of biosensors, healthcare devices, and healthcare system. 相似文献

17.

Coordination of communication in robot teams by reinforcement learning

Darío Maravall Javier de Lope Raúl Domínguez 《Robotics and Autonomous Systems》2013,61(7):661-666

In multi-agent systems, the study of language and communication is an active field of research. In this paper we present the application of Reinforcement Learning (RL) to the self-emergence of a common lexicon in robot teams. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the robots or the states of the environment itself) into symbols or signals we check whether it is possible for the robot team to converge in an autonomous, decentralized way to a common lexicon by means of RL, so that the communication efficiency of the entire robot team is optimal. We have conducted several experiments aimed at testing whether it is possible to converge with RL to an optimal Saussurean Communication System. We have organized our experiments alongside two main lines: first, we have investigated the effect of the team size centered on teams of moderated size in the order of 5 and 10 individuals, typical of multi-robot systems. Second, and foremost, we have also investigated the effect of the lexicon size on the convergence results. To analyze the convergence of the robot team we have defined the team’s consensus when all the robots (i.e. 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that RL allows the convergence to lexicon consensus in a population of autonomous agents. 相似文献

18.

A synergistic approach to manufacturing systems control using machine learning and simulation

Alok R. Chaturvedi George K. Hutchinson Derek L. Nazareth 《Journal of Intelligent Manufacturing》1992,3(1):43-57

This paper describes a synergistic approach that is applicable to a wide variety of system control problems. The approach utilizes a machine learning technique, goal-directed conceptual aggregation (GDCA), to facilitate dynamic decision-making. The application domain employed is Flexible Manufacturing System (FMS) scheduling and control. Simulation is used for the dual purpose of providing a realistic depiction of FMSs, and serves as an engine for demonstrating the viability of a synergistic system involving incremental learning. The paper briefly describes prior approaches to FMS scheduling and control, and machine learning. It outlines the GDCA approach, provides a generalized architecture for dynamic control problems, and describes the implementation of the system as applied to FMS scheduling and control. The paper concludes with a discussion of the general applicability of this approach. 相似文献

19.

A learning architecture based on reinforcement learning for adaptive control of the walking machine LAURON

Winfried Ilg Karsten Berns 《Robotics and Autonomous Systems》1995,15(4):321-334

The learning of complex control behaviour of autonomous mobile robots is one of the actual research topics. In this article an intelligent control architecture is presented which integrates learning methods and available domain knowledge. This control architecture is based on Reinforcement Learning and allows continuous input and output parameters, hierarchical learning, multiple goals, self-organized topology of the used networks and online learning. As a testbed this architecture is applied to the six-legged walking machine LAURON to learn leg control and leg coordination. 相似文献

20.

Adaptive learning algorithm of self-organizing teams

《Expert systems with applications》2014,41(6):2630-2637

In order to improve the ability of achieving good performance in self-organizing teams, this paper presents a self-adaptive learning algorithm for team members. Members of the self-organizing teams are simulated by agents. In the virtual self-organizing team, agents adapt their knowledge according to cooperative principles. The self-adaptive learning algorithm is approached to learn from other agents with minimal costs and improve the performance of the self-organizing team. In the algorithm, agents learn how to behave (choose different game strategies) and how much to think about how to behave (choose the learning radius). The virtual team is self-adaptively improved according to the strategies’ ability of generating better quality solutions in the past generations. Six basic experiments are manipulated to prove the validity of the adaptive learning algorithm. It is found that the adaptive learning algorithm often causes agents to converge to optimal actions, based on agents’ continually updated cognitive maps of how actions influence the performance of the virtual self-organizing team. This paper considered the influence of relationships in self-organizing teams over existing works. It is illustrated that the adaptive learning algorithm is beneficial to both the development of self-organizing teams and the performance of the individual agent. 相似文献