首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
自适应系统所处的环境往往是不确定的,其变化事先难以预测,如何支持这种环境下复杂自适应系统的开发已经成为软件工程领域面临的一项重要挑战.强化学习是机器学习领域中的一个重要分支,强化学习系统能够通过不断试错的方式,学习环境状态到可执行动作的最优对应策略.本文针对自适应系统环境不确定的问题,将Agent技术与强化学习技术相结...  相似文献   

2.
Genetic network programming (GNP) has been proposed as one of the evolutionary algorithms and extended with reinforcement learning (GNP-RL). The combination of evolution and learning can efficiently evolve programs and the fitness improvement has been confirmed in the simulations of tileworld problems, elevator group supervisory control systems, stock trading models and wall following behavior of Khepera robot. However, its adaptability in testing environments, where the situations dynamically change, has not been analyzed in detail yet. In this paper, the adaptation mechanism in the testing environment is introduced and it is confirmed that GNP-RL can adapt to the environmental changes using a robot simulator WEBOTS, especially when unexperienced sensor troubles suddenly occur. The simulation results show that GNP-RL works well in the testing even if wrong sensor information is given because GNP-RL has a function to automatically change programs using alternative actions. In addition, the analysis on the effects of the parameters of GNP-RL is carried out in both training and testing simulations.  相似文献   

3.
Engineers and researchers are paying more attention to reinforcement learning (RL) as a key technique for realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, it is not easy to put RL into practical use. In prior research our approach mainly dealt with the problem of designing state and action spaces and we have proposed an adaptive co-construction method of state and action spaces. However, it is more difficult to design state and action spaces in dynamic environments than in static ones. Therefore, it is even more effective to use an adaptive co-construction method of state and action spaces in dynamic environments. In this paper, our approach mainly deals with a problem of adaptation in dynamic environments. First, we classify tasks of dynamic environments and propose a detection method of environmental changes to adapt to dynamic environments. Next, we conducted computational experiments using a so-called “path planning problem” with a slowly changing environment where the aging of the system is assumed. The performances of a conventional RL method and the proposed detection method were confirmed.  相似文献   

4.
Artificial Life and Robotics - Although the design of the reward function in reinforcement learning is important, it is difficult to design a system that can adapt to a variety of environments and...  相似文献   

5.
Adaptive immunity based reinforcement learning   总被引:2,自引:2,他引:0  
Recently much attention has been paid to intelligent systems which can adapt themselves to dynamic and/or unknown environments by the use of learning methods. However, traditional learning methods have a disadvantage that learning requires enormously long amounts of time with the degree of complexity of systems and environments to be considered. We thus propose a novel reinforcement learning method based on adaptive immunity. Our proposed method can provide a near-optimal solution with less learning time by self-learning using the concept of adaptive immunity. The validity of our method is demonstrated through some simulations with Sutton’s maze problem. This work was present in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

6.
现有的强化学习方法都不能很好地处理动态环境中的学习问题,当环境变化时需要重新学习最优策略,若环境变化的时间间隔小于策略收敛时间,学习算法则不能收敛.本文在Option分层强化学习方法的基础上提出一种适应动态环境的分层强化学习方法,该方法利用学习的分层特性,仅关注分层任务子目标状态及当前Option内部环境状态的变化,将策略更新过程限制在规模较小的局部空间或维数较低的高层空间上,从而加快学习速度.以二维动态栅格空间内两点间最短路径规划为背景进行了仿真实验,实验结果表明,该方法策略学习速度明显高于以往的方法,且学习算法收敛性对环境变化频率的依赖性有所降低.  相似文献   

7.
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.  相似文献   

8.
针对当前多智能体强化学习算法难以适应智能体规模动态变化的问题,文中提出序列多智能体强化学习算法(SMARL).将智能体的控制网络划分为动作网络和目标网络,以深度确定性策略梯度和序列到序列分别作为分割后的基础网络结构,分离算法结构与规模的相关性.同时,对算法输入输出进行特殊处理,分离算法策略与规模的相关性.SMARL中的...  相似文献   

9.
Due to increasing environmental concerns, manufacturers are forced to take back their products at the end of products’ useful functional life. Manufacturers explore various options including disassembly operations to recover components and subassemblies for reuse, remanufacture, and recycle to extend the life of materials in use and cut down the disposal volume. However, disassembly operations are problematic due to high degree of uncertainty associated with the quality and configuration of product returns. In this research we address the disassembly line balancing problem (DLBP) using a Monte-Carlo based reinforcement learning technique. This reinforcement learning approach is tailored fit to the underlying dynamics of a DLBP. The research results indicate that the reinforcement learning based method is able to perform effectively, even on a complex large scale problem, within a reasonable amount of computational time. The proposed method performed on par or better than the benchmark methods for solving DLBP reported in the literature. Unlike other methods which are usually limited deterministic environments, the reinforcement learning based method is able to operate in deterministic as well as stochastic environments.  相似文献   

10.
Distributed-air-jet MEMS-based systems have been proposed to manipulate small parts with high velocities and without any friction problems. The control of such distributed systems is very challenging and usual approaches for contact arrayed system don’t produce satisfactory results. In this paper, we investigate reinforcement learning control approaches in order to position and convey an object. Reinforcement learning is a popular approach to find controllers that are tailored exactly to the system without any prior model. We show how to apply reinforcement learning in a decentralized perspective and in order to address the global-local trade-off. The simulation results demonstrate that the reinforcement learning method is a promising way to design control laws for such distributed systems.  相似文献   

11.
近年来,进化策略由于其无梯度优化和高并行化效率等优点,在深度强化学习领域得到了广泛的应用.然而,传统基于进化策略的深度强化学习方法存在着学习速度慢、容易收敛到局部最优和鲁棒性较弱等问题.为此,提出了一种基于自适应噪声的最大熵进化强化学习方法.首先,引入了一种进化策略的改进办法,在“优胜”的基础上加强了“劣汰”,从而提高进化强化学习的收敛速度;其次,在目标函数中引入了策略最大熵正则项,来保证策略的随机性进而鼓励智能体对新策略的探索;最后,提出了自适应噪声控制的方式,根据当前进化情形智能化调整进化策略的搜索范围,进而减少对先验知识的依赖并提升算法的鲁棒性.实验结果表明,该方法较之传统方法在学习速度、最优性收敛和鲁棒性上有比较明显的提升.  相似文献   

12.
In this paper, we study the relationship between learning and evolution in a simple abstract model, where neural networks capable of learning are evolved using genetic algorithms (GAs). Each individual tries to acquire a proper behavior under a given environment through its lifetime learning, and the best individuals are selected to reproduce offspring, which then conduct lifetime learning in the succeeding generation. The connective weights of individuals' neural networks undergo modification, i.e., certain characters will be acquired, through their lifetime learning. By setting various rates for the heritability of acquired characters, which control the strength of ‘Lamarckian’ strategy, we observe adaptational processes of populations over successive generations. By taking the degree of environmental changes into consideration, we show the following results. Under static environments, populations with higher rates of heritability adapt themselves more quickly toward the environments, and thus perform well. On the other hand, under nonstationary environments, populations with lower rates of heritability not only show more stable behavior against environmental changes, but also maintain greater adaptability with respect to such changing environments. Consequently, the population with zero heritability, i.e., the Darwinian population, attains the highest level of adaptation towards dynamic environments. Received February 1999 / Revised September 1999 / Accepted in revised form September 1999  相似文献   

13.
Autonomous agents that learn about their environment can be divided into two broad classes. One class of existing learners, reinforcement learners, typically employ weak learning methods to directly modify an agent's execution knowledge. These systems are robust in dynamic and complex environments but generally do not support planning or the pursuit of multiple goals. In contrast, symbolic theory revision systems learn declarative planning knowledge that allows them to pursue multiple goals in large state spaces, but these approaches are generally only applicable to fully sensed, deterministic environments with no exogenous events. This research investigates the hypothesis that by limiting an agent to procedural access to symbolic planning knowledge, the agent can combine the powerful, knowledge-intensive learning performance of the theory revision systems with the robust performance in complex environments of the reinforcement learners. The system, IMPROV, uses an expressive knowledge representation so that it can learn complex actions that produce conditional or sequential effects over time. By developing learning methods that only require limited procedural access to the agent's knowledge, IMPROV's learning remains tractable as the agent's knowledge is scaled to large problems. IMPROV learns to correct operator precondition and effect knowledge in complex environments that include such properties as noise, multiple agents and time-critical tasks, and demonstrates a general learning method that can be easily strengthened through the addition of many different kinds of knowledge.  相似文献   

14.
Robot learning by demonstration is key to bringing robots into daily social environments to interact with and learn from human and other agents. However, teaching a robot to acquire new knowledge is a tedious and repetitive process and often restrictive to a specific setup of the environment. We propose a template-based learning framework for robot learning by demonstration to address both generalisation and adaptability. This novel framework is based upon a one-shot learning model integrated with spectral clustering and an online learning model to learn and adapt actions in similar scenarios. A set of statistical experiments is used to benchmark the framework components and shows that this approach requires no extensive training for generalisation and can adapt to environmental changes flexibly. Two real-world applications of an iCub humanoid robot playing the tic-tac-toe game and soldering a circuit board are used to demonstrate the relative merits of the framework.  相似文献   

15.
Up-to-date market dynamics has been forcing manufacturing systems to adapt quickly and continuously to the ever-changing environment. Self-evolution of manufacturing systems means a continuous process of adapting to the environment on the basis of autonomous goal-formation and goal-oriented dynamic organization. This paper proposes a goal-regulation mechanism that applies a reinforcement learning approach, which is a principal working mechanism for autonomous goal-formation. Individual goals are regulated by a neural network-based fuzzy inference system, namely, a goal-regulation network (GRN) updated by a reinforcement signal from another neural network called goal-evaluation network (GEN). The GEN approximates the compatibility of goals with current environmental situation. In this paper, a production planning problem is also examined by a simulation study in order to validate the proposed goal regulation mechanism.  相似文献   

16.
This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.  相似文献   

17.
基于再励学习的多移动机器人协调避障路径规划方法   总被引:1,自引:0,他引:1  
随着多移动机器人协调系统的应用向未知环境发展,一些依赖于环境模型的路径规划方法不再适用。而利用再励学习与环境直接交互,不需要先验知识和样本数据的特点,该文将再励学习应用于多机器人协调系统中,提出了基于再励学习的避障路径规划方法,并将再励函数设计为基于行为分解的无模型非均匀结构。计算机仿真实验结果表明该方法有效,并有较好的鲁棒性,新的再励函数结构使得学习速度得以提高。  相似文献   

18.
基于Markov对策的多Agent强化学习模型及算法研究   总被引:19,自引:0,他引:19  
在MDP,单Agent可以通过强化学习来寻找问题的最优解。但在多Agent系统中,MDP模型不再适用。同样极小极大Q算法只能解决采用零和对策模型的MAS学习问题。文中采用非零和Markov对策作为多Agent系统学习框架,并提出元对策强化学习的学习模型和元对策Q算法。理论证明元对策Q算法收敛在非零和Markov对策的元对策最优解。  相似文献   

19.
《Computers & Education》2005,44(3):343-355
In powerful learning environments, rich contexts and authentic tasks are presented to pupils. Active, autonomous and co-operative learning is stimulated, and the curriculum is adapted to the needs and capabilities of individual pupils. In this study, the characteristics of learning environments and the contribution of ICT to learning environments were investigated. A questionnaire was completed by 331 teachers in the highest grade of primary education. Results show that many teachers apply several elements of powerful learning environments in their classes. This especially goes for the presentation of authentic tasks and the fostering of active and autonomous learning. However, the methods employed by teachers to adapt education to the needs and abilities of individual pupils proved quite limited. The use of ICT in general merely showed characteristics of traditional approaches to learning. Chances of using open-ended ICT applications, which are expected to contribute to the power of learning environments, were greater with teachers who created powerful learning environments for their pupils, and when there were more computers available to pupils. In addition, teachers' views with regard to the contribution of ICT to active and autonomous learning, teachers' skills in using ICT, and the teacher's gender appeared to be relevant background variables in this respect.  相似文献   

20.
This paper proposes an approach to investigate norm-governed learning agents which combines a logic-based formalism with an equation-based counterpart. This dual formalism enables us to describe the reasoning of such agents and their interactions using argumentation, and, at the same time, to capture systemic features using equations. The approach is applied to norm emergence and internalisation in systems of learning agents. The logical formalism is rooted into a probabilistic defeasible logic instantiating Dung??s argumentation framework. Rules of this logic are attached with probabilities to describe the agents?? minds and behaviours as well as uncertain environments. Then, the equation-based model for reinforcement learning, defined over this probability distribution, allows agents to adapt to their environment and self-organise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号