首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
 We analyze learning classifier systems in the light of tabular reinforcement learning. We note that although genetic algorithms are the most distinctive feature of learning classifier systems, it is not clear whether genetic algorithms are important to learning classifiers systems. In fact, there are models which are strongly based on evolutionary computation (e.g., Wilson's XCS) and others which do not exploit evolutionary computation at all (e.g., Stolzmann's ACS). To find some clarifications, we try to develop learning classifier systems “from scratch”, i.e., starting from one of the most known reinforcement learning technique, Q-learning. We first consider thebasics of reinforcement learning: a problem modeled as a Markov decision process and tabular Q-learning. We introduce a formal framework to define a general purpose rule-based representation which we use to implement tabular Q-learning. We formally define generalization within rules and discuss the possible approaches to extend our rule-based Q-learning with generalization capabilities. We suggest that genetic algorithms are probably the most general approach for adding generalization although they might be not the only solution.  相似文献   

2.
Distributed-air-jet MEMS-based systems have been proposed to manipulate small parts with high velocities and without any friction problems. The control of such distributed systems is very challenging and usual approaches for contact arrayed system don’t produce satisfactory results. In this paper, we investigate reinforcement learning control approaches in order to position and convey an object. Reinforcement learning is a popular approach to find controllers that are tailored exactly to the system without any prior model. We show how to apply reinforcement learning in a decentralized perspective and in order to address the global-local trade-off. The simulation results demonstrate that the reinforcement learning method is a promising way to design control laws for such distributed systems.  相似文献   

3.
We introduce a sensitivity-based view to the area of learning and optimization of stochastic dynamic systems. We show that this sensitivity-based view provides a unified framework for many different disciplines in this area, including perturbation analysis, Markov decision processes, reinforcement learning, identification and adaptive control, and singular stochastic control; and that this unified framework applies to both the discrete event dynamic systems and continuous-time continuous-state systems. Many results in these disciplines can be simply derived and intuitively explained by using two performance sensitivity formulas. In addition, we show that this sensitivity-based view leads to new results and opens up new directions for future research. For example, the n th bias optimality of Markov processes has been established and the event-based optimization may be developed; this approach has computational and other advantages over the state-based approaches.  相似文献   

4.
This paper juxtaposes the probability matching paradox of decision theory and the magnitude of reinforcement problem of animal learning theory to show that simple classifier system bidding structures are unable to match the range of behaviors required in the deterministic and probabilistic problems faced by real cognitive systems. The inclusion of a variance-sensitive bidding (VSB) mechanism is suggested, analyzed, and simulated to enable good bidding performance over a wide range of nonstationary probabilistic and deterministic environments.  相似文献   

5.
We explain the significance of a learning general theory. We work in a resolution informational level that is defined on a population. The populational magnitude changes through the creation or destruction of individual systems; we express this creation or destruction by means of the reproducibility, which produces a selective discrimination on the population. We define the reinforcement as far as a goal variable is concerned and separate the reinforcement by selective discrimination and the reinforcement by a change of the conditional probability. We also define memory accumulation and reinforcement on individual systems and study The effectiveness and the regularity of reinforcement in relation to the goal. We define the information on a variable and its learning; we study the relation between learning and reinforcement and the transference of learning. We define the controlled system and its information and study its learning. We define organized learning, with memory accumulation, and closed control. We introduce axiomatics and study their consequences, their testing through modeling, and the possible repercussions of the theory.  相似文献   

6.
住宅暖通空调系统通常耗用大量能源,同时也极大地影响居住者的热舒适性。目前,强化学习广泛应用于优化暖通空调系统,然而这一方法需要投入大量时间和数据资源。为了解决该问题,提出了一个新的基于事件驱动的马尔可夫决策过程(event-driven Markov decision process,ED-MDP)框架,并在此基础上,提出了基于事件驱动的深度确定性策略梯度(event-driven deep deterministic policy gradient,ED-DDPG)方法,通过事件触发优化控制,结合强化学习算法求解最优控制策略。实验结果显示,与基准方法相比,ED-DDPG在提升学习速度和减少决策频率方面表现出色,并在节能和维持热舒适方面取得了显著成果。经过实验验证,该方法在优化住宅暖通空调控制方面展现出强大的鲁棒性和适应性。  相似文献   

7.
We present an algorithm for real-time, robust, vision-based active tracking and pursuit. The algorithm was designed to overcome problems arising from active vision-based pursuit, such as target occlusion. Our method employs two layers to deal with occlusions of different lengths. The first layer is for short- or medium-term occlusions: those where a known method—such as mean shift combined with a Kalman filter—fails. For this layer we designed the hybrid filter for active pursuit (HAP). HAP utilizes a Kalman filter modified to respond to two different modes of action: one in which the target is positively identified and one in which the target identification is uncertain. For long-term occlusions we use the second layer. This layer is a decision algorithm that follows a learning procedure and is based on game theory-related reinforcement (Cesa-Bianchi and Lugosi, Prediction Learning and Games, 2006). The learning process is based on trial and error and is designed to perform adequately with a small number of samples. The algorithm produces a data structure that can be shared among agents or sent to a central control of a multi-agent system. The learning process is designed so that agents perform tasks according to their skills: an efficient agent will pursue targets while an inefficient agent will search for entering targets. These capacities make this system well suited for embedding in a multi-agent control system.  相似文献   

8.
Active Learning for Vision-Based Robot Grasping   总被引:1,自引:0,他引:1  
Salganicoff  Marcos  Ungar  Lyle H.  Bajcsy  Ruzena 《Machine Learning》1996,23(2-3):251-278
Reliable vision-based grasping has proved elusive outside of controlled environments. One approach towards building more flexible and domain-independent robot grasping systems is to employ learning to adapt the robot's perceptual and motor system to the task. However, one pitfall in robot perceptual and motor learning is that the cost of gathering the learning set may be unacceptably high. Active learning algorithms address this shortcoming by intelligently selecting actions so as to decrease the number of examples necessary to achieve good performance and also avoid separate training and execution phases, leading to higher autonomy. We describe the IE-ID3 algorithm, which extends the Interval Estimation (IE) active learning approach from discrete to real-valued learning domains by combining IE with a classification tree learning algorithm (ID-3). We present a robot system which rapidly learns to select the grasp approach directions using IE-ID3 given simplified superquadric shape approximations of objects. Initial results on a small set of objects show that a robot with a laser scanner system can rapidly learn to pick up new objects, and simulation studies show the superiority of the active learning approach for a simulated grasping task using larger sets of objects. Extensions of the approach and future areas of research incorporating more sophisticated perceptual and action representation are discussed  相似文献   

9.
The dynamic nature of sustainable energy and electric systems can vary significantly along with the environment and load change, and they represent the features of multivariate, high complexity and uncertainty of the nonlinear system. Moreover, the integration of intermittent renewable energy sources and energy consumption behaviours of households introduce more uncertainty into sustainable energy and electric systems. The operation, control and decision-making in such an environment definitely require increasing intelligence and flexibility in the control and optimization to ensure the quality of service of sustainable energy and electric systems. Reinforcement learning is a wide class of optimal control strategies that uses estimating value functions from experience, simulation, or search to learn in highly dynamic, stochastic environment. The interactive context enables reinforcement learning to develop strong learning ability and high adaptability. Reinforcement learning does not require the use of the model of system dynamics, which makes it suitable for sustainable energy and electric systems with complex nonlinearity and uncertainty. The use of reinforcement learning in sustainable energy and electric systems will certainly change the traditional energy utilization mode and bring more intelligence into the system. In this survey, an overview of reinforcement learning, the demand for reinforcement learning in sustainable energy and electric systems, reinforcement learning applications in sustainable energy and electric systems, and future challenges and opportunities will be explicitly addressed.  相似文献   

10.
作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。  相似文献   

11.
Reinforcement learning has been widely-used for applications in planning, control, and decision making. Rather than using instructive feedback as in supervised learning, reinforcement learning makes use of evaluative feedback to guide the learning process. In this paper, we formulate a pattern classification problem as a reinforcement learning problem. The problem is realized with a temporal difference method in a FALCON-R network. FALCON-R is constructed by integrating two basic FALCON-ART networks as function approximators, where one acts as a critic network (fuzzy predictor) and the other as an action network (fuzzy controller). This paper serves as a guideline in formulating a classification problem as a reinforcement learning problem using FALCON-R. The strengths of applying the reinforcement learning method to the pattern classification application are demonstrated. We show that such a system can converge faster, is able to escape from local minima, and has excellent disturbance rejection capability.  相似文献   

12.
The main goal of this paper is modelling attention while using it in efficient path planning of mobile robots. The key challenge in concurrently aiming these two goals is how to make an optimal, or near-optimal, decision in spite of time and processing power limitations, which inherently exist in a typical multi-sensor real-world robotic application. To efficiently recognise the environment under these two limitations, attention of an intelligent agent is controlled by employing the reinforcement learning framework. We propose an estimation method using estimated mixture-of-experts task and attention learning in perceptual space. An agent learns how to employ its sensory resources, and when to stop observing, by estimating its perceptual space. In this paper, static estimation of the state space in a learning task problem, which is examined in the WebotsTM simulator, is performed. Simulation results show that a robot learns how to achieve an optimal policy with a controlled cost by estimating the state space instead of continually updating sensory information.  相似文献   

13.
多智能体深度强化学习的若干关键科学问题   总被引:6,自引:0,他引:6  
孙长银  穆朝絮 《自动化学报》2020,46(7):1301-1312
强化学习作为一种用于解决无模型序列决策问题的方法已经有数十年的历史, 但强化学习方法在处理高维变量问题时常常会面临巨大挑战. 近年来, 深度学习迅猛发展, 使得强化学习方法为复杂高维的多智能体系统提供优化的决策策略、在充满挑战的环境中高效执行目标任务成为可能. 本文综述了强化学习和深度强化学习方法的原理, 提出学习系统的闭环控制框架, 分析了多智能体深度强化学习中存在的若干重要问题和解决方法, 包括多智能体强化学习的算法结构、环境非静态和部分可观性等问题, 对所调查方法的优缺点和相关应用进行分析和讨论. 最后提供多智能体深度强化学习未来的研究方向, 为开发更强大、更易应用的多智能体强化学习控制系统提供一些思路.  相似文献   

14.
This article is related to the research effort of constructing an intelligent agent, i.e., a computer system that is able to sense its environment (world), reason utilizing its internal knowledge and execute actions upon the world (act). the specific part of this effor presented in this article is reinforcement learning, i.e., the process of acquiring new knowledge based upon an evaluative feedback, called reinforcement, received by tht agent through interactions with the world. This article has two objectives: (1) to give a compact overview of reinforcement learning, and (2) to show that the evolution of the reinforcement learning paradigm has been driven by the need for more efficient learning through the addition of more structure to the learning agent. Therefore, both main ideas of reinforcement learning are introduced, and structural solutions to reinforcemen learning are reviewed. Several architectural enhancements of the RL paradigm are discussed. These include incorporation of state information in the learning process, architectural solutions to learning with delayed reinforcement, dealing with structurally changing worlds through utilization of multiple models of the world, and focusing attention of the learning agent through active perception. the paper closes with an overview of directions for applications and for future research in this area. © 1993 John Wiley & Sons, Inc.  相似文献   

15.
We discuss the role of state focus in reinforcement learning (RL) systems that are applicable to mechanical systems including robots. Although the concept of the state focus is similar to attention/focusing in visual domains, its implementation requires some theoretical background based on RL. We propose an RL system that effectively learns how to choose the focus simultaneously with how to achieve a task. This RL system does not need heuristics for the adaptation of its focus. We conducted a capture experiment to compare the learning speed between the proposed system and the traditional systems, SARSAs, and conducted a navigation experiment to confirm the applicability of the proposed system to a realistic task. In the capture experiment, the proposed system learned faster than SARSAs. We visualized the developmental process of the focusing strategy in the proposed system using a Q-value analysis technique. In the navigation task, the proposed system demonstrated faster learning than SARSAs in the realistic task. The proposed system is applicable to a wide class of RLs that are applicable to mechanical systems including robots.  相似文献   

16.
基于递阶强化学习的多智能体AGV 调度系统   总被引:3,自引:1,他引:3  
递阶强化学习是解决状态空间庞大的复杂系统智能体决策的有效方法。具有离散动态特性的AGV调度系统需要实时动态的调度方法,而具有MaxQ递阶强化学习能力的多智能体通过高效的强化学习方法和协作,可以实现AGV的实时调度。仿真实验证明了这种方法的有效性。  相似文献   

17.
Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities, it is impossible to create advanced artificial intelligence (AI) systems. For the next-generation AI, called ‘AI 2.0’, one of the most significant features will be that AI is empowered with intelligent perceptual capabilities, which can simulate human brain’s mechanisms and are likely to surpass human brain in terms of performance. In this paper, we briefly review the state-of-the-art advances across different areas of perception, including visual perception, auditory perception, speech perception, and perceptual information processing and learning engines. On this basis, we envision several R&D trends in intelligent perception for the forthcoming era of AI 2.0, including: (1) human-like and transhuman active vision; (2) auditory perception and computation in an actual auditory setting; (3) speech perception and computation in a natural interaction setting; (4) autonomous learning of perceptual information; (5) large-scale perceptual information processing and learning platforms; and (6) urban omnidirectional intelligent perception and reasoning engines. We believe these research directions should be highlighted in the future plans for AI 2.0.  相似文献   

18.
《Advanced Robotics》2013,27(1):83-99
Reinforcement learning can be an adaptive and flexible control method for autonomous system. It does not need a priori knowledge; behaviors to accomplish given tasks are obtained automatically by repeating trial and error. However, with increasing complexity of the system, the learning costs are increased exponentially. Thus, application to complex systems, like a many redundant d.o.f. robot and multi-agent system, is very difficult. In the previous works in this field, applications were restricted to simple robots and small multi-agent systems, and because of restricted functions of the simple systems that have less redundancy, effectiveness of reinforcement learning is restricted. In our previous works, we had taken these problems into consideration and had proposed new reinforcement learning algorithm, 'Q-learning with dynamic structuring of exploration space based on GA (QDSEGA)'. Effectiveness of QDSEGA for redundant robots has been demonstrated using a 12-legged robot and a 50-link manipulator. However, previous works on QDSEGA were restricted to redundant robots and it was impossible to apply it to multi mobile robots. In this paper, we extend our previous work on QDSEGA by combining a rule-based distributed control and propose a hybrid autonomous control method for multi mobile robots. To demonstrate the effectiveness of the proposed method, simulations of a transportation task by 10 mobile robots are carried out. As a result, effective behaviors have been obtained.  相似文献   

19.
张明悦  金芝  赵海燕  罗懿行 《软件学报》2020,31(8):2404-2431
软件系统自适应机制提供了应对动态变化的环境和不确定的需求的技术方案.在已有的软件系统自适应性的相关研究中,有一类工作将软件系统自适应性转换为回归、分类、聚类、决策等问题,并利用强化学习、神经网络/深度学习、贝叶斯决策理论和概率图模型、规则学习等进行问题建模,并以此构造软件系统自适应机制.本文通过系统化的文献调研,综述了机器学习赋能的软件系统自适应性的工作.首先介绍基本概念,然后从不同视角对当前工作进行分类;按被控系统、监测和控制过程、学习算法、学习赋能方式等方面进行分析,并讨论不同机器学习方法赋能的软件系统自适应性的切入点及其优势和不足;最后对未来研究进行展望.  相似文献   

20.
There are many adaptive learning systems that adapt learning materials to student properties, preferences, and activities. This study is focused on designing such a learning system by relating combinations of different learning styles to preferred types of multimedia materials. We explore a decision model aimed at proposing learning material of an appropriate multimedia type. This study includes 272 student participants. The resulting decision model shows that students prefer well-structured learning texts with color discrimination, and that the hemispheric learning style model is the most important criterion in deciding student preferences for different multimedia learning materials. To provide a more accurate and reliable model for recommending different multimedia types more learning style models must be combined. Kolb's classification and the VAK classification allow us to learn if students prefer an active role in the learning process, and what multimedia type they prefer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号