首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
Selection of learning parameters for CMAC-based adaptive criticlearning   总被引:3,自引:0,他引:3  
The CMAC-based adaptive critic learning structure consists of two CMAC modules: the action and the critic ones. Learning occurs in both modules. The critic module learns to evaluate the system status. It transforms the system response, usually some occasionally provided reinforcement signal, into organized useful information. Based on the knowledge developed in the critic module, the action module learns the control technique. One difficulty in using this scheme lies on selection of learning parameters. In our previous study on the CMAC-based scheme, the best set of learning parameters were selected from a large number of test simulations. The picked parameter values are not necessarily adequate for generic cases. A general guideline for parameter selection needs to be developed. In this study, the problem is investigated. Effects of parameters are studied analytically and verified by simulations. Results provide a good guideline for parameter selection.  相似文献   

2.
Software quality assurance is a vital component of software project development. A software quality estimation model is trained using software measurement and defect (software quality) data of a previously developed release or similar project. Such an approach assumes that the development organization has experience with systems similar to the current project and that defect data are available for all modules in the training data. In software engineering practice, however, various practical issues limit the availability of defect data for modules in the training data. In addition, the organization may not have experience developing a similar system. In such cases, the task of software quality estimation or labeling modules as fault prone or not fault prone falls on the expert. We propose a semisupervised clustering scheme for software quality analysis of program modules with no defect data or quality-based class labels. It is a constraint-based semisupervised clustering scheme that uses k-means as the underlying clustering algorithm. Software measurement data sets obtained from multiple National Aeronautics and Space Administration software projects are used in our empirical investigation. The proposed technique is shown to aid the expert in making better estimations as compared to predictions made when the expert labels the clusters formed by an unsupervised learning algorithm. In addition, the software quality knowledge learnt during the semisupervised process provided good generalization performance for multiple test data sets. An analysis of program modules that remain unlabeled subsequent to our semisupervised clustering scheme provided useful insight into the characteristics of their software attributes  相似文献   

3.
We addresses the important problem of software quality analysis when there is limited software fault or fault-proneness data. A software quality model is typically trained using software measurement and fault data obtained from a previous release or similar project. Such an approach assumes that fault data is available for all the training modules. Various issues in software development may limit the availability of fault-proneness data for all the training modules. Consequently, the available labeled training dataset is such that the trained software quality model may not provide predictions. More specifically, the small set of modules with known fault-proneness labels is not sufficient for capturing the software quality trends of the project. We investigate semi-supervised learning with the Expectation Maximization (EM) algorithm for software quality estimation with limited fault-proneness data. The hypothesis is that knowledge stored in software attributes of the unlabeled program modules will aid in improving software quality estimation. Software data collected from a large NASA software project is used during the semi-supervised learning process. The software quality model is evaluated with multiple test datasets collected from other NASA software projects. Compared to software quality models trained only with the available set of labeled program modules, the EM-based semi-supervised learning scheme improves generalization performance of the software quality models.  相似文献   

4.
Adaptive fuzzy command acquisition with reinforcement learning   总被引:2,自引:0,他引:2  
Proposes a four-layered adaptive fuzzy command acquisition network (AFCAN) for adaptively acquiring fuzzy command via interactions with the user or environment. It can catch the intended information from a sentence (command) given in natural language with fuzzy predicates. The intended information includes a meaningful semantic action and the fuzzy linguistic information of that action. The proposed AFCAN has three important features. First, we can make no restrictions whatever on the fuzzy command input, which is used to specify the desired information, and the network requires no acoustic, prosodic, grammar, and syntactic structure, Second, the linguistic information of an action is learned adaptively and it is represented by fuzzy numbers based on α-level sets. Third, the network can learn during the course of performing the task. The AFCAN can perform off-line as well as online learning. For the off-line learning, the mutual-information (MI) supervised learning scheme and the fuzzy backpropagation (FBP) learning scheme are employed when the training data are available in advance. The former learning scheme is used to learn meaningful semantic actions and the latter learn linguistic information. The AFCAN can also perform online learning interactively when it is in use for fuzzy command acquisition. For the online learning, the MI-reinforcement learning scheme and the fuzzy reinforcement learning scheme are developed for the online learning of meaningful actions and linguistic information, respectively. An experimental system is constructed to illustrate the performance and applicability of the proposed AFCAN  相似文献   

5.
This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system.  相似文献   

6.
In this paper we argue for building reactive autonomous mobile robots through reinforcement connectionist learning. Nevertheless, basic reinforcement learning is a slow process. This paper describes an architecture which deals with complex— high-dimensional and/or continuous—situation and action spaces effectively. This architecture is based on two main ideas. The first is to organize the reactive component into a set of modules in such a way that, roughly, each one of them codifies the prototypical action for a given cluster of situations. The second idea is to use a particular kind of planning for figuring out what part of the action space deserves attention for each cluster of situations. Salient features of the planning process are that it is grounded and that it is invoked only when the reactive component does not generalize correctly its previous experience to the new situation. We also report our experience in solving a basic task that most autonomous mobile robots must face, namely path finding.  相似文献   

7.
This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The self scaling reinforcement (SSR) learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architecture was developed in order to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of new modules that represent new knowledge, or new requirements for the desired task.  相似文献   

8.
Because the radio spectrum is limited, managing the limited amount of resources is an important issue, especially for high-speed data applications. This paper presents a distributed multiagent scheme (DMAS) that was developed for supporting resource allocation in a customer-accepted and cost-effective fashion. The scheme consists of a collection of problem-solving agents with three modules built into the scheme: the knowledge source, the blackboard system, and the control engine. Through operations and cooperation among the active agents, an allocation policy is selected and a customer-accepted schedule that meets the specified quality of service (QoS) is generated. A comparison of the performance between the DMAS scheme and other references previously proposed was derived using a 55 microcell model. It was found that the proposed scheme gave the highest quality of service satisfaction (around 90% at light loading and 60% at heavy loading).  相似文献   

9.
目的 视频动作质量评估旨在评估视频中特定动作的执行情况和完成质量。自动化的动作质量评估能够有效地减少人力资源的损耗,可以更加精准、公正地对视频内容进行评估。传统动作质量评估方法主要存在以下问题: 1)视频中动作主体的多尺度时空特征问题; 2)认知差异导致的标记内在模糊性问题; 3)多头自注意力机制的注意力头冗余问题。针对以上问题,提出了一种能够感知视频序列中不同时空位置、生成细粒度标记的动作质量评估模型SALDL (self-attention and label distribution learning)。方法 SALDL提出Attention-Inc (attention-inception)结构,该结构通过Embedding、多头自注意力以及多层感知机将自注意力机制渐进式融入Inception结构,使模型能够获得不同尺度卷积特征之间的上下文信息。提出一种正负时间注意力模块PNTA (pos-neg temporal attention),通过PNTA损失挖掘时间注意力特征,从而减少自注意力头冗余并提取不同片段的注意力特征。SALDL模型通过标记增强及标记分布学习生成细粒度的动作质量标记。结果 提出的SALDL模型在MTL-AQA (multitask learning-action quality assessment)和JIGSAWS (JHU-ISI gesture and skill assessment working set)等数据集上进行了大量对比及消融实验,斯皮尔曼等级相关系数分别为0.941 6和0.818 3。结论 SALDL模型通过充分挖掘不同尺度的时空特征解决了多尺度时空特征问题,并引入符合标记分布的先验知识进行标记增强,达到了解决标记的内在模糊性问题以及注意力头的冗余问题。  相似文献   

10.
In this paper we propose a SoPC-based multiprocessor embedded system for controlling ambiental parameters in an Intelligent Inhabited Environment. The intelligent features are achieved by means of a Neuro-Fuzzy system which has the ability to learn from samples, reason and adapt itself to changes in the environment or in user preferences. In particular, a modified version of the well known ANFIS (Adaptive Neuro-Fuzzy Inference System) scheme is used, which allows the development of very efficient implementations. The architecture proposed here is based on two soft-core microprocessors: one microprocessor is dedicated to the learning and adaptive procedures, whereas the other is dedicated to the on-line response. This second microprocessor is endowed with 4 efficient ad hoc hardware modules intended to accelerate the neuro-fuzzy algorithms. The implementation has been carried out on a Xilinx Virtex-5 FPGA and obtained results show that a very high performance system is achieved.  相似文献   

11.
Recently, Learning Classifier Systems (LCS) and particularly XCS have arisen as promising methods for classification tasks and data mining. This paper investigates two models of accuracy-based learning classifier systems on different types of classification problems. Departing from XCS, we analyze the evolution of a complete action map as a knowledge representation. We propose an alternative, UCS, which evolves a best action map more efficiently. We also investigate how the fitness pressure guides the search towards accurate classifiers. While XCS bases fitness on a reinforcement learning scheme, UCS defines fitness from a supervised learning scheme. We find significant differences in how the fitness pressure leads towards accuracy, and suggest the use of a supervised approach specially for multi-class problems and problems with unbalanced classes. We also investigate the complexity factors which arise in each type of accuracy-based LCS. We provide a model on the learning complexity of LCS which is based on the representative examples given to the system. The results and observations are also extended to a set of real world classification problems, where accuracy-based LCS are shown to perform competitively with respect to other learning algorithms. The work presents an extended analysis of accuracy-based LCS, gives insight into the understanding of the LCS dynamics, and suggests open issues for further improvement of LCS on classification tasks.  相似文献   

12.
This work is focused on an architecture for systems which act inside an unpredictable world (embedded systems). Several systems dealing with the above issue have been proposed so far. We classify them by means of their architectures and algorithms, obtaining, for example, classical, deferred and reactive planning. From the systems developed up to now, we can point out some of the features that embedded systems must have. Namely, each system must have a flexible architecture, so it can deal with different problems. Each system must allow different basic activities, i.e., actuators and sensors controlling, plan formation and execution, and so on. Each system must have a flexible failure handling mechanism , since no action is guaranteed to succeed. In this paper, we propose a system called MRG which addresses the above features. Its architecture has several modules which can be combined in different ways depending on the problem. A module performs a basic activity. The system is able to detect and to react to failures. The architecture allows MRG a parallel activation of modules and a quick reaction to external events. The control of the architecture is reached by means of a planning language which has a small set of powerful control structures. MRG has been experimented in a complex large-scale application.  相似文献   

13.
夏鼎  王亚立  乔宇 《集成技术》2021,10(5):23-33
现有人体行为识别算法主要依赖于粗粒度的视频特征,但这些特征不足以有效描述人体行为 的动作构成,从而降低了深度学习模型对易混淆行为的识别能力。该研究提出了一种基于人体部件的 视频行为识别方法,通过学习人体细粒度部件的动作表示,自底向上地学习人体行为视频表征。该方 法主要包含:(1)部件特征增强模块,用于增强基于图像的人体部件特征;(2)部件特征融合模块,用 于融合人体各部件特征以形成人体特征;(3)人体特征增强模块,用于增强视频帧中所有人的人体特 征。该方法在国际标准数据库 UCF101 和 HMDB51 上进行的实验验证结果显示,基于人体部件的视频 行为识别方法与已有方法具有良好的互补性,可以有效提高人体行为识别精度。  相似文献   

14.
This paper proposes a reinforcement fuzzy adaptive learning control network (RFALCON), constructed by integrating two fuzzy adaptive learning control networks (FALCON), each of which has a feedforward multilayer network and is developed for the realization of a fuzzy controller. One FALCON performs as a critic network (fuzzy predictor), the other as an action network (fuzzy controller). Using temporal difference prediction, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network performs a stochastic exploratory algorithm to adapt itself according to the internal reinforcement signal. An ART-based reinforcement structure/parameter-learning algorithm is developed for constructing the RFALCON dynamically. During the learning process, structure and parameter learning are performed simultaneously. RFALCON can construct a fuzzy control system through a reward/penalty signal. It has two important features; it reduces the combinatorial demands of system adaptive linearization, and it is highly autonomous.  相似文献   

15.
以移动教育方面的三方应用场景为例,设计并实现了一个面向精品课程和开放课程的移动学习系统原型,实现了网络教学资源整合。根据现行主流移动智能终端的特点,针对触摸屏移动智能终端因误操作引起的计算资源和通信资源浪费现象,给出了一个有三方参与的应用系统场景中,可行的减少误操作的应用方法。通过使用确定型有限自动机的方法验证了系统的正确性。测试表明,该方法可以有效降低应用场景中代理服务模块和资源服务模块的工作压力,提高用户体验度。  相似文献   

16.
A modular robot can be built with a shape and function that matches the working environment. We developed a four-arm modular robot system which can be configured in a planar structure. A learning mechanism is incorporated in each module constituting the robot. We aim to control the overall shape of the robot by an accumulation of the autonomous actions resulting from the individual learning functions. Considering that the overall shape of a modular robot depends on the learning conditions in each module, this control method can be treated as a dispersion control learning method. The learning object is cooperative motion between adjacent modules. The learning process proceeds based on Q-learning by trial and error. We confirmed the effectiveness of the proposed technique by computer simulation.  相似文献   

17.
ABSTRACT

A reinforcement learning scheme on congestion control in a high-speed network is presented. Traditional methods for congestion control always monitor the queue length, on which the source rate depends. However, the determination of the congested threshold and sending rate is difficult to couple with each other in these methods. We proposed a simple and robust reinforcement learning congestion controller (RLCC) to solve the problem. The scheme consists of two subsystems: the expectation-return predictor is a long-term policy evaluator and the other is a short-term rate selector, which is composed of action-value evaluator and stochastic action selector elements. RLCC receives reinforcement signals generated by an immediate reward evaluator and takes the best action to control source flow in consideration of high throughput and low cell loss rate. Through on-line learning processes, RLCC can adaptively take more and more correct actions under time-varying environments. Simulation results have shown that the proposed approach can increase system utilization and decrease packet losses simultaneously in comparison with the popular best-effort scheme.  相似文献   

18.
近年来,动作模型学习引起了研究人员的极大兴趣.可是,尽管不确定规划已经研究了十几年,动作模型学习的研究仍然集中于经典的确定性动作模型上.提出了在部分观测环境下学习不确定动作模型的算法,该算法可应用于假定人们对转移系统一无所知的情形下进行,输入只有动作-观测序列.在现实世界中,这样的场景很常见.致力于动作是由简单逻辑结构组成的、且观测以一定频率出现的一类问题的研究.学习过程分为3个步骤:首先,计算命题在状态中成立的概率;然后,将命题抽取成效果模式,再抽取前提;最后,对效果模式进行聚类以去除冗余.在基准领域上进行的实验结果表明,动作模型学习技术可推广到不确定的部分观测环境中.  相似文献   

19.
Controlling chaos by GA-based reinforcement learning neural network   总被引:12,自引:0,他引:12  
Proposes a TD (temporal difference) and GA (genetic algorithm) based reinforcement (TDGAR) neural learning scheme for controlling chaotic dynamical systems based on the technique of small perturbations. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to fulfil the reinforcement learning task. Structurally, the TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network for helping the learning of the other network, the action network, which determines the outputs (actions) of the TDGAR learning system. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. This can usually accelerate the GA learning since an external reinforcement signal may only be available at a time long after a sequence of actions have occurred in the reinforcement learning problems. By defining a simple external reinforcement signal. the TDGAR learning system can learn to produce a series of small perturbations to convert chaotic oscillations of a chaotic system into desired regular ones with a periodic behavior. The proposed method is an adaptive search for the optimum control technique. Computer simulations on controlling two chaotic systems, i.e., the Henon map and the logistic map, have been conducted to illustrate the performance of the proposed method.  相似文献   

20.
In this paper we propose a unified approach for integrating implicit and explicit knowledge in neurosymbolic systems as a combination of neural and neuro-fuzzy modules. In the developed hybrid system, training data set is used for building neuro-fuzzy modules, and represents implicit domain knowledge. The explicit domain knowledge on the other hand is represented by fuzzy rules, which are directly mapped into equivalent neural structures. The aim of this approach is to improve the abilities of modular neural structures, which are based on incomplete learning data sets, since the knowledge acquired from human experts is taken into account for adapting the general neural architecture. Three methods to combine the explicit and implicit knowledge modules are proposed. The techniques used to extract fuzzy rules from neural implicit knowledge modules are described. These techniques improve the structure and the behavior of the entire system. The proposed methodology has been applied in the field of air quality prediction with very encouraging results. These experiments show that the method is worth further investigation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号