首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Policy iteration, which evaluates and improves the control policy iteratively, is a reinforcement learning method. Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validity. However, most existing online least-squares policy iteration methods only use each sample just once, resulting in the low utilization rate. With the goal of improving the utilization efficiency, we propose an experience replay for least-squares policy iteration (ERLSPI) and prove its convergence. ERLSPI method combines online least-squares policy iteration method with experience replay, stores the samples which are generated online, and reuses these samples with least-squares method to update the control policy. We apply the ERLSPI method for the inverted pendulum system, a typical benchmark testing. The experimental results show that the method can effectively take advantage of the previous experience and knowledge, improve the empirical utilization efficiency, and accelerate the convergence speed.   相似文献   

2.
Learning a compact predictive model in an online setting has recently gained a great deal of attention.The combination of online learning with sparsity-inducing regularization enables faster learning with a smaller memory space than the previous learning frameworks.Many optimization methods and learning algorithms have been developed on the basis of online learning with L1-regularization.L1-regularization tends to truncate some types of parameters,such as those that rarely occur or have a small range of values,unless they are emphasized in advance.However,the inclusion of a pre-processing step would make it very difficult to preserve the advantages of online learning.We propose a new regularization framework for sparse online learning.We focus on regularization terms,and we enhance the state-of-the-art regularization approach by integrating information on all previous subgradients of the loss function into a regularization term.The resulting algorithms enable online learning to adjust the intensity of each feature’s truncations without pre-processing and eventually eliminate the bias of L1-regularization.We show theoretical properties of our framework,the computational complexity and upper bound of regret.Experiments demonstrated that our algorithms outperformed previous methods in many classification tasks.  相似文献   

3.
Personalized web-based learning has become an important learning form in the 21st century. To recommend appropriate online materials for a certain learner, several characteristics of the learner, such as his/her learning style, learning modality, cognitive style and competency, need to be considered. An earlier research result showed that a fuzzy knowledge extraction model can be established to extract personalized recommendation knowledge by discovering effective learning paths from past learning experiences through an ant colony optimization model. Though that results revealed the theoretical potential of the proposed method in discovering effective learning paths for learners, critical limitations arose when considering its applications in real world situations, such as the requirement of a large amount of learners and a long period of training cycles in order to discover good learning paths for learners. These practical issues motivate this research. In this paper, the aim is to resolve the aforementioned issues by devising more efficient algorithms that basically run on the same ant colony model yet requiring only a reasonable number of learners and training cycles to find satisfactory good results. The key approaches to resolving the practical issues include revising the global update policy, an adaptive search policy and a segmented-goal training strategy. Based on simulation results, it is shown that these new ingredients added to the original knowledge extraction algorithm result in more efficient ones that can be applied in practical situations.  相似文献   

4.
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the “curse of dimensionality” issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network; such a process is called experience replay. Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.   相似文献   

5.
目的 当前,目标跟踪问题常常会通过在线学习、检测的方法来解决。针对在线学习过程中,分类器训练需要花费大量时间以提高其识别准确率的问题,提出使用Adaboost算法级联弱分类器,在训练一定帧数后仅进行检测的方法来达到实时和准确的折中。方法 首先针对跟踪问题简化了haar特征,以降低特征计算量。同时考虑到经典的Adaboost算法可能并不适合跟踪过程中存在的正负样本不均衡问题,提出在样本权重更新公式中引入一个新的调整因子项并且结合代价敏感学习来提高目标识别率的方法。最终给出使用简化的haar特征作为描述子,改进的代价敏感Adaboost作为分类器的目标跟踪算法。结果 对20组视频进行跟踪实验,本文算法的平均代表准确率高于压缩跟踪算法约26%,高于原始代价敏感算法约11%;本文算法的视频处理平均帧率高于压缩跟踪算法约38%。结论 本文提出的新代价敏感Adaboost算法对目标的识别、跟踪具有较高的准确率及较快的处理速度,并具有一定的抗干扰能力。特别对人等非刚性目标能够进行较好跟踪。  相似文献   

6.
In recent years, the use of multi-view data has attracted much attention resulting in many multi-view batch learning algorithms. However, these algorithms prove expensive in terms of training time and memory when used on the incremental data. In this paper, we propose Multi-view Incremental Discriminant Analysis (MvIDA), which updates the trained model to incorporate new data samples. MvIDA requires only the old model and newly added data to update the model. Depending on the nature of the increments, MvIDA is presented as two cases, sequential MvIDA and chunk MvIDA. We have compared the proposed method against the batch Multi-view Discriminant Analysis (MvDA) for its discriminability, order independence, the effect of the number of views, training time, and memory requirements. We have also compared our method with single-view Incremental Linear Discriminant Analysis (ILDA) for accuracy and training time. The experiments are conducted on four datasets with a wide range of dimensions per view. The results show that through order independence and faster construction of the optimal discriminant subspace, MvIDA addresses the issues faced by the batch multi-view algorithms in the incremental setting.  相似文献   

7.
为减少深度Q网络算法的训练时间,采用结合优先经验回放机制与竞争网络结构的DQN方法,针对Open AI Gym平台cart pole和mountain car两个经典控制问题进行研究,其中经验回放采用基于排序的机制,而竞争结构中采用深度神经网络。仿真结果表明,相比于常规DQN算法、基于竞争网络结构的DQN方法和基于优先经验回放的DQN方法,该方法具有更好的学习性能,训练时间最少。同时,详细分析了算法参数对于学习性能的影响,为实际运用提供了有价值的参考。  相似文献   

8.
歹杰  李青山  褚华  周洋涛  杨文勇  卫彪彪 《软件学报》2022,33(10):3656-3672
近年来,随着互联网技术的迅猛发展,以慕课(MOOC)为代表的在线教育平台得到广泛普及.为助力“因材施教”的个性化智慧教育,以推荐算法为代表的人工智能技术受到了学术界与工业界的普遍关注.虽然在电子商务等领域获得了成功应用,但推荐算法与在线教育融合时仍面临严峻挑战:现有算法对隐式交互数据的挖掘不充足,推荐背后的知识指导作用不明显,面向实践的推荐系统软件有缺失.对此,设计了一套面向工业化场景的智慧课程推荐系统:(1)提出基于图卷积神经网络的推荐引擎,将“用户-课程”隐式交互数据建模为异构图;(2)将课程知识信息融入“用户-课程”异构图,深入挖掘了“用户-课程-知识”关联关系;(3)设计了高效的在线推荐系统,实现了“预处理-召回-离线排序-在线推荐-结果融合”的多段流水线原型,不仅能够快速响应课程推荐请求,更能有效缓解推荐算法落地的最大障碍——冷启动问题.最后,基于真实课程学习平台数据集,以对比实验表明了离线推荐引擎相比其他主流推荐算法的先进性,并基于两个典型用例分析验证了在线推荐系统面临工业场景需求的可用性.  相似文献   

9.
This paper presents a method for combining domain knowledge and machine learning (CDKML) for classifier generation and online adaptation. The method exploits advantages in domain knowledge and machine learning as complementary information sources. Whereas machine learning may discover patterns in interest domains that are too subtle for humans to detect, domain knowledge may contain information on a domain not present in the available domain dataset. CDKML has three steps. First, prior domain knowledge is enriched with relevant patterns obtained by machine learning to create an initial classifier. Second, genetic algorithms refine the classifier. Third, the classifier is adapted online on the basis of user feedback using the Markov decision process. CDKML was applied in fall detection. Tests showed that the classifiers developed by CDKML have better performance than machine‐learning classifiers generated on a training dataset that does not adequately represent all real‐life cases of the learned concept. The accuracy of the initial classifier was 10 percentage points higher than the best machine‐learning classifier and the refinement added 3 percentage points. The online adaptation improved the accuracy of the refined classifier by an additional 15 percentage points.  相似文献   

10.
This paper presents a novel online learning method for automatically detecting anatomic structures in medical images. Conventional off-line learning methods require collecting a complete set of representative samples prior to training a detector. Once the detector is trained, its performance is fixed. To improve the performance, the detector must be completely retrained, demanding the maintenance of historical training samples. Our proposed online approach eliminates the need for storing historical training samples and is capable of continually improving performance with new samples. We evaluate our approach with three distinct thoracic structures, demonstrating that our approach yields performance competitive with the off-line approach. Furthermore, we investigate the properties of our proposed method in comparison with an online learning method suggested by Grabner and Bischof (IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2006, vol. 1, pp. 260–267, 2006), which is the state of the art, indicating that our proposed method runs faster, offers more stability, improves handling of “catastrophic forgetting”, and simultaneously achieves a satisfactory level of adaptability. The enhanced performance is attributed to our novel online learning structure coupled with more accurate weaker learners based on histograms.  相似文献   

11.
目的 现有的图像识别方法应用于从同一分布中提取的训练数据和测试数据时具有良好性能,但这些方法在实际场景中并不适用,从而导致识别精度降低。使用领域自适应方法是解决此类问题的有效途径,领域自适应方法旨在解决来自两个领域相关但分布不同的数据问题。方法 通过对数据分布的分析,提出一种基于注意力迁移的联合平衡自适应方法,将源域有标签数据中提取的图像特征迁移至无标签的目标域。首先,使用注意力迁移机制将有标签源域数据的空间类别信息迁移至无标签的目标域。通过定义卷积神经网络的注意力,使用关注信息来提高图像识别精度。其次,基于目标数据集引入网络参数的先验分布,并且赋予网络自动调整每个领域对齐层特征对齐的能力。最后,通过跨域偏差来描述特定领域的特征对齐层的输入分布,定量地表示每层学习到的领域适应性程度。结果 该方法在数据集Office-31上平均识别准确率为77.6%,在数据集Office-Caltech上平均识别准确率为90.7%,不仅大幅领先于传统手工特征方法,而且取得了与目前最优的方法相当的识别性能。结论 注意力迁移的联合平衡领域自适应方法不仅可以获得较高的识别精度,而且能够自动学习领域间特征的对齐程度,同时也验证了进行域间特征迁移可以提高网络优化效果这一结论。  相似文献   

12.
Entity resolution (ER) is the problem of identifying and grouping different manifestations of the same real world object. Algorithmic approaches have been developed where most tasks offer superior performance under supervised learning. However, the prohibitive cost of labeling training data is still a huge obstacle for detecting duplicate query records from online sources. Furthermore, the unique combinations of noisy data with missing elements make ER tasks more challenging. To address this, transfer learning has been adopted to adaptively share learned common structures of similarity scoring problems between multiple sources. Although such techniques reduce the labeling cost so that it is linear with respect to the number of sources, its random sampling strategy is not successful enough to handle the ordinary sample imbalance problem. In this paper, we present a novel multi-source active transfer learning framework to jointly select fewer data instances from all sources to train classifiers with constant precision/recall. The intuition behind our approach is to actively label the most informative samples while adaptively transferring collective knowledge between sources. In this way, the classifiers that are learned can be both label-economical and flexible even for imbalanced or quality diverse sources. We compare our method with the state-of-the-art approaches on real-word datasets. Our experimental results demonstrate that our active transfer learning algorithm can achieve impressive performance with far fewer labeled samples for record matching with numerous and varied sources.  相似文献   

13.
Boost learning algorithm, such as AdaBoost, has been widely used in a variety of applications in multimedia and computer vision. Relevance feedback-based image retrieval has been formulated as a classification problem with a small number of training samples. Several machine learning techniques have been applied to this problem recently. In this paper, we propose a novel paired feature AdaBoost learning system for relevance feedback-based image retrieval. To facilitate density estimation in our feature learning method, we propose an ID3-like balance tree quantization method to preserve most discriminative information. By using paired feature combination, we map all training samples obtained in the relevance feedback process onto paired feature spaces and employ the AdaBoost algorithm to select a few feature pairs with best discrimination capabilities in the corresponding paired feature spaces. In the AdaBoost algorithm, we employ Bayesian classification to replace the traditional binary weak classifiers to enhance their classification power, thus producing a stronger classifier. Experimental results on content-based image retrieval (CBIR) show superior performance of the proposed system compared to some previous methods.  相似文献   

14.
Computing Optimal Attribute Weight Settings for Nearest Neighbor Algorithms   总被引:2,自引:0,他引:2  
Nearest neighbor (NN) learning algorithms, examples of the lazy learning paradigm, rely on a distance function to measure the similarity of testing examples with the stored training examples. Since certain attributes are more discriminative, while others can be less or totally irrelevant, attributes should be weighed differently in the distance function. Most previous studies on weight setting for NN learning algorithms are empirical. In this paper we describe our attempt on deciding theoretically optimal weights that minimize the predictive error for NN algorithms. Assuming a uniform distribution of examples in a 2-d continuous space, we first derive the average predictive error introduced by a linear classification boundary, and then determine the optimal weight setting for any polygonal classification region. Our theoretical results of optimal attribute weights can serve as a baseline or lower bound for comparing other empirical weight setting methods.  相似文献   

15.
It is challenging to model the performance of distributed graph computation. Explicit formulation cannot easily capture the diversified factors and complex interactions in the system. Statistical learning methods require a large number of training samples to generate an accurate prediction model. However, it is time-consuming to run the required graph computation tests to obtain the training samples. In this paper, we propose TransGPerf, a transfer learning based solution that can exploit prior knowledge from a source scenario and utilize a manageable amount of training data for modeling the performance of a target graph computation scenario. Experimental results show that our proposed method is capable of generating accurate models for a wide range of graph computation tasks on PowerGraph and GraphX. It outperforms transfer learning methods proposed for other applications in the literature.  相似文献   

16.
近年来,深度强化学习作为一种无模型的资源分配方法被用于解决无线网络中的同信道干扰问题。然而,基于常规经验回放策略的网络难以学习到有价值的经验,导致收敛速度较慢;而人工划定探索步长的方式没有考虑算法在每个训练周期上的学习情况,使得对环境的探索存在盲目性,限制了系统频谱效率的提升。对此,提出一种频分多址系统的分布式强化学习功率控制方法,采用优先经验回放策略,鼓励智能体从环境中学习更重要的数据,以加速学习过程;并且设计了一种适用于分布式强化学习、动态调整步长的探索策略,使智能体得以根据自身学习情况探索本地环境,减少人为设定步长带来的盲目性。实验结果表明,相比于现有算法,所提方法加快了收敛速度,提高了移动场景下的同信道干扰抑制能力,在大型网络中具有更高的性能。  相似文献   

17.
目的 度量学习是少样本学习中一种简单且有效的方法,学习一个丰富、具有判别性和泛化性强的嵌入空间是度量学习方法实现优秀分类效果的关键。本文从样本自身的特征以及特征在嵌入空间中的分布出发,结合全局与局部数据增强实现了一种元余弦损失的少样本图像分类方法(a meta-cosine loss for few-shot image classification,AMCL-FSIC)。方法 首先,从数据自身特征出发,将全局与局部的数据增广方法结合起来,利于局部信息提供更具区别性和迁移性的信息,使训练模型更多关注图像的前景信息。同时,利用注意力机制结合全局与局部特征,以得到更丰富更具判别性的特征。其次,从样本特征在嵌入空间中的分布出发,提出一种元余弦损失(meta-cosine loss,MCL)函数,优化少样本图像分类模型。使用样本与类原型间相似性的差调整不同类的原型,扩大类间距,使模型测试新任务时类间距更加明显,提升模型的泛化能力。结果 分别在5个少样本经典数据集上进行了实验对比,在FC100(Few-shot Cifar100)和CUB(Caltech-UCSD Birds-200-2011)数据集上,本文方法均达到了目前最优分类效果;在MiniImageNet、TieredImageNet和Cifar100数据集上与对比模型的结果相当。同时,在MiniImageNet,CUB和Cifar100数据集上进行对比实验以验证MCL的有效性,结果证明提出的MCL提升了余弦分类器的分类效果。结论 本文方法能充分提取少样本图像分类任务中的图像特征,有效提升度量学习在少样本图像分类中的准确率。  相似文献   

18.
Multi-task learning is to improve the performance of the model by transferring and exploiting common knowledge among tasks. Existing MTL works mainly focus on the scenario where label sets among multiple tasks (MTs) are usually the same, thus they can be utilized for learning across the tasks. However, the real world has more general scenarios in which each task has only a small number of training samples and their label sets are just partially overlapped or even not. Learning such MTs is more challenging because of less correlation information available among these tasks. For this, we propose a framework to learn these tasks by jointly leveraging both abundant information from a learnt auxiliary big task with sufficiently many classes to cover those of all these tasks and the information shared among those partially-overlapped tasks. In our implementation of using the same neural network architecture of the learnt auxiliary task to learn individual tasks, the key idea is to utilize available label information to adaptively prune the hidden layer neurons of the auxiliary network to construct corresponding network for each task, while accompanying a joint learning across individual tasks. Extensive experimental results demonstrate that our proposed method is significantly competitive compared to state-of-the-art methods.  相似文献   

19.
在模型未知的部分可观测马尔可夫决策过程(partially observable Markov decision process,POMDP)下,智能体无法直接获取环境的真实状态,感知的不确定性为学习最优策略带来挑战。为此,提出一种融合对比预测编码表示的深度双Q网络强化学习算法,通过显式地对信念状态建模以获取紧凑、高效的历史编码供策略优化使用。为改善数据利用效率,提出信念回放缓存池的概念,直接存储信念转移对而非观测与动作序列以减少内存占用。此外,设计分段训练策略将表示学习与策略学习解耦来提高训练稳定性。基于Gym-MiniGrid环境设计了POMDP导航任务,实验结果表明,所提出算法能够捕获到与状态相关的语义信息,进而实现POMDP下稳定、高效的策略学习。  相似文献   

20.
赵英男  刘鹏  赵巍  唐降龙 《自动化学报》2019,45(10):1870-1882
实现深度Q学习的一种方式是深度Q网络(Deep Q-networks,DQN).经验回放方法利用经验池中的样本训练深度Q网络,构造经验池需要智能体与环境进行大量交互,这样会增加成本和风险.一种减少智能体与环境交互次数的有效方式是高效利用样本.样本所在序列的累积回报对深度Q网络训练有影响.累积回报大的序列中的样本相对于累积回报小的序列中的样本更能加速深度Q网络的收敛速度,并提升策略的质量.本文提出深度Q学习的二次主动采样方法.首先,根据序列累积回报的分布构造优先级对经验池中的序列进行采样.然后,在已采样的序列中根据样本的TD-error(Temporal-difference error)分布构造优先级对样本采样.随后用两次采样得到的样本训练深度Q网络.该方法从序列累积回报和TD-error两个方面选择样本,以达到加速深度Q网络收敛,提升策略质量的目的.在Atari平台上进行了验证.实验结果表明,用经过二次主动采样得到的样本训练深度Q网络取得了良好的效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号