首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The ability to predict a student’s performance could be useful in a great number of different ways associated with university-level distance learning. Students’ marks in a few written assignments can constitute the training set for a supervised machine learning algorithm. Along with the explosive increase of data and information, incremental learning ability has become more and more important for machine learning approaches. The online algorithms try to forget irrelevant information instead of synthesizing all available information (as opposed to classic batch learning algorithms). Nowadays, combining classifiers is proposed as a new direction for the improvement of the classification accuracy. However, most ensemble algorithms operate in batch mode. Therefore a better proposal is an online ensemble of classifiers that combines an incremental version of Naive Bayes, the 1-NN and the WINNOW algorithms using the voting methodology. Among other significant conclusions it was found that the proposed algorithm is the most appropriate to be used for the construction of a software support tool.  相似文献   

2.
Along with the increase of data and information, incremental learning ability turns out to be more and more important for machine learning approaches. The online algorithms try not to remember irrelevant information instead of synthesizing all available information (as opposed to classic batch learning algorithms). In this study, we attempted to increase the prediction accuracy of an incremental version of Naive Bayes model by integrating instance based learning. We performed a large-scale comparison of the proposed method with other state-of-the-art algorithms on several datasets and the proposed method produce better accuracy in most cases.  相似文献   

3.
将集成学习的思想引入到增量学习之中可以显著提升学习效果,近年关于集成式增量学习的研究大多采用加权投票的方式将多个同质分类器进行结合,并没有很好地解决增量学习中的稳定-可塑性难题。针对此提出了一种异构分类器集成增量学习算法。该算法在训练过程中,为使模型更具稳定性,用新数据训练多个基分类器加入到异构的集成模型之中,同时采用局部敏感哈希表保存数据梗概以备待测样本近邻的查找;为了适应不断变化的数据,还会用新获得的数据更新集成模型中基分类器的投票权重;对待测样本进行类别预测时,以局部敏感哈希表中与待测样本相似的数据作为桥梁,计算基分类器针对该待测样本的动态权重,结合多个基分类器的投票权重和动态权重判定待测样本所属类别。通过对比实验,证明了该增量算法有比较高的稳定性和泛化能力。  相似文献   

4.
Incremental learning has been used extensively for data stream classification. Most attention on the data stream classification paid on non-evolutionary methods. In this paper, we introduce new incremental learning algorithms based on harmony search. We first propose a new classification algorithm for the classification of batch data called harmony-based classifier and then give its incremental version for classification of data streams called incremental harmony-based classifier. Finally, we improve it to reduce its computational overhead in absence of drifts and increase its robustness in presence of noise. This improved version is called improved incremental harmony-based classifier. The proposed methods are evaluated on some real world and synthetic data sets. Experimental results show that the proposed batch classifier outperforms some batch classifiers and also the proposed incremental methods can effectively address the issues usually encountered in the data stream environments. Improved incremental harmony-based classifier has significantly better speed and accuracy on capturing concept drifts than the non-incremental harmony based method and its accuracy is comparable to non-evolutionary algorithms. The experimental results also show the robustness of improved incremental harmony-based classifier.  相似文献   

5.
We present an evaluation of incremental learning algorithms for the estimation of hidden Markov model (HMM) parameters. The main goal is to investigate incremental learning algorithms that can provide as good performances as traditional batch learning techniques, but incorporating the advantages of incremental learning for designing complex pattern recognition systems. Experiments on handwritten characters have shown that a proposed variant of the ensemble training algorithm, employing ensembles of HMMs, can lead to very promising performances. Furthermore, the use of a validation dataset demonstrated that it is possible to reach better performances than the ones presented by batch learning.  相似文献   

6.
方丁  王刚 《计算机系统应用》2012,21(7):177-181,248
随着Web2.0的迅速发展,越来越多的用户乐于在互联网上分享自己的观点或体验。这类评论信息迅速膨胀,仅靠人工的方法难以应对网上海量信息的收集和处理,因此基于计算机的文本情感分类技术应运而生,并且研究的重点之一就是提高分类的精度。由于集成学习理论是提高分类精度的一种有效途径,并且已在许多领域显示出其优于单个分类器的良好性能,为此,提出基于集成学习理论的文本情感分类方法。实验结果显示三种常用的集成学习方法 Bagging、Boosting和Random Subspace对基础分类器的分类精度都有提高,并且在不同的基础分类器条件下,Random Subspace方法较Bagging和Boosting方法在统计意义上更优,以上结果进一步验证了集成学习理论在文本情感分类中应用的有效性。  相似文献   

7.
Mining data streams is the process of extracting information from non-stopping, rapidly flowing data records to provide knowledge that is reliable and timely. Streaming data algorithms need to be one pass and operate under strict limitations of memory and response time. In addition, the classification of streaming data requires learning in an environment where the data characteristics might change constantly. Many of the classification algorithms presented in literature assume a 100 % labeling rate, which is impractical and expensive when data records are rapidly flowing in. In this paper, a new incremental grid density based learning framework, the GC3 framework, is proposed to perform classification of streaming data with concept drift and limited labeling. The proposed framework uses grid density clustering to detect changes in the input data space. It maintains an evolving ensemble of classifiers to learn and adapt to the model changes over time. The framework also uses a uniform grid density sampling mechanism to obtain a uniform subset of samples for better classification performance with a lower labeling rate. The entire framework is designed to be one-pass, incremental and work with limited memory to perform any-time classification on demand. Experimental comparison with state of the art concept drift handling systems demonstrate the GC3 frameworks ability to provide high classification performance, using fewer models in the ensemble and with only 4-6 % of the samples labeled. The results show that the GC3 framework is effective and attractive for use in real world data stream classification applications.  相似文献   

8.
集成学习被广泛用于提高分类精度, 近年来的研究表明, 通过多模态扰乱策略来构建集成分类器可以进一步提高分类性能. 本文提出了一种基于近似约简与最优采样的集成剪枝算法(EPA_AO). 在EPA_AO中, 我们设计了一种多模态扰乱策略来构建不同的个体分类器. 该扰乱策略可以同时扰乱属性空间和训练集, 从而增加了个体分类器的多样性. 我们利用证据KNN (K-近邻)算法来训练个体分类器, 并在多个UCI数据集上比较了EPA_AO与现有同类型算法的性能. 实验结果表明, EPA_AO是一种有效的集成学习方法.  相似文献   

9.
在集成学习中使用平均法、投票法作为结合策略无法充分利用基分类器的有效信息,且根据波动性设置基分类器的权重不精确、不恰当。以上问题会降低集成学习的效果,为了进一步提高集成学习的性能,提出将证据推理(evidence reasoning, ER)规则作为结合策略,并使用多样性赋权法设置基分类器的权重。首先,由多个深度学习模型作为基分类器、ER规则作为结合策略,构建集成学习的基本结构;然后,通过多样性度量方法计算每个基分类器相对于其他基分类器的差异性;最后,将差异性归一化实现基分类器的权重设置。通过多个图像数据集的分类实验,结果表明提出的方法较实验选取的其他方法准确率更高且更稳定,证明了该方法可以充分利用基分类器的有效信息,且多样性赋权法更精确。  相似文献   

10.
Ke  Minlong  Fernanda L.  Xin   《Neurocomputing》2009,72(13-15):2796
Negative correlation learning (NCL) is a successful approach to constructing neural network ensembles. In batch learning mode, NCL outperforms many other ensemble learning approaches. Recently, NCL has also shown to be a potentially powerful approach to incremental learning, while the advantages of NCL have not yet been fully exploited. In this paper, we propose a selective NCL (SNCL) algorithm for incremental learning. Concretely, every time a new training data set is presented, the previously trained neural network ensemble is cloned. Then the cloned ensemble is trained on the new data set. After that, the new ensemble is combined with the previous ensemble and a selection process is applied to prune the whole ensemble to a fixed size. This paper is an extended version of our preliminary paper on SNCL. Compared to the previous work, this paper presents a deeper investigation into SNCL, considering different objective functions for the selection process and comparing SNCL to other NCL-based incremental learning algorithms on two more real world bioinformatics data sets. Experimental results demonstrate the advantage of SNCL. Further, comparisons between SNCL and other existing incremental learning algorithms, such Learn++ and ARTMAP, are also presented.  相似文献   

11.
Traditional nonlinear manifold learning methods have achieved great success in dimensionality reduction and feature extraction, most of which are batch modes. However, if new samples are observed, the batch methods need to be calculated repeatedly, which is computationally intensive, especially when the number or dimension of the input samples are large. This paper presents incremental learning algorithms for Laplacian eigenmaps, which computes the low-dimensional representation of data set by optimally preserving local neighborhood information in a certain sense. Sub-manifold analysis algorithm together with an alternative formulation of linear incremental method is proposed to learn the new samples incrementally. The locally linear reconstruction mechanism is introduced to update the existing samples’ embedding results. The algorithms are easy to be implemented and the computation procedure is simple. Simulation results testify the efficiency and accuracy of the proposed algorithms.  相似文献   

12.
Recent years have witnessed great success of manifold learning methods in understanding the structure of multidimensional patterns. However, most of these methods operate in a batch mode and cannot be effectively applied when data are collected sequentially. In this paper, we propose a general incremental learning framework, capable of dealing with one or more new samples each time, for the so-called spectral embedding methods. In the proposed framework, the incremental dimensionality reduction problem reduces to an incremental eigen-problem of matrices. Furthermore, we present, using this framework as a tool, an incremental version of Hessian eigenmaps, the IHLLE method. Finally, we show several experimental results on both synthetic and real world datasets, demonstrating the efficiency and accuracy of the proposed algorithm.  相似文献   

13.
集成多个传感器的智能片上系统( SoC)在物联网得到了广泛的应用.在融合多个传感器数据的分类算法方面,传统的支持向量机( SVM)单分类器不能直接对传感器数据流进行小样本增量学习.针对上述问题,提出一种基于Bagging-SVM的集成增量算法,该算法通过在增量数据中采用Bootstrap方式抽取训练集,构造能够反映新信息变化的集成分类器,然后将新老分类器集成,实现集成增量学习.实验结果表明:该算法相比SVM单分类器能够有效降低分类误差,提高分类准确率,且具有较好的泛化能力,可以满足当下智能传感器系统基于小样本数据流的在线学习需求.  相似文献   

14.
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.  相似文献   

15.
In many real-world applications, pattern recognition systems are designed a priori using limited and imbalanced data acquired from complex changing environments. Since new reference data often becomes available during operations, performance could be maintained or improved by adapting these systems through supervised incremental learning. To avoid knowledge corruption and sustain a high level of accuracy over time, an adaptive multiclassifier system (AMCS) may integrate information from diverse classifiers that are guided by a population-based evolutionary optimization algorithm. In this paper, an incremental learning strategy based on dynamic particle swarm optimization (DPSO) is proposed to evolve heterogeneous ensembles of classifiers (where each classifier corresponds to a particle) in response to new reference samples. This new strategy is applied to video-based face recognition, using an AMCS that consists of a pool of fuzzy ARTMAP (FAM) neural networks for classification of facial regions, and a niching version of DPSO that optimizes all FAM parameters such that the classification rate is maximized. Given that diversity within a dynamic particle swarm is correlated with diversity within a corresponding pool of base classifiers, DPSO properties are exploited to generate and evolve diversified pools of FAM classifiers, and to efficiently select ensembles among the pools based on accuracy and particle swarm diversity. Performance of the proposed strategy is assessed in terms of classification rate and resource requirements under different incremental learning scenarios, where new reference data is extracted from real-world video streams. Simulation results indicate the DPSO strategy provides an efficient way to evolve ensembles of FAM networks in an AMCS. Maintaining particle diversity in the optimization space yields a level of accuracy that is comparable to AMCS using reference ensemble-based and batch learning techniques, but requires significantly lower computational complexity than assessing diversity among classifiers in the feature or decision spaces.  相似文献   

16.
基于多分类器的数据流中的概念漂移挖掘   总被引:4,自引:0,他引:4  
数据流中概念漂移的检测是当前数据挖掘领域的重要研究分支, 近年来得到了广泛的关注. 本文提出了一种称为 M_ID4 的数据流挖掘算法. 它是在大容量数据流挖掘中, 通过尽量少的训练样本来实现概念漂移检测的快速方法. 利用多分类器综合技术, M_ID4 实现了数据流中概念漂移的增量式检测和挖掘. 实验结果表明, M_ID4 算法在处理数据流的概念漂移上表现出比已有同类算法更高的精确度和适应性.  相似文献   

17.
This paper introduces Learn++, an ensemble of classifiers based algorithm originally developed for incremental learning, and now adapted for information/data fusion applications. Recognizing the conceptual similarity between incremental learning and data fusion, Learn++ follows an alternative approach to data fusion, i.e., sequentially generating an ensemble of classifiers that specifically seek the most discriminating information from each data set. It was observed that Learn++ based data fusion consistently outperforms a similarly configured ensemble classifier trained on any of the individual data sources across several applications. Furthermore, even if the classifiers trained on individual data sources are fine tuned for the given problem, Learn++ can still achieve a statistically significant improvement by combining them, if the additional data sets carry complementary information. The algorithm can also identify-albeit indirectly-those data sets that do not carry such additional information. Finally, it was shown that the algorithm can consecutively learn both the supplementary novel information coming from additional data of the same source, and the complementary information coming from new data sources without requiring access to any of the previously seen data.  相似文献   

18.
针对集成学习中bootstrap方法不能产生具有较大差异性的成员分类器,提出基于多模式扰动模型动态加权SVM集成方法。该方法在训练样本中使用bootstrap采样产生扰动,在输入特征中使用PCA特征滤波子空间法产生扰动,用自动模型选择法来动态扰动每个成员分类器的参数,用分类精度对成员分类器加权集成扰动输出。实验结果表明该方法比常用的bootstrap集成方法具有更好的集成效果。  相似文献   

19.
Classification is the most used supervized machine learning method. As each of the many existing classification algorithms can perform poorly on some data, different attempts have arisen to improve the original algorithms by combining them. Some of the best know results are produced by ensemble methods, like bagging or boosting. We developed a new ensemble method called allocation. Allocation method uses the allocator, an algorithm that separates the data instances based on anomaly detection and allocates them to one of the micro classifiers, built with the existing classification algorithms on a subset of training data. The outputs of micro classifiers are then fused together into one final classification. Our goal was to improve the results of original classifiers with this new allocation method and to compare the classification results with existing ensemble methods. The allocation method was tested on 30 benchmark datasets and was used with six well known basic classification algorithms (J48, NaiveBayes, IBk, SMO, OneR and NBTree). The obtained results were compared to those of the basic classifiers as well as other ensemble methods (bagging, MultiBoost and AdaBoost). Results show that our allocation method is superior to basic classifiers and also to tested ensembles in classification accuracy and f-score. The conducted statistical analysis, when all of the used classification algorithms are considered, confirmed that our allocation method performs significantly better both in classification accuracy and f-score. Although the differences are not significant for each of the used basic classifier alone, the allocation method achieved the biggest improvements on all six basic classification algorithms. In this manner, allocation method proved to be a competitive ensemble method for classification that can be used with various classification algorithms and can possibly outperform other ensembles on different types of data.  相似文献   

20.
Boosting Algorithms for Parallel and Distributed Learning   总被引:1,自引:0,他引:1  
The growing amount of available information and its distributed and heterogeneous nature has a major impact on the field of data mining. In this paper, we propose a framework for parallel and distributed boosting algorithms intended for efficient integrating specialized classifiers learned over very large, distributed and possibly heterogeneous databases that cannot fit into main computer memory. Boosting is a popular technique for constructing highly accurate classifier ensembles, where the classifiers are trained serially, with the weights on the training instances adaptively set according to the performance of previous classifiers. Our parallel boosting algorithm is designed for tightly coupled shared memory systems with a small number of processors, with an objective of achieving the maximal prediction accuracy in fewer iterations than boosting on a single processor. After all processors learn classifiers in parallel at each boosting round, they are combined according to the confidence of their prediction. Our distributed boosting algorithm is proposed primarily for learning from several disjoint data sites when the data cannot be merged together, although it can also be used for parallel learning where a massive data set is partitioned into several disjoint subsets for a more efficient analysis. At each boosting round, the proposed method combines classifiers from all sites and creates a classifier ensemble on each site. The final classifier is constructed as an ensemble of all classifier ensembles built on disjoint data sets. The new proposed methods applied to several data sets have shown that parallel boosting can achieve the same or even better prediction accuracy considerably faster than the standard sequential boosting. Results from the experiments also indicate that distributed boosting has comparable or slightly improved classification accuracy over standard boosting, while requiring much less memory and computational time since it uses smaller data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号