首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
传统的雷电数据预测方法往往采用单一最优机器学习算法,较少考虑气象数据的时空变化等现象。针对该现象,提出一种基于集成策略的多机器学习短时雷电预报算法。首先,对气象数据进行属性约简,降低数据维度;其次,在数据集上训练多种异构机器学习分类器,并基于预测质量筛选最优基分类器;最后,通过对最优基分类器训练权重,并结合集成策略产生最终分类器。实验表明,该方法优于传统单最优方法,其平均预测准确率提高了9.5%。  相似文献   

2.
将集成学习的思想引入到增量学习之中可以显著提升学习效果,近年关于集成式增量学习的研究大多采用加权投票的方式将多个同质分类器进行结合,并没有很好地解决增量学习中的稳定-可塑性难题。针对此提出了一种异构分类器集成增量学习算法。该算法在训练过程中,为使模型更具稳定性,用新数据训练多个基分类器加入到异构的集成模型之中,同时采用局部敏感哈希表保存数据梗概以备待测样本近邻的查找;为了适应不断变化的数据,还会用新获得的数据更新集成模型中基分类器的投票权重;对待测样本进行类别预测时,以局部敏感哈希表中与待测样本相似的数据作为桥梁,计算基分类器针对该待测样本的动态权重,结合多个基分类器的投票权重和动态权重判定待测样本所属类别。通过对比实验,证明了该增量算法有比较高的稳定性和泛化能力。  相似文献   

3.
Different classifiers with different characteristics and methodologies can complement each other and cover their internal weaknesses; so classifier ensemble is an important approach to handle the weakness of single classifier based systems. In this article we explore an automatic and fast function to approximate the accuracy of a given classifier on a typical dataset. Then employing the function, we can convert the ensemble learning to an optimisation problem. So, in this article, the target is to achieve a model to approximate the performance of a predetermined classifier over each arbitrary dataset. According to this model, an optimisation problem is designed and a genetic algorithm is employed as an optimiser to explore the best classifier set in each subspace. The proposed ensemble methodology is called classifier ensemble based on subspace learning (CEBSL). CEBSL is examined on some datasets and it shows considerable improvements.  相似文献   

4.
集成分类通过将若干个弱分类器依据某种规则进行组合,能有效改善分类性能。在组合过程中,各个弱分类器对分类结果的重要程度往往不一样。极限学习机是最近提出的一个新的训练单隐层前馈神经网络的学习算法。以极限学习机为基分类器,提出了一个基于差分进化的极限学习机加权集成方法。提出的方法通过差分进化算法来优化集成方法中各个基分类器的权值。实验结果表明,该方法与基于简单投票集成方法和基于Adaboost集成方法相比,具有较高的分类准确性和较好的泛化能力。  相似文献   

5.
动态分类器集成选择(DCES)是当前集成学习领域中一个非常重要的研究方向。然而,当前大部分 DCES算法的计算复杂度较高。为了解决该问题和进一步提高算法的性能,本文提出了基于聚类的动态分类器集成选择(CDCES),该方法通过对测试样本聚类,极大地减少了动态选择分类器的次数,因而降低了算法的计算复杂度。同时, CDCES是一种更加通用的算法,传统的静态选择性集成和动态分类器集成为本算法的特殊情况,因而本算法是一种鲁棒性更强的算法。通过对UCI数据集进行测试,以及与其他算法作比较,说明本算法是一种有效的、计算复杂度较低的方法。  相似文献   

6.
点击欺诈是近年来最常见的网络犯罪手段之一,互联网广告行业每年都会因点击欺诈而遭受巨大损失。为了能够在海量点击中有效地检测欺诈点击,构建了多种充分结合广告点击与时间属性关系的特征,并提出了一种点击欺诈检测的集成学习框架——CAT-RFE集成学习框架。CAT-RFE集成学习框架包含3个部分:基分类器、递归特征消除(RFE,recursive feature elimination)和voting集成学习。其中,将适用于类别特征的梯度提升模型——CatBoost(categorical boosting)作为基分类器;RFE是基于贪心策略的特征选择方法,可在多组特征中选出较好的特征组合;Voting集成学习是采用投票的方式将多个基分类器的结果进行组合的学习方法。该框架通过CatBoost和RFE在特征空间中获取多组较优的特征组合,再在这些特征组合下的训练结果通过voting进行集成,获得集成的点击欺诈检测结果。该框架采用了相同的基分类器和集成学习方法,不仅克服了差异较大的分类器相互制约而导致集成结果不理想的问题,也克服了RFE在选择特征时容易陷入局部最优解的问题,具备更好的检测能力。在实际互联网点击欺诈数据集上的性能评估和对比实验结果显示,CAT-RFE集成学习框架的点击欺诈检测能力超过了CatBoost模型、CatBoost和RFE组合的模型以及其他机器学习模型,证明该框架具备良好的竞争力。该框架为互联网广告点击欺诈检测提供一种可行的解决方案。  相似文献   

7.
Hybrid models based on feature selection and machine learning techniques have significantly enhanced the accuracy of standalone models. This paper presents a feature selection‐based hybrid‐bagging algorithm (FS‐HB) for improved credit risk evaluation. The 2 feature selection methods chi‐square and principal component analysis were used for ranking and selecting the important features from the datasets. The classifiers were built on 5 training and test data partitions of the input data set. The performance of the hybrid algorithm was compared with that of the standalone classifiers: feature selection‐based classifiers and bagging. The hybrid FS‐HB algorithm performed best for qualitative dataset with less features and tree‐based unstable base classifier. Its performance on numeric data was also better than other standalone classifiers, whereas comparable to bagging with only selected features. Its performance was found better on 70:30 data partition and the type II error, which is very significant in risk evaluation was also reduced significantly. The improved performance of FS‐HB is attributed to the important features used for developing the classifier thereby reducing the complexity of the algorithm and the use of ensemble methodology, which added to the classical bias variance trade‐off and performed better than standalone classifiers.  相似文献   

8.
This paper proposes a method for combining multiple tree classifiers based on both classifier ensemble (bagging) and dynamic classifier selection schemes (DCS). The proposed method is composed of the following procedures: (1) building individual tree classifiers based on bootstrap samples; (2) calculating the distance between all possible two trees; (3) clustering the trees based on single linkage clustering; (4) selecting two clusters by local region in terms of accuracy and error diversity; and (5) voting the results of tree classifiers selected in the two clusters. Empirical evaluation using publicly available data sets confirms the superiority of our proposed approach over other classifier combining methods.  相似文献   

9.
为改进SVM对不均衡数据的分类性能,提出一种基于拆分集成的不均衡数据分类算法,该算法对多数类样本依据类别之间的比例通过聚类划分为多个子集,各子集分别与少数类合并成多个训练子集,通过对各训练子集进行学习获得多个分类器,利用WE集成分类器方法对多个分类器进行集成,获得最终分类器,以此改进在不均衡数据下的分类性能.在UCI数据集上的实验结果表明,该算法的有效性,特别是对少数类样本的分类性能.  相似文献   

10.
针对原有集成学习多样性不足而导致的集成效果不够显著的问题,提出一种基于概率校准的集成学习方法以及两种降低多重共线性影响的方法。首先,通过使用不同的概率校准方法对原始分类器给出的概率进行校准;然后使用前一步生成的若干校准后的概率进行学习,从而预测最终结果。第一步中使用的不同概率校准方法为第二步的集成学习提供了更强的多样性。接下来,针对校准概率与原始概率之间的多重共线性问题,提出了选择最优(choose-best)和有放回抽样(bootstrap)的方法。选择最优方法对每个基分类器,从原始分类器和若干校准分类器之间选择最优的进行集成;有放回抽样方法则从整个基分类器集合中进行有放回的抽样,然后对抽样出来的分类器进行集成。实验表明,简单的概率校准集成学习对学习效果的提高有限,而使用了选择最优和有放回抽样方法后,学习效果得到了较大的提高。此结果说明,概率校准为集成学习提供了更强的多样性,其伴随的多重共线性问题可以通过抽样等方法有效地解决。  相似文献   

11.
Ensemble methods aim at combining multiple learning machines to improve the efficacy in a learning task in terms of prediction accuracy, scalability, and other measures. These methods have been applied to evolutionary machine learning techniques including learning classifier systems (LCSs). In this article, we first propose a conceptual framework that allows us to appropriately categorize ensemble‐based methods for fair comparison and highlights the gaps in the corresponding literature. The framework is generic and consists of three sequential stages: a pre‐gate stage concerned with data preparation; the member stage to account for the types of learning machines used to build the ensemble; and a post‐gate stage concerned with the methods to combine ensemble output. A taxonomy of LCSs‐based ensembles is then presented using this framework. The article then focuses on comparing LCS ensembles that use feature selection in the pre‐gate stage. An evaluation methodology is proposed to systematically analyze the performance of these methods. Specifically, random feature sampling and rough set feature selection‐based LCS ensemble methods are compared. Experimental results show that the rough set‐based approach performs significantly better than the random subspace method in terms of classification accuracy in problems with high numbers of irrelevant features. The performance of the two approaches are comparable in problems with high numbers of redundant features.  相似文献   

12.
针对集成分类器由于基分类器过弱,需要牺牲大量训练时间才能取得高精度的问题,提出一种基于实例的强分类器快速集成方法——FSE。首先通过基分类器评价方法剔除不合格分类器,再对分类器进行精确度和差异性排序,从而得到一组精度最高、差异性最大的分类器;然后通过FSE集成算法打破已有的样本分布,重新采样使分类器更多地关注难学习的样本,并以此决定各分类器的权重并集成。实验通过与集成分类器Boosting在UCI数据库和真实数据集上进行比对,Boosting构造的集成分类器的识别精度最高分别能达到90.2%和90.4%,而使用FSE方法的集成分类器精度分别能达到95.6%和93.9%;而且两者在达到相同精度时,使用FSE方法的集成分类器分别缩短了75%和80%的训练时间。实验结果表明,FSE集成模型能有效提高识别精度、缩短训练时间。  相似文献   

13.
针对非平衡警情数据改进的K-Means-Boosting-BP模型   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 掌握警情的时空分布规律,通过机器学习算法建立警情时空预测模型,制定科学的警务防控方案,有效抑制犯罪的发生,是犯罪地理研究的重点。已有研究表明,警情时空分布多集中在中心城区或居民密集区,在时空上属于非平衡数据,这种数据的非平衡性通常导致在该数据上训练的模型成为弱学习器,预测精度较低。为解决这种非平衡数据的回归问题,提出一种基于KMeans均值聚类的Boosting算法。方法 该算法以Boosting集成学习算法为基础,应用GA-BP神经网络生成基分类器,借助KMeans均值聚类算法进行基分类器的集成,从而实现将弱学习器提升为强学习器的目标。结果 与常用的解决非平衡数据回归问题的Synthetic Minority Oversampling Technique Boosting算法,简称SMOTEBoosting算法相比,该算法具有两方面的优势:1)在降低非平衡数据中少数类均方误差的同时也降低了数据的整体均方误差,SMOTEBoosting算法的整体均方误差为2.14E-04,KMeans-Boosting算法的整体均方误差达到9.85E-05;2)更好地平衡了少数类样本识别的准确率和召回率,KMeans-Boosting算法的召回率约等于52%,SMOTEBoosting算法的召回率约等于91%;但KMeans-Boosting算法的准确率等于85%,远高于SMOTEBoosting算法的19%。结论 KMeans-Boosting算法能够显著的降低非平衡数据的整体均方误差,提高少数类样本识别的准确率和召回率,是一种有效地解决非平衡数据回归问题和分类问题的算法,可以推广至其他需要处理非平衡数据的领域中。  相似文献   

14.
基于k-means聚类的神经网络分类器集成方法研究   总被引:2,自引:1,他引:2       下载免费PDF全文
针对差异性是集成学习的必要条件,研究了基于k-means聚类技术提高神经网络分类器集成差异性的方法。通过训练集并使用神经网络分类器学习算法训练许多分类器模型,在验证集中利用每个分类器的分类结果作为聚类的数据对象;然后应用k-means聚类方法对这些数据聚类,在聚类结果的每个簇中选择一个分类器代表模型,以此构成集成学习的成员;最后应用投票方法实验研究了这种提高集成学习差异性方法的性能,并与常用的集成学习方法bagging、adaboost进行了比较。  相似文献   

15.
针对传统的分类器集成的每次迭代通常是将单个最优个体分类器集成到强分类器中,而其它可能有辅助作用的个体分类器被简单抛弃的问题,提出了一种基于Boosting框架的非稀疏多核学习方法MKL-Boost,利用了分类器集成学习的思想,每次迭代时,首先从训练集中选取一个训练子集,然后利用正则化非稀疏多核学习方法训练最优个体分类器,求得的个体分类器考虑了M个基本核的最优非稀疏线性凸组合,通过对核组合系数施加LP范数约束,一些好的核得以保留,从而保留了更多的有用特征信息,差的核将会被去掉,保证了有选择性的核融合,然后将基于核组合的最优个体分类器集成到强分类器中。提出的算法既具有Boosting集成学习的优点,同时具有正则化非稀疏多核学习的优点,实验表明,相对于其它Boosting算法,MKL-Boost可以在较少的迭代次数内获得较高的分类精度。  相似文献   

16.
相比于集成学习,集成剪枝方法是在多个分类器中搜索最优子集从而改善分类器的泛化性能,简化集成过程。帕累托集成剪枝方法同时考虑了分类器的精准度及集成规模两个方面,并将二者均作为优化的目标。然而帕累托集成剪枝算法只考虑了基分类器的精准度与集成规模,忽视了分类器之间的差异性,从而导致了分类器之间的相似度比较大。本文提出了融入差异性的帕累托集成剪枝算法,该算法将分类器的差异性与精准度综合为第1个优化目标,将集成规模作为第2个优化目标,从而实现多目标优化。实验表明,当该改进的集成剪枝算法与帕累托集成剪枝算法在集成规模相当的前提下,由于差异性的融入该改进算法能够获得较好的性能。  相似文献   

17.
为了提高面部表情的分类识别性能,基于集成学习理论,提出了一种二次优化选择性(Quadratic Optimization Choice, QOC)集成分类模型。首先,对于9个基分类器,依据性能进行排序,选择前30%的基分类器作为集成模型的候选基分类器。其次,依据组合规则产生集成模型簇。最后,对集成模型簇进行二次优化选择,选择具有最小泛化误差的集成分类器的子集,从而确定最优集成分类模型。为了验证QOC集成分类模型的性能,选择采用最大值、最小值和均值规则的集成模型作为对比模型,实验结果表明:相对基分类器,QOC集成分类模型取得了较好的分类效果,尤其是对于识别率较差的悲伤表情类,平均识别率提升了21.11%。相对于非选择性集成模型,QOC集成分类模型识别性能也有显著提高。  相似文献   

18.
基于动态加权的粗糙子空间集成   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于动态加权的粗糙子空间集成方法EROS-DW。利用粗糙集属性约简方法获得多个特征约简子集,并据此训练基分类器。在分类阶段,根据给定待测样本的具体特征动态地为每个基分类器指派相应的权重,采用加权投票组合规则集成各分类器的输出结果。利用UCI标准数据集对该方法的性能进行测试。实验结果表明,相较于经典的集成方法,EROS-DW方法可以获得更高的分类准确率。  相似文献   

19.
Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets.  相似文献   

20.
In this paper, a measure of competence based on random classification (MCR) for classifier ensembles is presented. The measure selects dynamically (i.e. for each test example) a subset of classifiers from the ensemble that perform better than a random classifier. Therefore, weak (incompetent) classifiers that would adversely affect the performance of a classification system are eliminated. When all classifiers in the ensemble are evaluated as incompetent, the classification accuracy of the system can be increased by using the random classifier instead. Theoretical justification for using the measure with the majority voting rule is given. Two MCR based systems were developed and their performance was compared against six multiple classifier systems using data sets taken from the UCI Machine Learning Repository and Ludmila Kuncheva Collection. The systems developed had typically the highest classification accuracies regardless of the ensemble type used (homogeneous or heterogeneous).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号