首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
基于动态加权的粗糙子空间集成   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于动态加权的粗糙子空间集成方法EROS-DW。利用粗糙集属性约简方法获得多个特征约简子集,并据此训练基分类器。在分类阶段,根据给定待测样本的具体特征动态地为每个基分类器指派相应的权重,采用加权投票组合规则集成各分类器的输出结果。利用UCI标准数据集对该方法的性能进行测试。实验结果表明,相较于经典的集成方法,EROS-DW方法可以获得更高的分类准确率。  相似文献   

2.
入侵检测领域的数据往往具有高维性及非线性特点,且其中含有大量的噪声、冗余及连续型属性,这就使得一般的模式分类方法不能对其进行有效的处理。为了进一步提高入侵检测效果,提出了基于邻域粗糙集的入侵检测集成算法。采用Bagging技术产生多个具有较大差异性的训练子集,针对入侵检测数据的连续型特点,在各训练子集上使用具有不同半径的邻域粗糙集模型进行属性约简,消除冗余与噪声,实现属性约简以提高属性子集的分类性能,同时也获得具有更大差异性的训练子集,采用SVM为分类器训练多个基分类器,以各基分类器的检测精度构造权重进行加权集成。KDD99数据集的仿真实验结果表明,该算法能有效地提高入侵检测的精度和效率,具有较高的泛化性和稳定性。  相似文献   

3.
现有的集成技术大多使用经过训练的各个分类器来组成集成系统,集成系统的庞大导致产生额外的内存开销和计算时间。为了提高集成分类模型的泛化能力和效率,在粗糙集属性约简的研究基础上,提出了一种基于属性约简的自采样集成分类方法。该方法将蚁群优化和属性约简相结合的策略应用在原始特征集上,进而得到多个最优的特征约简子空间,以任意一个约简的特征子集作为集成分类的特征输入,能在一定程度上减少分类器的内存消耗和计算时间;然后结合以样本的学习结果和学习速度为约束条件的自采样方法,迭代训练每个基分类器。最后实验结果验证了本文方法的有效性。  相似文献   

4.
为改进SVM对不均衡数据的分类性能,提出一种基于拆分集成的不均衡数据分类算法,该算法对多数类样本依据类别之间的比例通过聚类划分为多个子集,各子集分别与少数类合并成多个训练子集,通过对各训练子集进行学习获得多个分类器,利用WE集成分类器方法对多个分类器进行集成,获得最终分类器,以此改进在不均衡数据下的分类性能.在UCI数据集上的实验结果表明,该算法的有效性,特别是对少数类样本的分类性能.  相似文献   

5.
为降低集成特征选择方法的计算复杂性,提出了一种基于粗糙集约简的神经网络集成分类方法。该方法首先通过结合遗传算法求约简和重采样技术的动态约简技术,获得稳定的、泛化能力较强的属性约简集;然后,基于不同约简设计BP网络作为待集成的基分类器,并依据选择性集成思想,通过一定的搜索策略,找到具有最佳泛化性能的集成网络;最后通过多数投票法实现神经网络集成分类。该方法在某地区Landsat 7波段遥感图像的分类实验中得到了验证,由于通过粗糙集约简,过滤掉了大量分类性能欠佳的特征子集,和传统的集成特征选择方法相比,该方法时  相似文献   

6.
为降低集成特征选择方法的计算复杂性,提出了一种基于粗糙集约简的神经网络集成分类方法。该方法首先通过结合遗传算法求约简和重采样技术的动态约简技术,获得稳定的、泛化能力较强的属性约简集;然后,基于不同约简设计BP网络作为待集成的基分类器,并依据选择性集成思想,通过一定的搜索策略,找到具有最佳泛化性能的集成网络;最后通过多数投票法实现神经网络集成分类。该方法在某地区Landsat 7波段遥感图像的分类实验中得到了验证,由于通过粗糙集约简,过滤掉了大量分类性能欠佳的特征子集,和传统的集成特征选择方法相比,该方法时间开销少,计算复杂性低,具有满意的分类性能。  相似文献   

7.
不平衡数据分类是机器学习领域的重要研究方向之一,现有不平衡学习算法大多针对二分类而无法满足多分类需求。本文面向多类不平衡数据分类问题,通过结合粗糙集、重采样方法以及动态集成分类策略设计了一种新的多分类模型。该模型运用综合采样方式和粗糙集属性约简技术获得多个平衡数据子集,在此基础上实现动态集成分类模型的构建。真实数据集上的22组实验验证了该模型与两种经典算法相比对少数类样本具有更好的预测性能,可成为多类不平衡数据分类的可选策略。  相似文献   

8.
朴素贝叶斯分类器是一种简单而高效的分类器,但是其属性独立性假设限制了对实际数据的应用。提出一种新的算法,该算法为避免数据预处理时,训练集的噪声及数据规模使属性约简的效果不太理想,并进而影响分类效果,在训练集上通过随机属性选取生成若干属性子集,并以这些子集构建相应的贝叶斯分类器,进而采用遗传算法进行优选。实验表明,与传统的朴素贝叶斯方法相比,该方法具有更好的分类精度。  相似文献   

9.
郑芸芸  王萍  游强华 《福建电脑》2013,(10):99-100,134
朴素贝叶斯分类器是建立在一个指定类别中各属性的取值是相互独立的上的,但在实际运用过程中独立性假设经常是不存在的.而粗糙集模型提供了属性离散化和约简技术,能改善属性间的依赖关系,得到相互独立的核心属性.因此,将两种不同的计算方法想结合,利用粗糙集先对数据进行约简,然后在利用朴素贝叶斯分类器,得到分类结果.实验证明这种方法改善了朴素贝叶斯分类器.  相似文献   

10.
属性约简是粗糙集理论的重要研究内容.目前决策粗糙集的属性约简大多基于全局的决策类,并且都是采用单一的约简准则.针对这一问题,在决策粗糙集下提出一种特定类别属性约简算法.针对特定的决策类,给出一种属性约简的定义,在保证决策区域极大化的同时尽可能地降低决策区域划分时的代价;利用集成学习的方法设计出相应的启发式属性约简算法.通过在UCI数据集上与已有的算法进行实验比较,验证了该算法具有更高的属性约简性能.  相似文献   

11.
One of the most widely used approaches to the class-imbalanced issue is ensemble learning. The base classifier is trained using an unbalanced training set in the conventional ensemble learning approach. We are unable to select the best suitable resampling method or base classifier for the training set, despite the fact that researchers have examined employing resampling strategies to balance the training set. A multi-armed bandit heterogeneous ensemble framework was developed as a solution to these issues. This framework employs the multi-armed bandit technique to pick the best base classifier and resampling techniques to build a heterogeneous ensemble model. To obtain training sets, we first employ the bagging technique. Then, we use the instances from the out-of-bag set as the validation set. In general, we consider the basic classifier combination with the highest validation set score to be the best model on the bagging subset and add it to the pool of model. The classification performance of the multi-armed bandit heterogeneous ensemble model is then assessed using 30 real-world imbalanced data sets that were gathered from UCI, KEEL, and HDDT. The experimental results demonstrate that, under the two assessment metrics of AUC and Kappa, the proposed heterogeneous ensemble model performs competitively with other nine state-of-the-art ensemble learning methods. At the same time, the findings of the experiment are confirmed by the statistical findings of the Friedman test and Holm's post-hoc test.  相似文献   

12.
NB方法条件独立性假设和BAN方法小训练集难以建模。为此,提出一种基于贝叶斯学习的集成流量分类方法。构造单独的NB和BAN分类器,在此基础上利用验证集得到各分类器的权重,通过加权平均组合各分类器的输出,实现网络流量分类。以Moore数据集为实验数据,并与NB方法和BAN方法相比较,结果表明,该方法具有更高的分类准确率和稳定性。  相似文献   

13.
A classifier ensemble combines a set of individual classifier’s predictions to produce more accurate results than that of any single classifier system. However, one classifier ensemble with too many classifiers may consume a large amount of computational time. This paper proposes a new ensemble subset evaluation method that integrates classifier diversity measures into a novel classifier ensemble reduction framework. The framework converts the ensemble reduction into an optimization problem and uses the harmony search algorithm to find the optimized classifier ensemble. Both pairwise and non-pairwise diversity measure algorithms are applied by the subset evaluation method. For the pairwise diversity measure, three conventional diversity algorithms and one new diversity measure method are used to calculate the diversity’s merits. For the non-pairwise diversity measure, three classical algorithms are used. The proposed subset evaluation methods are demonstrated by the experimental data. In comparison with other classifier ensemble methods, the method implemented by the measurement of the interrater agreement exhibits a high accuracy prediction rate against the current ensembles’ performance. In addition, the framework with the new diversity measure achieves relatively good performance with less computational time.  相似文献   

14.
Training neural networks in distinguishing different emotions from physiological signals frequently involves fuzzy definitions of each affective state. In addition, manual design of classification tasks often uses sub-optimum classifier parameter settings, leading to average classification performance. In this study, an attempt to create a framework for multi-layered optimization of an ensemble of classifiers to maximize the system's ability to learn and classify affect, and to minimize human involvement in setting optimum parameters for the classification system is proposed. Using fuzzy adaptive resonance theory mapping (ARTMAP) as the classifier template, genetic algorithms (GAs) were employed to perform exhaustive search for the best combination of parameter settings for individual classifier performance. Speciation was implemented using subset selection of classification data attributes, as well as using an island model genetic algorithms method. Subsequently, the generated population of optimum classifier configurations was used as candidates to form an ensemble of classifiers. Another set of GAs were used to search for the combination of classifiers that would result in the best classification ensemble accuracy. The proposed methodology was tested using two affective data sets and was able to produce relatively small ensembles of fuzzy ARTMAPs with excellent affect recognition accuracy.  相似文献   

15.
章少平  梁雪春 《计算机应用》2015,35(5):1306-1309
传统的分类算法大都建立在平衡数据集的基础上,当样本数据不平衡时,这些学习算法的性能往往会明显下降.对于非平衡数据分类问题,提出了一种优化的支持向量机(SVM)集成分类器模型,采用KSMOTE和Bootstrap对非平衡数据进行预处理,生成相应的SVM模型并用复合形算法优化模型参数,最后利用优化的参数并行生成SVM集成分类器模型,采用投票机制得到分类结果.对5组UCI标准数据集进行实验,结果表明采用优化的SVM集成分类器模型较SVM模型、优化的SVM模型等分类精度有了明显的提升,同时验证了不同的bootNum取值对分类器性能效果的影响.  相似文献   

16.
针对传统模型在解决不平衡数据分类问题时存在精度低、稳定性差、泛化能力弱等问题,提出基于序贯三支决策多粒度集成分类算法MGE-S3WD。采用二元关系实现粒层动态划分;根据代价矩阵计算阈值并构建多层次粒结构,将各粒层数据划分为正域、边界域和负域;将各粒层上的划分,按照正域与负域、正域与边界域、负域与边界域重新组合形成新的数据子集,并在各数据子集上构建基分类器,实现不平衡数据的集成分类。仿真结果表明,该算法能够有效降低数据子集的不平衡比,提升集成学习中基分类器的差异性,在G-mean和F-measure1 2个评价指标下,分类性能优于或部分优于其他集成分类算法,有效提高了分类模型的分类精度和稳定性,为不平衡数据集的集成学习提供了新的研究思路。  相似文献   

17.
针对传统单个分类器在不平衡数据上分类效果有限的问题,基于对抗生成网络(GAN)和集成学习方法,提出一种新的针对二类不平衡数据集的分类方法——对抗生成网络-自适应增强-决策树(GAN-AdaBoost-DT)算法。首先,利用GAN训练得到生成模型,生成模型生成少数类样本,降低数据的不平衡性;其次,将生成的少数类样本代入自适应增强(AdaBoost)模型框架,更改权重,改进AdaBoost模型,提升以决策树(DT)为基分类器的AdaBoost模型的分类性能。使用受测者工作特征曲线下面积(AUC)作为分类评价指标,在信用卡诈骗数据集上的实验分析表明,该算法与合成少数类样本集成学习相比,准确率提高了4.5%,受测者工作特征曲线下面积提高了6.5%;对比改进的合成少数类样本集成学习,准确率提高了4.9%,AUC值提高了5.9%;对比随机欠采样集成学习,准确率提高了4.5%,受测者工作特征曲线下面积提高了5.4%。在UCI和KEEL的其他数据集上的实验结果表明,该算法在不平衡二分类问题上能提高总体的准确率,优化分类器性能。  相似文献   

18.
We present attribute bagging (AB), a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classifiers are built. The induced classifiers are then used for voting. This article compares the performance of our AB method with bagging and other algorithms on a hand-pose recognition dataset. It is shown that AB gives consistently better results than bagging, both in accuracy and stability. The performance of ensemble voting in bagging and the AB method as a function of the attribute subset size and the number of voters for both weighted and unweighted voting is tested and discussed. We also demonstrate that ranking the attribute subsets by their classification accuracy and voting using only the best subsets further improves the resulting performance of the ensemble.  相似文献   

19.
A detailed comparison and assessment of the performance of features extracted from space-borne interferometric SAR data and classified with different types of classifiers is presented. Multi-seasonal ERS-1 and ERS-2 SAR data of the Czech Republic is used to automatically classify into four different land-cover classes. An exhaustive search in the space of all possible feature subsets out of an overall number of 14 features taken from local statistics, fractal analysis and co-occurrence matrices is presented. The evaluation of the subset performance is compared using the Jeffreys-Matusita distance in the feature space and classification performance measured on a validation set independent from the classifier's training set. Classifiers investigated are maximum-likelihood, fuzzy ARTMAP and multilayer perceptron. The exhaustive search shows the importance and irrelevance of individual features depending on the classifier used. Furthermore, the size of the best subsets ranges from three to six features only, thus decreasing overall computation time. The classifier performance is assessed by measuring overall accuracy and tau statistics. The overall classification accuracy of 88.8% for the maximum-likelihood method and 91.35% for the multilayer perceptron on the validation set is further improved to 90.9% by use of a simple Bayesian context classifier which operates on class likelihoods or to 93.2% by operating on multilayer perceptron outputs.  相似文献   

20.
In this paper, we introduce a new adaptive rule-based classifier for multi-class classification of biological data, where several problems of classifying biological data are addressed: overfitting, noisy instances and class-imbalance data. It is well known that rules are interesting way for representing data in a human interpretable way. The proposed rule-based classifier combines the random subspace and boosting approaches with ensemble of decision trees to construct a set of classification rules without involving global optimisation. The classifier considers random subspace approach to avoid overfitting, boosting approach for classifying noisy instances and ensemble of decision trees to deal with class-imbalance problem. The classifier uses two popular classification techniques: decision tree and k-nearest-neighbor algorithms. Decision trees are used for evolving classification rules from the training data, while k-nearest-neighbor is used for analysing the misclassified instances and removing vagueness between the contradictory rules. It considers a series of k iterations to develop a set of classification rules from the training data and pays more attention to the misclassified instances in the next iteration by giving it a boosting flavour. This paper particularly focuses to come up with an optimal ensemble classifier that will help for improving the prediction accuracy of DNA variant identification and classification task. The performance of proposed classifier is tested with compared to well-approved existing machine learning and data mining algorithms on genomic data (148 Exome data sets) of Brugada syndrome and 10 real benchmark life sciences data sets from the UCI (University of California, Irvine) machine learning repository. The experimental results indicate that the proposed classifier has exemplary classification accuracy on different types of biological data. Overall, the proposed classifier offers good prediction accuracy to new DNA variants classification where noisy and misclassified variants are optimised to increase test performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号