首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
针对生物组学数据高维小样本的特点而引起的分类误差较大的问题,提出了一种带约束小生境二进制粒子群优化的集成特征选择方法。该方法利用二进制粒子群优化算法搜索分类准确率最高的特征子集,通过约束粒子编码的置位个数以限制选择特征个数,并加入多模优化中的小生境技术使算法能够一次获得多个差异度较大的特征子集,最后采用集成学习技术将基于多特征子集建立的基分类器集成为强分类器并对数据进行分类学习。实验结果表明,该特征选择方法在生物组学数据上能够稳定选择较少特征并获得较好分类性能。   相似文献   

2.
基于随机化属性选择和邻域覆盖约简的集成学习   总被引:2,自引:0,他引:2       下载免费PDF全文
 提高分类模型的分类精度和可靠性是分类建模追求的目标.针对目前规则学习方法应用于分类时稳定性差以及分类精度低的问题,本文通过随机化邻域属性约简,搜索一组分类精度较高的属性子集,在不同的属性子集上采用邻域覆盖约简方法学习分类规则,得到多个规则集.最后通过简单投票融合不同规则集上的分类结果获得对象的类别.实验表明,基于随机化邻域约简的集成学习方法分类性能优于或与其它相关的分类器相当,并且在噪声扰动下具有更强的鲁棒性.  相似文献   

3.
Conventional hyperspectral image-based automatic target recognition (ATR) systems project high-dimensional reflectance signatures onto a lower dimensional subspace using techniques such as principal components analysis (PCA), Fisher's linear discriminant analysis (LDA), and stepwise LDA. Typically, these feature space projections are suboptimal. In a typical hyperspectral ATR setup, the number of training signatures (ground truth) is often less than the dimensionality of the signatures. Standard dimensionality reduction tools such as LDA and PCA cannot be applied in such situations. In this paper, we present a divide-and-conquer approach that addresses this problem for robust ATR. We partition the hyperspectral space into contiguous subspaces based on the optimization of a performance metric. We then make local classification decisions in every subspace using a multiclassifier system and employ a decision fusion system for making the final decision on the class label. In this work, we propose a metric that incorporates higher order statistical information for accurate partitioning of the hyperspectral space. We also propose an adaptive weight assignment method in the decision fusion process based on the strengths (as measured by the training accuracies) of individual classifiers that made the local decisions. The proposed methods are tested using hyperspectral data with known ground truth, such that the efficacy can be quantitatively measured in terms of target recognition accuracies. The proposed system was found to significantly outperform conventional approaches. For example, under moderate pixel mixing, the proposed approach resulted in classification accuracies around 90%, where traditional feature fusion resulted in accuracies around 65%.  相似文献   

4.
基于随机子空间和AdaBoost的自适应集成方法   总被引:4,自引:0,他引:4  
如何构造差异性大且精确度高的基分类器是集成学习的重点,为此提出一种新的集成学习方法——利用PSO寻找使得AdaBoost依样本权重抽取的数据集分类错误率最小化的最优特征权重分布,依据此最优权重分布对特征随机抽样生成随机子空间,并应用于AdaBoost的训练过程中.这就在增加分类器间差异性的同时保证了基分类器的准确度.最后用多数投票法融合各基分类器的决策结果,并通过仿真实验验证该方法的有效性.  相似文献   

5.
Jie WANG  Lili YANG  Min YANG 《通信学报》2018,39(10):155-165
A malicious network traffic detection method based on multi-level distributed ensemble classifier was proposed for the problem that the attack model was not trained accurately due to the lack of some samples of attack steps for detecting attack in the current network big data environment,as well as the deficiency of the existing ensemble classifier in the construction of multilevel classifier.The dataset was first preprocessed and aggregated into different clusters,then noise processing on each cluster was performed,and then a multi-level distributed ensemble classifier,MLDE,was built to detect network malicious traffic.In the MLDE ensemble framework the base classifier was used at the bottom,while the non-bottom different ensemble classifiers were used.The framework was simple to be built.In the framework,big data sets were concurrently processed,and the size of ensemble classifier was adjusted according to the size of data sets.The experimental results show that the AUC value can reach 0.999 when MLDE base users random forest was used in the first layer,bagging was used in the second layer and AdaBoost classifier was used in the third layer.  相似文献   

6.
A system for a regular updating of land-cover maps is proposed that is based on the use of multitemporal remote sensing images. Such a system is able to address the updating problem under the realistic but critical constraint that, for the image to be classified (i.e., the most recent of the considered multitemporal dataset) no ground truth information is available. The system is composed of an ensemble of partially unsupervised classifiers integrated in a multiple-classifier architecture. Each classifier of the ensemble exhibits the following novel characteristics: (1) it is developed in the framework of the cascade-classification approach to exploit the temporal correlation existing between images acquired at different times in the considered area; and (2) it is based on a partially unsupervised methodology capable of accomplishing the classification process under the aforementioned critical constraint. Both a parametric maximum-likelihood (ML) classification approach and a nonparametric radial basis function (RBF) neural-network classification approach are used as basic methods for the development of partially unsupervised cascade classifiers. In addition, in order to generate an effective ensemble of classification algorithms, hybrid ML and RBF neural-network cascade classifiers are defined by exploiting the characteristics of the cascade-classification methodology. The results yielded by the different classifiers are combined by using standard unsupervised combination strategies. This allows the definition of a robust and accurate partially unsupervised classification system capable of analyzing a wide typology of remote sensing data (e.g., images acquired by passive sensors, synthetic aperture radar images, and multisensor and multisource data). Experimental results obtained on a real multitemporal and multisource dataset confirm the effectiveness of the proposed system.  相似文献   

7.
王雪松  高阳  程玉虎 《电子学报》2011,39(8):1746-1750
针对高维数、小样本数据分类问题,提出一种基于随机子空间-正交局部保持投影的支持向量机.利用随机子空间方法对原始高维样本的特征空间进行多次随机采样,生成多个具有不同特征子集的基支持向量机(SVM)分类器;利用正交局部保持投影对各基SVM分类器的样本进行特征提取,实现维数约简;然后,利用降维后的样本对各基SVM分类器进行训...  相似文献   

8.
In this paper we propose a strategy to create ensemble of classifiers based on unsupervised features selection. It takes into account a hierarchical multi-objective genetic algorithm that generates a set of classifiers by performing feature selection and then combines them to provide a set of powerful ensembles. The proposed method is evaluated in the context of handwritten month word recognition, using three different feature sets and Hidden Markov Models as classifiers. Comprehensive experiments demonstrate the effectiveness of the proposed strategy.  相似文献   

9.
脱婷  马慧芳  李志欣  赵卫中 《电子学报》2000,48(11):2131-2137
针对短文本特征稀疏性问题,提出一种熵权约束稀疏表示的短文本分类方法.考虑到初始字典维数较高,首先,利用Word2vec工具将字典中的词表示成词向量形式,然后根据加权向量平均值对原始字典进行降维.其次,利用一种快速特征子集选择算法去除字典中不相关和冗余短文本,得到过滤后的字典.再次,基于稀疏表示理论在过滤后的字典上,为目标函数设计一种熵权约束的稀疏表示方法,引入拉格朗日乘数法求得目标函数的最优值,从而得到每个类的子空间.最后,在学习到的子空间下通过计算待分类短文本与每个类中短文本的距离,并根据三种分类规则对短文本进行分类.在真实数据集上的大量实验结果表明,本文提出的方法能够有效缓解短文本特征稀疏问题且优于现有短文本分类方法.  相似文献   

10.
多维贝叶斯分类器是处理多维分类问题的概率图形模型,其中属性变量可决定一个或多个类变量。文中针对属性变量维数较高和信息冗余问题,采用Fast ICA算法对属性变量进行降维,从而将高维属性变量约减为能较完整描述数据信息的低维属性变量。然后根据约减后的属性变量构建多维贝叶斯分类器;最终,通过理论分析得到基于ICA的多维贝叶斯分类器的性能较好。实验结果表明,对3组基准数据集的分类,基于ICA的多维贝叶斯分类器相比于其他算法具有较高的分类准确率。  相似文献   

11.
The explosion of DNA and protein sequence data in public and private databases has been encouraging interdisciplinary research on biology and information technology. Gene expression profiles are just sequences of numbers, and the necessity of tools analyzing them to get useful information has risen significantly. In order to predict the cancer class of patients from the gene expression profile, this paper presents a classification framework that combines a pair of classifiers trained with mutually exclusive features. The idea behind feature selection with nonoverlapping correlation is to encourage classifier ensemble, which consists of multiple classifiers, to learn different aspects of training data, so that classifiers can search in a wide solution space. Experimental results show that the classifier ensemble produces higher recognition accuracy than conventional classifiers.  相似文献   

12.
深度学习技术的应用给SAR图像目标识别带来了大幅度的性能提升,但其对实际应用中车辆目标局部部件的变化适应能力仍有待加强。利用数据内在先验知识,在高维语义特征中学习其内在的低维子空间结构,可以提升分类模型在车辆目标变体条件下的泛化性能。本文基于目标特征的稀疏性,提出了一种稀疏先验引导卷积神经网络(Convolution Neural Network,CNN)学习的SAR目标识别方法(CNN-TDDL)。首先,该方法利用CNN提取SAR图像目标的高维语义特征。其次,通过稀疏先验引导模块,利用特征稀疏性,对目标特征内在的低维子空间结构进行学习。分类任务驱动的字典学习层(Task-Driven Dictionary Learning,TDDL)将目标特征的低维子空间以稀疏编码的形式表示,再利用非负弹性正则网增强了稀疏编码的稳定性,使稀疏编码不仅有效地表征目标的低维子空间结构,并且能够提取更具判别性的类别特征。基于运动和静止目标获取与识别(Moving and Stationary Target Acquisition and Recognition,MSTAR)数据集以及仿真和实测配对和标记实验 (Synthetic and Measured Paired and Labeled Experiment,SAMPLE) 数据集的实验表明,相比于传统字典学习方法和典型深度学习方法,CNN-TDDL在MSTAR标准操作条件(Standard Operating Conditions, SOC)下识别精度提升0.85%~5.28%,型号识别精度提升3.97%以上,表现出更好的泛化性能。特征可视化分析表明稀疏先验引导模块显著提升了异类目标特征表示的可分性。   相似文献   

13.
Statistical classification of byperspectral data is challenging because the inputs are high in dimension and represent multiple classes that are sometimes quite mixed, while the amount and quality of ground truth in the form of labeled data is typically limited. The resulting classifiers are often unstable and have poor generalization. This work investigates two approaches based on the concept of random forests of classifiers implemented within a binary hierarchical multiclassifier system, with the goal of achieving improved generalization of the classifier in analysis of hyperspectral data, particularly when the quantity of training data is limited. A new classifier is proposed that incorporates bagging of training samples and adaptive random subspace feature selection within a binary hierarchical classifier (BHC), such that the number of features that is selected at each node of the tree is dependent on the quantity of associated training data. Results are compared to a random forest implementation based on the framework of classification and regression trees. For both methods, classification results obtained from experiments on data acquired by the National Aeronautics and Space Administration (NASA) Airborne Visible/Infrared Imaging Spectrometer instrument over the Kennedy Space Center, Florida, and by Hyperion on the NASA Earth Observing 1 satellite over the Okavango Delta of Botswana are superior to those from the original best basis BHC algorithm and a random subspace extension of the BHC.  相似文献   

14.
In this paper, an empirical study of the development and application of a committee of neural networks on online pattern classification tasks is presented. A multiple classifier framework is designed by adopting an Adaptive Resonance Theory-based (ART) autonomously learning neural network as the building block. A number of algorithms for combining outputs from multiple neural classifiers are considered, and two benchmark data sets have been used to evaluate the applicability of the proposed system. Different learning strategies coupling offline and online learning approaches, as well as different input pattern representation schemes, including the "ensemble" and "modular" methods, have been examined experimentally. Benefits and shortcomings of each approach are systematically analyzed and discussed. The results are comparable, and in some cases superior, with those from other classification algorithms. The experiments demonstrate the potentials of the proposed multiple neural network systems in offering an alternative to handle online pattern classification tasks in possibly nonstationary environments.  相似文献   

15.
In remotely sensed hyperspectral imagery, many samples are collected on a given flight and many variable factors contribute to the distribution of samples. Various factors transform spectral responses causing them to appear differently in different contexts. We develop a method that infers context via spectra population distribution analysis. In this manner, feature space orientations of sets of spectral signatures are characterized using random set models. The models allow for the characterization of complex and irregular patterns in a feature space. The developed random set framework for context-based classification applies context-specific classifiers in an ensemblelike manner, and aggregates their decisions based on their contextual relevance to the spectra under test. Results indicate that the proposed method improves classification accuracy over similar classifiers, which make no use of contextual information, and performs well when compared to similar context-based approaches.  相似文献   

16.
This paper presents a distributed coevolutionary classifier (DCC) for extracting comprehensible rules in data mining. It allows different species to be evolved cooperatively and simultaneously, while the computational workload is shared among multiple computers over the Internet. Through the intercommunications among different species of rules and rule sets in a distributed manner, the concurrent processing and computational speed of the coevolutionary classifiers are enhanced. The advantage and performance of the proposed DCC are validated upon various datasets obtained from the UCI machine learning repository. It is shown that the predicting accuracy of DCC is robust and the computation time is reduced as the number of remote engines increases. Comparison results illustrate that the DCC produces good classification rules for the datasets, which are competitive as compared to existing classifiers in literature.  相似文献   

17.
徐森  周天  于化龙  李先锋 《电子学报》2013,41(6):1219-1224
 首先将聚类集成问题归结为直观的最佳子空间的求解问题;随后根据线性代数理论将该问题描述为带约束条件的优化问题,通过放松离散约束条件进一步约简为矩阵低秩近似问题;最后通过求解超图的加权邻接矩阵的奇异值分解问题获得最佳子空间的一组标准正交基.据此,设计了一个基于矩阵低秩近似的算法,该算法根据每个对象在低维空间下的坐标使用K均值算法进行聚类,从而得到最终的结果.在多组基准数据集上的实验结果表明:较之于传统的聚类集成算法,本文的算法获得了更好的聚类结果,且效率较高.  相似文献   

18.
张维  杜兰 《电子与信息学报》2021,43(5):1219-1227
一类分类是一种将目标类样本和其他所有的非目标类样本区分开的分类方法。传统的一类分类方法针对所有训练样本建立一个分类器,忽视了数据的内在结构,在样本分布复杂时,其分类性能会严重下降。为了提升复杂分布情况下的分类性能,该文提出一种集成式Beta过程最大间隔一类方法。该方法利用Dirichlet过程混合模型(DPM)对训练样本聚类,同时在每一个聚类学习一个Beta过程最大间隔一类分类器。通过多个分类器的集成,可以构造出一个描述能力更强的分类器,提升复杂分布下的分类效果。DPM聚类模型和Beta过程最大间隔一类分类器在同一个贝叶斯框架下联合优化,保证了每一个聚类样本的可分性。此外,在Beta过程最大间隔一类分类器中,加入了服从Beta过程先验分布的特征选择因子,从而可以降低特征冗余度以及提升分类效果。基于仿真数据、公共数据集和实测SAR图像数据的实验结果证明了所提方法的有效性。  相似文献   

19.
分类器组合技术可以提高模式识别的性能,受到了模式识别领域研究人员的广泛关注。实现成员分类器的多样性是提高分类器组合泛化能力主要手段。本文从成员分类器的生成介绍了实现成员分类器多样性的各种方法,同时介绍了度量成员分类器多样性的各种技术,并提出了一种如何训练多样性成员分类器的技术思路。  相似文献   

20.
针对传统集成学习方法直接应用于单类分类器效果不理想的问题,该文首先证明了集成学习方法能够提升单类分类器的性能,同时证明了若基分类器集不经选择会导致集成后性能下降;接着指出了经典集成方法直接应用于单类分类器集成时存在基分类器多样性严重不足的问题,并提出了一种能够提高多样性的基单类分类器混合生成策略;最后从集成损失构成的角度拆分集成单类分类器的损失函数,针对性地构造了集成单类分类器修剪策略并提出一种基于混合多样性生成和修剪的单类分类器集成算法,简称为PHD-EOC。在UCI标准数据集和恶意程序行为检测数据集上的实验结果表明,PHD-EOC算法兼顾多样性与单类分类性能,在各种单类分类器评价指标上均较经典集成学习方法有更好的表现,并降低了决策阶段的时间复杂度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号