首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
针对不同轴承数据特征选择困难和单个分类器方法在滚动轴承故障诊断中精度较低的问题,提出了一种基于分类回归树(CART)的随机森林滚动轴承故障诊断算法。随机森林是包含了多种分类器的集成学习方法。通过随机森林的“集成”思想来提高滚动轴承故障诊断的精度。从滚动轴承的振动信号中提取时域统计指标,将其作为特征向量,利用随机森林(Random Forest)对滚动轴承故障进行诊断。利用SQI-MFS实验平台的轴承数据,与传统分类器(SVM、kNN和ANN)以及单个分类回归树的诊断结果相比,随机森林算法具有比较高的诊断精度。  相似文献   

2.
传统集成分类算法中,一般将集成数目设置为固定值,这可能会导致较低分类准确率。针对这一问题,提出了准确率爬坡集成分类算法(C-ECA)。首先,该算法不再用一些基分类器去替换相同数量的表现最差的基分类器,而是基于准确率对基分类器进行更新,然后确定最佳集成数目。其次,在C-ECA的基础上提出了基于爬坡的动态加权集成分类算法(C-DWECA)。该算法提出了一个加权函数,其在具有不同特征的数据流上训练基分类器时,可以获得基分类器的最佳权值,从而提升集成分类器的性能。最后,为了能更早地检测到概念漂移并提高最终精度,采用了快速霍夫丁漂移检测方法(FHDDM)。实验结果表明C-DWECA的准确率最高可达到97.44%,并且该算法的平均准确率比自适应多样性的在线增强(ADOB)算法提升了40%左右,也优于杠杆装袋(LevBag)、自适应随机森林(ARF)等其他对比算法。  相似文献   

3.
针对肺结节病灶数据具有多样性及异质性特点,提出了动态多分类器选择集成算法(Dynamic Multiple Classifiers Selection,DMCS),将特征空间随机划分为若干特征子集,针对每个特征子集样本分布不同,对不同的特征子集选择适合的基分类器,最后进行集成学习。实验表明,该算法比目前有代表性的肺结节检测病灶分类算法具有更好的稳定性和检测性能。  相似文献   

4.
基于改进离散二进制粒子群的SVM选择集成算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对基于离散二进制粒子群(BPSO)的SVM选择集成算法的分类精度不高,以及所选分类器个数过多等问题,利用改进的离散二进制粒子群算法(IBPSO)和SVM选择集成算法相结合,提出基于IBPSO的SVM选择集成算法。通过选用合适的适应度函数以及调节因子[k],进行多次仿真,实验表明,对由boostrap方式生成的SVM集合,基于IBPSO的SVM选择集成在精度和分类器个数方面均优于基于BPSO的SVM选择集成,证明了IBPSO算法的优越性。  相似文献   

5.
基于Google Earth Engine(GEE)云计算平台,协同Sentinel-2影像、WordClim生物气候数据、SRTM地形数据、森林资源二类调查数据等数据,以随机森林(Random Forest,RF),支持向量机(Support Vector Machine,SVM)和最大熵(Maximum Entropy,MaxEnt)3种机器学习算法为组件分类器,开展多源特征、多分类器决策融合的优势树种分类研究。通过3种组件分类器分别构建了两种串行集成和3种贝叶斯并行集成模型,用于确定云南香格里拉地区10种主要优势树种的空间分布。分类结果显示:3个组件分类器的总体精度均低于67.17%;3种并行集成方法总体精度相当,约为72%;两种串行集成方法精度高于78.48%,其中MaxEnt-SVM串行集成方法获得最佳精度(OA:80.66%,Kappa:0.78),与组件分类器相比精度至少提高了13.49%。研究表明:决策融合方法在优势树种分类中比组件分类器精度更高,并且有效改善了小样本树种的分类精度,可用于大范围山区优势树种分类。  相似文献   

6.
针对传统CHI算法忽略特征词的词频易导致重要特征词被漏选的问题,结合特征选择时Filter类算法速度快、Wrapper类算法准确率高的特点,提出一种将改进CHI(TDF-CHI)算法与随机森林特征选择(RFFS)相结合的特征选择算法。先利用TDF-CHI算法计算特征词的文档频率及词频与类别的相关程度来进行特征选择,去除冗余特征;再通过RFFS算法度量剩余特征的重要性,进行二次特征选择,优化特征集合,使分类器的性能进一步提升。为了验证改进算法的优越性,利用新闻文本数据,在常用的分类器上进行测试。实验表明,改进算法相比传统CHI算法所选特征词具有更好的分类效果,提高了分类器的准确率和召回率。  相似文献   

7.
为提高决策树的集成分类精度,介绍了一种基于特征变换的旋转森林分类器集成算法,通过对数据属性集的随机分割,并在属性子集上对抽取的子样本数据进行主成分分析,以构造新的样本数据,达到增大基分类器差异性及提高预测准确率的目的。在Weka平台下,分别采用Bagging、AdaBoost及旋转森林算法对剪枝与未剪枝的J48决策树分类算法进行集成的对比试验,以10次10折交叉验证的平均准确率为比较依据。结果表明旋转森林算法的预测精度优于其他两个算法,验证了旋转森林是一种有效的决策树分类器集成算法。  相似文献   

8.
针对垃圾网页检测过程中轻微的不平衡分类问题,提出三种随机欠采样集成分类器算法,分别为一次不放回随机欠采样(RUS-once)、多次不放回随机欠采样(RUS-multiple)和有放回随机欠采样(RUS-replacement)算法。首先使用其中一种随机欠采样技术将训练样本集转换成平衡样本集,然后对每个平衡样本集使用分类回归树(CART)分类器算法进行分类,最后采用简单投票法构建集成分类器对测试样本进行分类。实验表明,三种随机欠采样集成分类器均取得了良好的分类效果,其中RUS-multiple和RUS-replacement比RUS-once的分类效果更好。与CART及其Bagging和Adaboost集成分类器相比,在WEBSPAM UK-2006数据集上,RUS-multiple和RUS-replacement方法的AUC指标值提高了10%左右,在WEBSPAM UK-2007数据集上,提高了25%左右;与其他最优研究结果相比,RUS-multiple和RUS-replacement方法在AUC指标上能达到最优分类结果。  相似文献   

9.
为探索机载点云与无人机可见光影像在乔木树种识别与分类领域的应用潜力,提出了一种多模态特征与决策混合融合的无人机单木尺度树种分类识别方法。首先使用Kendall Rank相关系数法与排列重要性分析(Permutation Importance, PI)进行特征选择,采用高效低秩多模态融合算法(Low-rank Multimodal Fusion, LMF)融合点云与影像特征。再引入集成学习,将点云、影像及融合特征分别输入Stacking集成的极限梯度提升机(eXtreme Gradient Boosting, XGBoost)、轻型梯度提升机(Light Gradient Boosting Machine, LightGBM)与随机森林(Random Forest, RF)3个基分类器,最后采用元分类器—朴素贝叶斯进行决策融合。实验数据表明:所提方法独立测试精度达99.4%,较传统的特征串联融合随机森林分类器提升了22.58%,Kappa系数提升了0.285 4。与卷积神经网络(Convolutional Neural Network, CNN)对比实验证明:所提算法在小样本训练的优势明...  相似文献   

10.
李剑  江成顺  董丽英 《计算机工程》2010,36(13):180-182
提出基于选择性集成支持向量机的语音、话带数据信号分类方法,根据集成算法的差异性定义,采用两层级联结构的动态叠加算法完成决策输出。该方法能够在训练阶段准确地选择具有较高识别精度和差异性的成员分类器,在测试阶段对各成员分类器进行动态集成,保证最终的分类结果最优。构建时域、频域相结合的特征向量,并具有较好的抗噪声能力。实验结果表明,该方法无论在分类还是在运算复杂度上都取得较好的效果。  相似文献   

11.
Fully polarimetric synthetic aperture radar (PolSAR) Earth Observations showed great potential for mapping and monitoring agro-environmental systems. Numerous polarimetric features can be extracted from these complex observations which may lead to improve accuracy of land-cover classification and object characterization. This article employed two well-known decision tree ensembles, i.e. bagged tree (BT) and random forest (RF), for land-cover mapping from PolSAR imagery. Moreover, two fast modified decision tree ensembles were proposed in this article, namely balanced filter-based forest (BFF) and cost-sensitive filter-based forest (CFF). These algorithms, designed based on the idea of RF, use a fast filter feature selection algorithms and two extended majority voting. They are also able to embed some solutions of imbalanced data problem into their structures. Three different PolSAR datasets, with imbalanced data, were used for evaluating efficiency of the proposed algorithms. The results indicated that all the tree ensembles have higher efficiency and reliability than the individual DT. Moreover, both proposed tree ensembles obtained higher mean overall accuracy (0.5–14% higher), producer’s accuracy (0.5–10% higher), and user’s accuracy (0.5–9% higher) than the classical tree ensembles, i.e. BT and RF. They were also much faster (e.g. 2–10 times) and more stable than their competitors for classification of these three datasets. In addition, unlike BT and RF, which obtained higher accuracy in large ensembles (i.e. the high number of DT), BFF and CFF can also be more efficient and reliable in smaller ensembles. Furthermore, the extended majority voting techniques could outperform the classical majority voting for decision fusion.  相似文献   

12.
Ensemble methods aim at combining multiple learning machines to improve the efficacy in a learning task in terms of prediction accuracy, scalability, and other measures. These methods have been applied to evolutionary machine learning techniques including learning classifier systems (LCSs). In this article, we first propose a conceptual framework that allows us to appropriately categorize ensemble‐based methods for fair comparison and highlights the gaps in the corresponding literature. The framework is generic and consists of three sequential stages: a pre‐gate stage concerned with data preparation; the member stage to account for the types of learning machines used to build the ensemble; and a post‐gate stage concerned with the methods to combine ensemble output. A taxonomy of LCSs‐based ensembles is then presented using this framework. The article then focuses on comparing LCS ensembles that use feature selection in the pre‐gate stage. An evaluation methodology is proposed to systematically analyze the performance of these methods. Specifically, random feature sampling and rough set feature selection‐based LCS ensemble methods are compared. Experimental results show that the rough set‐based approach performs significantly better than the random subspace method in terms of classification accuracy in problems with high numbers of irrelevant features. The performance of the two approaches are comparable in problems with high numbers of redundant features.  相似文献   

13.
In general, the analysis of microarray data requires two steps: feature selection and classification. From a variety of feature selection methods and classifiers, it is difficult to find optimal ensembles composed of any feature-classifier pairs. This paper proposes a novel method based on the evolutionary algorithm (EA) to form sophisticated ensembles of features and classifiers that can be used to obtain high classification performance. In spite of the exponential number of possible ensembles of individual feature-classifier pairs, an EA can produce the best ensemble in a reasonable amount of time. The chromosome is encoded with real values to decide the weight for each feature-classifier pair in an ensemble. Experimental results with two well-known microarray datasets in terms of time and classification rate indicate that the proposed method produces ensembles that are superior to individual classifiers, as well as other ensembles optimized by random and greedy strategies.  相似文献   

14.
目的 基于卷积神经网络(CNN)在图块级上实现的随机脉冲噪声(RVIN)降噪算法在执行效率方面较经典的逐像素点开关型降噪算法有显著优势,但降噪效果如何取决于能否对降噪图像受噪声干扰程度(噪声比例值)进行准确估计。为此,提出一种基于多层感知网络的两阶段噪声比例预测算法,达到自适应调用CNN预训练降噪模型获得最佳去噪效果的目的。方法 首先,对大量无噪声图像添加不同噪声比例的RVIN噪声构成噪声图像集合;其次,基于视觉码本(visual codebook)采用软分配(soft-assignment)编码法提取并筛选若干能反映噪声图像受随机脉冲噪声干扰程度的特征值构成特征矢量;再次,将从噪声图像上提取的特征矢量及对应的噪声比例分别作为多层感知网络的输入和输出训练噪声比例预测模型,实现从特征矢量到噪声比例值的映射(预测);最后,采用粗精相结合的两阶段实现策略进一步提高RVIN噪声比例的预测准确性。结果 针对不同RVIN噪声比例的失真图像,从预测准确性、实际降噪效果和执行效率3个方面验证提出算法的性能和实用性。实验数据表明,本文算法在大多数噪声比例下的预测误差小于2%,降噪效果(PSNR指标)较其他主流降噪算法高24 dB,处理一幅大小为512×512像素的图像仅需3 s左右。结论 本文提出的RVIN噪声比例预测算法在各个噪声比例下具有鲁棒的预测准确性,在降噪效果和执行效率两个方面较经典的开关型RVIN降噪算法有显著提升,更具实用价值。  相似文献   

15.
Label noise can be a major problem in classification tasks, since most machine learning algorithms rely on data labels in their inductive process. Thereupon, various techniques for label noise identification have been investigated in the literature. The bias of each technique defines how suitable it is for each dataset. Besides, while some techniques identify a large number of examples as noisy and have a high false positive rate, others are very restrictive and therefore not able to identify all noisy examples. This paper investigates how label noise detection can be improved by using an ensemble of noise filtering techniques. These filters, individual and ensembles, are experimentally compared. Another concern in this paper is the computational cost of ensembles, once, for a particular dataset, an individual technique can have the same predictive performance as an ensemble. In this case the individual technique should be preferred. To deal with this situation, this study also proposes the use of meta-learning to recommend, for a new dataset, the best filter. An extensive experimental evaluation of the use of individual filters, ensemble filters and meta-learning was performed using public datasets with imputed label noise. The results show that ensembles of noise filters can improve noise filtering performance and that a recommendation system based on meta-learning can successfully recommend the best filtering technique for new datasets. A case study using a real dataset from the ecological niche modeling domain is also presented and evaluated, with the results validated by an expert.  相似文献   

16.
In classification, noise may deteriorate the system performance and increase the complexity of the models built. In order to mitigate its consequences, several approaches have been proposed in the literature. Among them, noise filtering, which removes noisy examples from the training data, is one of the most used techniques. This paper proposes a new noise filtering method that combines several filtering strategies in order to increase the accuracy of the classification algorithms used after the filtering process. The filtering is based on the fusion of the predictions of several classifiers used to detect the presence of noise. We translate the idea behind multiple classifier systems, where the information gathered from different models is combined, to noise filtering. In this way, we consider the combination of classifiers instead of using only one to detect noise. Additionally, the proposed method follows an iterative noise filtering scheme that allows us to avoid the usage of detected noisy examples in each new iteration of the filtering process. Finally, we introduce a noisy score to control the filtering sensitivity, in such a way that the amount of noisy examples removed in each iteration can be adapted to the necessities of the practitioner. The first two strategies (use of multiple classifiers and iterative filtering) are used to improve the filtering accuracy, whereas the last one (the noisy score) controls the level of conservation of the filter removing potentially noisy examples. The validity of the proposed method is studied in an exhaustive experimental study. We compare the new filtering method against several state-of-the-art methods to deal with datasets with class noise and study their efficacy in three classifiers with different sensitivity to noise.  相似文献   

17.
Various methods for ensembles selection and classifier combination have been designed to optimize the performance of ensembles of classifiers. However, use of large number of features in training data can affect the classification performance of machine learning algorithms. The objective of this paper is to represent a novel feature elimination (FE) based ensembles learning method which is an extension to an existing machine learning environment. Here the standard 12 lead ECG signal recordings data have been used in order to diagnose arrhythmia by classifying it into normal and abnormal subjects. The advantage of the proposed approach is that it reduces the size of feature space by way of using various feature elimination methods. The decisions obtained from these methods have been coalesced to form a fused data. Thus the idea behind this work is to discover a reduced feature space so that a classifier built using this tiny data set would perform no worse than a classifier built from the original data set. Random subspace based ensembles classifier is used with PART tree as base classifier. The proposed approach has been implemented and evaluated on the UCI ECG signal data. Here, the classification performance has been evaluated using measures such as mean absolute error, root mean squared error, relative absolute error, F-measure, classification accuracy, receiver operating characteristics and area under curve. In this way, the proposed novel approach has provided an attractive performance in terms of overall classification accuracy of 91.11 % on unseen test data set. From this work, it is shown that this approach performs well on the ensembles size of 15 and 20.  相似文献   

18.
不平衡数据集上的Relief特征选择算法   总被引:1,自引:0,他引:1  
Relief算法为系列特征选择方法,包括最早提出的Relief算法和后来拓展的ReliefF算法,核心思想是对分类贡献大的特征赋予较大的权值;特点是算法简单,运行效率高,因此有着广泛的应用。但直接将Relief算法应用于有干扰的数据集或不平衡数据集,效果并不理想。基于Relief算法,提出一种干扰数据特征选择算法,称为阈值-Relief算法,有效消除了干扰数据对分类结果的影响。结合K-means算法,提出两种不平衡数据集特征选择算法,分别称为K-means-ReliefF算法和 K-means-Relief抽样算法,有效弥补了Relief算法在不平衡数据集上表现出的不足。实验证明了本文算法的有效性。  相似文献   

19.
The decision tree method has grown fast in the past two decades and its performance in classification is promising. The tree-based ensemble algorithms have been used to improve the performance of an individual tree. In this study, we compared four basic ensemble methods, that is, bagging tree, random forest, AdaBoost tree and AdaBoost random tree in terms of the tree size, ensemble size, band selection (BS), random feature selection, classification accuracy and efficiency in ecological zone classification in Clark County, Nevada, through multi-temporal multi-source remote-sensing data. Furthermore, two BS schemes based on feature importance of the bagging tree and AdaBoost tree were also considered and compared. We conclude that random forest or AdaBoost random tree can achieve accuracies at least as high as bagging tree or AdaBoost tree with higher efficiency; and although bagging tree and random forest can be more efficient, AdaBoost tree and AdaBoost random tree can provide a significantly higher accuracy. All ensemble methods provided significantly higher accuracies than the single decision tree. Finally, our results showed that the classification accuracy could increase dramatically by combining multi-temporal and multi-source data set.  相似文献   

20.
The Corona Virus Disease 2019 (COVID-19) has been declared a worldwide pandemic, and a key method for diagnosing COVID-19 is chest X-ray imaging. The application of convolutional neural network with medical imaging helps to diagnose the disease accurately, where the label quality plays an important role in the classification problem of COVID-19 chest X-rays. However, most of the existing classification methods ignore the problem that the labels are hardly completely true and effective, and noisy labels lead to a significant degradation in the performance of image classification frameworks. In addition, due to the wide distribution of lesions and the large number of local features of COVID-19 chest X-ray images, existing label recovery algorithms have to face the bottleneck problem of the difficult reuse of noisy samples. Therefore, this paper introduces a general classification framework for COVID-19 chest X-ray images with noisy labels and proposes a noisy label recovery algorithm based on subset label iterative propagation and replacement (SLIPR). Specifically, the proposed algorithm first obtains random subsets of the samples multiple times. Then, it integrates several techniques such as principal component analysis, low-rank representation, neighborhood graph regularization, and k-nearest neighbor for feature extraction and image classification. Finally, multi-level weight distribution and replacement are performed on the labels to cleanse the noise. In addition, for the label-recovered dataset, high confidence samples are further selected as the training set to improve the stability and accuracy of the classification framework without affecting its inherent performance. In this paper, three typical datasets are chosen to conduct extensive experiments and comparisons of existing algorithms under different metrics. Experimental results on three publicly available COVID-19 chest X-ray image datasets show that the proposed algorithm can effectively recover noisy labels and improve the accuracy of the image classification framework by 18.9% on the Tawsifur dataset, 19.92% on the Skytells dataset, and 16.72% on the CXRs dataset. Compared to the state-of-the-art algorithms, the gain of classification accuracy of SLIPR on the three datasets can reach 8.67%-19.38%, and the proposed algorithm also has certain scalability while ensuring data integrity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号