首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 37 毫秒
1.
In Brazil, the National Cancer Institute (INCA) reports more than 50,000 new cases of the disease, with risk of 51 cases per 100,000 women. Radiographic images obtained from mammography equipments are one of the most frequently used techniques for helping in early diagnosis. Due to factors related to cost and professional experience, in the last two decades computer systems to support detection (Computer-Aided Detection – CADe) and diagnosis (Computer-Aided Diagnosis – CADx) have been developed in order to assist experts in detection of abnormalities in their initial stages. Despite the large number of researches on CADe and CADx systems, there is still a need for improved computerized methods. Nowadays, there is a growing concern with the sensitivity and reliability of abnormalities diagnosis in both views of breast mammographic images, namely cranio-caudal (CC) and medio-lateral oblique (MLO). This paper presents a set of computational tools to aid segmentation and detection of mammograms that contained mass or masses in CC and MLO views. An artifact removal algorithm is first implemented followed by an image denoising and gray-level enhancement method based on wavelet transform and Wiener filter. Finally, a method for detection and segmentation of masses using multiple thresholding, wavelet transform and genetic algorithm is employed in mammograms which were randomly selected from the Digital Database for Screening Mammography (DDSM). The developed computer method was quantitatively evaluated using the area overlap metric (AOM). The mean ± standard deviation value of AOM for the proposed method was 79.2 ± 8%. The experiments demonstrate that the proposed method has a strong potential to be used as the basis for mammogram mass segmentation in CC and MLO views. Another important aspect is that the method overcomes the limitation of analyzing only CC and MLO views.  相似文献   

2.
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.  相似文献   

3.
This paper proposes a general local learning framework to effectively alleviate the complexities of classifier design by means of “divide and conquer” principle and ensemble method. The learning framework consists of a quantization layer which uses generalized learning vector quantization (GLVQ) and an ensemble layer which uses multi-layer perceptrons (MLP). The proposed method is tested on public handwritten character data sets, which obtains a promising performance consistently. In contrast to other methods, the proposed method is especially suitable for a large-scale real-world classification problems although it is easily scaled to a small training set while preserving a good performance.  相似文献   

4.
ABSTRACT

High-spatial and -temporal resolution snow cover products in mountain areas are important to hydrological applications. The GF-1 satellite provides multispectral images with 8-m resolution and a revisit up to 2 days, which makes it possible to produce snow cover products. However, it is challenging to extract snow cover from these images because of limited spectral bands, severe mountain shadows, and dataset-shift problem in multitemporal classification. To overcome the limitations above, this study proposes a multitemporal ensemble learning framework to extract snow cover from high-spatial-resolution images in mountain areas. The principle behind ensemble learning, i.e. learning from disagreement, is extended from single image classification to multitemporal ones. We assume that multitemporal training samples selected within time-invariant classes at the same locations can be different in feature space. Such disagreements are used in multitemporal ensemble learning to improve classification accuracy. To enhance both accuracy and diversity of the multiple classifiers trained on these samples, a joint feature selection method is suggested to select the optimal multitemporal feature space and a joint parameter optimization method is designed to ensemble classifiers trained for multitemporal images. The experiments show that the performances of multitemporal ensemble classifiers are superior to that of single classifiers, confirming the effectiveness of the proposed framework.  相似文献   

5.
对数据流分类分析的常用方法是集成学习。为了得到更好的分类效果,给出一种基于堆叠集成的数据流分类分析方法。该方法通过构造一个分类器对基分类器进行集成。实验结果表明,与基于投票或加权投票的集成方法相比,基于堆叠集成方法对概念漂移的快速适应能力以及预测准确率得到了提高。  相似文献   

6.
Disease diagnosis at early stages can enable the physicians to overcome the complications and treat them properly. The diagnosis method plays an important role in disease diagnosis and accuracy of its treatment. A diagnosis expert system can help a great deal in identifying those diseases and describing methods of treatment to be carried out; taking into account the user capability in order to deal and interact with expert system easily and clearly. A good way to improve diagnosis accuracy of expert systems is use of ensemble classifiers. The proposed research presents an expert system using multi-layer classification with enhanced bagging and optimized weighting. The proposed method is named as “M2-BagWeight” which overcomes the limitations of individual as well as other ensemble classifiers. Evaluation of the proposed model is performed on two different liver disease datasets, chronic kidney disease dataset, heart disease dataset, diabetic retinopathy debrecen dataset, breast cancer dataset and primary tumor dataset obtained from UCI public repository. It is clear from the analysis of results that proposed expert system has achieved high classification and prediction accuracy when compared with individual as well as ensemble classifiers. Moreover, an application named “WebMAC” is also developed for practical implementation of proposed model in hospital for diagnostic advice.  相似文献   

7.
乔善平  闫宝强 《计算机应用》2016,36(8):2150-2156
针对多标记学习和集成学习在解决蛋白质多亚细胞定位预测问题上应用还不成熟的状况,研究基于集成多标记学习的蛋白质多亚细胞定位预测方法。首先,从多标记学习和集成学习相结合的角度提出了一种三层的集成多标记学习系统框架结构,该框架将学习算法和分类器进行了层次性分类,并把二分类学习、多分类学习、多标记学习和集成学习进行有效整合,形成一个通用型的三层集成多标记学习模型;其次,基于面向对象技术和统一建模语言(UML)对系统模型进行了设计,使系统具备良好的可扩展性,通过扩展手段增强系统的功能和提高系统的性能;最后,使用Java编程技术对模型进行扩展,实现了一个学习系统软件,并成功应用于蛋白质多亚细胞定位预测问题上。通过在革兰氏阳性细菌数据集上进行测试,验证了系统功能的可操作性和较好的预测性能,该系统可以作为解决蛋白质多亚细胞定位预测问题的一个有效工具。  相似文献   

8.
在集成学习中使用平均法、投票法作为结合策略无法充分利用基分类器的有效信息,且根据波动性设置基分类器的权重不精确、不恰当。以上问题会降低集成学习的效果,为了进一步提高集成学习的性能,提出将证据推理(evidence reasoning, ER)规则作为结合策略,并使用多样性赋权法设置基分类器的权重。首先,由多个深度学习模型作为基分类器、ER规则作为结合策略,构建集成学习的基本结构;然后,通过多样性度量方法计算每个基分类器相对于其他基分类器的差异性;最后,将差异性归一化实现基分类器的权重设置。通过多个图像数据集的分类实验,结果表明提出的方法较实验选取的其他方法准确率更高且更稳定,证明了该方法可以充分利用基分类器的有效信息,且多样性赋权法更精确。  相似文献   

9.
刁树民  王永利 《计算机应用》2009,29(6):1578-1581
在进行组合决策时,已有的组合分类方法需要对多个组合分类器均有效的公共已知标签训练样本。为了解决在没有已知标签样本的情况下数据流组合分类决策问题,提出一种基于约束学习的数据流组合分类器的融合策略。在判定测试样本上的决策时,根据直推学习理论设计满足每一个局部分类器约束度量的方法,保证了约束的可行性,解决了分布式分类聚集时最大熵的直推扩展问题。测试数据集上的实验证明,与已有的直推学习方法相比,此方法可以获得更好的决策精度,可以应用于数据流组合分类的融合。  相似文献   

10.
结构化集成学习垃圾邮件过滤   总被引:4,自引:0,他引:4  
为了解决垃圾邮件过滤算法低计算复杂度与高分类准确率之间的矛盾,在多域学习框架下提出一种结构化集成学习思想,它根据文档结构组合多个基分类器的结果以追求更高分类性能.采用邮件文档的字符串特征生成多个轻量基分类器,并采用字符串-频率索引存储标注数据,使得每次更新和查询的时间开销是常数量级.根据邮件文档的多域结构特性,提出历史域分类器效力线性组合权和当前域文档分类能力线性组合权.综合考虑历史域分类器效力和当前域文档分类能力,还提出一种能够提高整体分类准确率的综合线性组合权.在TREC立即全反馈垃圾邮件过滤任务上的实验结果表明:基于综合线性组合权的结构化集成学习方法能够在较短的时间(47.24 min)内完成过滤任务,整体性能1-ROCA达到参加TREC2007评测的最优过滤器性能(0.0055).  相似文献   

11.
针对传统单个分类器在不平衡数据上分类效果有限的问题,基于对抗生成网络(GAN)和集成学习方法,提出一种新的针对二类不平衡数据集的分类方法——对抗生成网络-自适应增强-决策树(GAN-AdaBoost-DT)算法。首先,利用GAN训练得到生成模型,生成模型生成少数类样本,降低数据的不平衡性;其次,将生成的少数类样本代入自适应增强(AdaBoost)模型框架,更改权重,改进AdaBoost模型,提升以决策树(DT)为基分类器的AdaBoost模型的分类性能。使用受测者工作特征曲线下面积(AUC)作为分类评价指标,在信用卡诈骗数据集上的实验分析表明,该算法与合成少数类样本集成学习相比,准确率提高了4.5%,受测者工作特征曲线下面积提高了6.5%;对比改进的合成少数类样本集成学习,准确率提高了4.9%,AUC值提高了5.9%;对比随机欠采样集成学习,准确率提高了4.5%,受测者工作特征曲线下面积提高了5.4%。在UCI和KEEL的其他数据集上的实验结果表明,该算法在不平衡二分类问题上能提高总体的准确率,优化分类器性能。  相似文献   

12.
The aim of the present study is to comparatively assess the performance of different machine learning and statistical techniques with regard to their ability to estimate the risk of developing type 2 diabetes mellitus (Case 1) and cardiovascular disease complications (Case 2). This is the first work investigating the application of ensembles of artificial neural networks (EANN) towards producing the 5‐year risk of developing type 2 diabetes mellitus and cardiovascular disease as a long‐term diabetes complication. The performance of the proposed models has been comparatively assessed with the performance obtained by applying logistic regression, Bayesian‐based approaches, and decision trees. The models' discrimination and calibration have been evaluated using the classification accuracy (ACC), the area under the curve (AUC) criterion, and the Hosmer–Lemeshow goodness of fit test. The obtained results demonstrate the superiority of the proposed models (EANN) over the other models. In Case 1, EANN with different topologies has achieved high discrimination and good calibration performance (ACC = 80.20%, AUC = 0.849, p value = .886). In Case 2, EANN based on bagging has resulted in good discrimination and calibration performance (ACC = 92.86%, AUC = 0.739, p value = .755).  相似文献   

13.
Failure mode (FM) and bearing capacity of reinforced concrete (RC) columns are key concerns in structural design and/or performance assessment procedures. The failure types, i.e., flexure, shear, or mix of the above two, will greatly affect the capacity and ductility of the structure. Meanwhile, the design methodologies for structures of different failure types will be totally different. Therefore, developing efficient and reliable methods to identify the FM and predict the corresponding capacity is of special importance for structural design/assessment management. In this paper, an intelligent approach is presented for FM classification and bearing capacity prediction of RC columns based on the ensemble machine learning techniques. The most typical ensemble learning method, adaptive boosting (AdaBoost) algorithm, is adopted for both classification and regression (prediction) problems. Totally 254 cyclic loading tests of RC columns are collected. The geometric dimensions, reinforcing details, material properties are set as the input variables, while the failure types (for classification problem) and peak capacity forces (for regression problem) are set as the output variables. The results indicate that the model generated by the AdaBoost learning algorithm has a very high accuracy for both FM classification (accuracy = 0.96) and capacity prediction (R2 = 0.98). Different learning algorithms are also compared and the results show that ensemble learning (especially AdaBoost) has better performance than single learning. In addition, the bearing capacity predicted by the AdaBoost is also compared to that by the empirical formulas provided by the design codes, which shows an obvious superior of the proposed method. In summary, the machine learning technique, especially the ensemble learning, can provide an alternate to the conventional mechanics-driven models in structural design in this big data time.  相似文献   

14.
Landslide susceptibility assessment of Uttarakhand area of India has been done by applying five machine learning methods namely Support Vector Machines (SVM), Logistic Regression (LR), Fisher's Linear Discriminant Analysis (FLDA), Bayesian Network (BN), and Naïve Bayes (NB). Performance of these methods has been evaluated using the ROC curve and statistical index based methods. Analysis and comparison of the results show that all five landslide models performed well for landslide susceptibility assessment (AUC = 0.910–0.950). However, it has been observed that the SVM model (AUC = 0.950) has the best performance in comparison to other landslide models, followed by the LR model (AUC = 0.922), the FLDA model (AUC = 0.921), the BN model (AUC = 0.915), and the NB model (AUC = 0.910), respectively.  相似文献   

15.
Rotation Forest, an effective ensemble classifier generation technique, works by using principal component analysis (PCA) to rotate the original feature axes so that different training sets for learning base classifiers can be formed. This paper presents a variant of Rotation Forest, which can be viewed as a combination of Bagging and Rotation Forest. Bagging is used here to inject more randomness into Rotation Forest in order to increase the diversity among the ensemble membership. The experiments conducted with 33 benchmark classification data sets available from the UCI repository, among which a classification tree is adopted as the base learning algorithm, demonstrate that the proposed method generally produces ensemble classifiers with lower error than Bagging, AdaBoost and Rotation Forest. The bias–variance analysis of error performance shows that the proposed method improves the prediction error of a single classifier by reducing much more variance term than the other considered ensemble procedures. Furthermore, the results computed on the data sets with artificial classification noise indicate that the new method is more robust to noise and kappa-error diagrams are employed to investigate the diversity–accuracy patterns of the ensemble classifiers.  相似文献   

16.
AdaBoost算法是一种典型的集成学习框架,通过线性组合若干个弱分类器来构造成强学习器,其分类精度远高于单个弱分类器,具有很好的泛化误差和训练误差。然而AdaBoost 算法不能精简输出模型的弱分类器,因而不具备良好的可解释性。本文将遗传算法引入AdaBoost算法模型,提出了一种限制输出模型规模的集成进化分类算法(Ensemble evolve classification algorithm for controlling the size of final model,ECSM)。通过基因操作和评价函数能够在AdaBoost迭代框架下强制保留物种样本的多样性,并留下更好的分类器。实验结果表明,本文提出的算法与经典的AdaBoost算法相比,在基本保持分类精度的前提下,大大减少了分类器数量。  相似文献   

17.
针对分层Takagi-Sugeno-Kang (TSK)模糊分类器可解释性差,以及当增加或删除一个TSK模糊子分类器时Boosting模糊分类器需要重新训练所有TSK模糊子分类器等问题,提出一种并行集成具有高可解释的TSK模糊分类器EP-Q-TSK.该集成模糊分类器每个TSK模糊子分类器可以使用最小学习机(LLM)被并行地快速构建.作为一种新的集成学习方式,该分类器利用每个TSK模糊子分类器的增量输出来扩展原始验证数据空间,然后采用经典的模糊聚类算法FCM获取一系列代表性中心点,最后利用KNN对测试数据进行分类.在标准UCI数据集上,分别从分类性能和可解释性两方面验证了EP-Q-TSK的有效性.  相似文献   

18.
相比于集成学习,集成剪枝方法是在多个分类器中搜索最优子集从而改善分类器的泛化性能,简化集成过程。帕累托集成剪枝方法同时考虑了分类器的精准度及集成规模两个方面,并将二者均作为优化的目标。然而帕累托集成剪枝算法只考虑了基分类器的精准度与集成规模,忽视了分类器之间的差异性,从而导致了分类器之间的相似度比较大。本文提出了融入差异性的帕累托集成剪枝算法,该算法将分类器的差异性与精准度综合为第1个优化目标,将集成规模作为第2个优化目标,从而实现多目标优化。实验表明,当该改进的集成剪枝算法与帕累托集成剪枝算法在集成规模相当的前提下,由于差异性的融入该改进算法能够获得较好的性能。  相似文献   

19.
集成学习通过构建具有一定互补功能的多个分类器来完成学习任务,以减少分类误差。但是当前研究未能考虑分类器的局部有效性。为此,在基于集成学习的框架下,提出了一个分层结构的多分类算法。该算法按预测类别分解问题,在分层的基础上,集成多个分类器以提高分类准确度。在美国某高校招生录取这一个实际应用的数据集及3个UCI数据集上进行实验,实验结果验证了该算法的有效性。  相似文献   

20.
The global prediction of a homogeneous ensemble of classifiers generated in independent applications of a randomized learning algorithm on a fixed training set is analyzed within a Bayesian framework. Assuming that majority voting is used, it is possible to estimate with a given confidence level the prediction of the complete ensemble by querying only a subset of classifiers. For a particular instance that needs to be classified, the polling of ensemble classifiers can be halted when the probability that the predicted class will not change when taking into account the remaining votes is above the specified confidence level. Experiments on a collection of benchmark classification problems using representative parallel ensembles, such as bagging and random forests, confirm the validity of the analysis and demonstrate the effectiveness of the instance-based ensemble pruning method proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号