首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study aims at designing a support vector machine (SVM)-based classifier for breast cancer detection with higher degree of accuracy. It introduces a best possible training scheme of the features extracted from the mammogram, by first selecting the kernel function and then choosing a suitable training-test partition. Prior to classification, detailed statistical analysis viz., test of significance, density estimation have been performed for identifying discriminating power of the features in between malignant and benign classes. A comparative study has been performed in respect to diagnostic measures viz., confusion matrix, sensitivity and specificity. Here we have considered two data sets from UCI machine learning database having nine and ten dimensional feature spaces for classification. Furthermore, the overall classification accuracy obtained by using the proposed classification strategy is 99.385% for dataset-I and 93.726% for dataset-II, respectively.  相似文献   

2.
J. Li  X. Tang  J. Liu  J. Huang  Y. Wang 《Pattern recognition》2008,41(6):1975-1984
Various microarray experiments are now done in many laboratories, resulting in the rapid accumulation of microarray data in public repositories. One of the major challenges of analyzing microarray data is how to extract and select efficient features from it for accurate cancer classification. Here we introduce a new feature extraction and selection method based on information gene pairs that have significant change in different tissue samples. Experimental results on five public microarray data sets demonstrate that the feature subset selected by the proposed method performs well and achieves higher classification accuracy on several classifiers. We perform extensive experimental comparison of the features selected by the proposed method and features selected by other methods using different evaluation methods and classifiers. The results confirm that the proposed method performs as well as other methods on acute lymphoblastic-acute myeloid leukemia, adenocarcinoma and breast cancer data sets using a fewer information genes and leads to significant improvement of classification accuracy on colon and diffuse large B cell lymphoma cancer data sets.  相似文献   

3.
Abstract: This paper gives an integrated view of implementing automated diagnostic systems for clinical decision-making. Because of the importance of making the right decision, better classification procedures are necessary for clinical decisions. The major objective of the paper is to be a guide for readers who want to develop an automated decision support system for clinical practice. The purpose was to determine an optimum classification scheme with high diagnostic accuracy for this problem. Several different classification algorithms were tested and benchmarked for their performance. The performance of the classification algorithms is illustrated on two data sets: the Pima Indians diabetes and the Wisconsin breast cancer. The present research demonstrates that the support vector machines achieved diagnostic accuracies which were higher than those of other automated diagnostic systems.  相似文献   

4.
This paper presents an automatic diagnosis system for detecting breast cancer based on association rules (AR) and neural network (NN). In this study, AR is used for reducing the dimension of breast cancer database and NN is used for intelligent classification. The proposed AR + NN system performance is compared with NN model. The dimension of input feature space is reduced from nine to four by using AR. In test stage, 3-fold cross validation method was applied to the Wisconsin breast cancer database to evaluate the proposed system performances. The correct classification rate of proposed system is 95.6%. This research demonstrated that the AR can be used for reducing the dimension of feature space and proposed AR + NN model can be used to obtain fast automatic diagnostic systems for other diseases.  相似文献   

5.
Breast cancer is a decisive disease worldwide. It is one of the most widely spread cancer among women. As per the survey, one out of eight women in the world are at risk of breast cancer at some point of time in her life. One of the methods to reduce breast cancer mortality rate is timely detection and effective treatment. That is why, more accurate classification of a breast cancer tumor has become a challenging problem in the medical field. Many classification techniques are proposed in the literature. Today, expert systems and machine learning techniques are being extensively used in the breast cancer classification problem. They provide high classification accuracy and effective diagnostic capabilities. In this paper, we have proposed a novel Gauss-Newton representation based algorithm (GNRBA) for breast cancer classification. It uses the sparse representation with training sample selection. Until now, sparse representation has been successfully applied in pattern recognition only. The proposed method introduces a novel Gauss-Newton based approach to find the optimal weights for the training samples for classification. In addition, it evaluates the sparsity in a computationally efficient way as compared to the conventional l1-norm method. The effectiveness of the GNRBA is examined on the Wisconsin Breast Cancer Database (WBCD) and the Wisconsin Diagnosis Breast Cancer (WDBC) database from the UCI Machine Learning repository. Various performance measures like classification accuracy, sensitivity, specificity, confusion matrices, a statistical test and the area under the receiver operating characteristic (AUC) are reported to show the superiority of the proposed method as compared to classical models. The experimental results show that the proposed GNRBA could be a good alternative for breast cancer classification for clinical experts.  相似文献   

6.
Implementing automated diagnostic systems for breast cancer detection   总被引:3,自引:0,他引:3  
This paper intends to an integrated view of implementing automated diagnostic systems for breast cancer detection. The major objective of the paper is to be a guide for the readers, who want to develop an automated decision support system for detection of breast cancer. Because of the importance of making the right decision, better classification procedures for breast cancer have been searched. The classification accuracies of different classifiers, namely multilayer perceptron neural network (MLPNN), combined neural network (CNN), probabilistic neural network (PNN), recurrent neural network (RNN) and support vector machine (SVM), which were trained on the attributes of each record in the Wisconsin breast cancer database, were compared. The purpose was to determine an optimum classification scheme with high diagnostic accuracy for this problem. This research demonstrated that the SVM achieved diagnostic accuracies which were higher than that of the other automated diagnostic systems.  相似文献   

7.
This paper intends to an integrated view of implementing automated diagnostic systems for breast cancer detection. The major objective of the paper is to be a guide for the readers, who want to develop an automated decision support system for detection of breast cancer. Because of the importance of making the right decision, better classification procedures for breast cancer have been searched. The classification accuracies of different classifiers, namely multilayer perceptron neural network (MLPNN), combined neural network (CNN), probabilistic neural network (PNN), recurrent neural network (RNN) and support vector machine (SVM), which were trained on the attributes of each record in the Wisconsin breast cancer database, were compared. The purpose was to determine an optimum classification scheme with high diagnostic accuracy for this problem. This research demonstrated that the SVM achieved diagnostic accuracies which were higher than that of the other automated diagnostic systems.  相似文献   

8.

The high incidence of breast cancer in women has increased significantly in the recent years. Mammogram breast X-ray imaging is considered the most effective, low-cost, and reliable method in early detection of breast cancer. Although general rules for the differentiation between benign and malignant breast lesion exist, only 15–30% of masses referred for surgical biopsy are actually malignant. Physician experience of detecting breast cancer can be assisted by using some computerized feature extraction and classification algorithms. Computer-aided classification system was used to help in diagnosing abnormalities faster than traditional screening program without the drawback attribute to human factors. In this work, an approach is proposed to develop a computer-aided classification system for cancer detection from digital mammograms. The proposed system consists of three major steps. The first step is region of interest (ROI) extraction of 256 × 256 pixels size. The second step is the feature extraction; we used a set of 26 features, and we found that these features are capable of differentiating between normal and cancerous breast tissues in order to minimize the classification error. The third step is the classification process; we used the technique of the association rule mining to classify between normal and cancerous tissues. The proposed system was shown to have the large potential for cancer detection from digital mammograms.

  相似文献   

9.
In this paper, we present a gene selection method based on genetic algorithm (GA) and support vector machines (SVM) for cancer classification. First, the Wilcoxon rank sum test is used to filter noisy and redundant genes in high dimensional microarray data. Then, the different highly informative genes subsets are selected by GA/SVM using different training sets. The final subset, consisting of highly discriminating genes, is obtained by analyzing the frequency of appearance of each gene in the different gene subsets. The proposed method is tested on three open datasets: leukemia, breast cancer, and colon cancer data. The results show that the proposed method has excellent selection and classification performance, especially for breast cancer data, which can yield 100% classification accuracy using only four genes.  相似文献   

10.
目的 深度置信网络能够从数据中自动学习、提取特征,在特征学习方面具有突出优势。极化SAR图像分类中存在海量特征利用率低、特征选取主观性强的问题。为了解决这一问题,提出一种基于深度置信网络的极化SAR图像分类方法。方法 首先进行海量分类特征提取,获得极化类、辐射类、空间类和子孔径类四类特征构成的特征集;然后在特征集基础上选取样本并构建特征矢量,用以输入到深度置信网络模型之中;最后利用深度置信网络的方法对海量分类特征进行逐层学习抽象,获得有效的分类特征进行分类。结果 采用AIRSAR数据进行实验,分类结果精度达到91.06%。通过与经典Wishart监督分类、逻辑回归分类方法对比,表现了深度置信网络方法在特征学习方面的突出优势,验证了方法的适用性。结论 针对极化SAR图像海量特征的选取与利用,提出了一种新的分类方法,为极化SAR图像分类提供了一种新思路,为深度置信网络获得更广泛地应用进行有益的探索和尝试。  相似文献   

11.
超声图像检测是当前乳腺癌诊断的主要辅助手段之一.为实现超声乳腺肿瘤的计算机自动辅助诊断,提出一种基于支持向量机(SVM)目标检测与水平集图像分割相结合的全自动肿瘤提取算法.首先提取超声图像训练集的分块特征来训练SVM分类器,对测试集图像进行检测得到可疑病灶区域;然后提取可疑区域边缘作为水平集的初始轮廓,使用加入Bhattacharyya距离项的Chan-Vese主动轮廓改进模型进行可疑病灶区域的轮廓演化,得到准确的轮廓;最后综合面积、位置、灰度、纹理等因素设计区域评价筛选准则,去除可疑病灶中的干扰区域,得到最终的肿瘤分割结果.在真实病例数据集上的测试结果表明,利用该算法在良恶性肿瘤检测分割中均有较好表现.  相似文献   

12.
粗糙集在心电图分类诊断中的应用   总被引:2,自引:0,他引:2  
心电图是诊断心血管疾病的重要依据,论文提出了基于粗糙集的多变量决策树在分类诊断中的应用,并以窦性心率失常为例创建了多变量决策树,得到相应的分类规则。使用实际数据进行测试的结果表明,可以有效、快速地进行心率失常病例判别。  相似文献   

13.
Breast cancer continues to be a significant public health problem in the world. Early detection is the key for improving breast cancer prognosis. Mammogram breast X-ray is considered the most reliable method in early detection of breast cancer. However, it is difficult for radiologists to provide both accurate and uniform evaluation for the enormous mammograms generated in widespread screening. Micro calcification clusters (MCCs) and masses are the two most important signs for the breast cancer, and their automated detection is very valuable for early breast cancer diagnosis. The main objective is to discuss the computer-aided detection system that has been proposed to assist the radiologists in detecting the specific abnormalities and improving the diagnostic accuracy in making the diagnostic decisions by applying techniques splits into three-steps procedure beginning with enhancement by using Histogram equalization (HE) and Morphological Enhancement, followed by segmentation based on Otsu's threshold the region of interest for the identification of micro calcifications and mass lesions, and at last classification stage, which classify between normal and micro calcifications ‘patterns and then classify between benign and malignant micro calcifications. In classification stage; three methods were used, the voting K-Nearest Neighbor classifier (K-NN) with prediction accuracy of 73%, Support Vector Machine classifier (SVM) with prediction accuracy of 83%, and Artificial Neural Network classifier (ANN) with prediction accuracy of 77%.  相似文献   

14.
Mammogram—breast X-ray—is considered the most effective, low cost, and reliable method in early detection of breast cancer. Although general rules for the differentiation between benign and malignant breast lesions exist, only 15–30 % of masses referred for surgical biopsy are actually malignant. In this work, an approach is proposed to develop a computer-aided classification system for cancer detection from digital mammograms. The proposed system consists of three major steps. The first step is region of interest (ROI) extraction of 256 × 256 pixels size. The second step is the feature extraction; we used a set of 19 GLCM and GLRLM features, and the 19 (nineteen) features extracted from gray-level run-length matrix and gray-level co-occurrence matrix could distinguish malignant masses from benign masses with an accuracy of 96.7 %. Further analysis was carried out by involving only 12 of the 19 features extracted, which consists of 5 features extracted from GLCM matrix and 7 features extracted from GLRL matrix. The 12 selected features are as follows: Energy, Inertia, Entropy, Maxprob, Inverse, SRE, LRE, GLN, RLN, LGRE, HGRE, and SRLGE; ARM with 12 features as prediction can distinguish malignant mass image and benign mass with a level of accuracy of 93.6 %. Further analysis showed that area under the receiver operating curve was 0.995, which means that the accuracy level of classification is good or very good. Based on that data, it was concluded that texture analysis based on GLCM and GLRLM could distinguish malignant image and benign image with considerably good result. The third step is the classification process; we used the technique of decision tree using image content to classify between normal and cancerous masses. The proposed system was shown to have the large potential for cancer detection from digital mammograms.  相似文献   

15.
The receiver operating characteristic (ROC) formulation of the two class signal detection problem is well known with its present theory being based on decision theory and psychophysics. Statistical procedures developed for analyzing these human observer detection experiments can be extended to analyzing pattern recognition experiments with computer based classification schemes. This article presents an introduction to statistical estimation and hypothesis testing methodology, which can be employed in analyzing the performance of various classifiers. The methodology will be illustrated by analyzing the performance of two classifiers in a breast cancer detection task.  相似文献   

16.
机器学习和深度学习技术可用于解决医学分类预测中的许多问题,其中一些分类算法的预测精度较高,而另一些算法的精度有限。提出了基于C-AdaBoost模型的集成学习算法,对乳腺癌疾病进行预测,发现了判断乳腺癌是否复发、乳腺癌肿瘤是否为良性的最优特征组合。通过逐步回归方法对现有特征进行二次选取,并结合C-AdaBoost模型使得预测效果更优。大量实验表明,基于C-AdaBoost模型的算法的预测准确率比SVM、Naive Bayes、RandomForest以及传统的集成学习模型等机器学习分类器的准确率最多可提高19.5%,从而可以更好地帮助医生进行临床决策。  相似文献   

17.
贾鹤鸣  李瑶  孙康健 《自动化学报》2022,48(6):1601-1615
针对传统支持向量机方法用于数据分类存在分类精度低的不足问题, 将支持向量机分类方法与特征选择同步结合, 并利用智能优化算法对算法参数进行优化研究. 首先将遗传算法(Genetic algorithm, GA)和乌燕鸥优化算法(Sooty tern optimization algorithm, STOA)进行混合, 先通过对平均适应度值进行评估, 当个体的适应度函数值小于平均值时采用遗传算法对其进行局部搜索的加强, 否则进行乌燕鸥本体优化过程, 同时将支持向量机内核函数和特征选择目标共同作为优化对象, 利用改进后的STOA-GA寻找最适应解, 获得所选的特征分类结果. 其次, 通过16组经典UCI数据集和实际乳腺癌数据集进行数据分类研究, 在最佳适应度值、所选特征个数、特异性、敏感性和算法耗时方面进行对比研究, 实验结果表明, 该算法可以更加准确地处理数据, 避免冗余特征干扰, 在数据挖掘领域具有更广阔的工程应用前景.  相似文献   

18.
This paper presents a reciprocal-sigmoid model for pattern classification. This proposed classifier can be considered as a Φ-machine since it preserves the theoretical advantage of linear machines where the weight parameters can be estimated in a single step. The model can also be considered as an approximation to logistic regression under the framework of Generalized Linear Models. While inheriting the necessary classification capability from logistic regression, the problems of local minima and tedious recursive search no longer exist in the proposed formulation. To handle possible over-fitting when using high order models, the classifier is trained using multiple samples of uniformly scaled pattern features. Empirically, the classifier is evaluated using a benchmark synthetic data from random sampling runs for initial statistical evidence regarding its classification accuracy and computational efficiency. Additional experiments based on ten runs of 10-fold cross validations on 40 data sets further support the effectiveness of the reciprocal-sigmoid model, where its classification accuracy is seen to be comparable to several top classifiers in the literature. Main reasons for the good performance are attributed to effective use of reciprocal sigmoid for embedding nonlinearities and effective use of bundled feature sets for smoothing the training error hyper-surface. Editor: Risto Miikkulainen  相似文献   

19.
A covariance matrix self-adaptation evolution strategy (CMSA-ES) was compared with several metaheuristic techniques for multilayer perceptron (MLP)-based function approximation and classification. Function approximation was based on simulations of several 2D functions and classification analysis was based on nine cancer DNA microarray data sets. Connection weight learning by MLPs was carried out using genetic algorithms (GA?CMLP), covariance matrix self-adaptation-evolution strategies (CMSA-ES?CMLP), back-propagation gradient-based learning (MLP), particle swarm optimization (PSO?CMLP), and ant colony optimization (ACO?CMLP). During function approximation runs, input-side activation functions evaluated included linear, logistic, tanh, Hermite, Laguerre, exponential, and radial basis functions, while the output-side function was always linear. For classification, the input-side activation function was always logistic, while the output-side function was always regularized softmax. Self-organizing maps and unsupervised neural gas were used to reduce dimensions of original gene expression input features used in classification. Results indicate that for function approximation, use of Hermite polynomials for activation functions at hidden nodes with CMSA-ES?CMLP connection weight learning resulted in the greatest fitness levels. On average, the most elite chromosomes were observed for MLP ( ${\rm MSE}=0.4977$ ), CMSA-ES?CMLP (0.6484), PSO?CMLP (0.7472), ACO?CMLP (1.3471), and GA?CMLP (1.4845). For classification analysis, overall average performance of classifiers used was 92.64% (CMSA-ES?CMLP), 92.22% (PSO?CMLP), 91.30% (ACO?CMLP), 89.36% (MLP), and 60.72% (GA?CMLP). We have shown that a reliable approach to function approximation can be achieved through application of MLP connection weight learning when the assumed function is unknown. In this scenario, the MLP architecture itself defines the equation used for solving the unknown parameters relating input and output target values. A major drawback of implementing CMSA-ES into an MLP is that when the number of MLP weights is large, the ${{\mathcal{O}}}(N^3)$ Cholesky factorization becomes a bottleneck for performance. As an alternative, feature reduction using SOM and NG can greatly enhance performance of CMSA-ES?CMLP by reducing $N.$ Future research into the speeding up of Cholesky factorization for CMSA-ES will be helpful in overcoming time complexity problems related to a large number of connection weights.  相似文献   

20.
In this study, we propose a set of new algorithms to enhance the effectiveness of classification for 5-year survivability of breast cancer patients from a massive data set with imbalanced property. The proposed classifier algorithms are a combination of synthetic minority oversampling technique (SMOTE) and particle swarm optimization (PSO), while integrating some well known classifiers, such as logistic regression, C5 decision tree (C5) model, and 1-nearest neighbor search. To justify the effectiveness for this new set of classifiers, the g-mean and accuracy indices are used as performance indexes; moreover, the proposed classifiers are compared with previous literatures. Experimental results show that the hybrid algorithm of SMOTE + PSO + C5 is the best one for 5-year survivability of breast cancer patient classification among all algorithm combinations. We conclude that, implementing SMOTE in appropriate searching algorithms such as PSO and classifiers such as C5 can significantly improve the effectiveness of classification for massive imbalanced data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号