期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

APPLYING DATA MINING TECHNIQUES FOR CANCER CLASSIFICATION ON GENE EXPRESSION DATA

Jinn-Yi Yeh 《控制论与系统》2013,44(6):583-602

Cancer classification through gene expression data analysis has recently emerged as an active area of research. This paper applies Genetic Algorithms (GA) for selecting a group of relevant genes from cancer microarray data. Then, the popular classifiers, such as OneR, Naïve Bayes, decision tree, and Support Vector Machine (SVM), are built on the basis of these selected genes. The performance of those classifiers is evaluated by using the publicly available gene expression data sets. Experimental results indicate that the cascade of GA and SVM has the highest rank among different methods. Moreover, the gene selection operation of GA is reproducible. 相似文献

2.

Gene selection using genetic algorithm and support vectors machines

Shutao Li Xixian Wu Xiaoyan Hu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2008,12(7):693-698

In this paper, we present a gene selection method based on genetic algorithm (GA) and support vector machines (SVM) for cancer classification. First, the Wilcoxon rank sum test is used to filter noisy and redundant genes in high dimensional microarray data. Then, the different highly informative genes subsets are selected by GA/SVM using different training sets. The final subset, consisting of highly discriminating genes, is obtained by analyzing the frequency of appearance of each gene in the different gene subsets. The proposed method is tested on three open datasets: leukemia, breast cancer, and colon cancer data. The results show that the proposed method has excellent selection and classification performance, especially for breast cancer data, which can yield 100% classification accuracy using only four genes. 相似文献

3.

Gene selection using hybrid particle swarm optimization and genetic algorithm 总被引：2，自引：0，他引：2

Shutao Li Xixian Wu Mingkui Tan 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2008,12(11):1039-1048

Selecting high discriminative genes from gene expression data has become an important research. Not only can this improve the performance of cancer classification, but it can also cut down the cost of medical diagnoses when a large number of noisy, redundant genes are filtered. In this paper, a hybrid Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) method is used for gene selection, and Support Vector Machine (SVM) is adopted as the classifier. The proposed approach is tested on three benchmark gene expression datasets: Leukemia, Colon and breast cancer data. Experimental results show that the proposed method can reduce the dimensionality of the dataset, and confirm the most informative gene subset and improve classification accuracy. 相似文献

4.

一种基于拆分的基因选择算法 总被引：1，自引：0，他引：1

王永全焦娜苗夺谦《计算机科学》2012,39(1):228-233

基因表达数据是由成千上万个基因及几十个样本组成的,有效的基因选择算法是基因表达数据研究的重要内容。粗糙集是一个有效的去掉冗余特征的工具。然而,对于含有成千上万特征、几十个样本的基因表达数据,现有基于粗糙集的特征选择算法的计算效率会变得非常低。为此,将拆分方法应用于特征选择,提出了一种基于拆分的特征选择算法。该算法把一个复杂的表拆分成简单的、更容易处理的主表与子表形式,然后把它们的结果连接到一起解决初始表的问题。实验结果表明,该算法在保证分类精度的同时,能明显提高计算效率。相似文献

5.

Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach

Sounak Chakraborty 《Computational statistics & data analysis》2009,53(4):1462-1474

Since most cancer treatments come with a certain degree of toxicity it is very essential to identify a cancer type correctly and then administer the relevant therapy. With the arrival of powerful tools such as gene expression microarrays the cancer classification basis is slowly changing from morphological properties to molecular signatures. Several recent studies have demonstrated a marked improvement in prediction accuracy of tumor types based on gene expression microarray measurements over clinical markers. The main challenge in working with gene expression microarrays is that there is a huge number of genes to work with. Out of them only a small fraction are actually relevant for differentiating between different types of cancer. A Bayesian nearest neighbor model equipped with an integrated variable selection technique is proposed to overcome this challenge. This classification and gene selection model is able to classify different cancer types accurately and simultaneously identify the relevant or important genes. The proposed model is completely automatic in the sense that it adaptively picks up the neighborhood size and the important covariates. The method is successfully applied to three simulated data sets and four well known real data sets. To demonstrate the competitiveness of the method a comparative study is also done with several other “off the shelf” popular classification methods. For all the simulated data sets and real life data sets, the proposed method produced highly competitive if not better results. While the standard approach is two step model building for gene selection and then tumor prediction, this novel adaptive gene selection technique automatically selects the relevant genes along with tumor class prediction in one go. The biological relevance of the selected genes are also discussed to validate the claim. 相似文献

6.

基于相容关系的基因选择方法 总被引：1，自引：0，他引：1

焦娜苗夺谦《计算机科学》2010,37(10):217-220

有效的基因选择是对基因表达数据进行分析的重要内容。粗糙集作为一种软计算方法能够保持在数据集分类能力不变的基础上,对属性进行约简。由于基因表达数据的连续性,为了避免运用粗糙集方法所必需的离散化过程带来的信息丢失,将相容粗糙集应用于基因的特征选取,提出了基于相容关系的基因选择方法。首先,通过i检验对基因表达数据进行排列,选择评分靠前的若干基因;然后,通过相容粗糙集对这些基因进一步约简。在两个标准的基因表达数据上进行了实验,结果表明该方法是可行性和有效性的。相似文献

7.

A two-stage gene selection scheme utilizing MRMR filter and GA wrapper 总被引：1，自引：0，他引：1

Ali El Akadi Aouatif Amine Abdeljalil El Ouardighi Driss Aboutajdine 《Knowledge and Information Systems》2011,26(3):487-500

Gene expression data usually contain a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminates biological samples of different types. In this paper, we propose a two-stage selection algorithm for genomic data by combining MRMR (Minimum Redundancy–Maximum Relevance) and GA (Genetic Algorithm). In the first stage, MRMR is used to filter noisy and redundant genes in high-dimensional microarray data. In the second stage, the GA uses the classifier accuracy as a fitness function to select the highly discriminating genes. The proposed method is tested for tumor classification on five open datasets: NCI, Lymphoma, Lung, Leukemia and Colon using Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers. The comparison of the MRMR-GA with MRMR filter and GA wrapper shows that our method is able to find the smallest gene subset that gives the most classification accuracy in leave-one-out cross-validation (LOOCV). 相似文献

8.

基于融合信息的癌症相关基因选择方法

张树波赖剑煌《计算机科学》2010,37(12):171-174

基因表达数据的出现,为人类从分子生物学的角度研究和探索癌症的发病机理提供了广阔的前景,利用基因表达数据发现与癌症相关的基因对于癌症的诊断和治疗具有重要的意义。在过去的十几年里,已经有很多种计算方法被成功地用于从基因表达数据中找出与癌症相关的关键基因,然而,不同的方法从不同的角度刻画基因对不同类型样本的区分能力,它们选择出来的关键基因可能不一致,这将给医学解释和应用带来困扰。现提出一种融合的方法,即将基因在不同方面对样本的判别能力结合起来,首先计算每个基因的信息增益、全局判别能力和局部判别能力,再用它们的识别率进行加权,进而计算每个基因的综合判别能力,最后筛选出判别能力最高的基因子集作为关键基因子集。实验结果表明,此方法得到了比采用单独一种评价标准更好的识别效果。相似文献

9.

肿瘤信息基因启发式宽度优先搜索算法研究 总被引：6，自引：0，他引：6

王树林王戟陈火旺李树涛张波云《计算机学报》2008,31(4):636-649

基于基因表达谱的肿瘤检测方法有望成为临床医学上一种快速而有效的肿瘤分子诊断方法,但由于基因表达谱数据存在维数过高、样本量很小以及噪音很大等特点,使得肿瘤信息基因选择成为一件有挑战性的工作.根据肿瘤基因表达谱样本集的特点,提出了一种以支持向量机分类性能为评估准则的寻找信息基因的启发式宽度优先搜索算法,其优点是能够同时搜索到基因数量尽可能少而分类能力尽可能强的多个信息基因子集.实验采用了3种肿瘤样本集以验证新算法的可行性和有效性,对于急性白血病、难以分类的结肠癌和多肿瘤亚型的小圆蓝细胞瘤样本集,分别只需2,4和4个信息基因就能获得100%的4-折交叉验证识别准确率.与其它优秀的肿瘤分类方法相比,实验结果在信息基因数量及其分类性能方面具有明显的优越性.为避免样本集的不同划分对分类性能的影响,提出了一种能够更加客观地反映信息基因子集分类性能的全折交叉验证评估方法. 相似文献

10.

基于对称不确定性和邻域粗糙集的肿瘤分类信息基因选择

叶明全高凌云伍长荣黄道斌胡学钢《数据采集与处理》2018,33(3):426-435

基因表达谱中信息基因选择是有效建立肿瘤分类模型的关键问题。肿瘤基因表达谱具有高维小样本、噪声大且存在大量无关和冗余基因等特点。为了获得基因数量尽可能少而分类能力尽可能强的一组信息基因,提出一种基于对称不确定性和邻域粗糙集的肿瘤分类信息基因选择SUNRS方法。首先利用对称不确定性指标评估信息基因的重要度,以剔除大量无关和冗余基因,获取信息基因的候选子集;然后利用邻域粗糙集约简算法对信息基因候选子集进行寻优,获得信息基因的目标子集。实验结果表明,SUNRS方法能够用较少的信息基因获得更高的分类精度,从而既能改善算法的泛化性能,又能提高时间效率。相似文献

11.

Feature clustering based support vector machine recursive feature elimination for gene selection

Xiaojuan Huang Li Zhang Bangjun Wang Fanzhang Li Zhao Zhang 《Applied Intelligence》2018,48(3):594-607

In a DNA microarray dataset, gene expression data often has a huge number of features(which are referred to as genes) versus a small size of samples. With the development of DNA microarray technology, the number of dimensions increases even faster than before, which could lead to the problem of the curse of dimensionality. To get good classification performance, it is necessary to preprocess the gene expression data. Support vector machine recursive feature elimination (SVM-RFE) is a classical method for gene selection. However, SVM-RFE suffers from high computational complexity. To remedy it, this paper enhances SVM-RFE for gene selection by incorporating feature clustering, called feature clustering SVM-RFE (FCSVM-RFE). The proposed method first performs gene selection roughly and then ranks the selected genes. First, a clustering algorithm is used to cluster genes into gene groups, in each which genes have similar expression profile. Then, a representative gene is found to represent a gene group. By doing so, we can obtain a representative gene set. Then, SVM-RFE is applied to rank these representative genes. FCSVM-RFE can reduce the computational complexity and the redundancy among genes. Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM-RFE. 相似文献

12.

Gene subset selection using an iterative approach based on genetic algorithms

Mohd Saberi Mohamad Sigeru Omatu Safaai Deris Michifumi Yoshioka 《Artificial Life and Robotics》2009,14(1):12-15

Microarray data are expected to be useful for cancer classification. However, the process of gene selection for the classification contains a major problem due to properties of the data such as the small number of samples compared with the huge number of genes (higher-dimensional data), irrelevant genes, and noisy data. Hence, this article aims to select a near-optimal (small) subset of informative genes that is most relevant for the cancer classification. To achieve this aim, an iterative approach based on genetic algorithms has been proposed. Experimental results show that the performance of the proposed approach is superior to other previous related work, as well as to four methods tried in this work. In addition, a list of informative genes in the best gene subsets is also presented for biological usage. 相似文献

13.

Class prediction and gene selection for DNA microarrays using regularized sliced inverse regression

Luca Scrucca 《Computational statistics & data analysis》2007,52(1):438-451

The monitoring of the expression profiles of thousands of genes have proved to be particularly promising for biological classification. DNA microarray data have been recently used for the development of classification rules, particularly for cancer diagnosis. However, microarray data present major challenges due to the complex, multiclass nature and the overwhelming number of variables characterizing gene expression profiles. A regularized form of sliced inverse regression (REGSIR) approach is proposed. It allows the simultaneous development of classification rules and the selection of those genes that are most important in terms of classification accuracy. The method is illustrated on some publicly available microarray data sets. Furthermore, an extensive comparison with other classification methods is reported. The REGSIR performance is comparable with the best classification methods available, and when appropriate feature selection is made the performance can be considerably improved. 相似文献

14.

Markov blanket-embedded genetic algorithm for gene selection

Zexuan Zhu Author Vitae Yew-Soon Ong Author Vitae Manoranjan Dash Author Vitae 《Pattern recognition》2007,40(11):3236-3248

Microarray technologies enable quantitative simultaneous monitoring of expression levels for thousands of genes under various experimental conditions. This new technology has provided a new way of biological classification on a genome-wide scale. However, predictive accuracy is affected by the presence of thousands of genes many of which are unnecessary from the classification point of view. So, a key issue of microarray data classification is to identify the smallest possible set of genes that can achieve good predictive accuracy. In this study, we propose a novel Markov blanket-embedded genetic algorithm (MBEGA) for gene selection problem. In particular, the embedded Markov blanket-based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results on synthetic and microarray benchmark datasets suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive power in classifier model. A detailed comparative study with other methods from each of filter, wrapper, and standard GA shows that MBEGA gives a best compromise among all four evaluation criteria, i.e., classification accuracy, number of selected genes, computational cost, and robustness. 相似文献

15.

Bayesian binary kernel probit model for microarray based cancer classification and gene selection

Sounak Chakraborty 《Computational statistics & data analysis》2009,53(12):4198-4209

With the arrival of gene expression microarrays a new challenge has opened up for identification or classification of cancer tissues. Due to the large number of genes providing valuable information simultaneously compared to very few available tissue samples the cancer staging or classification becomes very tricky.In this paper we introduce a hierarchical Bayesian probit model for two class cancer classification. Instead of assuming a linear structure for the function that relates the gene expressions with the cancer types we only assume that the relationship is explained by an unknown function which belongs to an abstract functional space like the reproducing kernel Hilbert space. Our formulation automatically reduces the dimension of the problem from the large number of covariates or genes to a small sample size. We incorporate a Bayesian gene selection scheme with the automatic dimension reduction to adaptively select important genes and classify cancer types under an unified model. Our model is highly flexible in terms of explaining the relationship between the cancer types and gene expression measurements and picking up the differentially expressed genes. The proposed model is successfully tested on three simulated data sets and three publicly available leukemia cancer, colon cancer, and prostate cancer real life data sets. 相似文献

16.

A gene selection method for microarray data based on risk genes

Tzu-Tsung Wong Ding-Qun Chen 《Expert systems with applications》2011,38(11):14065-14071

Many gene selection methods have been proposed to select a subset of genes that can have a high prediction accuracy for cancer classification, and most set the same preference for all genes. However, many biological reports have pointed out that mutated or flawed genes, named as risk genes, can be one of the major causes of a specific disease. This study proposes a gene selection method based on the risk genes found in biological reports. The information provided by risk genes can reduce the time complexity for gene selection and increase the accuracy of cancer classification. This gene selection method is composed of two stages. Since all risk genes must be chosen, the first stage is to remove the genes that have similar expression levels or functions to risk genes. The next stage is to perform gene selection and gene replacement based on the results of a process that divides the remaining genes into clusters. Based on the test results from four microarray data sets, our gene selection method outperforms those proposed by previous studies, and genes that have the potential to be new risk genes are presented. 相似文献

17.

基于聚类和微粒群优化的基因选择新方法

杨善秀韩飞关健《计算机应用》2013,33(5):1285-1288

鉴于传统的基因选择方法会选出大量冗余基因从而导致较低的样本预测准确率,提出一种基于聚类和微粒群优化的基因选择算法。首先采用聚类算法将基因分成固定数目的簇;然后,采用极限学习机作为分类器进行簇中的特征基因分类性能评价,得到一个备选基因库;最后,采用基于微粒群优化和极限学习机的缠绕法从备选基因库中选择具有最大分类率、最小数目的基因子集。所选出的基因具有良好的分类性能。在两个公开的微阵列数据集上的实验结果表明,相对于一些经典的方法,新方法能够以较少的基因获得更高的分类性能。相似文献

18.

Soft computing models based feature selection for TRUS prostate cancer image classification

K. Thangavel R. Manavalan 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(6):1165-1176

Ultrasound imaging is the most suitable method for early detection of prostate cancer. It is very difficult to distinguish benign and malignant nature of the affliction in the early stage of cancer. This is reflected in the high percentage of unnecessary biopsies that are performed and many deaths caused by late detection or misdiagnosis. A computer based classification system can provide a second opinion to the radiologists. Generally, objects are described in terms of a set of measurable features in pattern recognition. The selection and quality of the features representing each pattern will have a considerable bearing on the success of subsequent pattern classification. Feature selection is a process of selecting the most wanted or dominating features set from the original features set in order to reduce the cost of data visualization and increasing classification efficiency and accuracy. The region of interest (ROI) is identified from transrectal ultrasound (TRUS) images using DBSCAN clustering with morphological operators after image enhancement using M3-filter. Then the 22 grey level co-occurrence matrix features are extracted from the ROIs. Soft computing model based feature selection algorithms genetic algorithm (GA), ant colony optimization (ACO) and QR are studied. In this paper, QR-ACO (hybridization of rough set based QR and ACO) and GA-ACO (hybridization GA and ACO) are proposed for reducing feature set in order to increase the accuracy and efficiency of the classification with regard to prostate cancer. The selected features may have the best discriminatory power for classifying prostate cancer based on TRUS images. Support vector machine is tailored for evaluation of the proposed feature selection methods through classification. Then, the comparative analysis is performed among these methods. Experimental results show that the proposed method QR-ACO produces significant results. Number of features selected using QR-ACO algorithm is minimal, is successful and has high detection accuracy. 相似文献

19.

Classification of human cancer diseases by gene expression profiles

《Applied Soft Computing》2017

A cancers disease in virtually any of its types presents a significant reason behind death surrounding the world. In cancer analysis, classification of varied tumor types is of the greatest importance. Microarray gene expressions datasets investigation has been seemed to provide a successful framework for revising tumor and genetic diseases. Despite the fact that standard machine learning ML strategies have effectively been valuable to realize significant genes and classify category type for new cases, regular limitations of DNA microarray data analysis, for example, the small size of an instance, an incredible feature number, yet reason for limitation its investigative, medical and logical uses. Extending the interpretability of expectation and forecast approaches while holding a great precision would help to analysis genes expression profiles information in DNA microarray dataset all the most reasonable and proficiently. This paper presents a new methodology based on the gene expression profiles to classify human cancer diseases. The proposed methodology combines both Information Gain (IG) and Standard Genetic Algorithm (SGA). It first uses Information Gain for feature selection, then uses Genetic Algorithm (GA) for feature reduction and finally uses Genetic Programming (GP) for cancer types’ classification. The suggested system is evaluated by classifying cancer diseases in seven cancer datasets and the results are compared with most latest approaches. The use of proposed system on cancers datasets matching with other machine learning methodologies shows that no classification technique commonly outperforms all the others, however, Genetic Algorithm improve the classification performance of other classifiers generally. 相似文献

20.

Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data

Pashaei Elham Pashaei Elnaz 《The Journal of supercomputing》2022,78(13):15598-15637

Gene expression data play a significant role in the development of effective cancer diagnosis and prognosis techniques. However, many redundant, noisy, and irrelevant genes (features) are present in the data, which negatively affect the predictive accuracy of diagnosis and increase the computational burden. To overcome these challenges, a new hybrid filter/wrapper gene selection method, called mRMR-BAOAC-SA, is put forward in this article. The suggested method uses Minimum Redundancy Maximum Relevance (mRMR) as a first-stage filter to pick top-ranked genes. Then, Simulated Annealing (SA) and a crossover operator are introduced into Binary Arithmetic Optimization Algorithm (BAOA) to propose a novel hybrid wrapper feature selection method that aims to discover the smallest set of informative genes for classification purposes. BAOAC-SA is an enhanced version of the BAOA in which SA and crossover are used to help the algorithm in escaping local optima and enhancing its global search capabilities. The proposed method was evaluated on 10 well-known microarray datasets, and its results were compared to other current state-of-the-art gene selection methods. The experimental results show that the proposed approach has a better performance compared to the existing methods in terms of classification accuracy and the minimum number of selected genes.

相似文献