共查询到20条相似文献,搜索用时 15 毫秒
1.
Cancer classification is one of the major applications of the microarray technology. When standard machine learning techniques are applied for cancer classification, they face the small sample size (SSS) problem of gene expression data. The SSS problem is inherited from large dimensionality of the feature space (due to large number of genes) compared to the small number of samples available. In order to overcome the SSS problem, the dimensionality of the feature space is reduced either through feature selection or through feature extraction. Linear discriminant analysis (LDA) is a well-known technique for feature extraction-based dimensionality reduction. However, this technique cannot be applied for cancer classification because of the singularity of the within-class scatter matrix due to the SSS problem. In this paper, we use Gradient LDA technique which avoids the singularity problem associated with the within-class scatter matrix and shown its usefulness for cancer classification. The technique is applied on three gene expression datasets; namely, acute leukemia, small round blue-cell tumour (SRBCT) and lung adenocarcinoma. This technique achieves lower misclassification error as compared to several other previous techniques. 相似文献
2.
Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced. 相似文献
3.
Microarray gene expression profile shall be exploited for the efficient and effective classification of cancers. This is a computationally challenging task because of large quantity of genes and relatively small amount of experiments in gene expression data. The repercussion of this work is to devise a framework of techniques based on supervised machine learning for discrimination of acute lymphoblastic leukemia and acute myeloid leukemia using microarray gene expression profiles. Artificial neural network (ANN) technique was employed for this classification. Moreover, ANN was compared with other five machine learning techniques. These methods were assessed on eight different classification performance measures. This article reports a significant classification accuracy of 98% using ANN with no error in identification of acute lymphoblastic leukemia and only one error in identification of acute myeloid leukemia on tenfold cross-validation and leave-one-out approach. Furthermore, models were validated on independent test data, and all samples were correctly classified. 相似文献
4.
在总结二分类支持向量机应用的基础上,提出了利用t-验证方法和Wilcoxon验证方法进行特征选取,以支持向量机(SVM)为分类器,针对基因微阵列癌症数据进行分析的新方法,通过对白血病数据集和结肠癌数据集的分类实验,证明提出的方法不但识别率高,而且需要选取的特征子集小,分类速度快,提高了分类的准确性与分类速度。 相似文献
5.
Monitoring gene expression profiles is a novel approach to cancer diagnosis. Several studies have showed that the sparse logistic regression is a useful classification method for gene expression data. Not only does it give a sparse solution with high accuracy, it provides the user with explicit probabilities of classification apart from the class information. However, its optimal extension to more than two classes is not obvious. In this paper, we propose a multiclass extension of sparse logistic regression. Analysis of five publicly available gene expression data sets shows that the proposed method outperforms the standard multinomial logistic model in prediction accuracy as well as gene selectivity. 相似文献
6.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnosis and classification platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression data which will maximize the classification accuracy. A model for gene selection and classification has been developed by using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifier. We show that the classification accuracy of the proposed model is useful for the cancer classification of one widely used gene expression benchmark data set. 相似文献
7.
基于微阵列表达数据,探索新的有效特征提取和分类方法。采用小波多分辩率分析方法提取基因表达的特征,利用支持向量机和BP神经网络方法进行分类。基因表达具有明显的多尺度特征,分类率最大达到98.61%,结果稳定。采用多尺度理论对基因表达数据进行分析是一种新的有效的生物信息学方法,值得进一步探索与研究。 相似文献
8.
Microarray data has significant potential in clinical medicine, which always owns a large quantity of genes relative to the samples’ number. Finding a subset of discriminatory genes (features) through intelligent algorithms has been trend. Based on this, building a disease prognosis expert system will bring a great effect on clinical medicine. In addition, the fewer the selected genes are, the less cost the disease prognosis expert system is. So the small gene set with high classification accuracy is what we need. In this paper, a multi-objective model is built according to the analytic hierarchy process (AHP), which treats the classification accuracy absolutely important than the number of selected genes. And a multi-objective heuristic algorithm called MOEDA is proposed to solve the model, which is an improvement of Univariate Marginal Distribution Algorithm. Two main rules are designed, one is ’Higher and Fewer Rule’ which is used for evaluating and sorting individuals and the other is ‘Forcibly Decrease Rule’ which is used for generate potential individuals with high classification accuracy and fewer genes. Our proposed method is tested on both binary-class and multi-class microarray datasets. The results show that the gene set selected by MOEDA not only results in higher accuracies, but also keep a small scale, which cannot only save computational time but also improve the interpretability and application of the result with the simple classification model. The proposed MOEDA opens up a new way for the heuristic algorithms applying on microarray gene expression data. 相似文献
9.
Cancer class prediction and discovery is beneficial to imperfect non-automated cancer diagnoses which affect patient cancer treatments. Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is expected to contribute significantly to the progress in cancer treatment by enabling an automatic, precise and early diagnosis. A promising application of SAGE gene expression data is classification of cancers. In this paper, we build three event models (the multivariate Bernoulli model, the multinomial model and the normalized multinomial model) for SAGE gene expression profiles. The event models based methods are compared with the standard Naïve Bayes method. Both binary classification and multicategory classification are investigated. Experiments results on several SAGE datasets show that event models are better than standard Naïve Bayes in general. Normalized Information Gain (NIG), an extension of Information Gain (IG), is proposed for gene selection. The impact of gene correlation on the classification performance is investigated. 相似文献
10.
Pattern Analysis and Applications - A typical microarray dataset usually contains thousands of genes, but only a small number of samples. It is in fact that most genes in a DNA microarray dataset... 相似文献
11.
High dimension low sample size data, like the microarray gene expression levels, pose numerous challenges to conventional statistical methods. In the particular case of binary classification, some classification methods, such as the support vector machine (SVM), can efficiently deal with high-dimensional predictors, but lacks the accuracy in estimating the probability of membership of a class. In contrast, the traditional logistic regression (TLR) effectively estimates the probability of class membership for data with low-dimensional inputs, but does not handle high-dimensional cases. The study bridges the gap between SVM and TLR by their loss functions. Based on the proposed new loss function, a pseudo-logistic regression and classification approach which simultaneously combines the strengths of both SVM and TLR is also proposed. Simulation evaluations and real data applications demonstrate that for low-dimensional data, the proposed method produces regression estimates comparable to those of TLR and penalized logistic regression, and that for high-dimensional data, the new method possesses higher classification accuracy than SVM and, in the meanwhile, enjoys enhanced computational convergence and stability. 相似文献
12.
Molecular level diagnostics based on microarray technologies can offer the methodology of precise, objective, and systematic
cancer classification. Genome-wide expression patterns generally consist of thousands of genes. It is desirable to extract
some significant genes for accurate diagnosis of cancer because not all genes are associated with a cancer. In this paper,
we have used representative gene vectors that are highly discriminatory for cancer classes and extracted multiple significant
gene subsets based on those representative vectors respectively. Also, an ensemble of neural networks learned from the multiple
significant gene subsets is proposed to classify a sample into one of several cancer classes. The performance of the proposed
method is systematically evaluated using three different cancer types: Leukemia, colon, and B-cell lymphoma. 相似文献
14.
在生命科学中,需要对物种及基因进行分类,以获得对种群固有结构的认识。利用数据聚类方法,有效地辨别/识别基因表示数据的模式,对它们进行分类。将特征相似性大的归为一类,特征相异性大的归为不同类。这对于研究基因的结构、功能、以及不同种类基因之间的关系都具有重要意义。利用图论的方法对分子生物学中基因表示数据进行初始聚类,然后再结合别的算法,如K-近邻自学习聚类算法或基于中心点的自学习聚类算法,对其进一步求精。对于某种聚类判别准则,能够产生全局最优簇。最后对算法进行了分析和讨论,并用模拟数据进行了实验验证。 相似文献
15.
This paper presents two new approaches of spatio-temporal data classification using complex-valued neural networks. First approach uses extended complex-valued back-propagation algorithm to train MLP network, whose output’s amplitudes are encoded in one-of-N coding. It makes a classification decision based on accumulated distance between network output and trained pattern. The second approach is inspired in RBF networks with two layer architecture. Neurons from the first layer have fixed position in space and time encoded into theirs weights. This layer is trained by presented extension of neural gas algorithm into complex numbers. The second layer affects which neurons from the first layer belong to specific class. Paper contains details on experimenting with proposed approaches on artificial data of hand-written character recognition and comparison of both methods. 相似文献
16.
Microarray technology presents a challenge due to the large dimensionality of the data, which can be difficult to interpret. To address this challenge, the article proposes a feature extraction-based cancer classification technique coupled with artificial bee colony optimization (ABC) algorithm. The ABC-support vector machine (SVM) method is used to classify the lung cancer datasets and compared them with existing techniques in terms of precision, recall, F-measure, and accuracy. The proposed ABC-SVM has the advantage of dealing with complex nonlinear data, providing good flexibility. Simulation analysis was conducted with 30% of the data reserved for testing the proposed method. The results indicate that the proposed attribute classification technique, which uses fewer genes, performs better than other modalities. The classifiers, such as naïve Bayes, multi-class SVM, and linear discriminant analysis, were also compared and the proposed method outperformed these classifiers and state-of-the-art techniques. Overall, this study demonstrates the potential of using intelligent algorithms and feature extraction techniques to improve the accuracy of cancer diagnosis using microarray gene expression data. 相似文献
17.
为解决大规模基因调控网络构建算法精度不高、计算时间过长的问题,提出一种从基因表达数据分析出发,并行计算和阈值限定相结合的新算法来构建大规模基因调控网络。该算法中基因间交互强度值采用条件互信息值度量,并行计算采用GPU与CPU相结合的CUDA与Open MP架构。综合数据集的运行结果证明该算法较新的构建算法(如贝叶斯模型算法和微分方程模型算法)相比,在构建大规模基因调控网络时有更高的运算精度和更短的运行时间。 相似文献
18.
Since most biological systems are developmental and dynamic, time-course gene expression profiles provide an important characterization of gene functions. Assigning functions for genes with unknown functions based on time-course gene expressions is an important task in functional genomics. Recently, various methods have been proposed for the classification of gene functions based on time-course gene expression data. In this paper, we consider the classification of gene functions from functional data analysis viewpoint, where a functional support vector machine is adopted. The functional support vector machine can model temporal effects of time-course gene expression data by incorporating the coefficients as well as the basis matrix obtained from a finite expansion of gene expressions on a set of basis functions. We apply the functional support vector machine to both real microarray and simulated data. Our results indicate that the functional support vector machine is effective in discriminating gene functions of time-course gene expressions with predefined functions. The method also provides valuable functional information about interactions between genes and allows the assignment of new functions to genes with unknown functions. 相似文献
19.
Data gravitation-based classification model, a new physic law inspired classification model, has been demonstrated to be an effective classification model for both standard and imbalanced tasks. However, due to its large scale of gravitational computation during the feature weighting process, DGC suffers from high computational complexity, especially for large data sets. In this paper, we address the problem of speeding up gravitational computation using graphics processing unit (GPU). We design a GPU parallel algorithm namely GPU–DGC to accelerate the feature weighting process of the DGC model. Our GPU–DGC model distributes the gravitational computing process to parallel GPU threads, in order to compute gravitation simultaneously. We use 25 open classification data sets to evaluate the parallel performance of our algorithm. The relationship between the speedup ratio and the number of GPU threads is discovered and discussed based on the empirical studies. The experimental results show the effectiveness of GPU–DGC, with the maximum speedup ratio of 87 to the serial DGC. Its sensitivity to the number of GPU threads is also discovered in the empirical studies. 相似文献
20.
Data classification is usually based on measurements recorded at the same time. This paper considers temporal data classification where the input is a temporal database that describes measurements over a period of time in history while the predicted class is expected to occur in the future. We describe a new temporal classification method that improves the accuracy of standard classification methods. The benefits of the method are tested on weather forecasting using the meteorological database from the Texas Commission on Environmental Quality and on influenza using the Google Flu Trends database. 相似文献
|