首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 185 毫秒
1.
基因表达分析中的微阵列数据具有高维、高冗余的特点,给基因表达数据分类带来很大的困难。机器学习中的最小二乘支持向量机算法具有计算效率高的优势,从而为数据挖掘提供了一条有效途径。针对两类典型的癌症微阵列数据集(结肠癌集和白血病集),进行归一化预处理并且计算其相关系数矩阵;使用主成分分析法进行降维处理,得到用于特征选取和分类的信息基因集(各取10个基因);采用最小二乘支持向量机分类器对信息基因集进行分类。实验结果表明,该算法在两类癌症数据集上的留一交叉检验的准确率分别为97.5%和100%,具有比其他分类器都高的测试准确率,为进一步医学临床应用提供可靠的诊断依据。  相似文献   

2.
SVM在基因微阵列癌症数据分类中的应用   总被引:1,自引:0,他引:1  
在总结二分类支持向量机应用的基础上,提出了利用t-验证方法和Wilcoxon验证方法进行特征选取,以支持向量机(SVM)为分类器,针对基因微阵列癌症数据进行分析的新方法,通过对白血病数据集和结肠癌数据集的分类实验,证明提出的方法不但识别率高,而且需要选取的特征子集小,分类速度快,提高了分类的准确性与分类速度。  相似文献   

3.
基于支持向量机的肿瘤分类特征基因选取   总被引:19,自引:0,他引:19  
依据基因表达谱有效建立肿瘤分类模型的关键在于准确找出决定样本类别的一组特征基因.针对该问题,在分析肿瘤基因表达谱特征的基础上,研究了肿瘤分类特征基因选取问题.首先,提出了一种新的类别可分性判据以滤除分类无关基因,并采用支持向量机作为分类器进行特征基因分类性能的检验.然后,采用两两冗余分析及基于支持向量机分类模型的灵敏度分析法进行冗余基因的剔除.以急性白血病亚型分类特征基因选取为例进行实验,结果表明了上述方法的可行性和有效性.  相似文献   

4.
刘青  周鹏 《计算机工程》2005,31(3):189-191
DNA微阵列技术使人们可同时观测成千上万个基因的表达水平,对其数据的分析已成为生物信息学研究的焦点。针对微阵列基因表达数据维数高、样本小、非线性的特点,设计并实现了一种基因表达数据分类识别方法,针对结肠数据集的实验表明其泛化效果有所增强。  相似文献   

5.
晁浩  阮晓钢 《计算机工程与应用》2005,41(31):178-179,204
基于肿瘤基因表达数据,运用信息科学的方法和技术建立肿瘤的预测分类模型,对肿瘤的识别具有重要意义。针对该类问题,论文提出了一种利用支持向量机进行肿瘤分类与判别的方法。该方法在分析基因表达谱特征的基础上,首先对所有的基因进行聚类,并挑选出每一类的“代表基因”作为特征基因,然后采用支持向量机作为分类器进行肿瘤分类。论文以前列腺癌的基因表达谱数据为例进行分类实验并取得了良好的分类结果,表明了该方法的有效性和可行性。  相似文献   

6.
周鹏 《计算机工程与设计》2005,26(11):2966-2968,2974
基于DNA微阵列实验,可以同时观测成千上万个基因的表达水平,使得人们能够在基因组水平上以系统的、全局的观念去研究生命现象及本质。支持向量机作为一种新的机器学习方法,最近几年在生物信息学领域得到了广泛的研究,在许多情况下,支持向量机取得了优于或接近其它方法的性能。就支持向量机在DNA微阵列的应用做一综述。  相似文献   

7.
用于微阵列分类的Huberized多类支持向量机   总被引:2,自引:0,他引:2  
提出了一种能同时进行基因选择和微阵列分类的新型多类支持向量机. 通过结合huberized hinge 损失函数与弹性网络惩罚, 所提支持向量机能自动地进行基因选择并激励一种群体效应. 所提支持向量机的系数路关于单正则化参数是分段线性的, 并基于此发展了解路算法, 减少了计算的复杂性. 白血病数据集上的实验验证了所提方法的有效性.  相似文献   

8.
基于微阵列表达数据,探索新的有效特征提取和分类方法。采用小波多分辩率分析方法提取基因表达的特征,利用支持向量机和BP神经网络方法进行分类。基因表达具有明显的多尺度特征,分类率最大达到98.61%,结果稳定。采用多尺度理论对基因表达数据进行分析是一种新的有效的生物信息学方法,值得进一步探索与研究。  相似文献   

9.
为了从高维、小样本的基因表达数据中有效地选择特征基因,消除与肿瘤分类无关的数据,提出一种随机矩阵替换与支持向量机的肿瘤信息基因选择算法(RD-SVM)。首先构建多组0/1随机向量表示的信息基因子集,并以支持向量机构建分类器评价每组子集的优劣,然后考虑各特征之间的相互作用,以0、1替换策略对基因子集评估,并找到最优基因子集,最后采用5个肿瘤信息基因表达谱数据对算法性能进行测试。结果表明,相对于参比算法,RD-SVM算法不仅提高了肿瘤信息基因的识别精度,同时所选信息基因最少。  相似文献   

10.
基于遗传算法和支持向量机的肿瘤分子分类   总被引:1,自引:0,他引:1  
提出了一种基于遗传算法(GA)和支持向量机(SVM)的用于肿瘤分子分类和特征基因选择的新方法。该方法针对基因表达数据样本少维数高的特点,先根据基因的散乱度滤掉大量分类无关基因,而后使用相关性分析去除分类冗余基因,得到一个候选基因子集,用遗传算法搜索候选特征基因空间,发现在支持向量机分类器上具有好的分类性能的且含基因个数较少的特征子集。把这种GA/SVM方法应用到结肠癌和急性白血病基因表达谱,能选出多个取得较高分类精度的较小基因子集,实验结果表明了该方法的有效性。  相似文献   

11.
Investigation of genes, using data analysis and computer-based methods, has gained widespread attention in solving human cancer classification problem. DNA microarray gene expression datasets are readily utilized for this purpose. In this paper, we propose a feature selection method using improved regularized linear discriminant analysis technique to select important genes, crucial for human cancer classification problem. The experiment is conducted on several DNA microarray gene expression datasets and promising results are obtained when compared with several other existing feature selection methods.  相似文献   

12.
Identifying differentially expressed genes in microarray data has been studied extensively and several methods have been proposed. Most popular methods in the study of gene expression microarray data analysis rely on normal distribution assumption and are based on a Wald statistic. These methods may be inefficient when expression levels follow a skewed distribution. To deal with possible violations of the normality assumption, we propose a method based on Generalized Logistic Distribution of Type II (GLDII). The motivation behind this distributional assumption is to allow longer tails than normal distribution. This is important in analyzing gene expression data since extreme values are common in such experiments. The shape parameter for GLDII allows flexibility in modeling a wide range of distributions. To simplify the computational complexity involved in carrying out Likelihood Ratio (LR) tests for several thousands of genes, an Approximate LR Test (ALRT) is proposed. We also generalize the two-class ALRT method to multi-class microarray data. The performance of the ALRT method under the GLDII assumption is compared to methods based on Wald-type statistics using simulation. The results from the simulations show that our method performs quite well compared to the significance analysis of microarrays (SAM) approach using standardized Wilcoxon rank statistics and the empirical Bayes (E-B) t-statistics. Our method is also less sensitive to extreme values. We illustrate our method using two publicly available gene expression data sets.  相似文献   

13.
This paper combines a powerful algorithm, called Dongguang Li (DGL) global optimization, with the methods of cancer diagnosis through gene selection and microarray analysis. A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is proposed and applied to two test cancer cases, colon and leukemia. The study attempts to analyze multiple sets of genes simultaneously, for an overall global solution to the gene??s joint discriminative ability in assigning tumors to known classes. With the workable concepts and methodologies described here an accurate classification of the type and seriousness of cancer can be made. Using the orthogonal arrays for sampling and a search space reduction process, a computer program has been written that can operate on a personal laptop computer. Both the colon cancer and the leukemia microarray data can be classified 100% correctly without previous knowledge of their classes. The classification processes are automated after the gene expression data being inputted. Instead of examining a single gene at a time, the DGL method can find the global optimum solutions and construct a multi-subsets pyramidal hierarchy class predictor containing up to 23 gene subsets based on a given microarray gene expression data collection within a period of several hours. An automatically derived class predictor makes the reliable cancer classification and accurate tumor diagnosis in clinical practice possible.  相似文献   

14.
Dimensionality reduction has been applied in the most different areas, among which the data analysis of gene expression obtained with the microarray approach. The data involved in this problem is challenging for machine learning algorithms due to a small number of samples and a high number of attributes. This paper proposes a preprocessing phase by means of attribute selection and random projection method in microarray data. Experimental results are promising and show that the use of these methods improves the performance of classification algorithms.  相似文献   

15.
肿瘤基因表达谱分类特征基因选取问题及分析方法研究   总被引:18,自引:1,他引:18  
对肿瘤分类特征基因选取问题的研究是发现肿瘤特异表达基因、研究肿瘤基因表达模式的重要手段,文中基于多类别肿瘤基因表达谱数据集,从研究肿瘤与正常组织的分类入手,对肿瘤分类特征基因选取问题进行分析和研究,首先对基于Relief算法的特征选取策略加以改进生成候选特征集合;然后以支持向量机作为分类器对其分类性能进行检验以选取分类特征基因;最后结合分类模型。利用灵敏度分析方法进行特征基因的精确搜索以滤除冗余,基于该方法文中选出了52个具有良好分类性能的特征基因作为肿瘤的基因特征,并对其表达行为进行了简要分析。  相似文献   

16.
基因选择是基因表达数据分析中的重点问题.然而现有的方法没有综合考虑样本不平衡和基因间的相互作用。借鉴聚类的验证技术提出了基因选择的0-1规划模型,同时考虑了样本不平衡和基因间的相互作用。进一步根据0-1规划模型的特点,给出了基于贪心思想的启发式算法来求解所提出的优化问题。在3个真实的基因表达数据上对提出的方法进行测试并与两个对照的方法比较,结果表明所提出模型和算法是有效的且稳健的。  相似文献   

17.
基因表达数据的聚类分析研究进展   总被引:4,自引:1,他引:3  
基因表达数据的爆炸性增长迫切需求自动、有效的数据分析工具. 目前聚类分析已成为分析基因表达数据获取生物学信息的有力工具. 为了更好地挖掘基因表达数据, 近年来提出了许多改进的传统聚类算法和新聚类算法. 本文首先简单介绍了基因表达数据的获取和表示, 之后系统地介绍了近年来应用在基因表达数据分析中的聚类算法. 根据聚类目标的不同将算法分为基于基因的聚类、基于样本的聚类和两路聚类, 并对每类算法介绍了其生物学的含义及其难点, 详细讨论了各种算法的基本原理及优缺点. 最后总结了当前的基因表达数据的聚类分析方法,并对发展趋势作了进一步的展望.  相似文献   

18.
Microarrays are capable of detecting the expression levels of thousands of genes simultaneously. So, gene expression data from DNA microarray are characterized by many measured variables (genes) on only a few samples. One important application of gene expression data is to classify the samples. In statistical terms, the very large number of predictors or variables compared to small number of samples makes most of classical “class prediction” methods unemployable. Generally, this problem can be avoided by selecting only the relevant features or extracting new features containing the maximal information about the class label from the original data. In this paper, a new method for gene selection based on independent variable group analysis is proposed. In this method, we first used t-statistics method to select a part of genes from the original data. Then, we selected the key genes from the selected genes for tumor classification using IVGA. Finally, we used SVM to classify tumors based on the key genes selected using IVGA. To validate the efficiency, the proposed method is applied to classify three different DNA microarray data sets. The prediction results show that our method is efficient and feasible.  相似文献   

19.
Cluster analysis of DNA microarray data is an important but difficult task in knowledge discovery processes. Many clustering methods are applied to analysis of data for gene expression, but none of them is able to deal with an absolute way with the challenges that this technology raises. Due to this, many applications have been developed for visually representing clustering algorithm results on DNA microarray data, usually providing dendrogram and heat map visualizations. Most of these applications focus only on the above visualizations, and do not offer further visualization components to the validate the clustering methods or to validate one another. This paper proposes using a visual analytics framework in cluster analysis of gene expression data. Additionally, it presents a new method for finding cluster boundaries based on properties of metric spaces. Our approach presents a set of visualization components able to interact with each other; namely, parallel coordinates, cluster boundary genes, 3D cluster surfaces and DNA microarray visualizations as heat maps. Experimental results have shown that our framework can be very useful in the process of more fully understanding DNA microarray data. The software has been implemented in Java, and the framework is publicly available at http://www.analiticavisual.com/jcastellanos/3DVisualCluster/3D-VisualCluster.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号