首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 218 毫秒
1.
基于基因表达谱对组织样本进行分类,在疾病诊断领域,是个非常重要的研究课题。在基因表达数据中,基因的数量(几千个)相对于数据样本(几十个)的个数通常比较多;也就是说,数据的维数相比于数据点的个数来说比较高(这个就是采样不足问题)。过高的维数(特征或基因数)将给分类问题带来极大的挑战。提出了结合非相关线性判别式分析方法(ULDA)和支持向量机(SVM)分类算法,对结肠癌组织样本进行分类识别,并同其他方法作了比较研究,分类效果得到了提高;结果表明了该方法的可行性和有效性。  相似文献   

2.
基因(特征)数远大于条件(样本)数,基因表达数据中往往存在大量噪声,并且生物学或医学工作者期 望能从大量的基因中挑选出与疾病诊断有关的标志基因,因此,应用基因表达数据进行疾病分类预测的关键环 节是基因选择。目前常用的方法有过滤法和缠绕法。结合过滤法和缠绕法的优点,提出基因选择的多目标分布 估计算法(MOEDA)。首先通过打分函数确定MOEDA的候选基因集合,在确定候选基因后,MOEDA通过对 KNN分类器的多个性能指标及基因数目等多个目标进行优化,从候选基因中选取综合区分能力最强的特征基 因子集  相似文献   

3.
PLS和SVM应用于基因表达数据分类   总被引:4,自引:3,他引:4  
基因表达数据的一个重要应用是给疾病样本分类,如鉴别肿瘤的类型。基因芯片的蓬勃发展使得同时测定成千上万个基因的表达成为可能。这种测定能力使得我们在很短的时间内可以得到变量数p(基因数)远远大于样本数N的数据矩阵。标准的分类统计方法在N相似文献   

4.
谢芬 《福建电脑》2010,26(3):2-3
提出了一种基于遗传算法(GA)优化支持向量机(SVM)分类决策树的用于肿瘤基因分类的新方法。该方法针对基因表达数据样本少维数高的特点,采用了支持向量机分类间隔作为遗传算法适应度函数。利用遗传算法在每一决策树结点自动选择最优或近优的分类决策,实现了对决策树的优化。试验结果表明,在样本有限的情况下,与传统的方法相比,该方法比单个决策树算法具有更高的分类精度。  相似文献   

5.
基于代表熵的基因表达数据聚类分析方法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对基因表达数据样本少,维数高的特点,尤其是在样本分型缺乏先验知识的情况下,结合自组织特征映射的优点提出了基于代表熵的双向聚类算法。该算法首先通过自组织特征映射网络(SOM)对基因聚类,根据波动系数挑选特征基因。然后根据代表熵的大小判断基因聚类的好坏,并确定网络的神经元个数。最后采用FCM(Fuzzy C Means)聚类算法对挑选出的特征基因集进行样本分型。将该算法用于两组公开的基因表达数据集,实验结果表明该算法在降低特征维数的同时,得出了较高的聚类准确率。  相似文献   

6.
近年来,基于基因表达谱的肿瘤分类问题引起了广泛关注,为癌症的精确诊断及分型提供了极大的便利.然而,由于基因表达谱数据存在样本数量小、维数高、噪声大及冗余度高等特点,给深入准确地挖掘基因表达谱中所蕴含的生物医学知识和肿瘤信息基因选择带来了极大困难.文中提出一种基于迭代Lasso的信息基因选择方法,以获得基因数量少且分类能力较强的信息基因子集.该方法分为两层:第一层采用信噪比指标衡量基因的重要性,以过滤无关基因;第二层采用改进的Lasso方法进行冗余基因的剔除.实验采用5个公开的肿瘤基因表达谱数据集验证了本文方法的可行性和有效性,与已有的信息基因选择方法相比具有更好的分类性能.  相似文献   

7.
利用加入了分类指导信息的ICA(Guide Independent components analysis,G-ICA),在已知样本中提取隐藏在样本基因表达数据中与组织分类密切相关的各种表达模式,根据这些模式对未知组织样本进行分类。试验结果表明,该方法提高了组织样本的分类能力,其计算复杂度低、收敛快,具有较强的稳定性。  相似文献   

8.
为了对基因表达数据矩阵中的肿瘤基因与正常基因进行判别分类,文章提出了基于支持向量机(Supporting Vector Machine,SVM)的肿瘤基因识别方法。在对基因进行特征选择的基础上,对只具有最优特征的基因样本再利用SVM分类思想进行判别,最后通过与其他方法所得结果进行对比可知,该方案在不降低分类准确度的同时,能有效地避免特征空间维数远大于样本空间维数所造成的“过学习”问题,而且避免了大的时空开销,具有很强的实用性。  相似文献   

9.
基于主成份分析的肿瘤分类检测算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
基于基因表达谱的肿瘤诊断方法有望成为临床医学上一种快速而有效的诊断方法,但由于基因表达数据存在维数过高、样本量很小以及噪音大等特点,使得提取与肿瘤有关的信息基因成为一件有挑战性的工作。因此,在分析了目前肿瘤分类检测所采用方法的基础上,本文提出了一种结合基因特征记分和主成份分析的混合特征抽取方法。实验表明明,这种方法能够有效地提取分类特征信息,并在保持较高的肿瘤识别准确率的前提下大幅度地降低基因表达数据的维数,使得分类器性能得到很大提高。实验采用了两种与肿瘤有关的基因表达数据集来验证这种混合特征抽取方法的有效性,采用支持向量机的分类实验结果表明,所提出的混合方法不仅交叉验证识别准确率高而且分类结果能够可
可视化。对于结肠癌组织样本集,其交叉验证识别准确率高这95.16%;而对于急性白血病组织样本集,其交叉验证识别准确率高这100%。  相似文献   

10.
对肿瘤基因表达谱进行分析,从而有效区分正常样本与肿瘤样本的关键是:准确找出能够决定样本类别的最少特征基因,并用一个性能较好的分类器进行分类预测。针对该问题,用修订的特征记分准则(RFSC)去除分类无关基因;对两两冗余法进行改进,提出强相关树法用于冗余基因的去除;对粗糙支持向量机(RSVM)改进,提出近似等价粗糙支持向量机(AE-RSVM)对样本集进行分类测试。以肿瘤样本集为例进行测试,实验结果表明了提出方法的可行性和有效性。  相似文献   

11.
常用的排列方法从DNA微数据中选择的基因集合往往会包含相关性较高的基因,而且使用单个基因评价方法也不能真正反映由此得到的特征集合分类能力的优劣。另外,基因数量远多于样本数量是进行疾病诊断面临的又一挑战。为此,提出一种DNA微阵列数据特征提取方法用于组织分类。该方法运用K-means方法对基因进行聚类分析,获取各子类DNA微阵列数据中心,用排列法去除对分类无关的子类,然后利用ICA方法提取剩余子类集合的特征,用SVMs方法构造分类器对组织进行分类。真实的生物学数据实验表明,该方法通过提取一种复合基因,能综合评价基因分类能力,减少特征数,提高分类器的分类准确性。  相似文献   

12.
Microarrays are capable of detecting the expression levels of thousands of genes simultaneously. So, gene expression data from DNA microarray are characterized by many measured variables (genes) on only a few samples. One important application of gene expression data is to classify the samples. In statistical terms, the very large number of predictors or variables compared to small number of samples makes most of classical “class prediction” methods unemployable. Generally, this problem can be avoided by selecting only the relevant features or extracting new features containing the maximal information about the class label from the original data. In this paper, a new method for gene selection based on independent variable group analysis is proposed. In this method, we first used t-statistics method to select a part of genes from the original data. Then, we selected the key genes from the selected genes for tumor classification using IVGA. Finally, we used SVM to classify tumors based on the key genes selected using IVGA. To validate the efficiency, the proposed method is applied to classify three different DNA microarray data sets. The prediction results show that our method is efficient and feasible.  相似文献   

13.
A microarray machine offers the capacity to measure the expression levels of thousands of genes simultaneously. It is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for cancer classification. However, the urgent problems in the use of gene expression data are the availability of a huge number of genes relative to the small number of available samples, and the fact that many of the genes are not relevant to the classification. It has been shown that selecting a small subset of genes can lead to improved accuracy in the classification. Hence, this paper proposes a solution to the problems by using a multiobjective strategy in a genetic algorithm. This approach was tried on two benchmark gene expression data sets. It obtained encouraging results on those data sets as compared with an approach that used a single-objective strategy in a genetic algorithm. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

14.
基于支持向量机的肿瘤分类特征基因选取   总被引:19,自引:0,他引:19  
依据基因表达谱有效建立肿瘤分类模型的关键在于准确找出决定样本类别的一组特征基因.针对该问题,在分析肿瘤基因表达谱特征的基础上,研究了肿瘤分类特征基因选取问题.首先,提出了一种新的类别可分性判据以滤除分类无关基因,并采用支持向量机作为分类器进行特征基因分类性能的检验.然后,采用两两冗余分析及基于支持向量机分类模型的灵敏度分析法进行冗余基因的剔除.以急性白血病亚型分类特征基因选取为例进行实验,结果表明了上述方法的可行性和有效性.  相似文献   

15.
In order to select a small subset of informative genes from gene expression data for cancer classification, many researchers have recently analyzed gene expression data using various computational intelligence methods. However, due to the small number of samples compared with the huge number of genes (high-dimension), irrelevant genes, and noisy genes, many of the computational methods face difficulties in selecting such a small subset. Therefore, we propose an enhancement of binary particle swarm optimization to select the small subset of informative genes that is relevant for classifying cancer samples more accurately. In this method, three approaches have been introduced to increase the probability of the bits in a particle’s position being zero. By performing experiments on two gene expression data sets, we have found that the performance of the proposed method is superior to previous related works, including the conventional version of binary particle swarm optimization (BPSO), in terms of classification accuracy and the number of selected genes. The proposed method also produces lower running times compared with BPSO.  相似文献   

16.
The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.  相似文献   

17.
In a DNA microarray dataset, gene expression data often has a huge number of features(which are referred to as genes) versus a small size of samples. With the development of DNA microarray technology, the number of dimensions increases even faster than before, which could lead to the problem of the curse of dimensionality. To get good classification performance, it is necessary to preprocess the gene expression data. Support vector machine recursive feature elimination (SVM-RFE) is a classical method for gene selection. However, SVM-RFE suffers from high computational complexity. To remedy it, this paper enhances SVM-RFE for gene selection by incorporating feature clustering, called feature clustering SVM-RFE (FCSVM-RFE). The proposed method first performs gene selection roughly and then ranks the selected genes. First, a clustering algorithm is used to cluster genes into gene groups, in each which genes have similar expression profile. Then, a representative gene is found to represent a gene group. By doing so, we can obtain a representative gene set. Then, SVM-RFE is applied to rank these representative genes. FCSVM-RFE can reduce the computational complexity and the redundancy among genes. Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM-RFE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号