首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
段旭 《计算机工程与设计》2011,32(11):3836-3839
一个微阵列数据集包含了成千上万的基因、相对少量的样本,而在这成千上万的基因中,只有一少部分基因对肿瘤分类是有贡献的,因此,对于肿瘤分类来说,最重要的一个问题就是识别选择出对肿瘤分类最有贡献的基因。为了能有效地进行微阵列基因选择,提出用一个边缘分布模型(marginal distribution model,MDM)来描述微阵列数据。该模型不仅能区分基因是否在两样本中差异表达,而且能区分出基因在哪一类样本中表达,从而选择出的基因更具有生物学意义。模拟数据及真实微阵列数据集上的实验结果表明,该方法能有效地进行微阵列基因选择。  相似文献   

2.
基因表达分析中的微阵列数据具有高维、高冗余的特点,给基因表达数据分类带来很大的困难。机器学习中的最小二乘支持向量机算法具有计算效率高的优势,从而为数据挖掘提供了一条有效途径。针对两类典型的癌症微阵列数据集(结肠癌集和白血病集),进行归一化预处理并且计算其相关系数矩阵;使用主成分分析法进行降维处理,得到用于特征选取和分类的信息基因集(各取10个基因);采用最小二乘支持向量机分类器对信息基因集进行分类。实验结果表明,该算法在两类癌症数据集上的留一交叉检验的准确率分别为97.5%和100%,具有比其他分类器都高的测试准确率,为进一步医学临床应用提供可靠的诊断依据。  相似文献   

3.
基于DNA微阵列数据的癌症分类问题研究进展   总被引:1,自引:1,他引:0  
应用DNA微阵列数据对癌症进行诊断与分型,已经逐渐成为生物信息学领域的研究热点之一。首先概述了基于微阵列数据的癌症分类问题的研究现状与发展趋势。然后简要介绍了微阵列实验的基本步骤,微阵列数据的结构、特点以及用于癌症分类的基本流程。接下来重点从数据预处理、特征基因选择、分类器设计以及分类性能评价等几方面对近10年来的研究成果进行了详细的综述与比较分析。最后,对该领域目前仍然存在的问题进行了归纳并对未来可能的研究方向作出了预测与展望。  相似文献   

4.
基于支持向量机的微阵列基因表达数据分析方法   总被引:5,自引:0,他引:5  
DNA微阵列技术,使人们可以同时观测成千上万个基因的表达水平,对其数据的分析已成为生物信息学研究的焦点.针对微阵列基因表达数据维数高、样本小、非线性的特点,设计了一种基于支持向量机的基因表达数据分类识别方法,该方法采用信噪比进行基因特征提取,运用支持向量机的不同核函数进行性能测试,针对几个典型数据集的实验表明其识别效果良好.  相似文献   

5.
杨昆  李建中  徐德昌  戴国骏 《软件学报》2010,21(9):2148-2160
提出集成分析来自相同研究问题的不同数据集来识别表达不稳定的基因.把这一问题形式化为一个非线性整数规划问题,三个启发式的算法被提出来求解这一优化问题;进一步地设计了一个统计量来度量基因的不稳定表达程度.提出的方法应用于两个真实数据,实验结果显示:所识别的不稳定基因在两个数据中的表达不一致;利用表达不稳定基因可以提高差异表达基因的筛选结果,而去除表达不稳定基因可以有效地提高微阵列数据分类.实验结果表明,提出的方法是有效的,并且表达不稳定基因可以为微阵列数据分析提供有价值的信息.  相似文献   

6.
特征选择和分类在数据挖掘中是非常重要的任务。特征选择将对结果影响较大的特征选择出来。让后期的机器学习变得简单而有效。分类能把数据库中的数据项映射到给定类别中的某一个,这种技术目前在商业上得到广泛应用。本文在特征选择、分类的背景和意义的基础上,将其应用于基因微阵列数据中。  相似文献   

7.
冯裕祺 《信息与电脑》2023,(6):65-69+73
随着技术的不断进步,大量的数据可以通过微阵列芯片等传感器收集到。癌症检测中,可以使用机器学习方法分析癌症微阵列数据。但是,机器学习方法在遇到超高维度的情况时表现不佳。文章提出使用Ball相关系数和Abess方法相结合的Ball-Abess方法来解决癌症微阵列数据遇到的超高维问题。与其他分类方法相比,利用该方法能够得到更好的结果。  相似文献   

8.
项婧  任劼 《计算机工程与设计》2006,27(15):2905-2908
近年来,需要深入研究癌症细胞的基因表达技术正在不断增多。机器学习算法已经被广泛用于当今世界的许多领域,但是却很少应用于生物信息领域。系统研究了决策树的生成、修剪的原理和算法以及其它与决策树相关的问题;并且根据CAMDA2000(critical assessment of mieroarray data analysis)提供的急性淋巴白血病(ALL)和急性骨髓白血病(AML)数据集,设计并实现了一个基于ID3算法的决策树分类器,并利用后剪枝算法简化决策树。最后通过实验验证算法的有效性,实验结果表明利用该决策树分类器对白血病微阵列实验数据进行判别分析,分类准确率很高,证明了决策树算法在医学数据挖掘领域有着广泛的应用前景。  相似文献   

9.
陈涛  洪增林  邓方安 《计算机科学》2014,41(10):291-294,316
DNA微阵列技术可以同时检测细胞内成千上万的基因的活性,被广泛应用于重大基因疾病的临床诊断。然而微阵列数据通常具有高维小样本特点,且存在大量噪声和冗余基因。为了进一步提高微阵列数据分类性能,提出一种特征基因混合选择算法。首先采用ReliefF算法剔除大量无关基因,获得特征基因候选子集;然后采用基于差分进化算法优化的邻域粗糙集模型实现特征基因选择;最后利用支持向量机进行分类,以验证算法的有效性。仿真实验结果表明,该算法能用尽可能少的特征基因来获得更高的分类精度,既增强了算法的泛化性能,又提高了时间效率,而且对致病基因的临床诊断有着重要的参考意义。  相似文献   

10.
分类问题是模式识别和数据挖掘领域中的重要内容,贝叶斯分类是一种经典的分类算法;LVQ神经网络在分类问题方面也有广泛的研究和应用;在实际分类应用中选择高效、稳定的分类算法非常重要。文章选用UCI机器学习仓库中的三个数据集,对贝叶斯分类和LVQ神经网络在分类问题上进行了分类正确率、稳定性、分类效率三方面的实验研究,实验结果表明,贝叶斯分类相比LVQ神经网络来说,具有更高的分类正确率、稳定性和分类效率,该结论为选择这两种分类算法提供了参考依据。  相似文献   

11.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnosis and classification platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression data which will maximize the classification accuracy. A model for gene selection and classification has been developed by using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifier. We show that the classification accuracy of the proposed model is useful for the cancer classification of one widely used gene expression benchmark data set.  相似文献   

12.
The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.  相似文献   

13.
The application of microarray data for cancer classification has recently gained in popularity. The main problem that needs to be addressed is the selection of a small subset of genes from the thousands of genes in the data that contribute to a disease. This selection process is difficult due to the availability of a small number of samples compared with the huge number of genes, many irrelevant genes, and noisy genes. Therefore, this article proposes an improved binary particle swarm optimization to select a near-optimal (small) subset of informative genes that is relevant for the cancer classification. Experimental results show that the performance of the proposed method is superior to the standard version of particle swarm optimization (PSO) and other previous related work in terms of classification accuracy and the number of selected genes.  相似文献   

14.
Gene expression technology, namely microarrays, offers the ability to measure the expression levels of thousands of genes simultaneously in biological organisms. Microarray data are expected to be of significant help in the development of an efficient cancer diagnosis and classification platform. A major problem in these data is that the number of genes greatly exceeds the number of tissue samples. These data also have noisy genes. It has been shown in literature reviews that selecting a small subset of informative genes can lead to improved classification accuracy. Therefore, this paper aims to select a small subset of informative genes that are most relevant for cancer classification. To achieve this aim, an approach using two hybrid methods has been proposed. This approach is assessed and evaluated on two well-known microarray data sets, showing competitive results. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

15.
Gene expression profiles are composed of thousands of genes at the same time, representing the complex relationships between them. One of the well-known constraints specifically related to microarray data is the large number of genes in comparison with the small number of available experiments or cases. In this context, the ability of design methods capable of overcoming current limitations of state-of-the-art algorithms is crucial to the development of successful applications. This paper presents gene -CBR, a hybrid model that can perform cancer classification based on microarray data. The system employs a case-based reasoning model that incorporates a set of fuzzy prototypes, a growing cell structure network and a set of rules to provide an accurate diagnosis. The hybrid model has been implemented and tested with microarray data belonging to bone marrow cases from forty-three adult patients with cancer plus a group of six cases corresponding to healthy persons.  相似文献   

16.
Abstract: Cancer classification, through gene expression data analysis, has produced remarkable results, and has indicated that gene expression assays could significantly aid in the development of efficient cancer diagnosis and classification platforms. However, cancer classification, based on DNA array data, remains a difficult problem. The main challenge is the overwhelming number of genes relative to the number of training samples, which implies that there are a large number of irrelevant genes to be dealt with. Another challenge is from the presence of noise inherent in the data set. It makes accurate classification of data more difficult when the sample size is small. We apply genetic algorithms (GAs) with an initial solution provided by t statistics, called t‐GA, for selecting a group of relevant genes from cancer microarray data. The decision‐tree‐based cancer classifier is built on the basis of these selected genes. The performance of this approach is evaluated by comparing it to other gene selection methods using publicly available gene expression data sets. Experimental results indicate that t‐GA has the best performance among the different gene selection methods. The Z‐score figure also shows that some genes are consistently preferentially chosen by t‐GA in each data set.  相似文献   

17.
A microarray machine offers the capacity to measure the expression levels of thousands of genes simultaneously. It is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for cancer classification. However, the urgent problems in the use of gene expression data are the availability of a huge number of genes relative to the small number of available samples, and the fact that many of the genes are not relevant to the classification. It has been shown that selecting a small subset of genes can lead to improved accuracy in the classification. Hence, this paper proposes a solution to the problems by using a multiobjective strategy in a genetic algorithm. This approach was tried on two benchmark gene expression data sets. It obtained encouraging results on those data sets as compared with an approach that used a single-objective strategy in a genetic algorithm. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

18.
基于基因表达谱对组织样本进行分类,在疾病诊断领域,是个非常重要的研究课题。在基因表达数据中,基因的数量(几千个)相对于数据样本(几十个)的个数通常比较多;也就是说,数据的维数相比于数据点的个数来说比较高(这个就是采样不足问题)。过高的维数(特征或基因数)将给分类问题带来极大的挑战。提出了结合非相关线性判别式分析方法(ULDA)和支持向量机(SVM)分类算法,对结肠癌组织样本进行分类识别,并同其他方法作了比较研究,分类效果得到了提高;结果表明了该方法的可行性和有效性。  相似文献   

19.
Molecular level diagnostics based on microarray technologies can offer the methodology of precise, objective, and systematic cancer classification. Genome-wide expression patterns generally consist of thousands of genes. It is desirable to extract some significant genes for accurate diagnosis of cancer because not all genes are associated with a cancer. In this paper, we have used representative gene vectors that are highly discriminatory for cancer classes and extracted multiple significant gene subsets based on those representative vectors respectively. Also, an ensemble of neural networks learned from the multiple significant gene subsets is proposed to classify a sample into one of several cancer classes. The performance of the proposed method is systematically evaluated using three different cancer types: Leukemia, colon, and B-cell lymphoma.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号