首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
微阵列技术是后基因时代功能基因组研究的主要工具。由于采用了高效的并行杂交技术,每次实验可以得到大量丰富的数据,因此其结果分析成为一项很有挑战性而且具有重要意义的工作。聚类分析是微阵列数据分析中使用最为广泛的一类方法。微阵列实验得到的大量数据通过聚类分析,可以得到很多有用的信息,其成功应用已广泛涉及到基因功能研究和生物医学研究中的各个领域。文中介绍了基因微阵列数据的聚类分析方法及其重要应用。  相似文献   

2.
陈涛  洪增林  邓方安 《计算机科学》2014,41(10):291-294,316
DNA微阵列技术可以同时检测细胞内成千上万的基因的活性,被广泛应用于重大基因疾病的临床诊断。然而微阵列数据通常具有高维小样本特点,且存在大量噪声和冗余基因。为了进一步提高微阵列数据分类性能,提出一种特征基因混合选择算法。首先采用ReliefF算法剔除大量无关基因,获得特征基因候选子集;然后采用基于差分进化算法优化的邻域粗糙集模型实现特征基因选择;最后利用支持向量机进行分类,以验证算法的有效性。仿真实验结果表明,该算法能用尽可能少的特征基因来获得更高的分类精度,既增强了算法的泛化性能,又提高了时间效率,而且对致病基因的临床诊断有着重要的参考意义。  相似文献   

3.
微阵列数据癌症分类问题中的基因选择   总被引:1,自引:0,他引:1  
微阵列数据广泛而成功地应用于生物医学的癌症分类研究.一个典型的微阵列数据集包含大量(通常成千上万,甚至数十万)的基因、相对少量(往往不足一百)的样本.在这成千上万的基因中,仅仅一少部分基因对癌症分类有贡献.因而,对于癌症分类来说,最重要的一个问题就是识别出对癌症分类最有贡献的基因.这一识别过程称为基因选择.基因选择在统计模式识别、机器学习和数据挖掘领域已得到广泛研究.介绍基因选择问题所涉及到的相关背景知识和基本概念;全面地回顾统计学、机器学习和数据挖掘领域对基因选择问题的解决方法;通过实验展示了几种典型算法在微阵列数据上的性能;指出当前存在的问题和未来的研究方向.  相似文献   

4.
于化龙  高尚  赵靖  秦斌 《计算机科学》2012,39(5):190-194
近年来,应用DNA微阵列技术对疾病,尤其是癌症进行诊断,已逐渐成为生物信息学领域的研究热点之一。对比其它的数据载体,微阵列数据通常具有一些独有的特点。针对微阵列数据样本分布不平衡这一特点,提出了一种基于概率分布的过采样技术,通过该技术可以为少数类建立一些合理的伪样本,从而使各类的样本数达到均衡,然后使用随机森林分类器对其进行分类。该方法的有效性和可行性已经在两个标准的微阵列数据集上得到了验证。实验结果显示,与传统的方法相比,该方法可以获得更好的分类性能。  相似文献   

5.
杨昆  李建中  徐德昌  戴国骏 《软件学报》2010,21(9):2148-2160
提出集成分析来自相同研究问题的不同数据集来识别表达不稳定的基因.把这一问题形式化为一个非线性整数规划问题,三个启发式的算法被提出来求解这一优化问题;进一步地设计了一个统计量来度量基因的不稳定表达程度.提出的方法应用于两个真实数据,实验结果显示:所识别的不稳定基因在两个数据中的表达不一致;利用表达不稳定基因可以提高差异表达基因的筛选结果,而去除表达不稳定基因可以有效地提高微阵列数据分类.实验结果表明,提出的方法是有效的,并且表达不稳定基因可以为微阵列数据分析提供有价值的信息.  相似文献   

6.
基于支持向量机的微阵列基因表达数据分析方法   总被引:5,自引:0,他引:5  
DNA微阵列技术,使人们可以同时观测成千上万个基因的表达水平,对其数据的分析已成为生物信息学研究的焦点.针对微阵列基因表达数据维数高、样本小、非线性的特点,设计了一种基于支持向量机的基因表达数据分类识别方法,该方法采用信噪比进行基因特征提取,运用支持向量机的不同核函数进行性能测试,针对几个典型数据集的实验表明其识别效果良好.  相似文献   

7.
周鹏 《计算机工程与设计》2005,26(11):2966-2968,2974
基于DNA微阵列实验,可以同时观测成千上万个基因的表达水平,使得人们能够在基因组水平上以系统的、全局的观念去研究生命现象及本质。支持向量机作为一种新的机器学习方法,最近几年在生物信息学领域得到了广泛的研究,在许多情况下,支持向量机取得了优于或接近其它方法的性能。就支持向量机在DNA微阵列的应用做一综述。  相似文献   

8.
常用的排列方法从DNA微数据中选择的基因集合往往会包含相关性较高的基因,而且使用单个基因评价方法也不能真正反映由此得到的特征集合分类能力的优劣。另外,基因数量远多于样本数量是进行疾病诊断面临的又一挑战。为此,提出一种DNA微阵列数据特征提取方法用于组织分类。该方法运用K-means方法对基因进行聚类分析,获取各子类DNA微阵列数据中心,用排列法去除对分类无关的子类,然后利用ICA方法提取剩余子类集合的特征,用SVMs方法构造分类器对组织进行分类。真实的生物学数据实验表明,该方法通过提取一种复合基因,能综合评价基因分类能力,减少特征数,提高分类器的分类准确性。  相似文献   

9.
DNA微阵列图象信息的自动提取   总被引:6,自引:0,他引:6  
微阵列图象分析是分子生物学中DNA微阵列杂交实验数据测定过程的一个重要部分。其目的是将微阵列图象中大量象素灰度信息简化为微阵列靶点的信号值。该文介绍了一种对DNA微阵列图象信息进行自动提取的方法,使用此方法完全不需要人为操作,可避免操作者主观性对实验的影响,提高数据提取的效率和可重复性。其关键步骤包括:图象滤波,图象灰度信息提取,微阵列间距确定,靶点定位和靶点信号值计算。实验结果表明使用此算法提取DNA微阵列图象信息具有重复性好、效率高和分析准确的特点。  相似文献   

10.
聚类分析是数据挖掘中的一个重要研究课题。在许多实际应用中,聚类分析的数据往往具有很高的维度,例如文档数据、基因微阵列等数据可以达到上千维,而在高维数据空间中,数据的分布较为稀疏。受这些因素的影响,许多对低维数据有效的经典聚类算法对高维数据聚类常常失效。针对这类问题,本文提出了一种基于遗传算法的高维数据聚类新方法。该方法利用遗传算法的全局搜索能力对特征空间进行搜索,以找出有效的聚类特征子空间。同时,为了考察特征维在子空间聚类中的特征,本文设计出一种基于特征维对子空间聚类贡献率的适应度函数。人工数据、真实数据的实验结果以及与k-means算法的对比实验证明了该方法的可行性和有效性。  相似文献   

11.
Cluster analysis of DNA microarray data is an important but difficult task in knowledge discovery processes. Many clustering methods are applied to analysis of data for gene expression, but none of them is able to deal with an absolute way with the challenges that this technology raises. Due to this, many applications have been developed for visually representing clustering algorithm results on DNA microarray data, usually providing dendrogram and heat map visualizations. Most of these applications focus only on the above visualizations, and do not offer further visualization components to the validate the clustering methods or to validate one another. This paper proposes using a visual analytics framework in cluster analysis of gene expression data. Additionally, it presents a new method for finding cluster boundaries based on properties of metric spaces. Our approach presents a set of visualization components able to interact with each other; namely, parallel coordinates, cluster boundary genes, 3D cluster surfaces and DNA microarray visualizations as heat maps. Experimental results have shown that our framework can be very useful in the process of more fully understanding DNA microarray data. The software has been implemented in Java, and the framework is publicly available at http://www.analiticavisual.com/jcastellanos/3DVisualCluster/3D-VisualCluster.  相似文献   

12.
Cluster analysis for gene expression data: a survey   总被引:16,自引:0,他引:16  
DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field.  相似文献   

13.
The emerging field of bioinformatics has recently created much interest in the computer science and engineering communities. With the wealth of sequence data in many public online databases and the huge amount of data generated from the Human Genome Project, computer analysis has become indispensable. This calls for novel algorithms and opens up new areas of applications for many pattern recognition techniques. In this article, we review two major avenues of research in bioinformatics, namely DNA sequence analysis and DNA microarray data analysis. In DNA sequence analysis, we focus on the topics of sequence comparison and gene recognition. For DNA microarray data analysis, we discuss key issues such as image analysis for gene expression data extraction, data pre-processing, clustering analysis for pattern discovery and gene expression time series data analysis. We describe current methods and show how computational techniques could be useful in these areas. It is our hope that this review article could demonstrate how the pattern recognition community could have an impact on the fascinating and challenging area of genomic research.  相似文献   

14.
基因表达数据的聚类分析研究进展   总被引:4,自引:1,他引:3  
基因表达数据的爆炸性增长迫切需求自动、有效的数据分析工具. 目前聚类分析已成为分析基因表达数据获取生物学信息的有力工具. 为了更好地挖掘基因表达数据, 近年来提出了许多改进的传统聚类算法和新聚类算法. 本文首先简单介绍了基因表达数据的获取和表示, 之后系统地介绍了近年来应用在基因表达数据分析中的聚类算法. 根据聚类目标的不同将算法分为基于基因的聚类、基于样本的聚类和两路聚类, 并对每类算法介绍了其生物学的含义及其难点, 详细讨论了各种算法的基本原理及优缺点. 最后总结了当前的基因表达数据的聚类分析方法,并对发展趋势作了进一步的展望.  相似文献   

15.
In a DNA microarray dataset, gene expression data often has a huge number of features(which are referred to as genes) versus a small size of samples. With the development of DNA microarray technology, the number of dimensions increases even faster than before, which could lead to the problem of the curse of dimensionality. To get good classification performance, it is necessary to preprocess the gene expression data. Support vector machine recursive feature elimination (SVM-RFE) is a classical method for gene selection. However, SVM-RFE suffers from high computational complexity. To remedy it, this paper enhances SVM-RFE for gene selection by incorporating feature clustering, called feature clustering SVM-RFE (FCSVM-RFE). The proposed method first performs gene selection roughly and then ranks the selected genes. First, a clustering algorithm is used to cluster genes into gene groups, in each which genes have similar expression profile. Then, a representative gene is found to represent a gene group. By doing so, we can obtain a representative gene set. Then, SVM-RFE is applied to rank these representative genes. FCSVM-RFE can reduce the computational complexity and the redundancy among genes. Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM-RFE.  相似文献   

16.
Recent advancement in microarray technology permits monitoring of the expression levels of a large set of genes across a number of time points simultaneously. For extracting knowledge from such huge volume of microarray gene expression data, computational analysis is required. Clustering is one of the important data mining tools for analyzing such microarray data to group similar genes into clusters. Researchers have proposed a number of clustering algorithms in this purpose. In this article, an attempt has been made in order to improve the performance of fuzzy clustering by combining it with support vector machine (SVM) classifier. A recently proposed real-coded variable string length genetic algorithm based clustering technique and an iterated version of fuzzy C-means clustering have been utilized in this purpose. The performance of the proposed clustering scheme has been compared with that of some well-known existing clustering algorithms and their SVM boosted versions for one simulated and six real life gene expression data sets. Statistical significance test based on analysis of variance (ANOVA) followed by posteriori Tukey-Kramer multiple comparison test has been conducted to establish the statistical significance of the superior performance of the proposed clustering scheme. Moreover biological significance of the clustering solutions have been established.  相似文献   

17.
The rapid advancement of DNA chip (microarray) technology has revolutionalized genetic research in bioscience. However, the enormous amount of data produced from a microarray image makes automatic computer analysis indispensable. An important first step in analyzing microarray image is the accurate determination of the DNA spots in the image. We report here a novel spot segmentation method for DNA microarray images. The algorithm makes use of adaptive thresholding and statistical intensity modeling to: (i) generate the grid structure automatically, where each subregion in the grid contains only one spot, and (ii) to segment the spot, if any, within each subregion. The algorithm is fully automatic, robust, and can aid in the high throughput computer analysis of microarray data.  相似文献   

18.
This paper proposes a new hierarchical clustering method using genetic algorithms for the analysis of gene expression data. This method is based on the mathematical proof of several results, showing its effectiveness with regard to other clustering methods. Genetic algorithms applied to cluster analysis have disclosed good results on biological data and many studies have been carried out in this sense, although most of them are focused on partitional clustering methods. Even though there are few studies that attempt to use genetic algorithms for building hierarchical clustering, they do not include constraints that allow us to reduce the complexity of the problem. Therefore, these studies become intractable problems for large data sets. On the other hand, the deterministic hierarchical clustering methods generally face the problem of convergence towards local optimums due to their greedy strategy. The method introduced here is an alternative to solve some of the problems existing methods face. The results of the experiments have shown that our approach can be very effective in cluster analysis of DNA microarray data.  相似文献   

19.
DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions.Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号