首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
马煜  陈莉  方鹤鹤 《微机发展》2006,16(2):117-119
微阵列技术是后基因时代功能基因组研究的主要工具。由于采用了高效的并行杂交技术,每次实验可以得到大量丰富的数据,因此其结果分析成为一项很有挑战性而且具有重要意义的工作。聚类分析是微阵列数据分析中使用最为广泛的一类方法。微阵列实验得到的大量数据通过聚类分析,可以得到很多有用的信息,其成功应用已广泛涉及到基因功能研究和生物医学研究中的各个领域。文中介绍了基因微阵列数据的聚类分析方法及其重要应用。  相似文献   

2.
基于支持向量机的微阵列基因表达数据分析方法   总被引:5,自引:0,他引:5  
DNA微阵列技术,使人们可以同时观测成千上万个基因的表达水平,对其数据的分析已成为生物信息学研究的焦点.针对微阵列基因表达数据维数高、样本小、非线性的特点,设计了一种基于支持向量机的基因表达数据分类识别方法,该方法采用信噪比进行基因特征提取,运用支持向量机的不同核函数进行性能测试,针对几个典型数据集的实验表明其识别效果良好.  相似文献   

3.
A new classification method, for isolating steam generator tube defects in nuclear power plants using Eddy Current Test (ECT) signals, has been developed. The method uses Self-Organizing maps (SOM) with different data signatures to identify and classify these defects. A multiple inference system is proposed which evaluates different extracted characteristic SOMs to infer the defect type. Wavelet zero-crossing representation, a linear predictive coding (LPC), and other basic signal representations, such as magnitude and phase, are used to construct characteristic vectors that combine one or more of these features. These vectors are evaluated for their ability to classify tube defects and the ones with the best performance are used in the multiple inference system. The effectiveness of the method is demonstrated by applications of the characteristic maps to ECT data from various cases of tube defects in pressurized water reactor plant steam generators. The developed algorithm enables real-time applications such as fast tube defects classification systems and visualization of ECT signal feature prototypes, which may improve the speed of time-critical decision making during power plant maintenance outages.  相似文献   

4.
The problem of finding the intrinsic dimension of speech is addressed in this paper. Astructured vector quantization lattice, Self-Organizing Map (SOM), is used as a projection space for the data. The goal is to find a hypercubical SOM lattice where the sequences of projected speech feature vectors form continuous trajectories. The effect of varying the dimension of the lattice is investigated using feature vector sequences computed from the TIMIT database.  相似文献   

5.
微阵列数据癌症分类问题中的基因选择   总被引:1,自引:0,他引:1  
微阵列数据广泛而成功地应用于生物医学的癌症分类研究.一个典型的微阵列数据集包含大量(通常成千上万,甚至数十万)的基因、相对少量(往往不足一百)的样本.在这成千上万的基因中,仅仅一少部分基因对癌症分类有贡献.因而,对于癌症分类来说,最重要的一个问题就是识别出对癌症分类最有贡献的基因.这一识别过程称为基因选择.基因选择在统计模式识别、机器学习和数据挖掘领域已得到广泛研究.介绍基因选择问题所涉及到的相关背景知识和基本概念;全面地回顾统计学、机器学习和数据挖掘领域对基因选择问题的解决方法;通过实验展示了几种典型算法在微阵列数据上的性能;指出当前存在的问题和未来的研究方向.  相似文献   

6.
The identification of overrepresented motifs in a collection of biological sequences continues to be a relevant and challenging problem in computational biology. Currently popular methods of motif discovery are based on statistical learning theory. In this paper, a machine-learning approach to the motif discovery problem is explored. The approach is based on a Self-Organizing Map (SOM) where the output layer neuron weight vectors are replaced by position weight matrices. This approach can be used to characterise features present in a set of sequences, and thus can be used as an aid in overrepresented motif discovery. The SOM approach to motif discovery is demonstrated using biological sequence datasets, both real and simulated  相似文献   

7.
Monti  Stefano  Tamayo  Pablo  Mesirov  Jill  Golub  Todd 《Machine Learning》2003,52(1-2):91-118
In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.  相似文献   

8.
Self-Organizing Maps and Learning Vector Quantization for Feature Sequences   总被引:2,自引:0,他引:2  
The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences. The novelty is to associate an entire feature vector sequence, instead of a single feature vector, as a model with each SOM node. Dynamic time warping is used to obtain time-normalized distances between sequences with different lengths. Starting with random initialization, ordered feature sequence maps then ensue, and Learning Vector Quantization can be used to fine tune the prototype sequences for optimal class separation. The resulting SOM models, the prototype sequences, can then be used for the recognition as well as synthesis of patterns. Good results have been obtained in speaker-independent speech recognition.  相似文献   

9.
A novel self-organizing map (SOM) based retrieval system is proposed for performing face matching in large database. The proposed system provides a small subset of faces that are most similar to a given query face, from which user can easily verify the matched images. The architecture of the proposed system consists of two major parts. First, the system provides a generalized integration of multiple feature-sets using multiple self-organizing maps. Multiple feature-sets are obtained from different feature extraction methods like Gabor filter, Local Autocorrelation Coefficients, etc. In this platform, multiple facial features are integrated to form a compressed feature vector without concerning scaling and length of individual feature set. Second, an SOM is trained to organize all the face images in a database through using the compressed feature vector. Using the organized map, similar faces to a query can be efficiently identified. Furthermore, the system includes a relevance feedback to enhance the face retrieval performance. The proposed method is computationally efficient. Comparative results show that the proposed approach is promising for identifying face in a given large image database.  相似文献   

10.
人类基因组计划的研究已进入后基因组时代,后基因组时代研究的焦点已经从测序转向功能研究,主要采用无监督和有监督技术来分析基因表达谱和识别基因功能,通过基因转录调控网络分析细胞内基因之间的相互作用关系的整体表示,说明生命功能在基因表达层面的展现,对目前基因表达谱数据分析技术及它们的发展,进行了综述性的研究,分析了它们的优缺点,提出了解决问题的思路和方法,为基因表达谱的进一步研究提供了新的途径。  相似文献   

11.
微阵列技术是后基因组时代功能基因组研究的主要工具。基因表达谱数据的聚类分析对于研究基因功能和基因调控机制有重要意义。针对聚类算法要求事先确定簇的个数、对噪声敏感和可伸缩性差的问题,基于密度聚类算法DBSCAN和共享近邻SharedNearestNeighbors(SNN)的不同的特点,提出了一种新的最近邻先吸收的聚类算法,将其应用于一个公开的酵母细胞同期数据集,并用评价方法FOM将聚类结果与K-means聚类方法的结果进行了比较。结果表明,该文的聚类算法优于其他聚类算法,聚类结果具有明显的生物学意义,并能对数据的类别数作出较好的预测和评估。  相似文献   

12.
颜文胜 《计算机工程》2011,37(5):202-203,206
依据基因表达数据的特点,提出一种基于弹簧模型的基因表达数据可视化聚类方法,将多维空间的基因表达数据映射到二维空间中,较好地保持了原始多维数据间的时空相似性。实验结果表明,该方法能发现基因表达数据集中隐含的类簇结构以及共表达基因模式。  相似文献   

13.
介绍了目前几种基于DNA微阵列基因表达数据的分类方法。分别阐述了递归分割法、构建森林法以及信息融合方法的算法思想,对每种方法进行了深入描述,并对它们进行了分析和比较。最后对基于基因表达微阵列数据的分类技术进行了展望。  相似文献   

14.
We have found one reason why AdaBoost tends not to perform well on gene expression data, and identified simple modifications that improve its ability to find accurate class prediction rules. These modifications appear especially to be needed when there is a strong association between expression profiles and class designations. Cross-validation analysis of six microarray datasets with different characteristics suggests that, suitably modified, boosting provides competitive classification accuracy in general.Sometimes the goal in a microarray analysis is to find a class prediction rule that is not only accurate, but that depends on the level of expression of few genes. Because boosting makes an effort to find genes that are complementary sources of evidence of the correct classification of a tissue sample, it appears especially useful for such gene-efficient class prediction. This appears particularly to be true when there is a strong association between expression profiles and class designations, which is often the case for example when comparing tumor and normal samples.  相似文献   

15.
The self-organizing map (SOM) is a powerful method for visualization, cluster extraction, and data mining. It has been used successfully for data of high dimensionality and complexity where traditional methods may often be insufficient. In order to analyze data structure and capture cluster boundaries from the SOM, one common approach is to represent the SOM's knowledge by visualization methods. Different aspects of the information learned by the SOM are presented by existing methods, but data topology, which is present in the SOM's knowledge, is greatly underutilized. We show in this paper that data topology can be integrated into the visualization of the SOM and thereby provide a more elaborate view of the cluster structure than existing schemes. We achieve this by introducing a weighted Delaunay triangulation (a connectivity matrix) and draping it over the SOM. This new visualization, CONNvis, also shows both forward and backward topology violations along with the severity of forward ones, which indicate the quality of the SOM learning and the data complexity. CONNvis greatly assists in detailed identification of cluster boundaries. We demonstrate the capabilities on synthetic data sets and on a real 8D remote sensing spectral image.  相似文献   

16.
Recently, biology has been confronted with large multidimensional gene expression data sets where the expression of thousands of genes is measured over dozens of conditions. The patterns in gene expression are frequently explained retrospectively by underlying biological principles. Here we present a method that uses text analysis to help find meaningful gene expression patterns that correlate with the underlying biology described in scientific literature. The main challenge is that the literature about an individual gene is not homogenous and may addresses many unrelated aspects of the gene. In the first part of the paper we present and evaluate the neighbor divergence per gene (NDPG) method that assigns a score to a given subgroup of genes indicating the likelihood that the genes share a biological property or function. To do this, it uses only a reference index that connects genes to documents, and a corpus including those documents. In the second part of the paper we present an approach, optimizing separating projections (OSP), to search for linear projections in gene expression data that separate functionally related groups of genes from the rest of the genes; the objective function in our search is the NDPG score of the positively projected genes. A successful search, therefore, should identify patterns in gene expression data that correlate with meaningful biology. We apply OSP to a published gene expression data set; it discovers many biologically relevant projections. Since the method requires only numerical measurements (in this case expression) about entities (genes) with textual documentation (literature), we conjecture that this method could be transferred easily to other domains. The method should be able to identify relevant patterns even if the documentation for each entity pertains to many disparate subjects that are unrelated to each other.  相似文献   

17.
改进自组织映射神经网络方法是将常规自组织映射神经网络方法结合确定性水平,对网络的输入矢量进行预处理。通过实验比较了这种改进的自组织映射神经网络识别方法与常规的自组织映射神经网络识别方法的识别效果,在识别性能上有了很大的提高。  相似文献   

18.
基于GA/SVM的微阵列数据特征的选择与分类   总被引:2,自引:0,他引:2       下载免费PDF全文
微阵列数据样本小、维度高的特点给数据分析造成了困难,而主基因的挑选又十分的重要。该文采用遗传算法挑选主基因,其中,用k最邻居距离作为模式识别方法,用支持向量机构造了诊断系统,用不同核函数进行预测分类性能测试。在经典的白血病数据集上,对34个样本的测试集的分类准确率为100%。  相似文献   

19.
针对目前双聚类算法很少考虑所得聚类结果整体的划分质量问题,提出一种基于PA指标的双聚类算法。该算法选定一种衡量所有簇划分效果的PA指标来构造双聚类的模型,运用启发式贪心策略,通过迭代增删行列的方式挖掘出划分效果较高的几个双聚类。将所提算法与CC、FLOC算法进行算法性能的比较。实验结果表明,该算法能获得更好的结果。这说明该算法更能挖掘出具备既有统计意义又有生物意义的局部模式。  相似文献   

20.
微阵列实验是一个复杂的多步骤的实验过程,不确定性存在于实验的每一个步骤中,导致最后得到的实验结果中包含了一些数据噪声。为了从这些含有噪声的数据中得到更多有意义的生物信息,很多算法相继被提出来计算基因表达值。目前流行的mmgMOS模型提高了芯片数据分析的准确性,但是该模型的主要缺点是其参数值φ在整个数据集上是唯一不变的,单一的值不能代表不同探针的真实信号。本文对mmgMOS模型中的参数值φ进行改进,从而进一步提高后续寻找差异基因的准确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号