首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the context of microarray data analysis, biclustering allows the simultaneous identification of a maximum group of genes that show highly correlated expression patterns through a maximum group of experimental conditions (samples). This paper introduces a heuristic algorithm called BicFinder (The BicFinder software is available at: ) for extracting biclusters from microarray data. BicFinder relies on a new evaluation function called Average Correspondence Similarity Index (ACSI) to assess the coherence of a given bicluster and utilizes a directed acyclic graph to construct its biclusters. The performance of BicFinder is evaluated on synthetic and three DNA microarray datasets. We test the biological significance using a gene annotation web-tool to show that our proposed algorithm is able to produce biologically relevant biclusters. Experimental results show that BicFinder is able to identify coherent and overlapping biclusters.  相似文献   

2.
ABSTRACT

Biclustering in gene-expression data is a subset of the genes demonstrating consistent patterns over a subset of the conditions. Recently, the most of research in biclustering involving statistical and graph-theoretic approaches by adding or deleting rows and/or columns in the data matrix based on some constraints. This is an exhaustive search of the space, and hence the solutions may not be feasible. The proposed work finds the significant biclusters in large expression data using shuffled cuckoo search with Nelder–Mead (SCS-NM). The diversification and intensification of the search space are obtained through shuffling and simplex NM, respectively. The proposed work is tested on four benchmark datasets, and the results are compared with the swarm intelligence techniques and the various biclustering algorithms. The results show that there is significant improvement in the fitness value of proposed work SCS-NM. In addition, the work determines the biological relevance of the biclusters with Gene Ontology in terms of function, process and component.  相似文献   

3.
4.
Several methods have been proposed for microarray data analysis that enables to identify groups of genes with similar expression profiles only under a subset of examples. We propose to improve the performance of these biclustering methods by adapting the approach of bagging to biclustering problems. The principle consists in generating a set of biclusters and aggregating the results. Our method has been tested with success on both synthetic and real datasets.  相似文献   

5.
双聚类方法是当前分析基因表达数据的一个重要研究方向,其挖掘目标是发现哪些基因在哪些实验条件下具有相似的表达水平或者关系密切.目前已提出了许多双聚类算法来挖掘不同类型的双聚类,然而其大部分挖掘效率不高.鉴于此,提出了一个新颖的挖掘算法——MRCluster,其主要是用来从原始的基因表达数据中挖掘最大的行常量双聚类模式.就其挖掘效率来说,它采用的是基于Apriori原则的基因扩展深度优先的挖掘策略,并且在挖掘过程中引入了一些新颖的剪枝技术来提高效率.将MRCluster和一个行常量双聚类模式挖掘方法RAP(range support pattern)算法进行比较,从实验结果上可以看出,相比RAP算法,MRCluster算法对在原始的基因表达数据中挖掘最大的行常量双聚类模式具有更好的效率.因此,MRCluster算法能够有效地从原始的基因表达数据中挖掘最大的行常量双聚类.  相似文献   

6.
Biclustering of gene expression data aims at finding localized patterns in a subspace. A bicluster (sometimes called a co-cluster), in the context of gene expression data, is a set of genes that exhibit similar expression intensity under a subset of experimental features (conditions). Most biclustering algorithms proposed in the literature aim at finding sub-matrices that exhibit some sort of coherence by selecting an initial sub-matrix and iteratively adding or subtracting rows and columns. These algorithms are generally dependent on the initial, hard selection of the gene and condition clusters respectively. In this work, we adapt a recently proposed approach for clustering textual data to find biclusters in gene expression data. Our proposed technique is based on the concept of co-similarity between genes (and between conditions) that exploits weighted higher order paths in a bipartite graph representation of the gene expression data. Therefore, we build statistical relations between genes and between conditions by comparing all genes and conditions before finally extracting biclusters from the data. We show that the proposed technique is able to find meaningful non-overlapping biclusters both on synthetically generated data as well as real cancer data. Our results indicate that the proposed technique is resistant to noise in the data and can successfully retrieve biclusters even in the presence of relatively large amount of noise. We also analyze our results with respect to the discovered genes and observe that our extracted biclusters are supported by biological evidences, such as enrichment of gene functions and biological processes.  相似文献   

7.
Biclustering is an important method in DNA microarray analysis which can be applied when only a subset of genes is co-expressed in a subset of conditions. Unlike standard clustering analyses, biclustering methodology can perform simultaneous classification on two dimensions of genes and conditions in a microarray data matrix. However, the performance of biclustering algorithms is affected by the inherent noise in data, types of biclusters and computational complexity. In this paper, we present a geometric biclustering method based on the Hough transform and the relaxation labeling technique. Unlike many existing biclustering algorithms, we first consider the biclustering patterns through geometric interpretation. Such a perspective makes it possible to unify the formulation of different types of biclusters as hyperplanes in spatial space and facilitates the use of a generic plane finding algorithm for bicluster detection. In our algorithm, the Hough transform is employed for hyperplane detection in sub-spaces to reduce the computational complexity. Then sub-biclusters are combined into larger ones under the probabilistic relaxation labeling framework. Our simulation studies demonstrate the robustness of the algorithm against noise and outliers. In addition, our method is able to extract biologically meaningful biclusters from real microarray gene expression data.  相似文献   

8.
基因表达数据是由DNA微阵列实验产生的大规模数据矩阵,双聚类算法是挖掘数据矩阵中具有较高相关性的子矩阵,能有效地提取生物学信息.针对当前多目标双聚类优化算法易于陷入早熟和局部最优解等问题,论文提出了基于逻辑运算的离散人工蜂群优化双聚类算法(LOABCB算法),一方面引入人工蜂群算法增强双聚类的全局寻优能力,另一方面通过...  相似文献   

9.
10.
Biclustering algorithms have become popular tools for gene expression data analysis. They can identify local patterns defined by subsets of genes and subsets of samples, which cannot be detected by traditional clustering algorithms. In spite of being useful, biclustering is an NP-hard problem. Therefore, the majority of biclustering algorithms look for biclusters optimizing a pre-established coherence measure. Many heuristics and validation measures have been proposed for biclustering over the last 20 years. However, there is a lack of an extensive comparison of bicluster coherence measures on practical scenarios. To deal with this lack, this paper experimentally analyzes 17 bicluster coherence measures and external measures calculated from information obtained in the gene ontologies. In this analysis, results were produced by 10 algorithms from the literature in 19 gene expression datasets. According to the experimental results, a few pairs of strongly correlated coherence measures could be identified, which suggests redundancy. Moreover, the pairs of strongly correlated measures might change when dealing with normalized or non-normalized data and biclusters enriched by different ontologies. Finally, there was no clear relation between coherence measures and assessment using information from gene ontology.  相似文献   

11.
现有的双聚类算法缺乏发现具有重叠结构双聚类的能力,无法有效发现基因表达数据中隐藏的相应双聚类结构,并且在增删条件过程中均未考虑条件重要性对双聚类结果的影响.针对上述问题,文中提出基于加权均方残差的改进双聚类算法.首先利用重叠率和隶属度控制的模糊划分将基因集划分为初始双聚类,然后在最小化目标函数过程中迭代修改各双簇中条件的权重,最后利用加权的均方残差添加符合条件的基因,删除优化的双聚类中一致波动性不好的基因,得到最终的双聚类集.实验表明,文中算法不仅能生成具有共表达水平大小不同的双簇,并且能将重叠率控制在合理范围内.  相似文献   

12.
Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples (row) and variables (column).In recent years,biclustering methods have been developed rapidly and widely applied in biological data analysis,text clustering,recommendation system and other fields.The traditional clustering algorithms cannot be well adapted to process high-dimensional data and/or large-scale data.At present,most of the biclustering algorithms are designed for the differentially expressed big biological data.However,there is little discussion on binary data clustering mining such as miRNA-targeted gene data.Here,we propose a novel biclustering method for miRNA-targeted gene data based on graph autoencoder named as GAEBic.GAEBic applies graph autoencoder to capture the similarity of sample sets or variable sets,and takes a new irregular clustering strategy to mine biclusters with excellent generalization.Based on the miRNA-targeted gene data of soybean,we benchmark several different types of the biclustering algorithm,and find that GAEBic performs better than Bimax,Bibit and the Spectral Biclustering algorithm in terms of target gene enrichment.This biclustering method achieves comparable performance on the high throughput miRNA data of soybean and it can also be used for other species.  相似文献   

13.
目前应用于基因表达数据上的双聚类算法大多是基于真实数据提出的, 因此易受噪声干扰, 且这些算法很少考虑样本间的时序性。提出了一种有效的时间点连续的双聚类挖掘算法DTCB, 从离散的时序基因表达数据中挖掘出时间点连续的最大共表达双聚类。该算法使用了一种新的数据离散化方法, 同时提出了三种在离散数据集下基因间的共表达关系; 为了提高挖掘效率, DTCB使用了有效的剪枝和输出策略, 可以在不产生候选集的情况下一次性挖掘出所有的最大共表达双聚类。通过实验分析, 证明DTCB具有高效的性能和良好的鲁棒性, 且结果具有较好的统计和生物意义。  相似文献   

14.
针对目前双聚类算法很少考虑所得聚类结果整体的划分质量问题,提出一种基于PA指标的双聚类算法。该算法选定一种衡量所有簇划分效果的PA指标来构造双聚类的模型,运用启发式贪心策略,通过迭代增删行列的方式挖掘出划分效果较高的几个双聚类。将所提算法与CC、FLOC算法进行算法性能的比较。实验结果表明,该算法能获得更好的结果。这说明该算法更能挖掘出具备既有统计意义又有生物意义的局部模式。  相似文献   

15.
Biclustering numerical data became a popular data-mining task at the beginning of 2000’s, especially for gene expression data analysis and recommender systems. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So-called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non-redundant enumeration of such patterns, a well-known intractable problem, while no formal framework exists. We introduce important links between biclustering and Formal Concept Analysis (FCA). Indeed, FCA is known to be, among others, a methodology for biclustering binary data. Handling numerical data is not direct, and we argue that Triadic Concept Analysis (TCA), the extension of FCA to ternary relations, provides a powerful mathematical and algorithmic framework for biclustering numerical data. We discuss hence both theoretical and computational aspects on biclustering numerical data with triadic concept analysis. These results also scale to n-dimensional numerical datasets.  相似文献   

16.
Biclustering of expression data with evolutionary computation   总被引:2,自引:0,他引:2  
Microarray techniques are leading to the development of sophisticated algorithms capable of extracting novel and useful knowledge from a biomedical point of view. In this work, we address the biclustering of gene expression data with evolutionary computation. Our approach is based on evolutionary algorithms, which have been proven to have excellent performance on complex problems, and searches for biclusters following a sequential covering strategy. The goal is to find biclusters of maximum size with mean squared residue lower than a given /spl delta/. In addition, we pay special attention to the fact of looking for high-quality biclusters with large variation, i.e., with a relatively high row variance, and with a low level of overlapping among biclusters. The quality of biclusters found by our evolutionary approach is discussed and the results are compared to those reported by Cheng and Church, and Yang et al. In general, our approach, named SEBI, shows an excellent performance at finding patterns in gene expression data.  相似文献   

17.

Background

One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured.

Results

The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs.

Conclusions

SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance.  相似文献   

18.
MicroCluster: efficient deterministic biclustering of microarray data   总被引:1,自引:0,他引:1  
MicroCluster can mine different types of arbitrarily positioned and overlapping clusters of genetic data to find interesting patterns. Our approach has four key features. First, we mine only the maximal biclusters satisfying certain homogeneity criteria. Second, the clusters can be arbitrarily positioned anywhere in the input data matrix, and they can have arbitrary overlapping regions. Third, MicroCluster uses a flexible definition of a cluster that lets it mine several types of biclusters (which previously were studied independently). Finally, MicroCluster can delete or merge biclusters that have large overlaps. So, it can tolerate some noise in the data set and let users focus on the most important clusters. We've developed a set of metrics to evaluate the clustering quality and have tested MicroCluster's effectiveness on several synthetic and real data sets.  相似文献   

19.
A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, the row variance, and the size of the bicluster. Different strategies to escape local minima are introduced and compared. Experimental results on several microarray data sets show that the method is able to find significant biclusters, also from a biological point of view.  相似文献   

20.
Co-regulation is a common phenomenon in gene expression. Finding positively and negatively co-regulated gene clusters from gene expression data is a real need. Existing techniques based on global similarity are unable to detect true up- and down-regulated gene clusters. This paper presents an expression pattern based biclustering technique, CoBi, for grouping both positively and negatively regulated genes from microarray expression data. Regulation pattern and similarity in degree of fluctuation are accounted for while computing similarity between two genes. Unlike traditional biclustering techniques, which use greedy iterative approaches, it uses a BiClust tree that needs single pass over the entire dataset to find a set of biologically relevant biclusters. Biclusters determined from different gene expression datasets by the technique show highly enriched functional categories.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号