首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
双聚类模型有助于聚类存在相关性的局部模式。论文提出了一种可识别多种相关模式的双聚类算法,以二次互信息作为相关性标准,并以Parzen窗口法有效估算高维变量之间的互信息;同时提出了最大相关维簇的概念。算法以多个最大相关维簇为种子,通过迭代细化聚类,可有效地发现高维数据环境内相关的长模式。真实基因表达数据的实验证明了算法的有效性。  相似文献   

2.
针对数据中多视角模式挖掘的问题,提出一个基于IB方法的无冗余多视角聚类算法:NrMIB.该算法一方面采用IB思想来最大化地保存聚类结果中的信息量,以确保高质量的聚类结果;另一方面通过最小化聚类结果与已知数据划分模式间的互信息来确保新的聚类结果相对于已知划分模式是无冗余的.NrMIB算法既适宜于分析共现数据,又适宜于分析欧氏空间非共现数据,可挖掘出数据中线性及非线性可分模式,无需额外参数来估算欧氏空间的信息量.在人工构造数据模式识别、人脸识别和文档聚类上的实验结果表明,NrMIB算法可有效地挖掘出数据中所蕴含的多个合理划分模式,性能优于传统单视角聚类算法及3个现有的无冗余多视角聚类算法.  相似文献   

3.
与传统的硬划分聚类相比,模糊聚类算法(以FCM为例)对数据的比例变化具有鲁棒性,能够更准确地反映数据点与类中心的实际关系,目前已得到广泛应用.然而对于时序基因表达数据来说,传统的聚类算法往往不能充分利用到数据中时间上的动态关联信息.因此可以在模糊聚类算法的基础上引入自回归(AR)模型,将时序基因表达数据作为一组时间序列进行动态的聚类分析.这样不仅可以充分利用到时序基因表达数据的内部自相关性,并且可以进一步利用隶属度函数对AR模型的预测过程进行模糊化调整,从而得到更为理想的聚类结果.  相似文献   

4.
针对特征空间中存在潜在相关特征的规律,分别利用谱聚类探索特征间的相关性及邻域互信息以寻求最大相关特征子集,提出联合谱聚类与邻域互信息的特征选择算法.首先利用邻域互信息移除与标记不相干的特征.然后采用谱聚类将特征进行分簇,使同一簇组中的特征强相关而不同簇组中的特征强相异.继而基于邻域互信息从每一特征簇组中选择与类标记强相关而与本组特征低冗余的特征子集.最后将所有选中特征子集组成最终的特征选择结果.在2个基分类器下的实验表明,文中算法能以较少的合理特征获得较高的分类性能.  相似文献   

5.
高阶异构数据层次联合聚类算法   总被引:1,自引:0,他引:1  
在实际应用中,包含多种特征空间信息的高阶异构数据广泛出现.由于高阶联合聚类算法能够有效融合多种特征空间信息提高聚类效果,近年来逐渐成为研究热点.目前高阶联合聚类算法多数为非层次聚类算法.然而,高阶异构数据内部往往隐藏着层次聚簇结构,为了更有效地挖掘数据内部隐藏的层次聚簇模式,提出了一种高阶层次联合聚类算法(high-order hierarchical co-clustering algorithm,HHCC).该算法利用变量相关性度量指标Goodman-Kruskal τ衡量对象变量和特征变量的相关性,将相关性较强的对象划分到同一个对象聚簇中,同时将相关性较强的特征划分到同一个特征聚簇中.HHCC算法采用自顶向下的分层聚类策略,利用指标Goodman-Kruskal τ评估每层对象和特征的聚类质量,利用局部搜索方法优化指标Goodman-Kruskal τ,自动确定聚簇数目,获得每层的聚类结果,最终形成树状聚簇结构.实验结果表明HHCC算法的聚类效果优于4种经典的同构层次聚类算法和5种已有的非层次高阶联合聚类算法.  相似文献   

6.
提出一种基于Bagging的集成聚类方法,采用一种新的数据集采样技术生成数据子集,尽可能的保持了子样本的多样性和最大相关性,然后应用一种改进的k均值聚类算法生成个体学习器,根据互信息对数据集的不同聚类结果进行处理,最后通过计算有争议的数据对象与各个聚类中心的距离将其重新划分到新的聚类结果中.在多个UCI标准数据集上的实验结果表明,该方法能有效改善聚类质量.  相似文献   

7.
杨辉  彭晗  朱建勇  聂飞平 《计算机仿真》2021,38(8):328-332,343
谱聚类可以任意形状的数据进行聚类,在聚类集成中能够有效的提高基聚类的质量.以往的聚类集成算法中,聚类集成得到的结果并不是最终聚类结果,还需要利用聚类算法来获得最终聚类结果,在整个过程中会使得解由离散-连续-离散的转变.提出了一种基于谱聚类的双边聚类集成算法.算法首先在生成阶段使用谱聚类算法来获得基聚类,通过标准互信息来选取基聚类.将选出来基聚类和样本作为图的顶点,并对构建的图利用双边聚类算法对基聚类和样本同时聚类直接得到最终聚类结果.在实验中,将所提方法与一些聚类集成算法进行了比较,取得了较好的结果.  相似文献   

8.
聚类集成是机器学习中的新问题.它是利用同一数据集的多个聚类划分集成在一起,以提高聚类分析的性能.如何发现从多个划分中得到“consensus clustering”是一个很困难的问题.很多学者对此作了研究.本文提出了一种基于互信息的模糊聚类集成算法.该算法主要扩展了Strehl & Ghosh提出的基于互信息的聚类集成目标函数,将其应用到模糊划分的集成,同时利用类似于信息瓶颈聚类的算法进行求解.实验结果表明,在4个UCI的数据集上,基于互信息的聚类集成能获得良好的性能.  相似文献   

9.
数据缺失对聚类算法提出了挑战,传统方法往往采用均值或回归方法将不完整数据进行填充,再对填充后的数据进行聚类.为解决均值填充和回归填充等方法在数据缺失比率增大时填充精度以及聚类效果变差的问题,提出一种新的不完整数据相似度计算方法.以期望互信息为依据对数据集中的属性排序,充分考虑了数据集中与位置相关的属性值特征,以数据集本身元素作为缺失值填充的来源,对排序后的不完整数据集进行相似度填充计算,最后采用基于局部密度的聚类算法进行聚类.利用UCI机器学习库中的数据集验证本文填充聚类算法,实验结果表明,当数据集中缺失值增多时,算法对缺失值的容忍性较好,对缺失元素的恢复能力较强,填充精度以及最终聚类结果方面均表现良好.本文填充计算相似度的方法考虑数据集的每个属性值来对缺失值逐个填充,因而耗时较多.  相似文献   

10.
基于关联函数的动态聚类算法及应用   总被引:1,自引:0,他引:1  
根据时序立体数据的特点,提出了基于关联函数一致性矩阵的动态聚类算法。给出了适用于时序立体数据关联函数的改进标准关联函数计算公式,并将该算法应用于乙烯裂解炉报警系统,结合流程的时序立体数据,得到了裂解炉报警系统的动态聚类分类结果,并验证了提出算法的有效性。本文算法对于时序数据的聚类具有较强的鲁棒性。  相似文献   

11.
Biclustering algorithms have become popular tools for gene expression data analysis. They can identify local patterns defined by subsets of genes and subsets of samples, which cannot be detected by traditional clustering algorithms. In spite of being useful, biclustering is an NP-hard problem. Therefore, the majority of biclustering algorithms look for biclusters optimizing a pre-established coherence measure. Many heuristics and validation measures have been proposed for biclustering over the last 20 years. However, there is a lack of an extensive comparison of bicluster coherence measures on practical scenarios. To deal with this lack, this paper experimentally analyzes 17 bicluster coherence measures and external measures calculated from information obtained in the gene ontologies. In this analysis, results were produced by 10 algorithms from the literature in 19 gene expression datasets. According to the experimental results, a few pairs of strongly correlated coherence measures could be identified, which suggests redundancy. Moreover, the pairs of strongly correlated measures might change when dealing with normalized or non-normalized data and biclusters enriched by different ontologies. Finally, there was no clear relation between coherence measures and assessment using information from gene ontology.  相似文献   

12.
This paper presents a scatter search approach based on linear correlations among genes to find biclusters, which include both shifting and scaling patterns and negatively correlated patterns contrarily to most of correlation-based algorithms published in the literature. The methodology established here for comparison is based on a priori biological information stored in the well-known repository Gene Ontology (GO). In particular, the three existing categories in GO, Biological Process, Cellular Components and Molecular Function, have been used. The performance of the proposed algorithm has been compared to other benchmark biclustering algorithms, specifically a group of classical biclustering algorithms and two algorithms that use correlation-based merit functions. The proposed algorithm outperforms the benchmark algorithms and finds patterns based on negative correlations. Although these patterns contain important relationship among genes, they are not found by most of biclustering algorithms. The experimental study also shows the importance of the size in a bicluster in addition to the value of its correlation. In particular, the size of a bicluster has an influence over its enrichment in a GO term.  相似文献   

13.
Biclustering is an important tool in exploratory statistical analysis which can be used to detect latent row and column groups of different response patterns. However, few studies include covariate data directly into their biclustering models to explain these variations. A novel biclustering framework that considers both stochastic block structures and covariate effects is proposed to address this modeling problem. Fast approximation estimation algorithms are also developed to deal with a large number of latent variables and covariate coefficients. These algorithms are derived from the variational generalized expectation–maximization (EM) framework where the goal is to increase, rather than maximize, the likelihood lower bound in both E and M steps. The utility of the proposed biclustering framework is demonstrated through two block modeling applications in model-based collaborative filtering and microarray analysis.  相似文献   

14.
Biclustering is an important method in DNA microarray analysis which can be applied when only a subset of genes is co-expressed in a subset of conditions. Unlike standard clustering analyses, biclustering methodology can perform simultaneous classification on two dimensions of genes and conditions in a microarray data matrix. However, the performance of biclustering algorithms is affected by the inherent noise in data, types of biclusters and computational complexity. In this paper, we present a geometric biclustering method based on the Hough transform and the relaxation labeling technique. Unlike many existing biclustering algorithms, we first consider the biclustering patterns through geometric interpretation. Such a perspective makes it possible to unify the formulation of different types of biclusters as hyperplanes in spatial space and facilitates the use of a generic plane finding algorithm for bicluster detection. In our algorithm, the Hough transform is employed for hyperplane detection in sub-spaces to reduce the computational complexity. Then sub-biclusters are combined into larger ones under the probabilistic relaxation labeling framework. Our simulation studies demonstrate the robustness of the algorithm against noise and outliers. In addition, our method is able to extract biologically meaningful biclusters from real microarray gene expression data.  相似文献   

15.
16.
Statistical evaluation of biclustering solutions is essential to guarantee the absence of spurious relations and to validate the high number of scientific statements inferred from unsupervised data analysis without a proper statistical ground. Most biclustering methods rely on merit functions to discover biclusters with specific homogeneity criteria. However, strong homogeneity does not guarantee the statistical significance of biclustering solutions. Furthermore, although some biclustering methods test the statistical significance of specific types of biclusters, there are no methods to assess the significance of flexible biclustering models. This work proposes a method to evaluate the statistical significance of biclustering solutions. It integrates state-of-the-art statistical views on the significance of local patterns and extends them with new principles to assess the significance of biclusters with additive, multiplicative, symmetric, order-preserving and plaid coherencies. The proposed statistical tests provide the unprecedented possibility to minimize the number of false positive biclusters without incurring on false negatives, and to compare state-of-the-art biclustering algorithms according to the statistical significance of their outputs. Results on synthetic and real data support the soundness and relevance of the proposed contributions, and stress the need to combine significance and homogeneity criteria to guide the search for biclusters.  相似文献   

17.
The advent of microarray technology enables us to monitor an entire genome in a single chip using a systematic approach. Clustering, as a widely used data mining approach, has been used to discover phenotypes from the raw expression data. However traditional clustering algorithms have limitations since they can not identify the substructures of samples and features hidden behind the data. Different from clustering, biclustering is a new methodology for discovering genes that are highly related to a subset of samples. Several biclustering models/methods have been presented and used for tumor clinical diagnosis and pathological research. In this paper, we present a new biclustering model using Binary Matrix Factorization (BMF). BMF is a new variant rooted from non-negative matrix factorization (NMF). We begin by proving a new boundedness property of NMF. Two different algorithms to implement the model and their comparison are then presented. We show that the microarray data biclustering problem can be formulated as a BMF problem and can be solved effectively using our proposed algorithms. Unlike the greedy strategy-based algorithms, our proposed algorithms for BMF are more likely to find the global optima. Experimental results on synthetic and real datasets demonstrate the advantages of BMF over existing biclustering methods. Besides the attractive clustering performance, BMF can generate sparse results (i.e., the number of genes/features involved in each biclustering structure is very small related to the total number of genes/features) that are in accordance with the common practice in molecular biology.  相似文献   

18.
Unlike traditional clustering analysis,the biclustering algorithm works simultaneously on two dimensions of samples (row) and variables (column).In recent years,biclustering methods have been developed rapidly and widely applied in biological data analysis,text clustering,recommendation system and other fields.The traditional clustering algorithms cannot be well adapted to process high-dimensional data and/or large-scale data.At present,most of the biclustering algorithms are designed for the differentially expressed big biological data.However,there is little discussion on binary data clustering mining such as miRNA-targeted gene data.Here,we propose a novel biclustering method for miRNA-targeted gene data based on graph autoencoder named as GAEBic.GAEBic applies graph autoencoder to capture the similarity of sample sets or variable sets,and takes a new irregular clustering strategy to mine biclusters with excellent generalization.Based on the miRNA-targeted gene data of soybean,we benchmark several different types of the biclustering algorithm,and find that GAEBic performs better than Bimax,Bibit and the Spectral Biclustering algorithm in terms of target gene enrichment.This biclustering method achieves comparable performance on the high throughput miRNA data of soybean and it can also be used for other species.  相似文献   

19.
Several biclustering algorithms have been proposed in different fields of microarray data analysis. We present a new approach that improves their performance in using the ensemble methods. An ensemble biclustering is considered and formalized by a problem of binary triclustering. We propose a simple and efficient algorithm to solve it. To illustrate the interest of our ensemble approach, numerical experiments are performed on both artificial and real datasets with two biclustering algorithms commonly used in bioinformatics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号