首页 | 本学科首页   官方微博 | 高级检索  
 共查询到18条相似文献,搜索用时 500 毫秒
二元数据子空间聚类算法的初始化研究*   总被引:2,自引:1,他引:1  
针对二元数据空间高维稀疏性的特点而提出的有限混合伯努利模型,能够快速寻找映射簇的模型框架;EM算法是数学模型进行参数迭代的重要方法,其算法的优劣很大程度上取决于其初始参数。对于运用EM算法来实现有限混合伯努利模型聚类算法已有许多研究, EM算法中参数的选取直接影响聚类算法的性能。引入 Binning法和改变数据之间相似度测量方式、中心点的选取方式来进行初始化,从而大大减少聚类结果对初始参数的依赖,实验证明该算法是高效的、正确的。  相似文献   

仿射传播算法是一种快速有效的聚类方法,但其聚类结果的不稳定性影响了聚类性能。对此,提出基于近邻的仿射传播算法(AP-NN),通过仿射传播算法产生初始簇,并从中选择代表簇对非代表簇的样本进行近邻聚类。在时间序列数据集上的实验结果表明,AP-NN模型算法能够产生较好的聚类结果,适用于聚类分析。  相似文献   

鉴于传统方法不能直接有效地对多元时间序列数据进行聚类分析,提出一种基于分量属性近邻传播的多元时间序列数据聚类方法.通过动态时间弯曲方法度量多元时间序列数据之间的总体距离,利用近邻传播聚类算法分别对数据之间的总体距离矩阵和分量近似距离矩阵进行聚类分析,综合考虑这两种视角下序列数据之间的关联关系,使用近邻传播方法对反映原始多元时间序列数据的综合关系矩阵实现较高质量的聚类.数值实验结果表明,与传统聚类方法相比,所提出方法不仅能够有效地反映总体数据特征之间的关系,而且通过重要分量属性序列之间的关联关系分析能够提高原始时间序列数据的聚类效果.  相似文献   

本文对复杂网络的社团发现问题进行研究,分析社团发现问题和聚类问题的相似性,使用自适应仿射传播聚类算法对社团发现问题进行求解,给出了算法的实例,针对算法中的不同参数进行测试比较。结果表明算法具有较好的准确率和运行效率。  相似文献   

一种能发现自然聚类的聚类算法 *   总被引:1,自引:0,他引:1  
目前的聚类算法如K-means、DBSCAN等,采用全局参数而难以发现数据的自然聚类,提出一种新的分 级聚类算法CluFNC,能够在数据空间中发现内部聚类特征。该算法的参数包括网格大小、噪声阈值和神经节点 数量。算法首先根据参数对数据空间划分网格,接着使用高斯影响函数计算每个单元的场强,然后针对网格位 置和网格的场强使用SOM进行聚类,最后使用Chameleon算法对SOM聚类得到的神经网络节点的权值进行聚 类,并把聚类结果映射回原始数据空间以得到最终聚类结果。理论分析和实验结果证明,该算法能够发  相似文献   

为了提高进化数据流的聚类质量,提出基于半监督近邻传播的数据流聚类算法(SAPStream),该算法借鉴半监督聚类的思想对初始数据流构造相似度矩阵进行近邻传播聚类,建立在线聚类模型,随着数据流的进化,应用衰减窗口技术对聚类模型适时做出调整,对产生的类代表点和新到来的数据点再次聚类得到数据流的聚类结果。对数据流进行动态聚类的实验结果表明该算法是高质有效的。  相似文献   

针对聚类问题中的非随机性缺失数据, 本文基于高斯混合聚类模型, 分析了删失型数据期望最大化算法的有效性, 并揭示了删失数据似然函数对模型算法的作用机制. 从赤池弘次信息准则、信息散度等指标, 比较了所提出方法与标准的期望最大化算法的优劣性. 通过删失数据划分及指示变量, 推导了聚类模型参数后验概率及似然函数, 调整了参数截尾正态函数的一阶和二阶估计量. 并根据估计算法的有效性理论, 通过关于得分向量期望的方程得出算法估计的最优参数. 对于同一删失数据集, 所提出的聚类算法对数据聚类中心估计更精准. 实验结果证实了所提出算法在高斯混合聚类的性能上优于标准的随机性缺失数据期望最大化算法.  相似文献   

为了提高近邻传播聚类算法的聚类性能,采用菌群算法进行近邻传播偏向参数优化求解.首先,根据待聚类样本建立相似矩阵,初始化偏向参数;然后采用菌群算法优化偏向参数,将偏向参数作为菌落进行训练,设置轮廓(Silhouette)指标值作为菌群算法的适应度函数;接着通过菌落位置更新优化后的偏向参数,进行近邻传播聚类运算,不断更新近...  相似文献   

针对传统模糊C均值聚类算法只能发现"类球状"簇和对分量属性数据敏感的缺点,提出一种基于FCM的属性分解聚类再融合的分类算法。该算法将信息融合的思想应用于聚类算法,先在每个分量属性维度进行聚类,然后对各属性的聚类结果进行融合分析并得到聚类结果。独立对每个分量属性聚类的思想为算法的并行实现提供便利。实验结果表明,该算法不但能有效提高聚类的准确度,而且不需要提前对数据进行归一化处理,在分量属性量测数据存在偏差时仍然表现出良好的鲁棒性。  相似文献   

一种用于蛋白质结构聚类的聚类中心选择算法   总被引:1,自引:0,他引:1  
黄旭  吕强  钱培德 《自动化学报》2011,37(6):682-692
提出一种对蛋白质结构聚类中心进行选择的算法. 聚类是蛋白质结构预测过程中必不可少的一个后处理步骤, 而目前在蛋白质结构预测中常用的属性阈值(Quality threshold, QT)聚类算法依赖于由经验得出的聚类半径; 其他聚类算法, 如近邻传播(Affinity propagation, AP)聚类算法也存在影响聚类分布的参数. 为克服对主观经验参数的依赖,本文提出一种聚类中心选择算法(Exemplar selection algorithm, ESA), 用于对不同参数下的聚类结果进行分析,从而选择最佳聚类中心,进而确定聚类半径等经验参数. 该算法在真实蛋白质结构数据集上进行了实验,在未知经验参数情况下选择出最佳聚类中心, 同时也为不同聚类算法寻找适合相应数据集的客观聚类参数提供了支持.  相似文献   

Finite mixture models are being increasingly used to provide model-based cluster analysis. To tackle the problem of block clustering which aims to organize the data into homogeneous blocks, recently we have proposed a block mixture model; we have considered this model under the classification maximum likelihood approach and we have developed a new algorithm for simultaneous partitioning based on the classification EM algorithm. From the estimation point of view, classification maximum likelihood approach yields inconsistent estimates of the parameters and in this paper we consider the block clustering problem under the maximum likelihood approach; unfortunately, the application of the classical EM algorithm for the block mixture model is not direct: difficulties arise due to the dependence structure in the model and approximations are required. Considering the block clustering problem under a fuzzy approach, we propose a fuzzy block clustering algorithm to approximate the EM algorithm. To illustrate our approach, we study the case of binary data by using a Bernoulli block mixture.  相似文献   

针对彩色图像质量无法实时评价及优化的问题,提出了基于K-均值聚类的彩色图像质量评价及优化算法。首先利用K-均值聚类的方式构建样本数据集;然后通过待评价图像与聚类数据集之间的相似性来构建特征集;之后再将待优化图像与聚类数据集之间进行融合,对融合后的矩阵进行PCA变换实现了图像质量的优化;通过实验证明,所提评价算法与人眼主观视觉具有较好的一致性;同时,还能通过评价结果实现图像质量的自适应优化。  相似文献   

A novel texture mapping technique is proposed based on nonlinear dimension reduction, called Bernoulli logistic embedding (BLE). Our probabilistic embedding model builds texture mapping with minimal shearing effects. A log-likelihood function, related to the Bregman distance, is used to measure the similarity between two related matrices defined over the spaces before and after embedding. Low-dimensional embeddings can then be obtained through minimizing this function by a fast block relaxation algorithm. To achieve better quality of texture mapping, the embedded results are adopted as initial values for mapping enhancement by stretch-minimizing. Our method can be applied to both complex mesh surfaces and dense point clouds.  相似文献   

Microarray technology has been widely applied in study of measuring gene expression levels for thousands of genes simultaneously. In this technology, gene cluster analysis is useful for discovering the function of gene because co-expressed genes are likely to share the same biological function. Many clustering algorithms have been used in the field of gene clustering. This paper proposes a new scheme for clustering gene expression datasets based on a modified version of Quantum-behaved Particle Swarm Optimization (QPSO) algorithm, known as the Multi-Elitist QPSO (MEQPSO) model. The proposed clustering method also employs a one-step K-means operator to effectively accelerate the convergence speed of the algorithm. The MEQPSO algorithm is tested and compared with some other recently proposed PSO and QPSO variants on a suite of benchmark functions. Based on the computer simulations, some empirical guidelines have been provided for selecting the suitable parameters of MEQPSO clustering. The performance of MEQPSO clustering algorithm has been extensively compared with several optimization-based algorithms and classical clustering algorithms over several artificial and real gene expression datasets. Our results indicate that MEQPSO clustering algorithm is a promising technique and can be widely used for gene clustering.  相似文献   

An EM algorithm for the block mixture model   总被引:1,自引:0,他引:1  
Although many clustering procedures aim to construct an optimal partition of objects or, sometimes, of variables, there are other methods, called block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks. Recently, we have proposed a new mixture model called block mixture model which takes into account this situation. This model allows one to embed simultaneous clustering of objects and variables in a mixture approach. We have studied this probabilistic model under the classification likelihood approach and developed a new algorithm for simultaneous partitioning based on the classification EM algorithm. In this paper, we consider the block clustering problem under the maximum likelihood approach and the goal of our contribution is to estimate the parameters of this model. Unfortunately, the application of the EM algorithm for the block mixture model cannot be made directly; difficulties arise due to the dependence structure in the model and approximations are required. Using a variational approximation, we propose a generalized EM algorithm to estimate the parameters of the block mixture model and, to illustrate our approach, we study the case of binary data by using a Bernoulli block mixture.  相似文献   

This paper presents an iterative spectral framework for pairwise clustering and perceptual grouping. Our model is expressed in terms of two sets of parameters. Firstly, there are cluster memberships which represent the affinity of objects to clusters. Secondly, there is a matrix of link weights for pairs of tokens. We adopt a model in which these two sets of variables are governed by a Bernoulli model. We show how the likelihood function resulting from this model may be maximised with respect to both the elements of link-weight matrix and the cluster membership variables. We establish the link between the maximisation of the log-likelihood function and the eigenvectors of the link-weight matrix. This leads us to an algorithm in which we iteratively update the link-weight matrix by repeatedly refining its modal structure. Each iteration of the algorithm is a three-step process. First, we compute a link-weight matrix for each cluster by taking the outer-product of the vectors of current cluster-membership indicators for that cluster. Second, we extract the leading eigenvector from each modal link-weight matrix. Third, we compute a revised link weight matrix by taking the sum of the outer products of the leading eigenvectors of the modal link-weight matrices.  相似文献   

针对各种扩散模式数据点分布的聚类问题,提出了一种基于密度变化的聚类算法(CDD)。CDD采用基于密度的典型聚类算法(DBSCAN)寻找核心点,通过分析数据样本及其周围点密度的扩散规律,计算密度扩散的方向、速度和加速度,对数据样本进行聚类。实验结果表明:与DBSCAN相比,能准确对扩散模式数据进行聚类,对非扩散模式数据具有抗噪声干扰能力强,参数较易确定的优点。  相似文献   

The class of finite mixtures of multivariate Bernoulli distributions is known to be nonidentifiable; that is, different values of the mixture parameters can correspond to exactly the same probability distribution. In principle, this would mean that sample estimates using this model would give rise to different interpretations. We give empirical support to the fact that estimation of this class of mixtures can still produce meaningful results in practice, thus lessening the importance of the identifiability problem. We also show that the expectation-maximization algorithm is guaranteed to converge to a proper maximum likelihood estimate, owing to a property of the log-likelihood surface. Experiments with synthetic data sets show that an original generating distribution can be estimated from a sample. Experiments with an electropalatography data set show important structure in the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号