首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 46 毫秒
谱聚类算法是近年来国际上机器学习领域的一个新的研究热点,但其在文本聚类上的应用还较少。设计了一种文本聚类谱算法,首先构建文本相似度矩阵并进而得到拉普拉斯矩阵,随后对其进行特征值分解获得前k个最小特征向量,最后使用K均值算法(K-means)获得k个文本簇。在真实文本数据集上进行了实验,与超球K均值算法相比,本文算法获得了更好的聚类结果。  相似文献   

文本聚类是数据挖掘的核心技术,能帮助用户有效地导航、总结和组织文本信息。本文通过对文本聚类的应用研究,探讨了几种聚类算法的原理与特点,提出并分析了K-means算法与层次凝聚算法的具体实现步骤。  相似文献   

随机森林是近些年发展起来的新集成学习算法,具有较好的分类准确率。针对该算法计算复杂度较高的不足,提出了一种基于谱聚类划分的随机森林算法。首先,利用聚类效果较好的谱聚类算法对原始样本集的每一类进行聚类处理。然后,在每一聚类簇中随机选取一个样本作为代表,组成新训练样本集合。最后,在新训练样本集上训练随机森林分类器。该算法通过谱聚类技术对原始样本进行了初步划分,将位置相近的多个样本用簇内的一个样本代表,较大程度地减少了训练样本的个数。在Corel Image图像识别数据集上的实验表明,算法可以用较少的分类时间达到较高的分类精度。  相似文献   

基于非负矩阵分解的谱聚类集成SAR图像分割   总被引:4,自引:0,他引:4       下载免费PDF全文
邓晓政  焦李成  卢山 《电子学报》2011,39(12):2905-2909
 本文提出了一种新颖的基于非负矩阵分解的谱聚类集成SAR图像分割框架.首先,个体分割结果的产生采用基于Nystrom逼近的谱聚类方法,使用不同的尺度参数,得到具有差异性的个体分割结果;其次,使用非负矩阵分解的方法来合并这些个体分割结果,使用非负矩阵分解方法的优点在于其合乎人类大脑感知的直观体验,并具有明确的物理含义;最后,根据合并得到的像素点隶属度关系得到SAR图像分割结果.为了验证本文方法的有效性,对3幅纹理图像和4幅SAR图像进行分割实验,并对比K-means方法、基于Nystrom逼近的谱聚类方法、Meta-clustering方法,本文的方法无论是定性还是定量分析都是较好的,并具有一定的实用性.  相似文献   

聚类集成是数据挖掘研究的一个热点。它是利用同一数据集的多个聚类划分集成在一起,以提高聚类分析的性能。当前相关研究大多没有考虑进行集成的聚类成员的质量,因此较差的成员会对集成结果产生不良影响。文中提出了一种基于加权co-occurrence矩阵的聚类集成算法(WCSCE)。该方法首先计算出聚类成员基于属性值的co-occurrence矩阵,然后对聚类成员的质量进行简单评价并赋予权重,生成加权co-occurrence矩阵,进而产生集成结果。最后通过实验验证了该算法的有效性,并提高了聚类质量。  相似文献   

基于灰度直方图和谱聚类的文本图像二值化方法   总被引:7,自引:0,他引:7  
在自动文本提取中,经定位获得的字符区域需二值化后方能有效识别,由于背景的复杂,常用的阈值化方法不能有效分割自然环境下的字符图像。该文提出了一种基于谱聚类的图像二值化方法,该方法利用规范化切痕(Normalized cut, Ncut)作为谱聚类测度,结合灰度直方图计算相似性矩阵,并通过实验确定最佳的直方图等级数,与通常基于像素级相似矩阵相比,算法的空间复杂度和计算复杂性都大为降低。实验结果表明,针对自然场景下的字符图像,该文方法的二值化结果优于常用的阈值分割结果。  相似文献   

汉语文本聚类及其算法设计   总被引:1,自引:0,他引:1  
主要针对传统的聚类算法倾向于识别大小类似的球形聚类簇,且对离群数据较为敏感等问题,利用聚类簇代表点选取的方法,同时结合基于人进行聚类判断所遵循的基本原则,即聚类中对象间距离应小于聚类间距离,设计了一种有效的聚类算法,实验结果表明算法是有效的。  相似文献   

密度敏感的谱聚类   总被引:13,自引:2,他引:13       下载免费PDF全文
王玲  薄列峰  焦李成 《电子学报》2007,35(8):1577-1581
谱聚类是近来出现的一种性能极具竞争力的聚类方法,它的成功很大程度依赖于相似性度量的选择.本文通过分析这一性质并结合数据聚类特性,提出一种数据依赖的相似性度量--密度敏感的相似性度量.该相似性度量可以有效描述数据的实际聚类分布.将其引入谱聚类得到密度敏感的谱聚类算法.与原有的谱聚类算法相比,新算法不仅能够处理多尺度聚类问题,而且对参数选择相对不敏感.算法有效性分析以及实验验证了所提算法的有效性和可行性.  相似文献   

一种基于随机游动的聚类算法   总被引:2,自引:0,他引:2  
该文提出一种改进的随机游动模型,并在此模型的基础上,发展了一种数据聚类算法。在此算法中,数据集中的样本点根据改进的随机游动模型,生成有权无向图G(V,E,d),其中每个样本点对应图G的一个顶点,并且假设每个顶点为可以在空间中移动的Agent。随后计算每个顶点向其邻集中顶点转移的概率,在随机选定邻集中的一个顶点作为转移方向后,移动一个单位距离。在所有样本点不断随机游动的过程中,同类的样本点就会逐渐的聚集到一起,而不同类的样本点相互远离,最后使得聚类自动形成。实验结果表明,基于随机游动的聚类算法能使样本点合理有效地被聚类,同时,与其他算法对比也说明了此算法的有效性。  相似文献   

针对情感分析中使用传统聚类算法所存在的准确率低,聚类方向不确定等问题,文章提出一种将用户反馈机制引入谱聚类算法的新方法。该方法首先由拉谱拉斯矩阵分解得到特征向量,这些特征向量对应于数据不同方面的特征信息,然后让用户对部分特征信息进行检阅并确定自己所需要的聚类方向,最后由系统自动按用户选择进行再聚类,从而得到用户所需要的情感方面分类。实验结果表明,该方法使得聚类结果的准确率获得了一定程度的提高,并解决了聚类方向不确定这一问题。  相似文献   

聚类算法及聚类融合算法研究   总被引:1,自引:0,他引:1  
基于常用聚类算法及聚类融合算法进行了研究。首先阐述了数据挖掘领域的常用聚类算法及特点,接下来对近年来聚类融合的方法和研究现状进行了综述,并对如何产生高效的聚类成员和共识函数如何构建才能产生高效的聚类融合算法进行了说明。运用改进的随机投影算法来生成聚类成员,实验表明随机投影是一个生成聚类成员的很有效的方法。最后得出运用聚...  相似文献   

We propose a novel multi-view document clustering method with the graph-regularized concept factorization (MVCF). MVCF makes full use of multi-view features for more comprehensive understanding of the data and learns weights for each view adaptively. It also preserves the local geometrical structure of the manifolds for multi-view clustering. We have derived an efficient optimization algorithm to solve the objective function of MVCF and proven its convergence by utilizing the auxiliary function method. Experiments carried out on three benchmark datasets have demonstrated the effectiveness of MVCF in comparison to several state-of-the-art approaches in terms of accuracy, normalized mutual information and purity.  相似文献   

为了满足对XML文档集合进行数据挖掘需求,本文提出了根据XML文档树的语义信息和结构信息来计算其结构相似度,通过结构相似度构造其结构相似度矩阵,在此基础上应用DBSCAN算法来对XML文档集合进行聚类.与其他聚类算法相比,其聚类的速度得到了很大的提高.  相似文献   

针对LEACH算法簇头选取及能量消耗方面的不足,提出一种基于能量、距离和节点度的分簇路由算法CMEDD,通过均匀分簇减少重建过程,对簇头选举公式进行改进,合理选择簇头,从而均衡节点能耗。采用基于代价因子的单跳和多跳相结合的方式建立最优路径进行数据传输。仿真结果表明,与LEACH算法和RMCRW算法相比,CMEDD算法能够有效均衡节点能耗,可相对延长网络生存周期。  相似文献   

In response to the problems traditional multi-view document clustering methods separate the multi-view document representation from the clustering process and ignore the complementary characteristics of multi-view document clustering,an iterative algorithm for complementary multi-view document clustering——CMDC was proposed,in which the multi-view document clustering process and the multi-view feature adjustment were conducted in a mutually unified manner.In CMDC algorithm,complementary text documents were selected from the clustering results to aid adjusting the contribution of view features via learning a local measurement metric of each document view.The complementary text document of the results among the dimensionality clusters was selected by CMDC,and used to promote the feature tuning of the clusters.The partition consistency of the multi-dimensional document clustering was solved by the measure consistency of the dimensions.Experimental results show that CMDC effectively improves multi-dimensional clustering performance.  相似文献   

Rajesh  David   《Ad hoc Networks》2006,4(1):36-59
Self-organization of wireless sensor networks, which involves network decomposition into connected clusters, is a challenging task because of the limited bandwidth and energy resources available in these networks. In this paper, we make contributions towards improving the efficiency of self-organization in wireless sensor networks. We first present a novel approach for message-efficient clustering, in which nodes allocate local “growth budgets” to neighbors. We introduce two algorithms that make use of this approach. We analyze the message complexity of these algorithms and provide performance results from simulations. The algorithms produce clusters of bounded size and low diameter, using significantly fewer messages than the earlier, commonly used, Expanding Ring approach. Next, we present a new randomized methodology for designing the timers of cluster initiators. This methodology provides a probabilistic guarantee that initiators will not interfere with each other. We derive an upper bound on the expected time for network decomposition that is logarithmic in the number of nodes in the network. We also present a variant that optimistically allows more concurrency among initiators and significantly reduces the network decomposition time. However, it produces slightly more clusters than the first method. Extensive simulations over different topologies confirm the analytical results and demonstrate that our proposed methodology scales to large networks.  相似文献   

Using partitioning in sensor networks to create clusters for routing, data management, and for controlling communication has been proven as a way to ensure long range deployment and to deal with sensor network shortcomings such as limited energy and short communication ranges. Choosing a cluster head within each cluster is important because cluster heads use additional energy for their responsibilities and that burden needs to be carefully passed around among nodes in a cluster. Many existing protocols either choose cluster heads randomly or use nodes with the highest remaining energy. We present an Energy Constrained minimum Dominating Set based efficient clustering called ECDS to model the problem of optimally choosing cluster heads with energy constraints. Our proposed randomized distributed algorithm for the constrained dominating set runs in O(log n log Δ) rounds with high probability where Δ is the maximum degree of a node in the graph. We provide an approximation ratio for the ECDS algorithm of expected size 8HΔOPT∣ and with high probability a size of O(∣OPT∣log n) where n is the number of nodes, H is the harmonic function and OPT means the optimal size. We propose multiple extensions to the distributed algorithm for the energy constrained dominating set. We experimentally show that these extensions perform well in terms of energy usage, node lifetime, and clustering time in comparison and, thus, are very suitable for wireless sensor networks.  相似文献   

在数据挖掘的所有算法中,聚类分析尤为重要.基于划分的聚类算法就是用统计分析的方法研究分类问题.本文介绍了聚类的定义及聚类算法的种类,详细阐述了K均值聚类算法和K中心点聚类算法的基本原理并对它们的性能进行分析,对近年来各学者对基于划分的聚类算法的研究现状进行了梳理,对其具体应用实例做了简要介绍。  相似文献   

Low-rank representation (LRR) and its variations have achieved great successes in subspace segmentation tasks. However, the segmentation processes of the existing LRR-related methods are all divided into two separated steps: affinity graphs construction and segmentation results obtainment. In the second step, normalize cut (Ncut) algorithm is used to get the final results based on the constructed graphs. This implies that the affinity graphs obtained by LRR-related algorithms may not be most suitable for Ncut, and the best results are not guaranteed to be achieved. In this paper, we propose a spectral clustering steered LRR representation algorithm (SCSLRR) which combines the objection functions of Ncut, K-means and LRR together. By solving a joint optimization problem, SCSLRR is able to find low-rank affinity matrices which are most beneficial for Ncut to get best segmentation results. The extensive experiments of subspace segmentation on several benchmark datasets show that SCSLRR dominates the related methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号