首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
密度敏感的谱聚类   总被引:13,自引:2,他引:13       下载免费PDF全文
王玲  薄列峰  焦李成 《电子学报》2007,35(8):1577-1581
谱聚类是近来出现的一种性能极具竞争力的聚类方法,它的成功很大程度依赖于相似性度量的选择.本文通过分析这一性质并结合数据聚类特性,提出一种数据依赖的相似性度量--密度敏感的相似性度量.该相似性度量可以有效描述数据的实际聚类分布.将其引入谱聚类得到密度敏感的谱聚类算法.与原有的谱聚类算法相比,新算法不仅能够处理多尺度聚类问题,而且对参数选择相对不敏感.算法有效性分析以及实验验证了所提算法的有效性和可行性.  相似文献   

2.
徐宁  张沪寅  王晶  徐方  汪志勇 《电子学报》2016,44(10):2323-2329
针对传统分簇算法无法适用于信道动态变化的认知Ad Hoc网络,提出了一种基于信道相似度的分布式分簇算法.首先计算节点间的信道相似度,利用改进的EM算法估计节点属于不同簇的概率,再结合图的最小割算法取得最优的分簇结果.算法既最大化簇内相似度,也最小化簇间相似度.最后,提出了一个协调机制,可以同步全局的分簇信息.整个过程完全分布式运行,并且无需依赖公共控制信道.仿真结果表明,算法能够根据信道变化,动态地调整分簇结构,提高簇内公共信道数量.与此同时,算法还能有效减少簇间公共信道,降低簇间通信干扰.  相似文献   

3.
高阶异构数据模糊联合聚类算法   总被引:1,自引:0,他引:1  
为了更有效地分析聚簇重叠部分高阶异构数据的聚簇结果,提出了一种高阶异构数据模糊联合聚类(HFCC)算法,该算法最小化每个特征空间中对象与聚簇中心的加权距离。推导出对象隶属度和特征权重的迭代更新公式,设计出聚类过程的迭代算法,并且从理论上证明了该迭代算法的收敛性。另外,通过泛化XB指标,提出适用于评估高阶异构数据聚类质量的指标GXB,用于判断聚簇数目。实验表明,HFCC算法能够有效探测数据内部隐藏的重叠聚簇结构,并且HFCC算法聚类效果明显优于5种有代表性的硬划分算法,此外GXB指标能够有效判定高阶异构数据的聚簇数目。  相似文献   

4.
基于相似度的词聚类算法   总被引:1,自引:1,他引:0  
基于类的统计语言模型是解决统计模型数据稀疏问题的重要方法.传统的统计方法基于贪婪原则,常以语料的似然函数或困惑度(perplexity)作为评价标准.传统的聚类方法的主要缺点是聚类速度慢,初值对结果影响大,易陷入局部最优.本文提出了词相似度定义、词集合相似度定义,一种自下而上的分层聚类算法.这种方法不但能改善聚类效果,而且可根据不同的模型选择不同的相似度定义,从而提高聚类的使用效果.  相似文献   

5.
Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k‐nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real‐world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.  相似文献   

6.
模糊C-均值(FCM)聚类算法的一个主要问题是需要事先确定聚类的数目,为此定义了类内差异度和类间重叠度来分别度量同一个聚类中数据的相似度和不同聚类间的分离程度,进而基于这两个度量提出一个新的有效性函数用于判定最佳聚类数目。实验结果表明,该有效性函数能有效地判定聚类数目,并且有较好的鲁棒性。  相似文献   

7.
Traditional clustering algorithms (e.g., the K-means algorithm and its variants) are used only for a fixed number of clusters. However, in many clustering applications, the actual number of clusters is unknown beforehand. The general solution to this type of a clustering problem is that one selects or defines a cluster validity index and performs a traditional clustering algorithm for all possible numbers of clusters in sequence to find the clustering with the best cluster validity. This is tedious and time-consuming work. To easily and effectively determine the optimal number of clusters and, at the same time, construct the clusters with good validity, we propose a framework of automatic clustering algorithms (called ETSAs) that do not require users to give each possible value of required parameters (including the number of clusters). ETSAs treat the number of clusters as a variable, and evolve it to an optimal number. Through experiments conducted on nine test data sets, we compared the ETSA with five traditional clustering algorithms. We demonstrate the superiority of the ETSA in finding the correct number of clusters while constructing clusters with good validity.  相似文献   

8.
基于语义的高维数据聚类技术   总被引:2,自引:2,他引:0  
刘铭  王晓龙  刘远超 《电子学报》2009,37(5):925-929
本文提出一种有效处理高维数据的聚类算法,算法首先通过构造特征链将文档集合划分为多个类别,同时在相似度计算及权值调整时考虑相似特征的影响以凝聚语义相似的文档,并动态调整文档权重使分布不平衡的文档得到充分训练.实验表明:该算法在高维空间能够获得较好的聚类结果,类内相似度高,类间区分性好,迭代次数较少.  相似文献   

9.
Identification of the correct number of clusters is an important consideration in clustering where several cluster validity indexes, primarily utilizing the Euclidean distance, have been used in the literature. The property of symmetry is observed in most clustering solutions. In this paper, the symmetry versions of nine cluster validity indexes, namely, Davies–Bouldin index, Dunn index, generalized Dunn index, point symmetry (PS) index, $I$ index, Xie–Beni index, FS index, $K$ index, and SV index, are proposed. It is empirically established that incorporation of the property of symmetry significantly improves the capabilities of these indexes in identifying the appropriate number of clusters. A recently developed PS-based genetic clustering technique, GAPS clustering, is used as the underlying partitioning algorithm. Results on six artificially generated and five real-life datasets show that symmetry-distance-based $I$ index performs the best as compared to all the other eight indexes.   相似文献   

10.
Dynamic estimation of number of clusters in data sets   总被引:3,自引:0,他引:3  
Boudraa  A.-O. 《Electronics letters》1999,35(19):1606-1608
A new method for estimating during clustering the number of clusters in data sets is proposed. The cluster validity index, Bcrit, takes the homogeneity in each cluster into account and is connected to the geometrical properties of the data set. Bcrit represents the combination of two validity indices. Comparisons between Bcrit and six cluster validity indices, conducted on real data sets, are presented  相似文献   

11.
《电子学报:英文版》2017,(6):1221-1226
Category-based statistic language model is an important method to solve the problem of sparse data in statistical language models. But there are two bottlenecks about this model: 1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation; 2) Class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a novel definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given and a bottom-up hierarchical clustering algorithm was proposed. Experimental results show that the word clustering algorithm based on word similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 207.8.  相似文献   

12.
毛健  倪云霞  陈佳 《通信技术》2010,43(5):92-94
针对已有的无线入侵检测方法训练时间长和检测精度低的问题,提出一种基于调整后的BIRCH——MBIRCH算法的无线Mesh网络入侵检测算法。该算法首先一次性扫描数据集获得CF(聚类特征),然后自底向上地计算不同层次的聚类有效指标,主要是考虑数据集的几何结构,即通过度量簇内数据点分布的紧凑度以及簇间的相似度,并保持二者之间的平衡,根据此指标确定CF树的簇结点,直到得到最佳聚类结果,将最佳聚类结果作为训练样本指定判别函数,对网络数据定位。实验结果表明,该算法不仅明显减少样本训练时间,同时提高了算法检测精度,符合无线Mesh网络的入侵检测需要。  相似文献   

13.
In most spectral clustering approaches, the Gaussian kernel‐based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state‐of‐the‐art methods on both synthetic and real‐world data. The experiment results show the superiority of the new similarity: 1) The max‐flow‐based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.  相似文献   

14.
为了满足对XML文档集合进行数据挖掘需求,本文提出了根据XML文档树的语义信息和结构信息来计算其结构相似度,通过结构相似度构造其结构相似度矩阵,在此基础上应用DBSCAN算法来对XML文档集合进行聚类.与其他聚类算法相比,其聚类的速度得到了很大的提高.  相似文献   

15.
一种基于距离调节的聚类算法   总被引:2,自引:1,他引:1  
针对k-means算法不适合凹形样本空间的问题,提出了一种基于距离调节的聚类算法.算法中引入了一种调节最短路径距离作为算法的相似度函数,该函数可以使经过高密度数据区域的两点距离缩短,而经过低密度数据区域的两点距离加长,由此来缩小类间样本的相似度,同时加大类间的相似度,以及更好的聚类.实验结果证明,该算法对凹状的聚类样本空间具有很好的聚类效果.  相似文献   

16.
聚类分析是基因表达数据分析研究的主要技术之一,其算法的基本出发点在于根据对象间相似度将对象划分为不同的类,选择适当的相似性度量准则是获得有效聚类结果的关键。采用预处理过的基因数据集在不同相似性度量准则下进行的不同聚类算法的聚类分析,并得到聚类结果评价。其中算法本身的缺陷及距离相似性度量的局限性都是影响结果评价的因素,为了获得更有效的聚类结果,改进相关聚类算法并提出了一种比例相似性度量准则。  相似文献   

17.
传统谱聚类算法在构造相似度矩阵时,高斯核函数参数选取的无规律性会对聚类结果造成严重影响。针对的这一缺陷,提出一种基于密度均值的谱聚类算法。与传统算法不同,该算法选取样本点到周围K个样本点的平均距离作为尺度参数,并引入样本点的密度信息,使得聚类结果更符合实际样本的分布。同时,由于相似矩阵能自适应不同的局部密度,使得该算法对样本的空间分布并不敏感。在不同类型数据集上的实验验证了算法的有效性和较高的鲁棒性。  相似文献   

18.
提出一种适用于道路障碍物识别检测的聚类算法,该算法用来处理各向异性分布的激光点云数据。算法的基本思想是:针对点云空间分布的实时变化,提出在线学习合并阈值的层次聚类算法,以确定聚类数搜索范围上界和初始聚类中心的待选点集;然后提出距离乘积最大化方法,对待选点集进行初始化排序,既结合点云的空间密度分布改善了聚类结果,又克服了传统K-means算法初始聚类中心难确定的问题;最后选取Silhouette和距离评价函数为聚类有效性指标分析算法的聚类效果,确定最佳聚类数。用以上自适应、在线学习的算法对2.5D激光雷达采集的点云数据进行聚类,并与其他两种聚类算法进行实际试验比较发现,本算法可以正确分割大多数空间分布各异且相互连接的障碍物。  相似文献   

19.
文章提出了一种基于模糊聚类的文本分类器构造方法,介绍了文本中特征词之间模糊相似度的度量方法,给出了利用“编网法”思想实现模糊聚类的算法。通过比较文本中特征词之间的模糊相似度,实现特征词的聚类,最终获取能够识别文本主题类别的特征词集合,并给出了分类器性能的测试结果。  相似文献   

20.
The radar signal sorting method based on traditional support vector clustering (SVC) algorithm takes a high time complexity, and the traditional validity index cannot efficiently indicate the best sorting result. Aiming at solving the problem, we study a new sorting method based on cone cluster labeling (CCL) method. The CCL method relies on the theory of approximate coverings both in feature space and data space. Also a new cluster validity index, similitude entropy (SE), is proposed. It can be used to evaluate the compactness and separation of clusters with information entropy theory. Simulations including the performance comparison between the proposed method and the conventional methods are presented. Results show that while maintaining the sorting accuracy, the proposed method can reduce the computing complexity effectively in sorting the signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号