首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
模糊C-均值(FCM)聚类算法的实现   总被引:11,自引:0,他引:11  
传统的FCM算法能够将靠近边界的具有固有形状的两个簇合并成为一个大的簇.然而,对于一些稍微复杂的数据,如果没有其它的像去除小簇之类的机制的话,FCM算法很难将非常接近的类聚类到一起.给出的聚类算法是在传统FCM算法的循环之后添加了去除掉空簇的步骤,解决了上述很难将非常接近的类聚到一个簇中的问题.另外,为便于选出最优结果,在递归之后又添加了计算聚类有效性的步骤.最后用Java实现了该算法并在数据集上进行了实验,证实了改进方法的有效性.  相似文献   

2.
半监督聚类就是利用样本的监督信息来帮助提升无监督学习的性能。在半监督聚类中,成对约束(must-link约束和cannot-link约束)作为样本的先验知识被广泛地使用。凝聚层次聚类(AHC)也叫合成聚类,是层次聚类法的一种。提出了一种基于成对约束的半监督凝聚层次聚类算法(PS-AHC),该算法利用成对约束来改变聚类簇之间的距离,使聚类簇之间的距离更真实。在UCI数据集上的实验表明,PS-AHC能有效地提高聚类的准确率,是一种有前景的半监督聚类算法。  相似文献   

3.
在比特流未知协议识别过程中,针对如何将得到的多协议数据帧分为单协议数据帧这一问题,提出了一种改进的凝聚型层次聚类算法。该算法以传统的凝聚型层次聚类算法思想为基础,结合比特流数据帧的特征,定义了数据帧之间及类簇之间的相似度,采用边聚类边提取符合要求类簇的方式,能快速有效地对数据帧进行聚类;并且该算法能自动地确定聚类的个数,所得的类簇含有相似度评价指标。利用林肯实验室公布的数据集进行测试,说明该算法能以较高的正确率对协议数据帧进行聚类。  相似文献   

4.
基于高斯分布的簇间距离计算方法   总被引:2,自引:0,他引:2  
凝聚的层次聚类算法是一种性能优越的聚类算法,该算法通过不断合并距离相近的簇最终将数据集合划分为用户指定的若干个类别。在聚类的过程中簇间距离计算的准确性是影响算法性能的重要因素。本文提出一种新的基于高斯分布的簇间距离的计算方法,该方法通过簇自身的大小、密度分布等因素改进算法的计算准确性,在不同文本集合上与现有的簇间距离计算方法进行了对比实验,实验结果表明该方法有效地改进了层次聚类算法的性能。  相似文献   

5.
增强的基于GCA(Gravity-based clustering approach)的入侵检测方法是先对训练集采用GCA进行聚类,然后依据凝聚层次聚类算法的思想,以簇间的差异度和整体相似度作为聚类质量评价标准对GCA聚类产生的簇进行一些合并,合并后能使簇中心更集中,簇内对象更紧密。再根据标记算法标记出哪些簇属于正常簇,哪些属于异常簇,最后用检测算法对测试集数据进行检测。实验表明该方法对未知攻击的检测能力有所增强,特别是能有效降低误报率。  相似文献   

6.
基于划分和凝聚层次聚类的无监督异常检测   总被引:3,自引:1,他引:2       下载免费PDF全文
李娜  钟诚 《计算机工程》2008,34(2):120-123
将信息熵理论应用于入侵检测的聚类问题,给出在混合属性条件下数据之间距离、数据与簇之间距离、簇与簇之间距离的定义,以整体相似度的聚类质量评价标准作为聚类合并的策略,提出了一种基于划分和凝聚层次聚类的无监督的异常检测算法。算法分析和实验结果表明,该算法具有较好的检测性能并能有效检测出未知入侵行为。  相似文献   

7.
CURE算法是一种凝聚的层次聚类算法,它首先提出了使用多代表点描述簇的思想。本文通过对已有的基于多代表点的层次聚类算法特点的分析,提出了一种新的基于多代表点的层次聚类算法WRPC。它使用了基于影响因子的簇代表点选取机制和基于k-近邻方法的小簇合并机制,可以发现形状、尺寸更为复杂的簇。实验结果表明,该算法在保证执行效率的情况下取得了更好的聚类效果。  相似文献   

8.
传统的K-Means聚类算法只能保证收敛到局部最优,从而导致聚类结果对初始代表点的选择非常敏感;凝聚层次聚类虽无需选择初始的聚类中心,但计算复杂度较高,而且凝聚过程不可逆。结合网络舆情的特点,深入剖析了K-Means聚类算法和凝聚层次聚类算法的优缺点,对K-Means聚类算法进行改进。改进后算法的核心思想是,结合两种算法分别在初始点选择和聚类过程两个方面的优势,进行整合优化。通过实验分析及实际应用表明,改进后的文本聚类算法在很大程度上可以提高网络舆情信息聚类结果的准确性、有效性以及算法的效率。  相似文献   

9.
模糊C均值聚类算法(FCM)是一种流行的聚类算法,在许多工程领域有着广泛的应用.密度加权的模糊C均值算法(Density Weighted FCM)是对传统FCM的一种改进,它可以很好的解决FCM对噪声敏感的问题.但是DWFCM与FCM都没有解决聚类结果很大程度上依赖初始聚类中心的选择好坏的问题.提出一种基于最近邻居节点对密度的FCM改进算法Improved-DWFCM,通过最近邻居节点估计节点密度的方法解决聚类结果对初始簇中心依赖的问题.仿真结果表明这种算法选择出来的初始聚类中心与最终结果的簇中心非常接近,大大提高了算法收敛的速度以及聚类的效果.  相似文献   

10.
高阶异构数据层次联合聚类算法   总被引:1,自引:0,他引:1  
在实际应用中,包含多种特征空间信息的高阶异构数据广泛出现.由于高阶联合聚类算法能够有效融合多种特征空间信息提高聚类效果,近年来逐渐成为研究热点.目前高阶联合聚类算法多数为非层次聚类算法.然而,高阶异构数据内部往往隐藏着层次聚簇结构,为了更有效地挖掘数据内部隐藏的层次聚簇模式,提出了一种高阶层次联合聚类算法(high-order hierarchical co-clustering algorithm,HHCC).该算法利用变量相关性度量指标Goodman-Kruskal τ衡量对象变量和特征变量的相关性,将相关性较强的对象划分到同一个对象聚簇中,同时将相关性较强的特征划分到同一个特征聚簇中.HHCC算法采用自顶向下的分层聚类策略,利用指标Goodman-Kruskal τ评估每层对象和特征的聚类质量,利用局部搜索方法优化指标Goodman-Kruskal τ,自动确定聚簇数目,获得每层的聚类结果,最终形成树状聚簇结构.实验结果表明HHCC算法的聚类效果优于4种经典的同构层次聚类算法和5种已有的非层次高阶联合聚类算法.  相似文献   

11.
在网页聚类中,HAC(Hierarchical Agglomerative Clustering)算法和K-means算法都是经常用到的。但它们都有各自的不足。提出一种两阶段聚类方法。第一阶段利用HAC聚类算法对网络检索结果的标题进行聚类,第二阶段以第一阶段结果作为初始中心用K-means算法聚类标题和摘要取得比较合理的聚类结果。由于标题一般都比较短,可以大大减少HAC算法的运行时间。这样既满足网络检索对时间的要求又可以得到较好的聚类结果。  相似文献   

12.
曹易  张宁 《计算机系统应用》2012,21(7):65-68,109
通过挖掘网页的浏览记录来对用户群体兴趣进行分析。对访问网站的兴趣类别、时间、用户数进行统计,得到规律性的结论。其次提出一种改进的基于HAC和k-means的算法对用户根据兴趣进行聚类,挖掘用户的访问模式。最后验证了主导兴趣的稳定性即随着日志的增加,用户的最大兴趣是趋于稳定的。  相似文献   

13.
In Wireless Sensor Networks (WSNs), energy efficiency is one of the most important factors influencing the networks’ performance. Through a well designed routing algorithm, WSNs’ energy efficiency can be improved evidently. Among various routing algorithms, hierarchical routing algorithms have advantages in improving nets’ robustness and flexibility, and it is more appropriate for large scale of networks. In this paper, some typical hierarchical routing algorithms are introduced, and their advantages and defects are analyzed. Based on these analyses, a new hierarchical routing algorithm with high energy efficiency named EESSC is proposed which is based on the improved HAC clustering approach. In EESSC, the sensor nodes’ residual energy would be taken into account in clustering operation, and a special packet head is defined to help update nodes’ energy information when transmitting message among the nodes. When the clusters have been formed, the nodes in cluster would be arrayed in a list and cluster head would be rotated automatically by the order of list. And a re-cluster mechanism is designed to dynamic adjust the result of clustering to make sensor nodes organization more reasonable. At last, EESSC is compared to other typical hierarchical routing algorithms in a series of experiments, and the experiments’ result which proves that EESSC has obviously improved the WSNs’ energy efficiency has been analyzed.  相似文献   

14.
Efficient Phrase-Based Document Similarity for Clustering   总被引:1,自引:0,他引:1  
In this paper, we propose a phrase-based document similarity to compute the pair-wise similarities of documents based on the Suffix Tree Document (STD) model. By mapping each node in the suffix tree of STD model into a unique feature term in the Vector Space Document (VSD) model, the phrase-based document similarity naturally inherits the term tf-idf weighting scheme in computing the document similarity with phrases. We apply the phrase-based document similarity to the group-average Hierarchical Agglomerative Clustering (HAC) algorithm and develop a new document clustering approach. Our evaluation experiments indicate that, the new clustering approach is very effective on clustering the documents of two standard document benchmark corpora OHSUMED and RCV1. The quality of the clustering results significantly surpass the results of traditional single-word textit{tf-idf} similarity measure in the same HAC algorithm, especially in large document data sets. Furthermore, by studying the property of STD model, we conclude that the feature vector of phrase terms in the STD model can be considered as an expanded feature vector of the traditional single-word terms in the VSD model. This conclusion sufficiently explains why the phrase-based document similarity works much better than the single-word tf-idf similarity measure.  相似文献   

15.
传统的聚类融合方法通过融合所有成员实现融合,无法彻底消除劣质聚类成员对融合质量的影响,而从聚类成员的选择和加权两方面进行聚类融合,即先采用两两融合技术代替融合所有聚类结果进行聚类成员选择,然后进行基于属性的聚类成员加权,在理论上具有更好优越性。通过对真实数据和模拟数据的实验发现,该算法能有效处理聚类成员的质量差异,比传统聚类融合能得到更好的聚类结果,具有较好可扩展性。  相似文献   

16.
针对传统K-均值聚类方法不能有效处理大规模数据聚类的问题,提出一种基于随机抽样的加速K-均值聚类(K-means Clustering Algorithm Based on Random Sampling , Kmeans_RS)方法,以提高传统K-均值聚类方法的效率。首先从大规模的聚类数据集中进行随机抽样,得到规模较小的工作集,在工作集上进行传统K-均值聚类,得到聚类中心和半径,并得到抽样结果;然后通过衡量剩下的聚类样本与已得到的抽样结果之间的关系,对剩余的样本进行归类。该方法通过随机抽样大大地减小了参与K-均值聚类的问题规模,从而有效提高了聚类效率,可解决大规模数据的聚类问题。实验结果表明,Kmeans_RS方法在大规模数据集中在保持聚类效果的同时大幅度提高了聚类效率。  相似文献   

17.
Cluster formation in vehicular ad hoc networks (VANETs) is a challenging problem due to rapidly changing network topology and frequent network disconnections of vehicles. Dynamic clustering is a technique to form grouping of vehicles on the fly. We propose a multiagent driven dynamic clustering scheme for VANETS on a lane between two intersections by considering vehicle speed, direction, connectivity degree to other vehicles and mobility pattern. The scheme comprises of heavy-weight static and light-weight mobile agents. Initially, cluster members are identified based on vehicle's relative speed and direction for dynamic clustering. Cluster head is selected among the cluster members based on stability metric derived from connectivity degree, average speed and time to leave the road intersection. Cluster head predicts future association of cluster members based on mobility patterns. The announcement of cluster mobility pattern to all cluster members is made by cluster head. The cluster members with similar mobility pattern can reconnect with cluster head after passing an intersection of the lane. We have evaluated the performance and effectiveness of proposed scheme by comparing with an existing clustering scheme. It is observed that proposed scheme performs better than existing stable clustering scheme in terms of cluster formation time, cluster member selection time, cluster head selection time and control overheads.  相似文献   

18.
Wu  Yong-Hao  Li  Zheng  Liu  Yong  Chen  Xiang 《计算机科学技术学报》2020,35(5):979-998

Bug isolation is a popular approach for multi-fault localization (MFL), where all failed test cases are clustered into several groups, and then the failed test cases in each group combined with all passed test cases are used to localize only a single fault. However, existing clustering algorithms cannot always obtain completely correct clustering results, which is a potential threat for bug isolation based MFL approaches. To address this issue, we first analyze the influence of the accuracy of the clustering on the performance of MFL, and the results of a controlled study indicate that using the clustering algorithm with the highest accuracy can achieve the best performance of MFL. Moreover, previous studies on clustering algorithms also show that the elements in a higher density cluster have a higher similarity. Based on the above motivation, we propose a novel approach FATOC (One-Fault-at-a-Time via OPTICS Clustering). In particular, FATOC first leverages the OPTICS (Ordering Points to Identify the Clustering Structure) clustering algorithm to group failed test cases, and then identifies a cluster with the highest density. OPTICS clustering is a density-based clustering algorithm, which can reduce the misgrouping and calculate a density value for each cluster. Such a density value of each cluster is helpful for finding a cluster with the highest clustering effectiveness. FATOC then combines the failed test cases in this cluster with all passed test cases to localize a single-fault through the traditional spectrum-based fault localization (SBFL) formula. After this fault is localized and fixed, FATOC will use the same method to localize the next single-fault, until all the test cases are passed. Our evaluation results show that FATOC can significantly outperform the traditional SBFL technique and a state-of-the-art MFL approach MSeer on 804 multi-faulty versions from nine real-world programs. Specifically, FATOC’s performance is 10.32% higher than that of traditional SBFL when using Ochiai formula in terms of metric A-EXAM. Besides, the results also indicate that, when checking 1%, 3% and 5% statements of all subject programs, FATOC can locate 36.91%, 48.50% and 66.93% of all faults respectively, which is also better than the traditional SBFL and the MFL approach MSeer.

  相似文献   

19.
We present a fast, local clustering service, FLOC, that partitions a multihop wireless network into nonoverlapping and approximately equal-sized clusters. Each cluster has a clusterhead such that all nodes within unit distance and some nodes within distance m of the clusterhead belong to the cluster. We show that, by asserting a stretch factor m geq 2, FLOC achieves locality of clustering and fault-local self-stabilization: The effects of cluster formation and faults/changes at any part of the network are contained within at most m+1 units. Through simulations and experiments with actual deployments, we analyze the trade-offs between clustering time and the quality of clustering and suggest suitable parameters for FLOC to achieve a fast completion time without compromising the quality of the resulting clustering.  相似文献   

20.
We present a fast multiclass classification algorithm to address the multiclass problems with a new clustering method, namely cooperative clustering. In the method of cooperative clustering, we iteratively compute the cluster centers of all classes simultaneously. For every cluster center in a class, a cluster center in an adjacent class is selected and the pair of cluster centers is drawn towards the boundary. In this way, the data set around a class is found and the data set plus the data in this class can be trained to form a classifier. With cooperative clustering, one binary classifier in the one-vs-all approach can be trained with far less samples. Furthermore, a kNN method is proposed to accelerate the classifying procedure. With this algorithm, both training and classification efficiency are improved with a slight impact on classification accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号