首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 78 毫秒
1.
一种半监督K均值多关系数据聚类算法   总被引:3,自引:1,他引:3  
提出了一种半监督K均值多关系数据聚类算法.该算法在K均值聚类算法的基础上扩展了其初始类簇的选择方法和对象相似性度量方法,以用于多关系数据的半监督学习.为了获取高性能,该算法在聚类过程中充分利用了标记数据、对象属性及各种关系信息.多关系数据库Movie上的实验结果验证了该算法的有效性.  相似文献   

2.
雷小锋  谢昆青  林帆  夏征义 《软件学报》2008,19(7):1683-1692
K-Means聚类算法只能保证收敛到局部最优,从而导致聚类结果对初始代表点的选择非常敏感.许多研究工作都着力于降低这种敏感性.然而,K-Means的局部最优和结果敏感性却构成了K-MeanSCAN聚类算法的基础.K-MeanSCAN算法对数据集进行多次采样和K-Means预聚类以产生多组不同的聚类结果,来自不同聚类结果的子簇之间必然会存在交集.算法的核心思想是,利用这些交集构造出关于子簇的加权连通图,并根据连通性合并子簇.理论和实验证明,K-MeanScan算法可以在很大程度上提高聚类结果的质量和算法的效率.  相似文献   

3.
王娇  罗四维  王立 《计算机科学》2012,39(103):635-539
半监督学习是机器学习领域的研究热点。协同训练研究数据有多个特征集时的半监督学习问题。将图表示法引入协同训练,使用多个图结构表示多关系数据。在每个图上进行半监督学习,在多个图之间进行协同学习,使多个图上的学习器对数据的预测一致。创新性地提出一种针对多关系数据的半监督协同训练算法,并从概率角度分析学习过程。在真实数据集上的实验表明,提出的算法处理多关系数据时具有较好的性能。  相似文献   

4.
半监督加权模糊C均值聚类算法   总被引:1,自引:1,他引:1  
江秀勤 《计算机工程》2009,35(17):170-171
对于团状、每类样本数相差较大的数据集,FCM算法和半监督模糊C均值聚类算法都不是最佳聚类方法,因为它们对数据集有等划分趋势。针对这种情况,利用样本点分布密度大小作为权值,结合半监督学习方法,提出半监督点密度加权模糊C均值聚类算法。在半监督学习过程中,对于求极值的问题采用模拟退火算法。结果证明,点密度加权模糊C均值聚类算法确实能提高聚类精度。  相似文献   

5.
半监督的改进K-均值聚类算法   总被引:4,自引:1,他引:3  
K-均值聚类算法必须事先获取聚类数目,并且随机地选取聚类初始中心会造成聚类结果不稳定,容易在获得一个局部最优值时终止。提出了一种基于半监督学习理论的改进K-均值聚类算法,利用少量标签数据建立图的最小生成树并迭代分裂获取K-均值聚类算法所需要的聚类数和初始聚类中心。在IRIS数据集上的实验表明,尽管随机样本构造的生成树不同,聚类中心也不同,但聚类是一致且稳定的,迭代的次数较少,验证了该文算法的有效性。  相似文献   

6.
提出一种选择最富信息数据并予以标记的基于主动学习策略的半监督聚类算法。首先, 采用传统K-均值聚类算法对数据集进行粗聚类; 其次, 根据粗聚类结果计算出每个数据隶属于每个类簇的隶属度, 筛选出满足最大与次大隶属度差值小于阈值的候选数据, 并从中选择差值较小的数据作为最富信息的数据进行标记; 最后, 将候选数据集合中未标记数据分组到与每类已被标记数据平均距离最小的类簇中。实验表明, 提出的主动学习策略能够很好地学习到最富信息数据, 基于该学习策略的半监督聚类算法在测试不同数据集时均获得了较高的准确率。  相似文献   

7.
基于半监督学习的K-均值聚类算法研究   总被引:1,自引:3,他引:1  
定义了一个欧氏距离和监督信息相混合的新的最近邻计算函数,从而将K-均值算法很好地应用于半监督聚类问题。针对K-均值算法初始质心敏感的缺陷,用粒子群算法的搜索空间模拟聚类的欧氏空间,迭代搜索找到较优的聚类质心,同时提出动态管理种群的策略以提高粒子群算法搜索效率。算法在UCI的多个数据集上测试都得到了较好的聚类准确率。  相似文献   

8.
半监督聚类算法研究现状   总被引:1,自引:0,他引:1  
半监督聚类是近几年机器学习领域的一个新的研究方向,也是数据挖掘的一个重要分支,逐步成为许多领域的有用工具。对数据挖掘半监督聚类算法的研究现状及发展趋势进行了分析与概括,并比较分析几种典型半监督聚类算法的优点与局限性,以便于对半监督聚类算法作进一步的研究。  相似文献   

9.
基于分类的半监督聚类方法   总被引:1,自引:0,他引:1  
提出一种基于分类的半监督聚类算法。充分利用了数据集中的少量标记对象对原始数据集进行粗分类,在传统k均值算法的基础上扩展了聚类中心点的选择方法;用k-meansGuider方法对数据集进行粗聚类,在此基础上对粗聚类结果进行集成。在多个UCI标准数据集上进行实验,结果表明提出的算法能有效改善聚类质量。  相似文献   

10.
一种基于谱聚类的半监督聚类方法   总被引:6,自引:1,他引:6  
司文武  钱沄涛 《计算机应用》2005,25(6):1347-1349
半监督聚类利用少部分标签的数据辅助大量未标签的数据进行非监督的学习,从而提高聚类的性能。提出一种基于谱聚类的半监督聚类算法,其利用标签数据的信息,调整点与点之间的距离所形成的距离矩阵,而后基于被调整的距离矩阵进行谱聚类。实验表明,该算法较之于已提出的半监督聚类算法,获得了更好的聚类性能。  相似文献   

11.
Interval Set Clustering of Web Users with Rough K-Means   总被引:1,自引:0,他引:1  
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.  相似文献   

12.
Harmony K-means algorithm for document clustering   总被引:2,自引:0,他引:2  
Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known optimum faster than other methods and the quality of clusters are comparable.  相似文献   

13.
针对基本粒子群算法的早熟收敛和收敛较慢的问题,提出了一种带变异操作的粒子群聚类算法。算法中对出现早熟收敛的种群采取变异操作,使其能够跳出局部最优解。对Iris植物样本数据的测试结果表明:该算法具有很好的全局收敛性和较快的收敛速度。  相似文献   

14.
In this paper we considered clustering of data corrupted by noise or suffering from imprecision due to finite resolution of the feature measuring device. Our work is motivated by the fact that no measurement can be made perfect and addition of noise is not an uncommon phenomenon for telemetric data. Here we tried to show how the classical k-means algorithm should be modified to take care of the noise/imprecision. Experimental results on Fisher's Iris data and a Nutrition data are demonstrated.  相似文献   

15.
Clustering is a very powerful data mining technique for topic discovery from text documents. The partitional clustering algorithms, such as the family of k-means, are reported performing well on document clustering. They treat the clustering problem as an optimization process of grouping documents into k clusters so that a particular criterion function is minimized or maximized. Usually, the cosine function is used to measure the similarity between two documents in the criterion function, but it may not work well when the clusters are not well separated. To solve this problem, we applied the concepts of neighbors and link, introduced in [S. Guha, R. Rastogi, K. Shim, ROCK: a robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345–366], to document clustering. If two documents are similar enough, they are considered as neighbors of each other. And the link between two documents represents the number of their common neighbors. Instead of just considering the pairwise similarity, the neighbors and link involve the global information into the measurement of the closeness of two documents. In this paper, we propose to use the neighbors and link for the family of k-means algorithms in three aspects: a new method to select initial cluster centroids based on the ranks of candidate documents; a new similarity measure which uses a combination of the cosine and link functions; and a new heuristic function for selecting a cluster to split based on the neighbors of the cluster centroids. Our experimental results on real-life data sets demonstrated that our proposed methods can significantly improve the performance of document clustering in terms of accuracy without increasing the execution time much.  相似文献   

16.
Semi-supervised graph clustering: a kernel approach   总被引:6,自引:0,他引:6  
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We first show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semi-supervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.  相似文献   

17.
稀疏编码已经广泛应用于复数图像的降噪问题,其中,近些年提出的分组稀疏编码由于能够充分利用同一分组图像块的相似性,在滤除噪声和提高降噪信噪比方面具有更大的优势.研究了一种基于K-means聚类方法的复数图像分组稀疏降噪算法,通过改进聚类算法,验证了K-means算法对分组稀疏编码算法的分组有效性.采用在线复数词典训练算法快速获取编码字典,并运用分组正交匹配追踪算法,实现了分组图像块的稀疏编码.通过限制每一分组图像块中编码的相似性,有效抑制了对图像块中噪声的编码,提高了对复数图像的降噪效果.为验证算法的有效性,对模拟和真实的干涉合成孔径雷达图像的仿真噪声进行了定量分析,证明了所提算法相对于以前的分组稀疏编码算法在峰值信噪比指标上有一定的提升.最后对真实的干涉合成孔径雷达图像进行了降噪,进一步验证了所提降噪算法对于真实噪声的降噪能力.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号