期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

冯征《计算机工程与应用》2006,42(20):141-142,146

在传统的硬聚类过程中,得到的簇中数据对象是确定的,然而在现实世界,边界数据是不能被准确划分到任何一个簇的。粗糙集是处理这种边界不确定性的工具,基于此提出了一种基于粗糙集的K-Means聚类算法,这种算法生成的簇包括上近似集和下近似集,可以处理边界对象。试验证明,这种算法是有效的。相似文献

2.

Discriminative K-Means Laplacian Clustering

Chao Guoqing 《Neural Processing Letters》2019,49(1):393-405

Neural Processing Letters - Recently, more and more multi-source data are widely used in many real world applications. This kind of data is high dimensional and comes from different resources,... 相似文献

3.

自适应K-均值聚类算法

李玉鑑《计算机研究与发展》2007,44(Z2):100-104

为了提高传统K-均值聚类的稳定性和可靠性,提出了一种自适应的K-均值聚类算法,其基本思想是通过分析样本集的最小树并切割其中所有超过一定阈值的较长边,根据样本集的结构特征事先自动地计算出合理的聚类个数和合理的初始聚类中心.理论分析和计算实验表明,该算法不仅能够保证聚类结果的惟一性,而且在样本集的各个聚类具有大致凸的形状时,如果类间距离明显大于类内距离,不需要人工选择参数就能直接获得较好的聚类结果.对于同样的数据集而言,即使选择了正确的聚类个数,传统的K-均值算法也可能给出不合理的聚类结果,因此自适应的K-均值聚类算法具有更好的性能. 相似文献

4.

Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization

Luo Ping Xiong Hui Zhan Guoxing Wu Junjie Shi Zhongzhi 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(9):1249-1262

This paper studies the generalization and normalization issues of information-theoretic distance measures for clustering validation. Along this line, we first introduce a uniform representation of distance measures, defined as quasi-distance, which is induced based on a general form of conditional entropy. The quasi-distance possesses three properties: symmetry, the triangle law, and the minimum reachable. These properties ensure that the quasi-distance naturally lends itself as the external measure for clustering validation. In addition, we observe that the ranges of the distance measures are different when they apply for clustering validation on different data sets. Therefore, when comparing the performances of clustering algorithms on different data sets, distance normalization is required to equalize ranges of the distance measures. A critical challenge for distance normalization is to obtain the ranges of a distance measure when a data set is provided. To that end, we theoretically analyze the computation of the maximum value of a distance measure for a data set. Finally, we compare the performances of the partition clustering algorithm K-means on various real-world data sets. The experiments show that the normalized distance measures have better performance than the original distance measures when comparing clusterings of different data sets. Also, the normalized Shannon distance has the best performance among four distance measures under study. 相似文献

5.

Experience with a Hybrid Processor: K-Means Clustering 总被引：2，自引：0，他引：2

Maya Gokhale Jan Frigo Kevin Mccabe James Theiler Christophe Wolinski Dominique Lavenier 《The Journal of supercomputing》2003,26(2):131-148

We discuss hardware/software co-processing on a hybrid processor for a compute- and data-intensive multispectral imaging algorithm, k-means clustering. The experiments are performed on two models of the Altera Excalibur board, the first using the soft IP core 32-bit NIOS 1.1 RISC processor, and the second with the hard IP core ARM processor. In our experiments, we compare performance of the sequential k-means algorithm with three different accelerated versions. We consider granularity and synchronization issues when mapping an algorithm to a hybrid processor. Our results show that speedup of 11.8X is achieved by migrating computation to the Excalibur ARM hardware/software as compared to software only on a Gigahertz Pentium III. Speedup on the Excalibur NIOS is limited by the communication cost of transferring data from external memory through the processor to the customized circuits. This limitation is overcome on the Excalibur ARM, in which dual-port memories, accessible to both the processor and configurable logic, have the biggest performance impact of all the techniques studied. 相似文献

6.

一种改进的K-Means算法

尹成祥张宏军张睿綦秀利王彬《计算机技术与发展》2014,(10):30-33

针对典型K-Means算法随机选取初始中心点导致的算法迭代次数过多的问题,采取数据分段方法,将数据点根据距离分成k段,在每段内选取一个中心作为初始中心点,进行迭代运算;为寻找最优的聚类数目k,定义了新的聚类有效性函数-聚类指数,包含聚类紧密度和聚类显著度两个指标,通过最优化聚类指数,在[1, n ]内寻找最优的k值。在IRIS数据集进行的仿真实验结果表明,算法的迭代次数明显减少,寻找的最优k值接近数据集的真实情况,算法有效性得到了验证。相似文献

7.

Development and Validation of Team Creativity Measures: A Complex Systems Perspective

Hui Jiang Qing‐pu Zhang 《Creativity & Innovation Management》2014,23(3):264-275

Without formalizing the team creativity (TC) concept with reliable and valid measurement, it is difficult to conduct rigorous research to help teams generate creative ideas and problem solving at a high level, of good quality and great value. The one‐sidedness and lack of depth of existing research on team creativity leads to the limited reliability and validity of team creativity measurements. In order to solve these problems, we introduce the complex system theory and develop the TC Scale with nine items for team creativity from three dimensions: team creative thinking, team creative action and team creative outcome. The data is collected from three distinct positions of respondents (managers, team leaders and senior staff) in 183 creative teams. The results of reliability measures, exploratory factor analysis and confirmatory factor analysis strongly support our scale. Further, we test the correlation between team trust and team creativity to establish its predictive validity and make a further verification on the scale structure through second‐order confirmatory factor analysis. Finally, we discuss the implications for research and practice. 相似文献

8.

一种结合PSOA的模糊K-均值客户聚类算法

下载免费PDF全文

朱沅海林泉万杰《计算机工程与科学》2009,31(12)

运用结合PSO(粒子群优化)算法的模糊均值聚类法进行客户聚类分析是CRM中一个新的研究方向。本文提出将M个客户记录指定字段中出现频率最大的N个字段值作为客户的特征属性,由M个客户的特征属性构成客户模糊聚类的模式样品集,并在均值聚类算法中结合PSO算法,对总的类内离散度和进行优化,使其达到最小值,从而获取最佳客户聚类。实验表明,采用本算法能够得到满意的客户聚类结果。相似文献

9.

K-Means聚类算法的研究

周爱武于亚飞《微机发展》2011,(2):62-65

K-Means算法是一种经典的聚类算法,有很多优点,也存在许多不足。比如初始聚类数K要事先指定,初始聚类中心选择存在随机性,算法容易生成局部最优解,受孤立点的影响很大等。文中主要针对K-Means算法初始聚类中心的选择以及孤立点问题加以改进,首先计算所有数据对象之间的距离,根据距离和的思想排除孤立点的影响,然后提出了一种新的初始聚类中心选择方法,并通过实验比较了改进算法与原算法的优劣。实验表明,改进算法受孤立点的影响明显降低,而且聚类结果更接近实际数据分布。相似文献

10.

K-Means聚类算法的研究 总被引：6，自引：0，他引：6

周爱武于亚飞《计算机技术与发展》2011,21(2)

K-Means算法是一种经典的聚类算法,有很多优点,也存在许多不足.比如初始聚类数K要事先指定,初始聚类中心选择存在随机性,算法容易生成局部最优解,受孤立点的影响很大等.文中主要针对K-Means算法初始聚类中心的选择以及孤立点问题加以改进,首先计算所有数据对象之间的距离,根据距离和的思想排除孤立点的影响,然后提出了一种新的初始聚类中心选择方法,并通过实验比较了改进算法与原算法的优劣.实验表明,改进算法受孤立点的影响明显降低,而且聚类结果更接近实际数据分布. 相似文献

11.

基于K均值的迭代局部搜索聚类算法 总被引：1，自引：0，他引：1

吴景岚朱文兴《计算机工程与应用》2004,40(22):37-41

K均值聚类算法(KM)是解决聚类问题的一个常用的方法,该方法的主要缺点是其找到的局部极小值与全局最优值的偏差往往较大。论文构造一种基于KM算法的迭代局部搜索算法(称之为IKM)。该算法以KM算法所得到的解作为初始解,从该初始解开始作局部搜索,在搜索过程中接受部分劣解。当解无法改进时,算法对所得到的局部极小解做适当强度的扰动后进行下一次的迭代,以跳出局部极小,从而拓展了搜索的范围。试验结果表明IKM算法得到的聚类结果比KM算法得到的聚类结果有明显的改进,平均改进达100%以上。当数据集越大,簇的个数越多时,改进的效果越是显著,可以达到300%以上。因而,IKM算法是一个确实可行的有效的方法。相似文献

12.

Approximate Distributed K-Means Clustering over a Peer-to-Peer Network 总被引：4，自引：0，他引：4

Datta Souptik Giannella Chris Kargupta Hillol 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(10):1372-1388

Data intensive Peer-to-Peer (P2P) networks are finding increasing number of applications. Data mining in such P2P environments is a natural extension. However, common monolithic data mining architectures do not fit well in such environments since they typically require centralizing the distributed data which is usually not practical in a large P2P network. Distributed data mining algorithms that avoid large-scale synchronization or data centralization offer an alternate choice. This paper considers the distributed K-means clustering problem where the data and computing resources are distributed over a large P2P network. It offers two algorithms which produce an approximation of the result produced by the standard centralized K-means clustering algorithm. The first is designed to operate in a dynamic P2P network that can produce clusterings by “local” synchronization only. The second algorithm uses uniformly sampled peers and provides analytical guarantees regarding the accuracy of clustering on a P2P network. Empirical results show that both the algorithms demonstrate good performance compared to their centralized counterparts at the modest communication cost. 相似文献

13.

基于密度加权的分裂式K均值聚类算法

张鸿雁杜文锋武丽芬《计算机仿真》2021,38(4):254-257

为避免初始聚类中心陷入局部最优,孤立点影响聚类准确性,结合分裂式思想,提出一种基于密度加权的K均值聚类算法.以K均值聚类算法为基础,引入分裂式思想,提取所有数据对象的属性值组建矢量,通过求解所有数据对象的全部属性,得到经过规范化预处理的数据对象矩阵,根据样本点与点群之间的最小最大距离,构建分裂式K均值聚类算法,采用样本... 相似文献

14.

基于改进的K-均值聚类图像分割算法

柳娟满家巨《数字社区&智能家居》2008,(6):1275-1276

K-均值聚类是一种被广泛应用的方法。本文提出了基于K-均值聚类的改进算法,并应用于图像分割。针对K-均值聚类算法对离群点的反应过强的缺点,通过替换中心点,比较代价函数,来达到改进划分结果的目的。实验结果表明,该方法能有效改善聚类中心,提高分类精度和准确性。相似文献

15.

一种基于网格的改进的K-Means聚类算法 总被引：1，自引：0，他引：1

任家东孟丽丽张冬梅《计算机研究与发展》2009,46(Z2)

K-Means算法对数据集中的每个数据点进行多次处理,因此对于大数据集时间效率不高.为提高K-Means算法的时间性能并使聚类结果更优,利用网格方法定义了单元密度聚合度概念,提出了一种基于网格的改进的K-Means聚类算法(IKMG).IKMG利用网格连通性原理并借助树形结构,将多个密集网格单元作为初始根节点,周围网格作为它的子节点,以此类推,广度优先扩展树最终得到K个聚类树.实验结果表明,IKMG不但大大缩短了K-Means算法对大数据集的处理时间,而且能有效消除聚类结果对初始聚类中心的敏感性,无需人为指定K值,能找出不同大小、不同形状的聚类. 相似文献

16.

一种半监督K均值多关系数据聚类算法 总被引：3，自引：1，他引：3

高滢刘大有齐红刘赫《软件学报》2008,19(11)

提出了一种半监督K均值多关系数据聚类算法.该算法在K均值聚类算法的基础上扩展了其初始类簇的选择方法和对象相似性度量方法,以用于多关系数据的半监督学习.为了获取高性能,该算法在聚类过程中充分利用了标记数据、对象属性及各种关系信息.多关系数据库Movie上的实验结果验证了该算法的有效性. 相似文献

17.

基于聚类和二进制PSO的特征选择

张家柏王小玲《计算机技术与发展》2010,20(6):25-28

特征选择是模式识别及数据挖掘等领域的重要问题之一.特征选择不但可以提高分类精度和效率,也可以找出富含信息的特征子集.针对此问题,在分析了常用的一些特征选择算法之后,文中提出一种基于聚类和二进制PSO算法的特征选择方法,首先基于特征之间的相关性聚类来进行特征分组及筛选,然后针对经过筛选而精简的特征子集采用二进制粒子群算法进行随机搜索.实验结果表明,该算法可有效地找出具有较好的线性可分离性的特征子集,具有特征精简幅度较大、运行效率较高等优点. 相似文献

18.

改进K-means加权自适应多视图数据聚类算法 总被引：1，自引：0，他引：1

李丽亚闫宏印《计算机仿真》2021,38(8):314-317,429

在如今的大数据时代,视图数据越来越多,由于这些数据表现出明显的多样性和差异性,使得多视图数据聚类成为了大数据的研究重点问题之一.针对多视图数据聚类问题,提出了一种基于改进K-means加权自适应多视图聚类算法.首先,提出加权自适应多视图聚类算法,降低视图同维度变换的复杂性.然后考虑到数据的误差性和离群点问题,对数据条件... 相似文献

19.

基于改进的K-均值聚类图像分割算法

LIU Juan MAN Jia-ju 《数字社区&智能家居》2008,(16)

K-均值聚类是一种被广泛应用的方法。本文提出了基于K-均值聚类的改进算法,并应用于图像分割。针对K-均值聚类算法对离群点的反应过强的缺点,通过替换中心点,比较代价函数,来达到改进划分结果的目的。实验结果表明,该方法能有效改善聚类中心,提高分类精度和准确性。相似文献

20.

基于改进K-Means聚类医学图像配准

《软件》2018,(1):75-82

ICP算法广泛应用于医学图像配准,但存在浮动点集初始平移矩阵和旋转矩阵对ICP的影响较大,图像配准容易造成目标函数陷入局部最优值且计算量大等问题。论文提出了基于改进K-Means聚类医学图像配准算法,该方法通过计算出参考图像和浮动图像的质心,获得配准平移初始值;对医学图像坐标进行中心化处理,通过改进的K-Means聚类方法把图像坐标聚成2类;把这2个聚类中心拟合成一条直线,求得该条直线的斜率,进而求得相关倾斜角,获得配准旋转初始值;使用BSGO自动选择特征点,得到参考点集和浮动点集。通过实验得出该算法既可用于单模态图像配准,也可用于多模态图像配准;具有运算量少、图像配准速度较快、计算比较简单、精确度较高等特点,并且解决了图像配准容易陷入局部最优的问题。相似文献