首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents partitioning dynamic clustering methods for interval-valued data based on suitable adaptive quadratic distances. These methods furnish a partition and a prototype for each cluster by optimizing an adequacy criterion that measures the fitting between the clusters and their representatives. These adaptive quadratic distances change at each algorithm iteration and can either be the same for all clusters or different from one cluster to another. Moreover, various tools for the partition and cluster interpretation of interval-valued data are also presented. Experiments with real and synthetic interval-valued data sets show the usefulness of these adaptive clustering methods and the merit of the partition and cluster interpretation tools.  相似文献   

2.
This paper presents variable-wise kernel hard clustering algorithms in the feature space in which dissimilarity measures are obtained as sums of squared distances between patterns and centroids computed individually for each variable by means of kernels. The methods proposed in this paper are supported by the fact that a kernel function can be written as a sum of kernel functions evaluated on each variable separately. The main advantage of this approach is that it allows the use of adaptive distances, which are suitable to learn the weights of the variables on each cluster, providing a better performance. Moreover, various partition and cluster interpretation tools are introduced. Experiments with synthetic and benchmark datasets show the usefulness of the proposed algorithms and the merit of the partition and cluster interpretation tools.  相似文献   

3.
This paper introduces dynamic clustering methods for partitioning symbolic interval data. These methods furnish a partition and a prototype for each cluster by optimizing an adequacy criterion that measures the fitting between clusters and their representatives. To compare symbolic interval data, these methods use single adaptive (city-block and Hausdorff) distances that change at each iteration, but are the same for all clusters. Moreover, various tools for the partition and cluster interpretation of symbolic interval data furnished by these algorithms are also presented. Experiments with real and synthetic symbolic interval data sets demonstrate the usefulness of these adaptive clustering methods and the merit of the partition and cluster interpretation tools.  相似文献   

4.
核聚类算法   总被引:112,自引:0,他引:112  
该文提出了一种用于聚类分析的核聚类方法,通过利用Mercer核,作者把输入空间的样本映射到高维特征空间后,在特征空间中进行聚类,由于经过了核函数的映射,使原来没有显现的特征突出来,从而能够更好地聚类,该核聚类方法在性能上比以典的聚类算法有较大的改进,具有更快的收敛速度以及更为准确的聚类,仿真实验的结果证实了核聚类方法的可行性和有效性。  相似文献   

5.
提出一种针对位置指纹的模糊核c-means聚类算法.将位置指纹归结为一种服从正态分布的区间值数据以反映接入点信号强度采样值的不确定性,通过区间中值和大小确定的正态分布函数将位置指纹映射为特征空间中的一点,并在该特征空间中采用基于核方法的模糊c-means算法对其进行聚类.通过ZigBee定位实验表明,该方法对于位置指纹的分类效果明显好于基于信号强度平均值的c-means聚类,可在保证定位精度的前提下有效降低定位的计算量.  相似文献   

6.
徐鲲鹏  陈黎飞  孙浩军  王备战 《软件学报》2020,31(11):3492-3505
现有的类属型数据子空间聚类方法大多基于特征间相互独立假设,未考虑属性间存在的线性或非线性相关性.提出一种类属型数据核子空间聚类方法.首先引入原作用于连续型数据的核函数将类属型数据投影到核空间,定义了核空间中特征加权的类属型数据相似性度量.其次,基于该度量推导了类属型数据核子空间聚类目标函数,并提出一种高效求解该目标函数的优化方法.最后,定义了一种类属型数据核子空间聚类算法.该算法不仅在非线性空间中考虑了属性间的关系,而且在聚类过程中赋予每个属性衡量其与簇类相关程度的特征权重,实现了类属型属性的嵌入式特征选择.还定义了一个聚类有效性指标,以评价类属型数据聚类结果的质量.在合成数据和实际数据集上的实验结果表明,与现有子空间聚类算法相比,核子空间聚类算法可以发掘类属型属性间的非线性关系,并有效提高了聚类结果的质量.  相似文献   

7.
In this paper, the support vector clustering is extended to an adaptive cell growing model which maps data points to a high dimensional feature space through a desired kernel function. This generalized model is called multiple spheres support vector clustering, which essentially identifies dense regions in the original space by finding their corresponding spheres with minimal radius in the feature space. A multisphere clustering algorithm based on adaptive cluster cell growing method is developed, whereby it is possible to obtain the grade of memberships, as well as cluster prototypes in partition. The effectiveness of the proposed algorithm is demonstrated for the problem of arbitrary cluster shapes and for prototype identification in an actual application to a handwritten digit data set.  相似文献   

8.
This paper presents adaptive and non-adaptive fuzzy c-means clustering methods for partitioning symbolic interval data. The proposed methods furnish a fuzzy partition and prototype for each cluster by optimizing an adequacy criterion based on suitable squared Euclidean distances between vectors of intervals. Moreover, various cluster interpretation tools are introduced. Experiments with real and synthetic data sets show the usefulness of these fuzzy c-means clustering methods and the merit of the cluster interpretation tools.  相似文献   

9.
动态加权模糊核聚类算法   总被引:2,自引:0,他引:2  
为了克服噪声特征向量对聚类的影响,充分考虑各特征向量对聚类结果的贡献度的不同,运用mercer核将待聚类的数据映射到高维空间,提出了一种新的动态加权模糊核聚类算法.该算法运用动态加权,自动消弱噪声特征向量在分类中的作用,在对数据没有任何先验信息的情况下,不仅能够准确划分线性数据,而且能够做到非线性划分非团状数据.仿真和实际数据分类结果表明,数据中的噪声对分类结果影响较小,该算法具有很高的实用性.  相似文献   

10.
Clustering for symbolic data type is a necessary process in many scientific disciplines, and the fuzzy c-means clustering for interval data type (IFCM) is one of the most popular algorithms. This paper presents an adaptive fuzzy c-means clustering algorithm for interval-valued data based on interval-dividing technique. This method gives a fuzzy partition and a prototype for each fuzzy cluster by optimizing an objective function. And the adaptive distance between the pattern and its cluster center varies with each algorithm iteration and may be either different from one cluster to another or the same for all clusters. The novel part of this approach is that it takes into account every point in both intervals when computing the distance between the cluster and its representative. Experiments are conducted on synthetic data sets and a real data set. To compare the comprehensive performance of the proposed method with other four existing methods, the corrected rand index, the value of objective function and iterations are introduced as the evaluation criterion. Clustering results demonstrate that the algorithm proposed in this paper has remarkable advantages.  相似文献   

11.
提出了一种基于核的聚类算法,并将其应用到入侵检测中,构造了一种新的检测模型。通过利用Mercer核,我们把输入空间的样本映射到高维特征空间后,在特征空间中进行聚类。由于经过了核函数的映射,使原来没有显现的特征凸显出来,从而能够更好地聚类。而且在初始化聚类中心的选择上利用了数据分段的方法,该聚类方法在性能上比经典的聚类算法有较大的改进,具有更快的收敛速度以及更为准确的聚类。仿真试验的结果证实了该方法的可行性和有效性。  相似文献   

12.
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm   总被引:3,自引:0,他引:3  
  相似文献   

13.
Clustering is the process of organizing objects into groups whose members are similar in some way. Most of the clustering methods involve numeric data only. However, this representation may not be adequate to model complex information which may be: histogram, distributions, intervals. To deal with these types of data, Symbolic Data Analysis (SDA) was developed. In multivariate data analysis, it is common some variables be more or less relevant than others and less relevant variables can mask the cluster structure. This work proposes a clustering method based on fuzzy approach that produces weighted multivariate memberships for interval-valued data. These memberships can change at each iteration of the algorithm and they are different from one variable to another and from one cluster to another. Furthermore, there is a different relevance weight associated to each variable that may also be different from one cluster to another. The advantage of this method is that it is robust to ambiguous cluster membership assignment since weights represent how important the different variables are to the clusters. Experiments are performed with synthetic data sets to compare the performance of the proposed method against other methods already established by the clustering literature. Also, an application with interval-valued scientific production data is presented in this work. Clustering quality results have shown that the proposed method offers higher accuracy when variables have different variabilities.  相似文献   

14.
为了提高分类型数据集聚类的准确性和对广泛数据集聚类的适应性,引入3种核函数,再利用基于山方法的核K-means作分类型的数据聚类,核函数把分类型数据映射到高维特征空间,从而给缺乏测度的分类型数据引入了数值型数据的测度.改进后用多个公开数据集对这些方法进行了实验评测,结果显示这些方法对分类型数据的聚类是有效的.  相似文献   

15.
In this paper, a novel clustering method in the kernel space is proposed. It effectively integrates several existing algorithms to become an iterative clustering scheme, which can handle clusters with arbitrary shapes. In our proposed approach, a reasonable initial core for each of the cluster is estimated. This allows us to adopt a cluster growing technique, and the growing cores offer partial hints on the cluster association. Consequently, the methods used for classification, such as support vector machines (SVMs), can be useful in our approach. To obtain initial clusters effectively, the notion of the incomplete Cholesky decomposition is adopted so that the fuzzy c‐means (FCM) can be used to partition the data in a kernel defined‐like space. Then a one‐class and a multiclass soft margin SVMs are adopted to detect the data within the main distributions (the cores) of the clusters and to repartition the data into new clusters iteratively. The structure of the data set is explored by pruning the data in the low‐density region of the clusters. Then data are gradually added back to the main distributions to assure exact cluster boundaries. Unlike the ordinary SVM algorithm, whose performance relies heavily on the kernel parameters given by the user, the parameters are estimated from the data set naturally in our approach. The experimental evaluations on two synthetic data sets and four University of California Irvine real data benchmarks indicate that the proposed algorithms outperform several popular clustering algorithms, such as FCM, support vector clustering (SVC), hierarchical clustering (HC), self‐organizing maps (SOM), and non‐Euclidean norm fuzzy c‐means (NEFCM). © 2009 Wiley Periodicals, Inc.4  相似文献   

16.
经典的模糊C-均值聚类算法存在对噪声数据较为敏感、未考虑样本属性特征间的不平衡性及对高维数据聚类不理想等问题,而可能性聚类算法虽然解决了噪声敏感和一致性聚类问题,但算法假定每个样本对聚类的贡献程度一样。针对以上问题,提出了一种基于样本-特征加权的可能性模糊核聚类算法,将可能性聚类应用到模糊聚类中以提高其对噪声或例外点的抗干扰能力;同时,根据不同类的具体特性动态计算样本各个属性特征对不同类别的重要性权值及各个样本对聚类的重要性权值,并优化选取核参数,不断修正核函数把原始空间中非线性可分的数据集映射到高维空间中的可分数据集。实验结果表明,基于样本-特征加权模糊聚类算法能够减少噪声数据和例外点的影响,比传统的聚类算法具有更好的聚类准确率。  相似文献   

17.
Zhong  Zhi  Chen  Long 《Multimedia Tools and Applications》2019,78(23):33339-33356

For many machine learning and data mining tasks in the information explosion environment, one is often confronted with very high dimensional heterogeneous data. Demands for new methods to select discrimination and valuable features that are beneficial to classification and cluster have increased. In this paper, we propose a novel feature selection method to jointly map original data from input space to kernel space and conduct both subspace learning (via locality preserving projection) and feature selection (via a sparsity constraint). Specifically, the nonlinear relationship between data is explored adequately through mapping data from original low-dimensional space to kernel space. Meanwhile, the subspace learning technique is leveraged to preserve available information of local structure in ambient space. Last, by restricting the sparsity of the coefficient matrix, the weight of some features is 0. As a result, we eliminate redundant and irrelevant features and thus make our method select informative and distinguishing features. By comparing our proposed method with some state-of-the-art methods, the experimental results demonstrate that the proposed method outperformed the comparisons in terms of clustering task.

  相似文献   

18.
基于Seed集的半监督核聚类   总被引:2,自引:1,他引:1       下载免费PDF全文
提出了一种新的半监督核聚类算法——SKK-均值算法。算法利用一定数量的标记样本构成seed集,作为监督信息来初始化K-均值算法的聚类中心,引导聚类过程并约束数据划分;同时还采用了核方法把输入数据映射到高维特征空间,并用核函数来实现样本之间的距离计算。在UCI数据集上进行了数值实验,并与K-均值算法和核-K-均值算法进行了比较。  相似文献   

19.
提出一种新的鲁棒核模糊C-均值聚类算法.将连通核与AFCM(Alternative fuzzy C-means)聚类算法相结合,给出基于连通核的核AFCM:CRKFCM(Connectivity kernel based robust fuzzy C-means).CRKFCM一方面有效地利用了连通核,可以对任意形状数据聚类,且避免了核参数的选取问题;另一方面在特征空间使用非欧氏距离,可以有效地处理含噪声数据的聚类问题.实验结果表明,与原有的AFCM和连通核硬C-均值(CKHCM,Connectivity kernel based hard C-means)聚类算法相比,新算法在处理噪声环境中的任意形状聚类问题方面更有效.  相似文献   

20.
A new data clustering algorithm Density oriented Kernelized version of Fuzzy c-means with new distance metric (DKFCM-new) is proposed. It creates noiseless clusters by identifying and assigning noise points into separate cluster. In an earlier work, Density Based Fuzzy C-Means (DOFCM) algorithm with Euclidean distance metric was proposed which only considered the distance between cluster centroid and data points. In this paper, we tried to improve the performance of DOFCM by incorporating a new distance measure that has also considered the distance variation within a cluster to regularize the distance between a data point and the cluster centroid. This paper presents the kernel version of the method. Experiments are done using two-dimensional synthetic data-sets, standard data-sets referred from previous papers like DUNN data-set, Bensaid data-set and real life high dimensional data-sets like Wisconsin Breast cancer data, Iris data. Proposed method is compared with other kernel methods, various noise resistant methods like PCM, PFCM, CFCM, NC and credal partition based clustering methods like ECM, RECM, CECM. Results shown that proposed algorithm significantly outperforms its earlier version and other competitive algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号