共查询到20条相似文献,搜索用时 15 毫秒
1.
This article describes a multiobjective spatial fuzzy clustering algorithm for image segmentation. To obtain satisfactory segmentation performance for noisy images, the proposed method introduces the non-local spatial information derived from the image into fitness functions which respectively consider the global fuzzy compactness and fuzzy separation among the clusters. After producing the set of non-dominated solutions, the final clustering solution is chosen by a cluster validity index utilizing the non-local spatial information. Moreover, to automatically evolve the number of clusters in the proposed method, a real-coded variable string length technique is used to encode the cluster centers in the chromosomes. The proposed method is applied to synthetic and real images contaminated by noise and compared with k-means, fuzzy c-means, two fuzzy c-means clustering algorithms with spatial information and a multiobjective variable string length genetic fuzzy clustering algorithm. The experimental results show that the proposed method behaves well in evolving the number of clusters and obtaining satisfactory performance on noisy image segmentation. 相似文献
2.
针对迁移原型聚类的优化问题,本文以模糊知识匹配迁移原型聚类为基础,介绍了聚类场景中从源域到目标域的迁移学习机制,明确了源域聚类中心辅助目标域得到更好的聚类效果。但目前此类迁移机制依然面临如下的挑战:1)如何克服已有迁移原型聚类方法中不同类别间的知识强制性匹配带来的负作用。2)当源域与目标域相似度较低时,如何避免模糊强制性匹配的不合理性以及过于依赖源域知识的缺陷被放大。为此,研究了一种新的迁移原型聚类机制,即可能性匹配知识迁移原型机制,并基于此实现了2个具体的迁移聚类算法。借鉴可能性匹配的思想,该算法可以自动选择和偏重有用的源域知识,克服了源域和目标域之间的强制性匹配限制,具有较好的可调节性。研究结果表明:在不同迁移场景下模拟数据集和真实NG20groups数据集上的实验研究表明,提出的算法较已有的相关算法展现了更好的性能。 相似文献
3.
The statistical properties of training, validation and test data play an important role in assuring optimal performance in artificial neural networks (ANNs). Researchers have proposed optimized data partitioning (ODP) and stratified data partitioning (SDP) methods to partition of input data into training, validation and test datasets. ODP methods based on genetic algorithm (GA) are computationally expensive as the random search space can be in the power of twenty or more for an average sized dataset. For SDP methods, clustering algorithms such as self organizing map (SOM) and fuzzy clustering (FC) are used to form strata. It is assumed that data points in any individual stratum are in close statistical agreement. Reported clustering algorithms are designed to form natural clusters. In the case of large multivariate datasets, some of these natural clusters can be big enough such that the furthest data vectors are statistically far away from the mean. Further, these algorithms are computationally expensive as well. We propose a custom design clustering algorithm (CDCA) to overcome these shortcomings. Comparisons are made using three benchmark case studies, one each from classification, function approximation and prediction domains. The proposed CDCA data partitioning method is evaluated in comparison with SOM, FC and GA based data partitioning methods. It is found that the CDCA data partitioning method not only perform well but also reduces the average CPU time. 相似文献
4.
Density-based clustering algorithms are attractive for the task of class identification in spatial database. However, in many cases, very different local-density clusters exist in different regions of data space, therefore, DBSCAN method [M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: E. Simoudis, J. Han, U.M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI, Menlo Park, CA, 1996, pp. 226–231] using a global density parameter is not suitable. Although OPTICS [M. Ankerst, M.M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: ordering points to identify the clustering structure, in: A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), Proceedings of ACM SIGMOD International Conference on Management of Data Philadelphia, PA, ACM, New York, 1999, pp. 49–60] provides an augmented ordering of the database to represent its density-based clustering structure, it only generates the clusters with local-density exceeds certain thresholds but not the cluster of similar local-density; in addition, it does not produce clusters of a data set explicitly. Furthermore, the parameters required by almost all the major clustering algorithms are hard to determine although they significantly impact on the clustering result. In this paper, a new clustering algorithm LDBSCAN relying on a local-density-based notion of clusters is proposed. In this technique, the selection of appropriate parameters is not difficult; it also takes the advantage of the LOF [M.M. Breunig, H.-P. Kriegel, R.T. Ng, J. Sander, LOF: identifying density-based local outliers, in: W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), Proceedings of ACM SIGMOD International Conference on Management of Data, Dalles, TX, ACM, New York, 2000, pp. 93–104] to detect the noises comparing with other density-based clustering algorithms. The proposed algorithm has potential applications in business intelligence. 相似文献
5.
In this paper, we define a validity measure for fuzzy criterion clustering which is a novel approach to fuzzy clustering that in addition to being non-distance-based, addresses the cluster validity problem. The model is then recast as a bilevel fuzzy criterion clustering problem. We propose an algorithm for this model that solves both the validity and clustering problems. Our approach is validated via some sample problems. 相似文献
6.
Joris D’hondt Joris Vertommen Paul-Armand Verhaegen Dirk Cattrysse Joost R. Duflou 《Information Sciences》2010,180(12):2341-3797
This paper introduces a novel pairwise-adaptive dissimilarity measure for large high dimensional document datasets that improves the unsupervised clustering quality and speed compared to the original cosine dissimilarity measure. This measure dynamically selects a number of important features of the compared pair of document vectors. Two approaches for selecting the number of features in the application of the measure are discussed. The proposed feature selection process makes this dissimilarity measure especially applicable in large, high dimensional document collections. Its performance is validated on several test sets originating from standardized datasets. The dissimilarity measure is compared to the well-known cosine dissimilarity measure using the average F-measures of the hierarchical agglomerative clustering result. This new dissimilarity measure results in an improved clustering result obtained with a lower required computational time. 相似文献
7.
Partitional clustering of categorical data is normally performed by using K-modes clustering algorithm, which works well for large datasets. Even though the design and implementation of K-modes algorithm is simple and efficient, it has the pitfall of randomly choosing the initial cluster centers for invoking every new execution that may lead to non-repeatable clustering results. This paper addresses the randomized center initialization problem of K-modes algorithm by proposing a cluster center initialization algorithm. The proposed algorithm performs multiple clustering of the data based on attribute values in different attributes and yields deterministic modes that are to be used as initial cluster centers. In the paper, we propose a new method for selecting the most relevant attributes, namely Prominent attributes, compare it with another existing method to find Significant attributes for unsupervised learning, and perform multiple clustering of data to find initial cluster centers. The proposed algorithm ensures fixed initial cluster centers and thus repeatable clustering results. The worst-case time complexity of the proposed algorithm is log-linear to the number of data objects. We evaluate the proposed algorithm on several categorical datasets and compared it against random initialization and two other initialization methods, and show that the proposed method performs better in terms of accuracy and time complexity. The initial cluster centers computed by the proposed approach are close to the actual cluster centers of the different data we tested, which leads to faster convergence of K-modes clustering algorithm in conjunction to better clustering results. 相似文献
8.
对基于区间值数据的模糊聚类算法进行了研究,介绍了具有控制区间大小对聚类结果影响的加权因子的模糊C-均值聚类新算法.针对区间值数据模糊C-均值聚类新算法提出了一个适应距离的弹性系数,使算法得到改进,既能利用传统的FCM算法,又考虑了区间大小对聚类结果的影响,同时也能发现不规则的聚类子集,使聚类结果更加准确. 相似文献
9.
Masud Moshtaghi Author Vitae Sutharshan Rajasegarar Author Vitae Author Vitae Shanika Karunasekera Author Vitae 《Pattern recognition》2011,44(9):2197-2209
Clustering has been widely used as a fundamental data mining tool for the automated analysis of complex datasets. There has been a growing need for the use of clustering algorithms in embedded systems with restricted computational capabilities, such as wireless sensor nodes, in order to support automated knowledge extraction from such systems. Although there has been considerable research on clustering algorithms, many of the proposed methods are computationally expensive. We propose a robust clustering algorithm with low computational complexity, suitable for computationally constrained environments. Our evaluation using both synthetic and real-life datasets demonstrates lower computational complexity and comparable accuracy of our approach compared to a range of existing methods. 相似文献
10.
Fuzzy c-means (FCM) is one of the most popular techniques for data clustering. Since FCM tends to balance the number of data points in each cluster, centers of smaller clusters are forced to drift to larger adjacent clusters. For datasets with unbalanced clusters, the partition results of FCM are usually unsatisfactory. Cluster size insensitive FCM (csiFCM) dealt with “cluster-size sensitivity” problem by dynamically adjusting the condition value for the membership of each data point based on cluster size after the defuzzification step in each iterative cycle. However, the performance of csiFCM is sensitive to both the initial positions of cluster centers and the “distance” between adjacent clusters. In this paper, we present a cluster size insensitive integrity-based FCM method called siibFCM to improve the deficiency of csiFCM. The siibFCM method can determine the membership contribution of every data point to each individual cluster by considering cluster's integrity, which is a combination of compactness and purity. “Compactness” represents the distribution of data points within a cluster while “purity” represents how far a cluster is away from its adjacent cluster. We tested our siibFCM method and compared with the traditional FCM and csiFCM methods extensively by using artificially generated datasets with different shapes and data distributions, synthetic images, real images, and Escherichia coli dataset. Experimental results showed that the performance of siibFCM is superior to both traditional FCM and csiFCM in terms of the tolerance for “distance” between adjacent clusters and the flexibility of selecting initial cluster centers when dealing with datasets with unbalanced clusters. 相似文献
11.
Liang Bai Jiye Liang Chuangyin Dang Fuyuan CaoAuthor vitae 《Pattern recognition》2011,44(12):2843-2861
Due to data sparseness and attribute redundancy in high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. To effectively address this issue, this paper presents a new optimization algorithm for clustering high-dimensional categorical data, which is an extension of the k-modes clustering algorithm. In the proposed algorithm, a novel weighting technique for categorical data is developed to calculate two weights for each attribute (or dimension) in each cluster and use the weight values to identify the subsets of important attributes that categorize different clusters. The convergence of the algorithm under an optimization framework is proved. The performance and scalability of the algorithm is evaluated experimentally on both synthetic and real data sets. The experimental studies show that the proposed algorithm is effective in clustering categorical data sets and also scalable to large data sets owning to its linear time complexity with respect to the number of data objects, attributes or clusters. 相似文献
12.
Jung-Hsien Chiang Author Vitae Author Vitae 《Pattern recognition》2007,40(11):3132-3145
This paper presents a new partitioning algorithm, designated as the Adaptive C-Populations (ACP) clustering algorithm, capable of identifying natural subgroups and influential minor prototypes in an unlabeled dataset. In contrast to traditional Fuzzy C-Means clustering algorithms, which partition the whole dataset equally, adaptive clustering algorithms, such as that presented in this study, identify the natural subgroups in unlabeled datasets. In this paper, data points within a small, dense region located at a relatively large distance from any of the major cluster centers are considered to form a minor prototype. The aim of ACP is to adaptively separate these isolated minor clusters from the major clusters in the dataset. The study commences by introducing the mathematical model of the proposed ACP algorithm and demonstrates its convergence to a stable solution. The ability of ACP to detect minor prototypes is confirmed via its application to the clustering of three different datasets with different sizes and characteristics. 相似文献
13.
针对数据竞争聚类算法在处理复杂结构数据集时聚类性能不佳的问题,提出了一种密度敏感的数据竞争聚类算法。首先,在密度敏感距离测度的基础上定义了局部距离,以描述数据分布的局部一致性;其次,在局部距离的基础上计算出数据间的全局距离,用来描述数据分布的全局一致性,挖掘数据的空间分布信息,以弥补欧氏距离描述数据分布全局一致性能力不佳的缺陷;最后,将全局距离用于数据竞争聚类算法中。将新算法与基于欧氏距离的数据竞争聚类算法进行性能比较,在人工数据集和真实数据集上的实验结果表明,该算法克服了数据竞争聚类算法难以处理复杂结构数据的缺点,聚类结果具有更高的准确率。 相似文献
14.
谱聚类方法的应用已经开始从图像分割领域扩展到文本挖掘领域中,并取得了一定的成果。在自动确定聚类数目的基础上,结合模糊理论与谱聚类算法,提出了一种应用在多文本聚类中的模糊聚类算法,该算法主要描述了如何实现单个文本同时属于多个文本类的模糊谱聚类方法。实验仿真结果表明该算法具有很好的聚类效果。 相似文献
15.
Ming-Chao Chiang Author Vitae Chun-Wei Tsai Author Vitae Chu-Sing Yang Author Vitae 《Information Sciences》2011,181(4):716-3410
This paper presents an efficient algorithm, called pattern reduction (PR), for reducing the computation time of k-means and k-means-based clustering algorithms. The proposed algorithm works by compressing and removing at each iteration patterns that are unlikely to change their membership thereafter. Not only is the proposed algorithm simple and easy to implement, but it can also be applied to many other iterative clustering algorithms such as kernel-based and population-based clustering algorithms. Our experiments—from 2 to 1000 dimensions and 150 to 10,000,000 patterns—indicate that with a small loss of quality, the proposed algorithm can significantly reduce the computation time of all state-of-the-art clustering algorithms evaluated in this paper, especially for large and high-dimensional data sets. 相似文献
16.
K-means聚类算法的性能依赖于距离度量的选择,k-means算法将欧几里德距离作为最常用的距离度量方法。欧氏距离认为所有属性在聚类中作用是相同的,但是这种距离度量方法并不能准确反映样本间的相异性。针对这种不足,提出了融合变异系数的k-means聚类分析方法(CV-k-means),利用变异系数权重向量来减少不相关属性的影响。实验结果表明,该方法的聚类结果优于k-means算法。 相似文献
17.
Clustering is an important research area with numerous applications in pattern recognition, machine learning, and data mining. Since the clustering problem on numeric data sets can be formulated as a typical combinatorial optimization problem, many researches have addressed the design of heuristic algorithms for finding sub-optimal solutions in a reasonable period of time. However, most of the heuristic clustering algorithms suffer from the problem of being sensitive to the initialization and do not guarantee the high quality results. Recently, Approximate Backbone (AB), i.e., the commonly shared intersection of several sub-optimal solutions, has been proposed to address the sensitivity problem of initialization. In this paper, we aim to introduce the AB into heuristic clustering to overcome the initialization sensitivity of conventional heuristic clustering algorithms. The main advantage of the proposed method is the capability of restricting the initial search space around the optimal result by defining the AB, and in turn, reducing the impact of initialization on clustering, eventually improving the performance of heuristic clustering. Experiments on synthetic and real world data sets are performed to validate the effectiveness of the proposed approach in comparison to three conventional heuristic clustering algorithms and three other algorithms with improvement on initialization. 相似文献
18.
This paper demonstrates the similarity between the single-link clustering method and a recently proposed clustering method based on mode-seeking. A user of clustering algorithms should be aware of this fact when attempting to assess clusters from different points of view. 相似文献
19.
针对分类研究中采用单一类型数据造成的结果失真, 提出了综合考虑产品属性和销售时间序列的两阶段优化聚类算法。分别采用基于属性的相似性排序及时间序列的分层优化聚类实现产品单独聚类, 然后基于初始聚类结果及参数化的动态相对权重提出考虑噪声数据处理的分层聚类方法实现产品综合优化分类。企业实例应用研究表明综合聚类模型及两阶段算法在聚类精度及时间复杂度上具有明显的优势, 相对权重的动态参数化设置有效解决了不同产品间个性化特征的差异表示。通用数据集的仿真进一步验证了算法在解决混合属性产品聚类问题时的优越性及广泛适用性。 相似文献
20.
Spectral clustering: A semi-supervised approach 总被引:2,自引:0,他引:2
Recently, graph-based spectral clustering algorithms have been developing rapidly, which are proposed as discrete combinatorial optimization problems and approximately solved by relaxing them into tractable eigenvalue decomposition problems. In this paper, we first review the current existing spectral clustering algorithms in a unified-framework way and give a straightforward explanation about spectral clustering. We also present a novel model for generalizing the unsupervised spectral clustering to semi-supervised spectral clustering. Under this model, prior information given by some instance-level constraints can be generalized to space-level constraints. We find that (undirected) graph built on the enlarged prior information is more meaningful, hence the boundaries of the clusters are more correct. Experimental results based on toy data, real-world data and image segmentation demonstrate the advantages of the proposed model. 相似文献