共查询到20条相似文献,搜索用时 0 毫秒
1.
Sriparna Saha Author Vitae Sanghamitra Bandyopadhyay Author Vitae 《Pattern recognition》2010,43(3):738-751
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering. 相似文献
2.
In this paper a new framework based on multiobjective optimization (MOO), namely FeaClusMOO, is proposed which is capable of identifying the correct partitioning as well as the most relevant set of features from a data set. A newly developed multiobjective simulated annealing based optimization technique namely archived multiobjective simulated annealing (AMOSA) is used as the background strategy for optimization. Here features and cluster centers are encoded in the form of a string. As the objective functions, two internal cluster validity indices measuring the goodness of the obtained partitioning using Euclidean distance and point symmetry based distance, respectively, and a count on the number of features are utilized. These three objectives are optimized simultaneously using AMOSA in order to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning. Points are allocated to different clusters using a point symmetry based distance. Mutation changes the feature combination as well as the set of cluster centers. Since AMOSA, like any other MOO technique, provides a set of solutions on the final Pareto front, a technique based on the concept of semi-supervised classification is developed to select a solution from the given set. The effectiveness of the proposed FeaClustMOO in comparison with other clustering techniques like its Euclidean distance based version where Euclidean distance is used for cluster assignment, a genetic algorithm based automatic clustering technique (VGAPS-clustering) using point symmetry based distance with all the features, K-means clustering technique with all features is shown for seven higher dimensional data sets obtained from real-life. 相似文献
3.
In this paper a fuzzy point symmetry based genetic clustering technique (Fuzzy-VGAPS) is proposed which can automatically determine the number of clusters present in a data set as well as a good fuzzy partitioning of the data. The clusters can be of any size, shape or convexity as long as they possess the property of symmetry. Here the membership values of points to different clusters are computed using the newly proposed point symmetry based distance. A variable number of cluster centers are encoded in the chromosomes. A new fuzzy symmetry based cluster validity index, FSym-index is first proposed here and thereafter it is utilized to measure the fitness of the chromosomes. The proposed index can detect non-convex, as well as convex-non-hyperspherical partitioning with variable number of clusters. It is mathematically justified via its relationship to a well-defined hard cluster validity function: the Dunn’s index, for which the condition of uniqueness has already been established. The results of the Fuzzy-VGAPS are compared with those obtained by seven other algorithms including both fuzzy and crisp methods on four artificial and four real-life data sets. Some real-life applications of Fuzzy-VGAPS to automatically cluster the gene expression data as well as segmenting the magnetic resonance brain image with multiple sclerosis lesions are also demonstrated. 相似文献
4.
提出了一种新型的聚类算法。这个新型的聚类算法是基于中心对称的概念之上的。使用这种基于中心对称性的聚类算法,在一个指定的数据集中的超球面形状的聚类能够被侦测出来。在对超球面性状的目标的侦测方面,这种聚类算法大大优于传统的算法。这个算法可以用于数据聚类和人脸识别方面,实验结果也证明了该算法的效果。 相似文献
5.
6.
《Information Processing Letters》2014,114(6):287-293
The segmentation task in the feature space of an image can be formulated as an optimization problem. Recent researches have demonstrated that the clustering techniques, using only one objective may not obtain suitable solution because the single objective function just can provide satisfactory result to one kind of corresponding data set. In this letter, a novel multiobjective clustering approach, named a quantum-inspired multiobjective evolutionary clustering algorithm (QMEC), is proposed to deal with the problem of image segmentation, where two objectives are simultaneously optimized. Based on the concepts and principles of quantum computing, the multi-state quantum bits are used to represent individuals and quantum rotation gate strategy is used to update the probabilistic individuals. The proposed algorithm can take advantage of the multiobjective optimization mechanism and the superposition of quantum states, and therefore it has a good population diversity and search capabilities. Due to a set of nondominated solutions in multiobjective clustering problems, a simple heuristic method is adopted to select a preferred solution from the final Pareto front and the results show that a good image segmentation result is selected. Experiments on one simulated synthetic aperture radar (SAR) image and two real SAR images have shown the superiority of the QMEC over three other known algorithms. 相似文献
7.
针对现有聚类算法计算复杂度普遍较高的问题,提出了一种基于定位的方法。该算法采用空间定位的方法将数据对象映射到特征空间中,并利用空间立方体的某些特殊顶点定位任一数据点;通过计算数据点与空间立方体顶点群的距离差异,完成聚类过程。在电信数据集上的实验结果表明,算法的时间复杂度降至O(N)级别。 相似文献
8.
A new clustering technique for function approximation 总被引:5,自引:0,他引:5
To date, clustering techniques have always been oriented to solve classification and pattern recognition problems. However, some authors have applied them unchanged to construct initial models for function approximators. Nevertheless, classification and function approximation problems present quite different objectives. Therefore it is necessary to design new clustering algorithms specialized in the problem of function approximation. This paper presents a new clustering technique, specially designed for function. approximation problems, which improves the performance of the approximator system obtained, compared with other models derived from traditional classification oriented clustering algorithms and input-output clustering techniques. 相似文献
9.
With the increasing use of on-line resources, the size of the bibliographic database is growing day by day. The available huge amount of data belong to various entities. It is difficult to automatically identify the records which belong to a particular entity. Mapping the records to the corresponding entity is termed as the entity matching problem. In bibliographic database many attributes change over time. For example - affiliation of an author changes frequently. Many authors generally use different email-ids. The names of co-authors also change with time. All these aspects have made the entity matching problem challenging. Generally an entity matching task is carried out by constructing a feature vector to represent a record, then a classifier is trained to classify each feature vector. But for bibliographic database it is very difficult and time consuming to generate some manually annotated labeled data to train a classifier. Inspired by this observation, we have proposed an unsupervised approach for entity matching problem using non-dominated sorting genetic algorithm-II (NSGA-II). A new encoding strategy is used to encode the clusters in the form of a chromosome. New mutation and crossover operators are proposed which are suitable for bibliographic data clustering. Different distance measures are used to measure the dissimilarities between records. Finally, solutions are evolved using the search capability of NSGA-II. Experimental evaluations are carried out with 247 different combinations of eight objective functions for eight different bibliographic datasets. A comparative analysis with two existing systems - DBLP and ArnetMiner, shows that the proposed technique can produce better results in many cases. 相似文献
10.
The principal aim of this paper is to present a new and unconventional approach for locating the equilibrium point of a real valued nonlinear objective function subject to nonlinear constraints. Most of the methods described in the literature depend on the fact that the stationary points of a function can be found by setting all the first derivatives to zero. In the method proposed here, the search is carried out on a positive-definite Liapunov Type of Function (LTF) of the gradients of a suitably modified objective function. By extending the concept of asymptotic stability of a discreté data control system, a search method is developed which ensures that the LTF progressively decreases in the direction of movement and eventually reaches a value of zero. This method provides an infallible and robust search technique for the solution of even highly complex nonlinear programming problems. 相似文献
11.
M. Falasconi Author Vitae A. Gutierrez Author Vitae M. Pardo Author Vitae G. Sberveglieri Author Vitae Author Vitae 《Pattern recognition》2010,43(4):1292-74
An important goal in cluster analysis is the internal validation of results using an objective criterion. Of particular relevance in this respect is the estimation of the optimum number of clusters capturing the intrinsic structure of your data. This paper proposes a method to determine this optimum number based on the evaluation of fuzzy partition stability under bootstrap resampling. The method is first characterized on synthetic data with respect to hyper-parameters, like the fuzzifier, and spatial clustering parameters, such as feature space dimensionality, clusters degree of overlap, and number of clusters. The method is then validated on experimental datasets. Furthermore, the performance of the proposed method is compared to that obtained using a number of traditional fuzzy validity rules based on the cluster compactness-to-separation criteria. The proposed method provides accurate and reliable results, and offers better generalization capabilities than the classical approaches. 相似文献
12.
13.
首先讨论了基于帧差分的视频对象分割一般理论模型。在此基础上,利用模糊熵聚类方法提出了一种视频运动变化区域自动检测的算法:通过模糊分类准则在差分图像中对运动变化区域和相对噪声区域进行划分,从而得到运动变化区域。文中仿真实验结果表明.将模糊熵算法运用到差分图像中检测运动变化区域的方法是切实可行的。 相似文献
14.
In this paper, we develop a novel framework, called Monitoring Vehicle Outliers based on a Clustering technique (MVOC), for monitoring vehicle outliers caused by complex vehicle states. The vehicle outlier monitoring is a method to continuously check the current vehicle conditions. Most of previous monitoring methods have conducted simple operations depending on uncomplicated analyses or expected lifetimes in regard to vehicle components. However, many serious vehicle outliers such as turning off during a drive result from the complex vehicle states influenced by correlated components. The proposed method monitors the current vehicle conditions based on not simple components like the previous methods but more complex and various vehicle states using a clustering technique. We perform vehicle data clustering and then analyze the generated clusters with information of vehicle outliers caused by complex correlations of vehicle components. Thus, we can learn vehicle information in more detail. To facilitate MVOC, we also propose related techniques such as sampling cluster data with representative attributes and deciding cluster characteristics on the basis of relations between vehicle data and states. Then, we demonstrate the performance of our approach in terms of monitoring vehicle outliers on the basis of real complex correlations between outliers and vehicle data through various experiments. Experimental results show that the proposed method can not only monitor the complex outliers by predicting their occurrence possibilities in advance but also outperform a standard technique. Moreover, we present statistical significance of the results through significance tests. 相似文献
15.
A new fast prototype selection method based on clustering 总被引:1,自引:1,他引:1
J. Arturo Olvera-López J. Ariel Carrasco-Ochoa J. Francisco Martínez-Trinidad 《Pattern Analysis & Applications》2010,13(2):131-141
In supervised classification, a training set T is given to a classifier for classifying new prototypes. In practice, not all information in T is useful for classifiers, therefore, it is convenient to discard irrelevant prototypes from T. This process is known as prototype selection, which is an important task for classifiers since through this process the
time for classification or training could be reduced. In this work, we propose a new fast prototype selection method for large
datasets, based on clustering, which selects border prototypes and some interior prototypes. Experimental results showing
the performance of our method and comparing accuracy and runtimes against other prototype selection methods are reported. 相似文献
16.
In this paper a new multiobjective (MO) clustering technique (GenClustMOO) is proposed which can automatically partition the data into an appropriate number of clusters. Each cluster is divided into several small hyperspherical subclusters and the centers of all these small sub-clusters are encoded in a string to represent the whole clustering. For assigning points to different clusters, these local sub-clusters are considered individually. For the purpose of objective function evaluation, these sub-clusters are merged appropriately to form a variable number of global clusters. Three objective functions, one reflecting the total compactness of the partitioning based on the Euclidean distance, the other reflecting the total symmetry of the clusters, and the last reflecting the cluster connectedness, are considered here. These are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method, in order to detect the appropriate number of clusters as well as the appropriate partitioning. The symmetry present in a partitioning is measured using a newly developed point symmetry based distance. Connectedness present in a partitioning is measured using the relative neighborhood graph concept. Since AMOSA, as well as any other MO optimization technique, provides a set of Pareto-optimal solutions, a new method is also developed to determine a single solution from this set. Thus the proposed GenClustMOO is able to detect the appropriate number of clusters and the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. The effectiveness of the proposed GenClustMOO in comparison with another recent multiobjective clustering technique (MOCK), a single objective genetic algorithm based automatic clustering technique (VGAPS-clustering), K-means and single linkage clustering techniques is comprehensively demonstrated for nineteen artificial and seven real-life data sets of varying complexities. In a part of the experiment the effectiveness of AMOSA as the underlying optimization technique in GenClustMOO is also demonstrated in comparison to another evolutionary MO algorithm, PESA2. 相似文献
17.
This article describes a multiobjective spatial fuzzy clustering algorithm for image segmentation. To obtain satisfactory segmentation performance for noisy images, the proposed method introduces the non-local spatial information derived from the image into fitness functions which respectively consider the global fuzzy compactness and fuzzy separation among the clusters. After producing the set of non-dominated solutions, the final clustering solution is chosen by a cluster validity index utilizing the non-local spatial information. Moreover, to automatically evolve the number of clusters in the proposed method, a real-coded variable string length technique is used to encode the cluster centers in the chromosomes. The proposed method is applied to synthetic and real images contaminated by noise and compared with k-means, fuzzy c-means, two fuzzy c-means clustering algorithms with spatial information and a multiobjective variable string length genetic fuzzy clustering algorithm. The experimental results show that the proposed method behaves well in evolving the number of clusters and obtaining satisfactory performance on noisy image segmentation. 相似文献
18.
Hua Jiang Jing Li Shenghe Yi Xiangyang Wang Xin Hu 《Expert systems with applications》2011,38(8):9373-9381
Clustering problem is an unsupervised learning problem. It is a procedure that partition data objects into matching clusters. The data objects in the same cluster are quite similar to each other and dissimilar in the other clusters. Density-based clustering algorithms find clusters based on density of data points in a region. DBSCAN algorithm is one of the density-based clustering algorithms. It can discover clusters with arbitrary shapes and only requires two input parameters. DBSCAN has been proved to be very effective for analyzing large and complex spatial databases. However, DBSCAN needs large volume of memory support and often has difficulties with high-dimensional data and clusters of very different densities. So, partitioning-based DBSCAN algorithm (PDBSCAN) was proposed to solve these problems. But PDBSCAN will get poor result when the density of data is non-uniform. Meanwhile, to some extent, DBSCAN and PDBSCAN are both sensitive to the initial parameters. In this paper, we propose a new hybrid algorithm based on PDBSCAN. We use modified ant clustering algorithm (ACA) and design a new partitioning algorithm based on ‘point density’ (PD) in data preprocessing phase. We name the new hybrid algorithm PACA-DBSCAN. The performance of PACA-DBSCAN is compared with DBSCAN and PDBSCAN on five data sets. Experimental results indicate the superiority of PACA-DBSCAN algorithm. 相似文献
19.
In this paper, we present a new approach of speech clustering with regards of the speaker identity. It consists in grouping the homogeneous speech segments that are obtained at the end of the segmentation process, by using the spatial information provided by the stereophonic speech signals. The proposed method uses the differential energy of the two stereophonic signals collected by two cardioid microphones, in order to cluster all the speech segments that belong to the same speaker. The total number of clusters obtained at the end should be equal to the real number of speakers present in the meeting room and each cluster should contain the global intervention of only one speaker. The proposed system is suitable for debates or multi-conferences for which the speakers are located at fixed positions. Basically, our approach tries to make a speaker localization with regards to the position of the microphones, taken as a spatial reference. Based on this localization, the new proposed method can recognize the speaker identity of any speech segment during the meeting. So, the intervention of each speaker is automatically detected and assigned to him by estimating his relative position. In a purpose of comparison, two types of clustering methods have been implemented and experimented: the new approach, which we called Energy Differential based Spatial Clustering (EDSC) and a classical statistical approach called “Mono-Gaussian based Sequential Clustering” (MGSC). Experiments of speaker clustering are done on a stereophonic speech corpus called DB15, composed of 15 stereophonic scenarios of about 3.5 minutes each. Every scenario corresponds to a free discussion between two or three speakers seated at fixed positions in the meeting room. Results show the outstanding performances of the new approach in terms of precision and speed, especially for short speech segments, where most of clustering techniques present a strong failure. 相似文献
20.
In this paper, we show how one can take advantage of the stability and effectiveness of object data clustering algorithms when the data to be clustered are available in the form of mutual numerical relationships between pairs of objects. More precisely, we propose a new fuzzy relational algorithm, based on the popular fuzzy C-means (FCM) algorithm, which does not require any particular restriction on the relation matrix. We describe the application of the algorithm to four real and four synthetic data sets, and show that our algorithm performs better than well-known fuzzy relational clustering algorithms on all these sets. 相似文献