首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
贺杨成  王士同  江南 《计算机应用》2010,30(12):3380-3384
k中心点算法仅仅用一个点去代表整个类显然是不足的,这必然会影响聚类结果的准确性。因此提出了一种关系数据的中心权重模糊聚类算法,在该算法中给每一个属于这个类的对象赋予一个中心权重以此来表示其作为这个类的代表对象的可能性程度,这种机制使类中的多个对象来代表整个类而不是利用类中的一个对象来代表整个类。实验结果表明,该算法能更好地发现数据集中潜在的内部结构及对象之间的关系,得到每个聚类结果更加准确的描述。  相似文献   

2.
As one of the most fundamental yet important methods of data clustering, center-based partitioning approach clusters the dataset into k subsets, each of which is represented by a centroid or medoid. In this paper, we propose a new medoid-based k-partitions approach called Clustering Around Weighted Prototypes (CAWP), which works with a similarity matrix. In CAWP, each cluster is characterized by multiple objects with different representative weights. With this new cluster representation scheme, CAWP aims to simultaneously produce clusters of improved quality and a set of ranked representative objects for each cluster. An efficient algorithm is derived to alternatingly update the clusters and the representative weights of objects with respect to each cluster. An annealing-like optimization procedure is incorporated to alleviate the local optimum problem for better clustering results and at the same time to make the algorithm less sensitive to parameter setting. Experimental results on benchmark document datasets show that, CAWP achieves favorable effectiveness and efficiency in clustering, and also provides useful information for cluster-specified analysis.  相似文献   

3.
This paper presents new algorithms-fuzzy c-medoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)-for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the well-known relational fuzzy c-means algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis  相似文献   

4.
The first stage of organizing objects is to partition them into groups or clusters. The clustering is generally done on individual object data representing the entities such as feature vectors or on object relational data incorporated in a proximity matrix.This paper describes another method for finding a fuzzy membership matrix that provides cluster membership values for all the objects based strictly on the proximity matrix. This is generally referred to as relational data clustering. The fuzzy membership matrix is found by first finding a set of vectors that approximately have the same inter-vector Euclidian distances as the proximities that are provided. These vectors can be of very low dimension such as 5 or less. Fuzzy c-means (FCM) is then applied to these vectors to obtain a fuzzy membership matrix. In addition two-dimensional vectors are also created to provide a visual representation of the proximity matrix. This allows comparison of the result of automatic clustering to visual clustering. The method proposed here is compared to other relational clustering methods including NERFCM, Rouben’s method and Windhams A-P method. Various clustering quality indices are also calculated for doing the comparison using various proximity matrices as input. Simulations show the method to be very effective and no more computationally expensive than other relational data clustering methods. The membership matrices that are produced by the proposed method are less crisp than those produced by NERFCM and more representative of the proximity matrix that is used as input to the clustering process.  相似文献   

5.
Fuzzy c-means (FCMs) is an important and popular unsupervised partitioning algorithm used in several application domains such as pattern recognition, machine learning and data mining. Although the FCM has shown good performance in detecting clusters, the membership values for each individual computed to each of the clusters cannot indicate how well the individuals are classified. In this paper, a new approach to handle the memberships based on the inherent information in each feature is presented. The algorithm produces a membership matrix for each individual, the membership values are between zero and one and measure the similarity of this individual to the center of each cluster according to each feature. These values can change at each iteration of the algorithm and they are different from one feature to another and from one cluster to another in order to increase the performance of the fuzzy c-means clustering algorithm. To obtain a fuzzy partition by class of the input data set, a way to compute the class membership values is also proposed in this work. Experiments with synthetic and real data sets show that the proposed approach produces good quality of clustering.  相似文献   

6.
7.
In this paper, we introduce a new algorithm for clustering and aggregating relational data (CARD). We assume that data is available in a relational form, where we only have information about the degrees to which pairs of objects in the data set are related. Moreover, we assume that the relational information is represented by multiple dissimilarity matrices. These matrices could have been generated using different sensors, features, or mappings. CARD is designed to aggregate pairwise distances from multiple relational matrices, partition the data into clusters, and learn a relevance weight for each matrix in each cluster simultaneously. The cluster dependent relevance weights offer two advantages. First, they guide the clustering process to partition the data set into more meaningful clusters. Second, they can be used in subsequent steps of a learning system to improve its learning behavior. The performance of the proposed algorithm is illustrated by using it to categorize a collection of 500 color images. We represent the pairwise image dissimilarities by six different relational matrices that encode color, texture, and structure information.  相似文献   

8.
Clustering is the process of organizing objects into groups whose members are similar in some way. Most of the clustering methods involve numeric data only. However, this representation may not be adequate to model complex information which may be: histogram, distributions, intervals. To deal with these types of data, Symbolic Data Analysis (SDA) was developed. In multivariate data analysis, it is common some variables be more or less relevant than others and less relevant variables can mask the cluster structure. This work proposes a clustering method based on fuzzy approach that produces weighted multivariate memberships for interval-valued data. These memberships can change at each iteration of the algorithm and they are different from one variable to another and from one cluster to another. Furthermore, there is a different relevance weight associated to each variable that may also be different from one cluster to another. The advantage of this method is that it is robust to ambiguous cluster membership assignment since weights represent how important the different variables are to the clusters. Experiments are performed with synthetic data sets to compare the performance of the proposed method against other methods already established by the clustering literature. Also, an application with interval-valued scientific production data is presented in this work. Clustering quality results have shown that the proposed method offers higher accuracy when variables have different variabilities.  相似文献   

9.
Since 1998, a graphical representation used in visual clustering called the reordered dissimilarity image or cluster heat map has appeared in more than 4000 biological or biomedical publications. These images are typically used to visually estimate the number of clusters in a data set, which is the most important input to most clustering algorithms, including the popularly chosen fuzzy c‐means and crisp k‐means. This paper presents a new formulation of a matrix reordering algorithm, coVAT, which is the only known method for providing visual clustering information on all four types of cluster structure in rectangular relational data. Finite rectangular relational data are an m× n array R of relational values between m row objects Or and n column objects Oc. R presents four clustering problems: clusters in Or, Oc, Or∪c, and coclusters containing some objects from each of Or and Oc. coVAT1 is a clustering tendency algorithm that provides visual estimates of the number of clusters to seek in each of these problems by displaying reordered dissimilarity images. We provide several examples where coVAT1 fails to do its job. These examples justify the introduction of coVAT2, a modification of coVAT1 based on a different reordering scheme. We offer several examples to illustrate that coVAT2 may detect coclusters in R when coVAT1 does not. Furthermore, coVAT2 is not limited to just relational data R. The R matrix can also take the form of feature data, such as gene microarray data where each data element is a real number: Positive values indicate upregulation, and negative values indicate downregulation. We show examples of coVAT2 on microarray data that indicate coVAT2 shows cluster tendency in these data. © 2012 Wiley Periodicals, Inc.  相似文献   

10.
利用数据点特征权重的概率约束关系和可能分布,提出了分别建立在概率和可能加权特征方式之上的改进可能模糊聚类的两种模型。其中建立在可能约束之上的改进PCM算法扩展了原算法,具有更广泛的适用性。实验结果表明,算法能够实现不同概率权重或可能分布特征条件下的模糊聚类,扩展了改进的PCM算法,适用性更广。与PCM及其改进算法相比,聚类的效果较为明显。  相似文献   

11.
Although there have been many researches on cluster analysis considering feature (or variable) weights, little effort has been made regarding sample weights in clustering. In practice, not every sample in a data set has the same importance in cluster analysis. Therefore, it is interesting to obtain the proper sample weights for clustering a data set. In this paper, we consider a probability distribution over a data set to represent its sample weights. We then apply the maximum entropy principle to automatically compute these sample weights for clustering. Such method can generate the sample-weighted versions of most clustering algorithms, such as k-means, fuzzy c-means (FCM) and expectation & maximization (EM), etc. The proposed sample-weighted clustering algorithms will be robust for data sets with noise and outliers. Furthermore, we also analyze the convergence properties of the proposed algorithms. This study also uses some numerical data and real data sets for demonstration and comparison. Experimental results and comparisons actually demonstrate that the proposed sample-weighted clustering algorithms are effective and robust clustering methods.  相似文献   

12.
The mountain clustering method and the subtractive clustering method are useful methods for finding cluster centers based on local density in object data. These methods have been extended to shell clustering. In this article, we propose a relational mountain clustering method (RMCM), which produces a set of (proto) typical objects as well as a crisp partition of the objects generating the relation, using a new concept that we call relational density. We exemplify RMCM by clustering several relational data sets that come from object data. Finally, RMCM is applied to web log analysis, where it produces useful user profiles from web log data. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 375–392, 2005.  相似文献   

13.
Basing cluster analysis on mixture models has become a classical and powerful approach. Until now, this approach, which allows to explain some classic clustering criteria such as the well-known k-means criteria and to propose general criteria, has been developed to classify a set of objects measured on a set of variables. But, for this kind of data, if most clustering procedures are designated to construct an optimal partition of objects or, sometimes, of variables, there exist others methods, named block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks.In this work, a new mixture model called block mixture model is proposed to take into account this situation. This model allows to embed simultaneous clustering of objects and variables in a mixture approach. We first consider this probabilistic model in a general context and we develop a new algorithm of simultaneous partitioning based on the CEM algorithm. Then, we focus on the case of binary data and we show that our approach allows us to extend a block clustering method, which had been proposed in this case. Simplicity, fast convergence and the possibility to process large data sets are the major advantages of the proposed approach.  相似文献   

14.
The Fuzzy k-Means clustering model (FkM) is a powerful tool for classifying objects into a set of k homogeneous clusters by means of the membership degrees of an object in a cluster. In FkM, for each object, the sum of the membership degrees in the clusters must be equal to one. Such a constraint may cause meaningless results, especially when noise is present. To avoid this drawback, it is possible to relax the constraint, leading to the so-called Possibilistic k-Means clustering model (PkM). In particular, attention is paid to the case in which the empirical information is affected by imprecision or vagueness. This is handled by means of LR fuzzy numbers. An FkM model for LR fuzzy data is firstly developed and a PkM model for the same type of data is then proposed. The results of a simulation experiment and of two applications to real world fuzzy data confirm the validity of both models, while providing indications as to some advantages connected with the use of the possibilistic approach.  相似文献   

15.
The first stage of knowledge acquisition and reduction of complexity concerning a group of entities is to partition or divide the entities into groups or clusters based on their attributes or characteristics. Clustering algorithms normally require both a method of measuring proximity between patterns and prototypes and a method for aggregating patterns. However sometimes feature vectors or patterns may not be available for objects and only the proximities between the objects are known. Even if feature vectors are available some of the features may not be numeric and it may not be possible to find a satisfactory method of aggregating patterns for the purpose of determining prototypes. Clustering of objects however can be performed on the basis of data describing the objects in terms of feature vectors or on the basis of relational data. The relational data is in terms of proximities between objects. Clustering of objects on the basis of relational data rather than individual object data is called relational clustering. The premise of this paper is that the proximities between the membership vectors, which are obtained as the objective of clustering, should be proportional to the proximities between the objects. The values of the components of the membership vector corresponding to an object are the membership degrees of the object in the various clusters. The membership vector is just a type of feature vector. Based on this premise, this paper describes another fuzzy relational clustering method for finding a fuzzy membership matrix. The method involves solving a rather challenging optimization problem, since the objective function has many local minima. This makes the use of a global optimization method such as particle swarm optimization (PSO) attractive for determining the membership matrix for the clustering. To minimize computational effort, a Bayesian stopping criterion is used in combination with a multi-start strategy for the PSO. Other relational clustering methods generally find local optimum of their objective function.  相似文献   

16.
Robust fuzzy clustering of relational data   总被引:1,自引:0,他引:1  
Popular relational-data clustering algorithms, relational dual of fuzzy c-means (RFCM), non-Euclidean RFCM (NERFCM) (both by Hathaway et al), and FANNY (by Kaufman and Rousseeuw) are examined. A new algorithm, which is a generalization of FANNY, called the fuzzy relational data clustering (FRC) algorithm, is introduced, having an identical objective functional as RFCM. However, the FRC does not have the restriction of RFCM, which is that the relational data is derived from Euclidean distance as the measure of dissimilarity between the objects, and it also does not have limitations of FANNY, including the use of a fixed membership exponent, or a fuzzifier exponent, m. The FRC algorithm is further improved by incorporating the concept of Dave's object data noise clustering (NC) algorithm, done by proposing a concept of noise-dissimilarity. Next, based on the constrained minimization, which includes an inequality constraint for the memberships and corresponding Kuhn-Tucker conditions, a noise resistant, FRC algorithm is derived which works well for all types of non-Euclidean dissimilarity data. Thus it is shown that the extra computations for data expansion (/spl beta/-spread transformation) required by the NERFCM algorithm are not necessary. This new algorithm is called robust non-Euclidean fuzzy relational data clustering (robust-NE-FRC), and its robustness is demonstrated through several numerical examples. Advantages of this new algorithm are: faster convergence, robustness against outliers, and ability to handle all kinds of relational data, including non-Euclidean. The paper also presents a new and better interpretation of the noise-class.  相似文献   

17.
18.
This paper proposes a fuzzy clustering-based algorithm for fuzzy modeling. The algorithm incorporates unsupervised learning with an iterative process into a framework, which is based on the use of the weighted fuzzy c-means. In the first step, the learning vector quantization (LVQ) algorithm is exploited as a data pre-processor unit to group the training data into a number of clusters. Since different clusters may contain different number of objects, the centers of these clusters are assigned weight factors, the values of which are calculated by the respective cluster cardinalities. These centers accompanied with their weights are considered to be a new data set, which is further elaborated by an iterative process. This process consists of applying in sequence the weighted fuzzy c-means and the back-propagation algorithm. The application of the weighted fuzzy c-means ensures that the contribution of each cluster center to the final fuzzy partition is determined by its cardinality, meaning that the real data structure can be easier discovered. The algorithm is successfully applied to three test cases, where the produced fuzzy models prove to be very accurate as well as compact in size.  相似文献   

19.
Document clustering using synthetic cluster prototypes   总被引:3,自引:0,他引:3  
The use of centroids as prototypes for clustering text documents with the k-means family of methods is not always the best choice for representing text clusters due to the high dimensionality, sparsity, and low quality of text data. Especially for the cases where we seek clusters with small number of objects, the use of centroids may lead to poor solutions near the bad initial conditions. To overcome this problem, we propose the idea of synthetic cluster prototype that is computed by first selecting a subset of cluster objects (instances), then computing the representative of these objects and finally selecting important features. In this spirit, we introduce the MedoidKNN synthetic prototype that favors the representation of the dominant class in a cluster. These synthetic cluster prototypes are incorporated into the generic spherical k-means procedure leading to a robust clustering method called k-synthetic prototypes (k-sp). Comparative experimental evaluation demonstrates the robustness of the approach especially for small datasets and clusters overlapping in many dimensions and its superior performance against traditional and subspace clustering methods.  相似文献   

20.
In this paper, we show how one can take advantage of the stability and effectiveness of object data clustering algorithms when the data to be clustered are available in the form of mutual numerical relationships between pairs of objects. More precisely, we propose a new fuzzy relational algorithm, based on the popular fuzzy C-means (FCM) algorithm, which does not require any particular restriction on the relation matrix. We describe the application of the algorithm to four real and four synthetic data sets, and show that our algorithm performs better than well-known fuzzy relational clustering algorithms on all these sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号