首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The first stage of knowledge acquisition and reduction of complexity concerning a group of entities is to partition or divide the entities into groups or clusters based on their attributes or characteristics. Clustering algorithms normally require both a method of measuring proximity between patterns and prototypes and a method for aggregating patterns. However sometimes feature vectors or patterns may not be available for objects and only the proximities between the objects are known. Even if feature vectors are available some of the features may not be numeric and it may not be possible to find a satisfactory method of aggregating patterns for the purpose of determining prototypes. Clustering of objects however can be performed on the basis of data describing the objects in terms of feature vectors or on the basis of relational data. The relational data is in terms of proximities between objects. Clustering of objects on the basis of relational data rather than individual object data is called relational clustering. The premise of this paper is that the proximities between the membership vectors, which are obtained as the objective of clustering, should be proportional to the proximities between the objects. The values of the components of the membership vector corresponding to an object are the membership degrees of the object in the various clusters. The membership vector is just a type of feature vector. Based on this premise, this paper describes another fuzzy relational clustering method for finding a fuzzy membership matrix. The method involves solving a rather challenging optimization problem, since the objective function has many local minima. This makes the use of a global optimization method such as particle swarm optimization (PSO) attractive for determining the membership matrix for the clustering. To minimize computational effort, a Bayesian stopping criterion is used in combination with a multi-start strategy for the PSO. Other relational clustering methods generally find local optimum of their objective function.  相似文献   

2.
The mountain clustering method and the subtractive clustering method are useful methods for finding cluster centers based on local density in object data. These methods have been extended to shell clustering. In this article, we propose a relational mountain clustering method (RMCM), which produces a set of (proto) typical objects as well as a crisp partition of the objects generating the relation, using a new concept that we call relational density. We exemplify RMCM by clustering several relational data sets that come from object data. Finally, RMCM is applied to web log analysis, where it produces useful user profiles from web log data. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 375–392, 2005.  相似文献   

3.
Robust fuzzy clustering of relational data   总被引:1,自引:0,他引:1  
Popular relational-data clustering algorithms, relational dual of fuzzy c-means (RFCM), non-Euclidean RFCM (NERFCM) (both by Hathaway et al), and FANNY (by Kaufman and Rousseeuw) are examined. A new algorithm, which is a generalization of FANNY, called the fuzzy relational data clustering (FRC) algorithm, is introduced, having an identical objective functional as RFCM. However, the FRC does not have the restriction of RFCM, which is that the relational data is derived from Euclidean distance as the measure of dissimilarity between the objects, and it also does not have limitations of FANNY, including the use of a fixed membership exponent, or a fuzzifier exponent, m. The FRC algorithm is further improved by incorporating the concept of Dave's object data noise clustering (NC) algorithm, done by proposing a concept of noise-dissimilarity. Next, based on the constrained minimization, which includes an inequality constraint for the memberships and corresponding Kuhn-Tucker conditions, a noise resistant, FRC algorithm is derived which works well for all types of non-Euclidean dissimilarity data. Thus it is shown that the extra computations for data expansion (/spl beta/-spread transformation) required by the NERFCM algorithm are not necessary. This new algorithm is called robust non-Euclidean fuzzy relational data clustering (robust-NE-FRC), and its robustness is demonstrated through several numerical examples. Advantages of this new algorithm are: faster convergence, robustness against outliers, and ability to handle all kinds of relational data, including non-Euclidean. The paper also presents a new and better interpretation of the noise-class.  相似文献   

4.
The first stage of organizing objects is to partition them into groups or clusters. The clustering is generally done on individual object data representing the entities such as feature vectors or on object relational data incorporated in a proximity matrix.This paper describes another method for finding a fuzzy membership matrix that provides cluster membership values for all the objects based strictly on the proximity matrix. This is generally referred to as relational data clustering. The fuzzy membership matrix is found by first finding a set of vectors that approximately have the same inter-vector Euclidian distances as the proximities that are provided. These vectors can be of very low dimension such as 5 or less. Fuzzy c-means (FCM) is then applied to these vectors to obtain a fuzzy membership matrix. In addition two-dimensional vectors are also created to provide a visual representation of the proximity matrix. This allows comparison of the result of automatic clustering to visual clustering. The method proposed here is compared to other relational clustering methods including NERFCM, Rouben’s method and Windhams A-P method. Various clustering quality indices are also calculated for doing the comparison using various proximity matrices as input. Simulations show the method to be very effective and no more computationally expensive than other relational data clustering methods. The membership matrices that are produced by the proposed method are less crisp than those produced by NERFCM and more representative of the proximity matrix that is used as input to the clustering process.  相似文献   

5.
This paper presents new algorithms-fuzzy c-medoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)-for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the well-known relational fuzzy c-means algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis  相似文献   

6.
A belief classification rule for imprecise data   总被引:1,自引:1,他引:0  
The classification of imprecise data is a difficult task in general because the different classes can partially overlap. Moreover, the available attributes used for the classification are often insufficient to make a precise discrimination of the objects in the overlapping zones. A credal partition (classification) based on belief functions has already been proposed in the literature for data clustering. It allows the objects to belong (with different masses of belief) not only to the specific classes, but also to the sets of classes called meta-classes which correspond to the disjunction of several specific classes. In this paper, we propose a new belief classification rule (BCR) for the credal classification of uncertain and imprecise data. This new BCR approach reduces the misclassification errors of the objects difficult to classify by the conventional methods thanks to the introduction of the meta-classes. The objects too far from the others are considered as outliers. The basic belief assignment (bba) of an object is computed from the Mahalanobis distance between the object and the center of each specific class. The credal classification of the object is finally obtained by the combination of these bba’s associated with the different classes. This approach offers a relatively low computational burden. Several experiments using both artificial and real data sets are presented at the end of this paper to evaluate and compare the performances of this BCR method with respect to other classification methods.  相似文献   

7.
8.
9.
An additive spectral method for fuzzy clustering is proposed. The method operates on a clustering model which is an extension of the spectral decomposition of a square matrix. The computation proceeds by extracting clusters one by one, which makes the spectral approach quite natural. The iterative extraction of clusters, also, allows us to draw several stopping rules to the procedure. This applies to several relational data types differently normalized: network structure data (the first eigenvector subtracted), affinity between multidimensional vectors (the pseudo-inverse Laplacian transformation), and conventional relational data including in-house data of similarity between research topics according to working of a research center. The method is experimentally compared with several classic and recent techniques and shown to be competitive.  相似文献   

10.
基于点面包含关系的GML空间聚类算法   总被引:1,自引:0,他引:1  
目前大多数空间聚类算法主要是针对关系数据,并且没有考虑空间拓扑关系的相似性,为此,对基于空间拓扑关系的空间聚类方法进行研究,提出两种基于点面包含关系的GML空间聚类算法SCGML_IR、SCGML_IR*.两个算法将GML文档中点面空间对象的包含关系作为空间对象相似性度量准则,并用CLOPE算法对空间对象进行聚类.SCGML_IR*算法在SCGML_IR的基础上,采用空间包含索引机制来提高空间包含关系的求解效率.实验结果表明,算法SCGML_IR和SCGML_IR*能实现GML数据的空间聚类,并具有较高的效率.  相似文献   

11.
The well-known Fuzzy C-Means (FCM) algorithm for data clustering has been extended to Evidential C-Means (ECM) algorithm in order to work in the belief functions framework with credal partitions of the data. Depending on data clustering problems, some barycenters of clusters given by ECM can become very close to each other in some cases, and this can cause serious troubles in the performance of ECM for the data clustering. To circumvent this problem, we introduce the notion of imprecise cluster in this paper. The principle of our approach is to consider that objects lying in the middle of specific classes (clusters) barycenters must be committed with equal belief to each specific cluster instead of belonging to an imprecise meta-cluster as done classically in ECM algorithm. Outliers object far away of the centers of two (or more) specific clusters that are hard to be distinguished, will be committed to the imprecise cluster (a disjunctive meta-cluster) composed by these specific clusters. The new Belief C-Means (BCM) algorithm proposed in this paper follows this very simple principle. In BCM, the mass of belief of specific cluster for each object is computed according to distance between object and the center of the cluster it may belong to. The distances between object and centers of the specific clusters and the distances among these centers will be both taken into account in the determination of the mass of belief of the meta-cluster. We do not use the barycenter of the meta-cluster in BCM algorithm contrariwise to what is done with ECM. In this paper we also present several examples to illustrate the interest of BCM, and to show its main differences with respect to clustering techniques based on FCM and ECM.  相似文献   

12.
We propose a new relational clustering approach, called Fuzzy clustering with Learnable Cluster-dependent Kernels (FLeCK), that learns the underlying cluster-dependent dissimilarity measure while seeking compact clusters. The learned dissimilarity is based on a Gaussian kernel function with cluster-dependent parameters. Each cluster’s parameter learned by FLeCK reflects the relative intra-cluster and inter-cluster characteristics. These parameters are learned by optimizing both the intra-cluster and the inter-cluster distances. This optimization is achieved iteratively by dynamically updating the partition and the local kernel. This makes the kernel learning task takes advantages of the available unlabeled data and reciprocally, the categorization task takes advantages of the learned local kernels. Another key advantage of FLeCK is that it is formulated to work on relational data. This makes it applicable to data where objects cannot be represented by vectors or when clusters of similar objects cannot be represented efficiently by a single prototype. Using synthetic and real data sets, we show that FLeCK learns meaningful parameters and outperforms several other algorithms. In particular, we show that when data include clusters with various inter- and intra-cluster distances, learning cluster-dependent parameters is crucial in obtaining a good partition.  相似文献   

13.
Since 1998, a graphical representation used in visual clustering called the reordered dissimilarity image or cluster heat map has appeared in more than 4000 biological or biomedical publications. These images are typically used to visually estimate the number of clusters in a data set, which is the most important input to most clustering algorithms, including the popularly chosen fuzzy c‐means and crisp k‐means. This paper presents a new formulation of a matrix reordering algorithm, coVAT, which is the only known method for providing visual clustering information on all four types of cluster structure in rectangular relational data. Finite rectangular relational data are an m× n array R of relational values between m row objects Or and n column objects Oc. R presents four clustering problems: clusters in Or, Oc, Or∪c, and coclusters containing some objects from each of Or and Oc. coVAT1 is a clustering tendency algorithm that provides visual estimates of the number of clusters to seek in each of these problems by displaying reordered dissimilarity images. We provide several examples where coVAT1 fails to do its job. These examples justify the introduction of coVAT2, a modification of coVAT1 based on a different reordering scheme. We offer several examples to illustrate that coVAT2 may detect coclusters in R when coVAT1 does not. Furthermore, coVAT2 is not limited to just relational data R. The R matrix can also take the form of feature data, such as gene microarray data where each data element is a real number: Positive values indicate upregulation, and negative values indicate downregulation. We show examples of coVAT2 on microarray data that indicate coVAT2 shows cluster tendency in these data. © 2012 Wiley Periodicals, Inc.  相似文献   

14.
15.
Recent advances in clustering consider incorporating background knowledge in the partitioning algorithm, using, e.g., pairwise constraints between objects. As a matter of fact, prior information, when available, often makes it possible to better retrieve meaningful clusters in data. Here, this approach is investigated in the framework of belief functions, which allows us to handle the imprecision and the uncertainty of the clustering process. In this context, the EVCLUS algorithm was proposed for partitioning objects described by a dissimilarity matrix. It is extended here so as to take pairwise constraints into account, by adding a term to its objective function. This term corresponds to a penalty term that expresses pairwise constraints in the belief function framework. Various synthetic and real datasets are considered to demonstrate the interest of the proposed method, called CEVCLUS, and two applications are presented. The performances of CEVCLUS are also compared to those of other constrained clustering algorithms.  相似文献   

16.
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGCIust) that builds layers of subgraphs based on this matrix and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure, we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.  相似文献   

17.
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGClust) that builds layers of subgraphs based on this matrix, and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.  相似文献   

18.
The simplicity principle—an updating of Ockham's razor to take into account modern information theory—states that the preferred theory for a set of data is the one that allows for the most efficient encoding of the data. We consider this in the context of classification, or clustering, as a data reduction technique that helps describe a set of objects by dividing the objects into groups. The simplicity model we present favors clusters such that the similarity of the items in the clusters is maximal, while the similarity of items between clusters is minimal. Several novel features of our clustering criterion make it especially appropriate for clustering of data derived from, psychological procedures (e.g., similarity ratings): It is non-parametric, and may be applied in situations where the metric axioms are violated without requiring (information-forgetting) transformation procedures. We illustrate the use of the criterion with a selection of data sets. A distinctive aspect of this research is that it motivates a clustering algorithm from psychological principles.  相似文献   

19.
作为当前数据流挖掘研究的热点之一,多数据流聚类要求在跟踪多个数据流随时间演化的同时按其相似程度进行划分。文中提出一种基于灰关联分析并结合近邻传播聚类的多数据流聚类方法。该方法基于一种灰关联度,将多个数据流的原始数据压缩成可增量更新的灰关联概要信息,并根据该信息计算多个数据流之间的灰关联度作为其相似性测度,最后应用近邻传播聚类算法生成聚类结果。在真实数据集上的对比实验证明该方法的有效性。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号