首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
In this study, we introduce a novel clustering architecture, in which several subsets of patterns can be processed together with an objective of finding a common structure. The structure revealed at the global level is determined by exchanging prototypes of the subsets of data and by moving prototypes of the corresponding clusters toward each other. Thereby, the required communication links are established at the level of cluster prototypes and partition matrices, without hampering the security concerns. A detailed clustering algorithm is developed by integrating the advantages of both fuzzy sets and rough sets, and a measure of quantitative analysis of the experimental results is provided for synthetic and real-world data.  相似文献   

2.
Most of the prototype reduction schemes (PRS), which have been reported in the literature, process the data in its entirety to yield a subset of prototypes that are useful in nearest-neighbor-like classification. Foremost among these are the prototypes for nearest neighbor classifiers, the vector quantization technique, and the support vector machines. These methods suffer from a major disadvantage, namely, that of the excessive computational burden encountered by processing all the data. In this paper, we suggest a recursive and computationally superior mechanism referred to as adaptive recursive partitioning (ARP)_PRS. Rather than process all the data using a PRS, we propose that the data be recursively subdivided into smaller subsets. This recursive subdivision can be arbitrary, and need not utilize any underlying clustering philosophy. The advantage of ARP_PRS is that the PRS processes subsets of data points that effectively sample the entire space to yield smaller subsets of prototypes. These prototypes are then, in turn, gathered and processed by the PRS to yield more refined prototypes. In this manner, prototypes which are in the interior of the Voronoi spaces, and thus ineffective in the classification, are eliminated at the subsequent invocations of the PRS. We are unaware of any PRS that employs such a recursive philosophy. Although we marginally forfeit accuracy in return for computational efficiency, our experimental results demonstrate that the proposed recursive mechanism yields classification comparable to the best reported prototype condensation schemes reported to-date. Indeed, this is true for both artificial data sets and for samples involving real-life data sets. The results especially demonstrate that a fair computational advantage can be obtained by using such a recursive strategy for "large" data sets, such as those involved in data mining and text categorization applications.  相似文献   

3.
Clustering is an important unsupervised learning technique widely used to discover the inherent structure of a given data set. Some existing clustering algorithms uses single prototype to represent each cluster, which may not adequately model the clusters of arbitrary shape and size and hence limit the clustering performance on complex data structure. This paper proposes a clustering algorithm to represent one cluster by multiple prototypes. The squared-error clustering is used to produce a number of prototypes to locate the regions of high density because of its low computational cost and yet good performance. A separation measure is proposed to evaluate how well two prototypes are separated. Multiple prototypes with small separations are grouped into a given number of clusters in the agglomerative method. New prototypes are iteratively added to improve the poor cluster separations. As a result, the proposed algorithm can discover the clusters of complex structure with robustness to initial settings. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed clustering algorithm.  相似文献   

4.
Collaborative fuzzy clustering   总被引:3,自引:0,他引:3  
In this study, we introduce a new clustering architecture in which several subsets of patterns can be processed together with an objective of finding a structure that is common to all of them. To reveal this structure, the clustering algorithms operating on the separate subsets of data collaborate by exchanging information about local partition matrices. In this sense, the required communication links are established at the level of information granules (more specifically, fuzzy sets forming the partition matrices) rather than patterns that are directly available in the databases. We discuss how this form of collaboration helps meet requirements of data confidentiality. A detailed clustering algorithm is developed on a basis of the standard FCM method and illustrated by means of numeric examples.  相似文献   

5.
指定K个聚类的多均值聚类算法在K-均值算法的基础上设置了多个次类,以改善K-均值算法在非凸数据集上的劣势,并将多均值聚类问题形式化为优化问题,可以得到更优的聚类效果。但是该算法对初始原型敏感,且随机选取原型的方式使聚类结果不稳定。针对上述问题,提出一种稳定的K-多均值聚类算法,并对该算法的复杂度与收敛性进行了简要讨论。该算法先基于数据样本的最邻近关系构造图,根据图的连通分支将数据分为若干组,取每组数据的均值点作为初始原型,再用交替迭代的方法对优化问题进行求解,得到最后的聚类结果。在人工数据集和真实数据集上的实验表明,该算法具有更稳定更优越的聚类效果。  相似文献   

6.
介绍了自适应递归支持向量机(AR-SVM)算法。AR-SVM以任意的方式将数据递归地划分为小的子集,每个子集上调用标准SVM产生小的归约原型(支持向量)集。这些原型合并后再调用SVM产生更精炼的原型。如此,不靠近分类边界的对分类没有影响的原型(向量)被清除掉,使其不出现在后继的SVM调用中。实验结果表明,该递归策略显著地提高了大型数据集的计算效率。  相似文献   

7.
Traditional approach to clustering is to fit a model (partition or prototypes) for the given data. We propose a completely opposite approach by fitting the data into a given clustering model that is optimal for similar pathological data of equal size and dimensions. We then perform inverse transform from this pathological data back to the original data while refining the optimal clustering structure during the process. The key idea is that we do not need to find optimal global allocation of the prototypes. Instead, we only need to perform local fine-tuning of the clustering prototypes during the transformation in order to preserve the already optimal clustering structure.  相似文献   

8.
基于分布式的大数据集聚类分析   总被引:1,自引:0,他引:1  
为了提高聚类效率提出了一种基于分布式的大数据集聚类算法。该方法并不是一次性对所有的数据进行聚类,而是将大数据集随机分成若干个子集,对每个子集同时进行聚类,最后进行类的合并。实验结果表明大多数情况下该方法和传统的一次性聚类的结果一致,而且极大地提高了聚类的速度。  相似文献   

9.
This paper presents the development of soft clustering and learning vector quantization (LVQ) algorithms that rely on multiple weighted norms to measure the distance between the feature vectors and their prototypes. Clustering and LVQ are formulated in this paper as the minimization of a reformulation function that employs distinct weighted norms to measure the distance between each of the prototypes and the feature vectors under a set of equality constraints imposed on the weight matrices. Fuzzy LVQ and clustering algorithms are obtained as special cases of the proposed formulation. The resulting clustering algorithm is evaluated and benchmarked on three data sets that differ in terms of the data structure and the dimensionality of the feature vectors. This experimental evaluation indicates that the proposed multinorm algorithm outperforms algorithms employing the Euclidean norm as well as existing clustering algorithms employing weighted norms.  相似文献   

10.
Granular prototyping in fuzzy clustering   总被引:5,自引:0,他引:5  
We introduce a logic-driven clustering in which prototypes are formed and evaluated in a sequential manner. The way of revealing a structure in data is realized by maximizing a certain performance index (objective function) that takes into consideration an overall level of matching (to be maximized) and a similarity level between the prototypes (the component to be minimized). The prototypes identified in the process come with the optimal weight vector that serves to indicate the significance of the individual features (coordinates) in the data grouping represented by the prototype. Since the topologies of these groupings are in general quite diverse the optimal weight vectors are reflecting the anisotropy of the feature space, i.e., they show some local ranking of features in the data space. Having found the prototypes we consider an inverse similarity problem and show how the relevance of the prototypes translates into their granularity.  相似文献   

11.
Data clustering is aimed at finding groups of data that share common hidden properties. These kinds of techniques are especially critical at early stages of data analysis where no information about the dataset is available. One of the mayor shortcomings of the clustering algorithms is the difficulty for non-experts users to configure them and, in some cases, interpret the results. In this work a computational approach with a two-layer structure based on Self-Organizing Map (SOM) is presented for cluster analysis. In the first level, a quantization of the data samples using topology-preserving metrics to automatically determine the number of units in the SOM is proposed. In the second level the obtained SOM prototypes are clustered by means of a connectivity analysis to explore the quality of the partitioning with different number of clusters. The most important benefit of this two-layer procedure is that computational load decreases considerably in comparison with data based clustering methods, making it possible to cluster large data sets and to consider several different clustering alternatives in a limited time. This methodology produces a two-dimensional map representation of the, usually, high dimensional input space, along with quantitative information on viable clustering alternatives, which facilitates the exploration of the possible partitions in a dataset. The efficiency and interpretation of the methodology is illustrated by its application to artificial, benchmark and real complex biological datasets. The experimental results demonstrate the ability of the method to identify possible segmentations in a dataset, compared to algorithms that only yield a single clustering solution. The proposed algorithm tackles the intrinsic limitations of SOM and the parameter settings associated with the clustering methodology, without requiring the number of clusters or the SOM architecture as a prerequisite, among others. This way, it makes possible its application even by researchers with a limited expertise in machine learning.  相似文献   

12.
In this paper, an efficient K-medians clustering (unsupervised) algorithm for prototype selection and Supervised K-medians (SKM) classification technique for protein sequences are presented. For sequence data sets, a median string/sequence can be used as the cluster/group representative. In K-medians clustering technique, a desired number of clusters, K, each represented by a median string/sequence, is generated and these median sequences are used as prototypes for classifying the new/test sequence whereas in SKM classification technique, median sequence in each group/class of labelled protein sequences is determined and the set of median sequences is used as prototypes for classification purpose. It is found that the K-medians clustering technique outperforms the leader based technique and also SKM classification technique performs better than that of motifs based approach for the data sets used. We further use a simple technique to reduce time and space requirements during protein sequence clustering and classification. During training and testing phase, the similarity score value between a pair of sequences is determined by selecting a portion of the sequence instead of the entire sequence. It is like selecting a subset of features for sequence data sets. The experimental results of the proposed method on K-medians, SKM and Nearest Neighbour Classifier (NNC) techniques show that the Classification Accuracy (CA) using the prototypes generated/used does not degrade much but the training and testing time are reduced significantly. Thus the experimental results indicate that the similarity score does not need to be calculated by considering the entire length of the sequence for achieving a good CA. Even space requirement is reduced during both training and classification.  相似文献   

13.
Adaptive fuzzy leader clustering of complex data sets in patternrecognition   总被引:1,自引:0,他引:1  
A modular, unsupervised neural network architecture that can be used for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns online in a stable and efficient manner. The system used a control structure similar to that found in the adaptive resonance theory (ART-1) network to identify the cluster centers initially. The initial classification of an input takes place in a two-stage process: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid position from fuzzy C-means (FCM) system equations for the centroids and the membership values. The operational characteristics of AFLC and the critical parameters involved in its operation are discussed. The AFLC algorithm is applied to the Anderson iris data and laser-luminescent finger image data. The AFLC algorithm successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets.  相似文献   

14.
An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.  相似文献   

15.
P.A.  M.  D.K.   《Pattern recognition》2006,39(12):2344-2355
Hybrid hierarchical clustering techniques which combine the characteristics of different partitional clustering techniques or partitional and hierarchical clustering techniques are interesting. In this paper, efficient bottom-up hybrid hierarchical clustering (BHHC) techniques have been proposed for the purpose of prototype selection for protein sequence classification. In the first stage, an incremental partitional clustering technique such as leader algorithm (ordered leader no update (OLNU) method) which requires only one database (db) scan is used to find a set of subcluster representatives. In the second stage, either a hierarchical agglomerative clustering (HAC) scheme or a partitional clustering algorithm—‘K-medians’ is used on these subcluster representatives to obtain a required number of clusters. Thus, this hybrid scheme is scalable and hence would be suitable for clustering large data sets and we also get a hierarchical structure consisting of clusters and subclusters and the representatives of which are used for pattern classification. Even if more number of prototypes are generated, classification time does not increase much as only a part of the hierarchical structure is searched. The experimental results (classification accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithms are compared with that of the hierarchical agglomerative schemes, K-medians and nearest neighbour classifier (NNC) methods. The proposed methods are found to be computationally efficient with reasonably good CA.  相似文献   

16.
Information granules form an abstract and efficient characterization of large volumes of numeric data. Fuzzy clustering is a commonly encountered information granulation approach. A reconstruction (degranulation) is about decoding information granules into numeric data. In this study, to enhance quality of reconstruction, we augment the generic data reconstruction approach by introducing a transformation mapping of the originally produced partition matrix and setting up an adjustment mechanism modifying a localization of the prototypes. We engage several population-based search algorithms to optimize interaction matrices and prototypes. A series of experimental results dealing with both synthetic and publicly available data sets are reported to show the enhancement of the data reconstruction performance provided by the proposed method.  相似文献   

17.
Incomplete data clustering is often encountered in practice. Here the treatment of missing attribute value and the optimization procedure of clustering are the important factors impacting the clustering performance. In this study, a missing attribute value becomes an information granule and is represented as a certain interval. To avoid intervals determined by different cluster information, we propose a congeneric nearest‐neighbor rule‐based architecture of the preclassification result, which can improve the effectiveness of estimation of missing attribute interval. Furthermore, a global fuzzy clustering approach using particle swarm optimization assisted by the Fuzzy C‐Means is proposed. A novel encoding scheme where particles are composed of the cluster prototypes and the missing attribute values is considered in the optimization procedure. The proposed approach improves the accuracy of clustering results, moreover, the missing attribute imputation can be implemented at the same time. The experimental results of several UCI data sets show the efficiency of the proposed approach.  相似文献   

18.
The mountain method of clustering and its relative, the subtractive clustering method, are studied here. A scheme to improve the accuracy of the prototypes obtained by the mountain method is proposed. Finally the mountain circular shell method to detect circular shells by using the mountain function is proposed. The proposed method is tested extensively on several synthetic data sets, and the results obtained are quite satisfactory. © 2000 John Wiley & Sons, Inc.  相似文献   

19.
This paper presents the development of soft clustering and learning vector quantization (LVQ) algorithms that rely on a weighted norm to measure the distance between the feature vectors and their prototypes. The development of LVQ and clustering algorithms is based on the minimization of a reformulation function under the constraint that the generalized mean of the norm weights be constant. According to the proposed formulation, the norm weights can be computed from the data in an iterative fashion together with the prototypes. An error analysis provides some guidelines for selecting the parameter involved in the definition of the generalized mean in terms of the feature variances. The algorithms produced from this formulation are easy to implement and they are almost as fast as clustering algorithms relying on the Euclidean norm. An experimental evaluation on four data sets indicates that the proposed algorithms outperform consistently clustering algorithms relying on the Euclidean norm and they are strong competitors to non-Euclidean algorithms which are computationally more demanding.  相似文献   

20.
In clustering algorithms, it is usually assumed that the number of clusters is known or given. In the absence of such a priori information, a procedure is needed to find an appropriate number of clusters. This paper presents a clustering algorithm that incorporates a mechanism for finding the appropriate number of clusters as well as the locations of cluster prototypes. This algorithm, called multi-scale clustering, is based on scale-space theory by considering that any prominent data structure ought to survive over many scales. The number of clusters as well as the locations of cluster prototypes are found in an objective manner by defining and using lifetime and drift speed clustering criteria. The outcome of this algorithm does not depend on the initial prototype locations that affect the outcome of many clustering algorithms. As an application of this algorithm, it is used to enhance the Hough transform technique.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号