首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Clustering properties of hierarchical self-organizing maps   总被引:1,自引:0,他引:1  
A multilayer hierarchical self-organizing map (HSOM) is discussed as an unsupervised clustering method. The HSOM is shown to form arbitrarily complex clusters, in analogy with multilayer feedforward networks. In addition, the HSOM provides a natural measure for the distance of a point from a cluster that weighs all the points belonging to the cluster appropriately. In experiments with both artificial and real data it is demonstrated that the multilayer SOM forms clusters that match better to the desired classes than do direct SOM's, classical k-means, or Isodata algorithms.  相似文献   

2.
The spatially unique properties of degraded meadows (heitutan) on the Qinghai–Tibet Plateau offer an excellent opportunity in assessing the utility and effectiveness of various knowledge in their automatic and accurate mapping from satellite imagery. After a Landsat Operational Land Imager image was K-means clustered to produce a degradation severity map, it was also used to generate a normalized difference vegetation index (NDVI) map that was subsequently converted to a degradation severity map, as well. Both maps were further refined with spatial knowledge derived from topographic data and the image via onscreen digitization. It is found that elevation is a more useful knowledge than channel in refining the image-derived results. It can reduce the area of K-means clustering results by 37% through exclusion of non-genuine heitutan at a very high elevation. This knowledge is especially beneficial to severe and slight heitutan. Channel knowledge is less effective by reducing the mapped heitutan by 15%, with a similar pace of reduction across all three classes of degradation severity. However, it is more useful in refining the NDVI-derived results than with the K-means results as all sparsely vegetated areas were indiscriminately lumped together. Both K-means clustering and NDVI produced drastically different results, but they converge closely with each other with a disparity of only 6% between them after the application of the spatial knowledge. Both methods achieved a similar overall mapping accuracy around 70%. Slope gradient and aspect are of limited use to the mapping due to lack of distinction between degraded heitutan and intact meadows. More research should focus on the universality of the knowledge and the impact of scale on the findings.  相似文献   

3.

Organizing and optimizing production in small and medium enterprises with small batch production and many different products can be very difficult. This paper presents an approach to organize the production cells by means of clustering-manufactured products into groups with similar product properties. Several clustering methods are compared, including the hierarchical clustering, k-means and self-organizing map (SOM) clustering. Clustering methods are applied to production data describing 252 products from a Slovenian company KGL. The best clustering result, evaluated by an average silhouette width for a total data set, is obtained by SOM clustering. In order to make clustering results applicable to the industrial production cell planning, an interpretation method is proposed. The method is based on percentile margins that reflect the requirements of each production cell and is further improved by incorporating the economic values of each product and consequently the economic impact of each production cell. Obtained results can be considered as a recommendation to the production floor planning that will optimize the production resources and minimize the work and material flow transfer between the production cells.

  相似文献   

4.
Modern day computers cannot provide optimal solution to the clustering problem. There are many clustering algorithms that attempt to provide an approximation of the optimal solution. These clustering techniques can be broadly classified into two categories. The techniques from first category directly assign objects to clusters and then analyze the resulting clusters. The methods from second category adjust representations of clusters and then determine the object assignments. In terms of disciplines, these techniques can be classified as statistical, genetic algorithms based, and neural network based. This paper reports the results of experiments comparing five different approaches: hierarchical grouping, object-based genetic algorithms, cluster-based genetic algorithms, Kohonen neural networks, and K-means method. The comparisons consist of the time requirements and within-group errors. The theoretical analyses were tested for clustering of highway sections and supermarket customers. All the techniques were applied to clustering of highway sections. The hierarchical grouping and genetic algorithms approaches were computationally infeasible for clustering a larger set of supermarket customers. Hence only Kohonen neural networks and K-means techniques were applied to the second set to confirm some of the results from previous experiments.  相似文献   

5.
目的 高光谱图像波段数目巨大,导致在解译及分类过程中出现“维数灾难”的现象。针对该问题,在K-means聚类算法基础上,考虑各个波段对不同聚类的重要程度,同时顾及类间信息,提出一种基于熵加权K-means全局信息聚类的高光谱图像分类算法。方法 首先,引入波段权重,用来刻画各个波段对不同聚类的重要程度,并定义熵信息测度表达该权重。其次,为避免局部最优聚类,引入类间距离测度实现全局最优聚类。最后,将上述两类测度引入K-means聚类目标函数,通过最小化目标函数得到最优分类结果。结果 为了验证提出的高光谱图像分类方法的有效性,对Salinas高光谱图像和Pavia University高光谱图像标准图中的地物类别根据其光谱反射率差异程度进行合并,将合并后的标准图作为新的标准分类图。分别采用本文算法和传统K-means算法对Salinas高光谱图像和Pavia University高光谱图像进行实验,并定性、定量地评价和分析了实验结果。对于图像中合并后的地物类别,光谱反射率差异程度大,从视觉上看,本文算法较传统K-means算法有更好的分类结果;从分类精度看,本文算法的总精度分别为92.20%和82.96%, K-means算法的总精度分别为83.39%和67.06%,较K-means算法增长8.81%和15.9%。结论 提出一种基于熵加权K-means全局信息聚类的高光谱图像分类算法,实验结果表明,本文算法对高光谱图像中具有不同光谱反射率差异程度的各类地物目标均能取得很好的分类结果。  相似文献   

6.
To cluster web documents, all of which have the same name entities, we attempted to use existing clustering algorithms such as K-means and spectral clustering. Unexpectedly, it turned out that these algorithms are not effective to cluster web documents. According to our intensive investigation, we found that clustering such web pages is more complicated because (1) the number of clusters (known as ground truth) is larger than two or three clusters as in general clustering problems and (2) clusters in the data set have extremely skewed distributions of cluster sizes. To overcome the aforementioned problem, in this paper, we propose an effective clustering algorithm to boost up the accuracy of K-means and spectral clustering algorithms. In particular, to deal with skewed distributions of cluster sizes, our algorithm performs both bisection and merge steps based on normalized cuts of the similarity graph G to correctly cluster web documents. Our experimental results show that our algorithm improves the performance by approximately 56% compared to spectral bisection and 36% compared to K-means.  相似文献   

7.
Unlike conventional unsupervised classification methods, such as K‐means and ISODATA, which are based on partitional clustering techniques, the methodology proposed in this work attempts to take advantage of the properties of Kohonen's self‐organizing map (SOM) together with agglomerative hierarchical clustering methods to perform the automatic classification of remotely sensed images. The key point of the proposed method is to execute the cluster analysis process by means of a set of SOM prototypes, instead of working directly with the original patterns of the image. This strategy significantly reduces the complexity of the data analysis, making it possible to use techniques that have not normally been considered viable in the processing of remotely sensed images, such as hierarchical clustering methods and cluster validation indices. Through the use of the SOM, the proposed method maps the original patterns of the image to a two‐dimensional neural grid, attempting to preserve the probability distribution and topology of the input space. Afterwards, an agglomerative hierarchical clustering method with restricted connectivity is applied to the trained neural grid, generating a simplified dendrogram for the image data. Utilizing SOM statistic properties, the method employs modified versions of cluster validation indices to automatically determine the ideal number of clusters for the image. The experimental results show examples of the application of the proposed methodology and compare its performance to the K‐means algorithm.  相似文献   

8.
Security assessment is a major concern in planning and operation studies of a power system. Conventional method of security evaluation performed by simulation involves long computer time and generates voluminous results. This paper presents a K-means clustering approach for classifying power system states as secure/insecure under a given operating condition and contingency. This paper demonstrates how the traditional K-means clustering algorithm can be profitably modified to be used as a classifier algorithm. The proposed algorithm combines particle swarm optimization (PSO) with the traditional K-means algorithm to satisfy the requirements of a classifier. The proposed PSO based K-means clustering technique is implemented in IEEE 30 Bus, 57 Bus, 118 Bus and 300 Bus standard test systems for static security and transient security evaluation. The simulation results of the proposed algorithm are compared with unsupervised K-means clustering, which uses different methods for cluster center initialization.  相似文献   

9.
Harmony K-means algorithm for document clustering   总被引:2,自引:0,他引:2  
Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known optimum faster than other methods and the quality of clusters are comparable.  相似文献   

10.
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm.  相似文献   

11.
Data clustering is a process of extracting similar groups of the underlying data whose labels are hidden. This paper describes different approaches for solving data clustering problem. Particle swarm optimization (PSO) has been recently used to address clustering task. An overview of PSO-based clustering approaches is presented in this paper. These approaches mimic the behavior of biological swarms seeking food located in different places. Best locations for finding food are in dense areas and in regions far enough from others. PSO-based clustering approaches are evaluated using different data sets. Experimental results indicate that these approaches outperform K-means, K-harmonic means, and fuzzy c-means clustering algorithms.  相似文献   

12.
The phenoregion delineation facilitates more effective monitoring and more accurate forecasting of land-surface phenology (LSP), and thereby can greatly improve natural resources management. This article delineated a series of phenoregion maps by applying the Dynamic-Time-Warping (DTW)-based k-means++ clustering on normalized difference vegetation index (NDVI) time series. The DTW distance, a well-known shape-based similarity measure for time series data, was used as the distance measure instead of the traditional Euclidean distance in k-means++ clustering. These phenoregion maps were compared with the ones clustered based on the similarity of phenological forcing variables. The results demonstrated that the DTW-based k-means++ clustering can capture much more homogeneous phenological cycles within each phenoregion; the two types of phenoregion maps have a medium degree of spatial concordance, and their representativeness of vegetation types is comparable. The phenocycle-based phenoregion map with 15 phenoregions was selected as the optimal one, based on the criteria of cluster cohesion and separation.  相似文献   

13.
一种半监督K均值多关系数据聚类算法   总被引:1,自引:0,他引:1  
高滢  刘大有  齐红  刘赫 《软件学报》2008,19(11):2814-2821
提出了一种半监督K均值多关系数据聚类算法.该算法在K均值聚类算法的基础上扩展了其初始类簇的选择方法和对象相似性度量方法,以用于多关系数据的半监督学习.为了获取高性能,该算法在聚类过程中充分利用了标记数据、对象属性及各种关系信息.多关系数据库Movie上的实验结果验证了该算法的有效性.  相似文献   

14.
A self-organizing map (SOM) is a nonlinear, unsupervised neural network model that could be used for applications of data clustering and visualization. One of the major shortcomings of the SOM algorithm is the difficulty for non-expert users to interpret the information involved in a trained SOM. In this paper, this problem is tackled by introducing an enhanced version of the proposed visualization method which consists of three major steps: (1) calculating single-linkage inter-neuron distance, (2) calculating the number of data points in each neuron, and (3) finding cluster boundary. The experimental results show that the proposed approach has the strong ability to demonstrate the data distribution, inter-neuron distances, and cluster boundary, effectively. The experimental results indicate that the effects of visualization of the proposed algorithm are better than that of other visualization methods. Furthermore, our proposed visualization scheme is not only intuitively easy understanding of the clustering results, but also having good visualization effects on unlabeled data sets.  相似文献   

15.
Search results of spatio-temporal data are often displayed on a map, but when the number of matching search results is large, it can be time-consuming to individually examine all results, even when using methods such as filtered search to narrow the content focus. This suggests the need to aggregate results via a clustering method. However, standard unsupervised clustering algorithms like K-means (i) ignore relevance scores that can help with the extraction of highly relevant clusters, and (ii) do not necessarily optimize search results for purposes of visual presentation. In this article, we address both deficiencies by framing the clustering problem for search-driven user interfaces in a novel optimization framework that (i) aims to maximize the relevance of aggregated content according to cluster-based extensions of standard information retrieval metrics and (ii) defines clusters via constraints that naturally reflect interface-driven desiderata of spatial, temporal, and keyword coherence that do not require complex ad-hoc distance metric specifications as in K-means. After comparatively benchmarking algorithmic variants of our proposed approach – RadiCAL – in offline experiments, we undertake a user study with 24 subjects to evaluate whether RadiCAL improves human performance on visual search tasks in comparison to K-means clustering and a filtered search baseline. Our results show that (a) our binary partitioning search (BPS) variant of RadiCAL is fast, near-optimal, and extracts higher-relevance clusters than K-means, and (b) clusters optimized via RadiCAL result in faster search task completion with higher accuracy while requiring a minimum workload leading to high effectiveness, efficiency, and user satisfaction among alternatives.  相似文献   

16.
We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k. Editor: Olivier Bousquet and Andre Elisseeff A preliminary version of this work appeared in the proceedings of COLT’04 (Ben-David, 2004). This work is supported in part by the Multidisciplinary University Research Initiative (MURI) under the Office of Naval Research Contract N00014-00-1-0564.  相似文献   

17.
Strategic group analysis comprises of clustering of firms within an industry according to their similarities with respect to a set of strategic dimensions and investigating the performance implications of strategic group membership. One of the challenges of strategic group analysis is the selection of the clustering method. In this study, the results of the strategic group analysis of Turkish contractors are presented to compare the performances of traditional cluster analysis techniques, self-organizing maps (SOM) and fuzzy C-means method (FCM) for strategic grouping. Findings reveal that traditional cluster analysis methods cannot disclose the overlapping strategic group structure and position of companies within the same strategic group. It is concluded that SOM and FCM can reveal the typology of the strategic groups better than traditional cluster analysis and they are more likely to provide useful information about the real strategic group structure.  相似文献   

18.
结合限制的分隔模型及K-Means算法   总被引:7,自引:0,他引:7       下载免费PDF全文
何振峰  熊范纶 《软件学报》2005,16(5):799-809
将数据对象间的关联限制与K-means算法结合可以取得较好的效果,但由于划分是由K个中心决定的,每一类仅由一个中心决定,分隔的表示方法限制了算法效果的进一步提高.基于数据对象间的两类限制,定义了数据对象和集合间的两类关联,以及集合间的3类关联,在此基础上给出了结合限制的分隔模型.在模型中,基于集合间的正关联,多个子集中心可以用来表示同一类,使划分的表示可以更为灵活、精细.基于此模型,给出了相应的算法CKS(constrained K-meanswith subsets)来生成结合限制的分隔.对3个UCI数据集的实验结果显示:在准确率及健壮性上,CKS显著优于另一个结合关联限制的K-means类算法COP-K-means,与另一个代表性的算法CCL相比,也有相当优势;在时间代价上,CKS也有一定优势.  相似文献   

19.
利用自组织特征映射神经网络进行可视化聚类   总被引:5,自引:0,他引:5  
白耀辉  陈明 《计算机仿真》2006,23(1):180-183
自组织特征映射作为一种神经网络方法,在数据挖掘、机器学习和模式分类中得到了广泛的应用。它将高维输人空间的数据映射到一个低维、规则的栅格上,从而可以利用可视化技术探测数据的固有特性。该文说明了自组织特征映射神经网络的工作原理和具体实现算法,同时利用一个算例展示了利用自组织特征映射进行聚类时的可视化特性,包括聚类过程的可视化和聚类结果的可视化,这也是自组织特征映射得到广泛应用的原因之一。  相似文献   

20.
The present paper considers the problem of partitioning a dataset into a known number of clusters using the sum of squared errors criterion (SSE). A new clustering method, called DE-KM, which combines differential evolution algorithm (DE) with the well known K-means procedure is described. In the method, the K-means algorithm is used to fine-tune each candidate solution obtained by mutation and crossover operators of DE. Additionally, a reordering procedure which allows the evolutionary algorithm to tackle the redundant representation problem is proposed. The performance of the DE-KM clustering method is compared to the performance of differential evolution, global K-means method, genetic K-means algorithm and two variants of the K-means algorithm. The experimental results show that if the number of clusters K is sufficiently large, DE-KM obtains solutions with lower SSE values than the other five algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号