首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Generalizing self-organizing map for categorical data   总被引:1,自引:0,他引:1  
The self-organizing map (SOM) is an unsupervised neural network which projects high-dimensional data onto a low-dimensional grid and visually reveals the topological order of the original data. Self-organizing maps have been successfully applied to many fields, including engineering and business domains. However, the conventional SOM training algorithm handles only numeric data. Categorical data are usually converted to a set of binary data before training of an SOM takes place. If a simple transformation scheme is adopted, the similarity information embedded between categorical values may be lost. Consequently, the trained SOM is unable to reflect the correct topological order. This paper proposes a generalized self-organizing map model that offers an intuitive method of specifying the similarity between categorical values via distance hierarchies and, hence, enables the direct process of categorical values during training. In fact, distance hierarchy unifies the distance computation of both numeric and categorical values. The unification is done by mapping the values to distance hierarchies and then measuring the distance in the hierarchies. Experiments on synthetic and real datasets were conducted, and the results demonstrated the effectiveness of the generalized SOM model.  相似文献   

2.
This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data.  相似文献   

3.
In this paper, a new algorithm named polar self-organizing map (PolSOM) is proposed. PolSOM is constructed on a 2-D polar map with two variables, radius and angle, which represent data weight and feature, respectively. Compared with the traditional algorithms projecting data on a Cartesian map by using the Euclidian distance as the only variable, PolSOM not only preserves the data topology and the inter-neuron distance, it also visualizes the differences among clusters in terms of weight and feature. In PolSOM, the visualization map is divided into tori and circular sectors by radial and angular coordinates, and neurons are set on the boundary intersections of circular sectors and tori as benchmarks to attract the data with the similar attributes. Every datum is projected on the map with the polar coordinates which are trained towards the winning neuron. As a result, similar data group together, and data characteristics are reflected by their positions on the map. The simulations and comparisons with Sammon's mapping, SOM and ViSOM are provided based on four data sets. The results demonstrate the effectiveness of the PolSOM algorithm for multidimensional data visualization.  相似文献   

4.
Understanding the inherent structure of high-dimensional datasets is a very challenging task. This can be tackled from visualization, summarizing or simply clustering points of view. The Self-Organizing Map (SOM) is a powerful and unsupervised neural network to resolve these kinds of problems. By preserving the data topology mapped onto a grid, SOM can facilitate visualization of data structure. However, classical SOM still suffers from the limits of its predefined structure. Growing variants of SOM can overcome this problem, since they have tried to define a network structure with no need an advance a fixed number of output units by dynamic growing architecture. In this paper we propose a new dynamic SOMs called MIGSOM: Multilevel Interior Growing SOMs for high-dimensional data clustering. MIGSOM present a different architecture than dynamic variants presented in the literature. Using an unsupervised training process MIGSOM has the capability of growing map size from the boundaries as well as the interior of the network in order to represent more faithfully the structure present in a data collection. As a result, MIGSOM can have three-dimensional (3-D) structure with different levels of oriented maps developed according to data direction. We demonstrate the potential of the MIGSOM with real-world datasets of high-dimensional properties in terms of topology preserving visualization, vectors summarizing by efficient quantization and data clustering. In addition, MIGSOM achieves better performance compared to growing grid and the classical SOM.  相似文献   

5.
针对基于密度的传统算法不能处理混合属性数据,以及目前的混合属性聚类算法大多数聚类质量不高等问题,提出了基于密度和混合距离度量方法的混合属性聚类算法.该算法通过分析混合属性数据特征,将混合属性数据分为数值占优、分类占优和均衡型混合属性数据3类,分析不同情况的特征选取相应的距离度量方式,通过预设参数能够发现数据密集区域,确定核心点,再利用核心点确定密度相连的对象实现聚类,获得最终的聚类结果.将算法应用于多种数据集上的实验结果表明,该算法具有较高的聚类质量,能够有效处理混合属性数据.  相似文献   

6.
一种新的基于SOM的数据可视化算法   总被引:1,自引:0,他引:1  
SOM(self—organizing map)所具有的拓扑保持特性使之可用来对高维数据进行低维展现,但由于数据间的距离信息在映射到低维空间中固定有序的神经元上时被丢掉了,因此数据的结构通常是被扭曲了的.为了更自然地展现数据的结构,提出了一种新的基于SOM的数据可视化算法——DPSOM(distance-preserving SOM),它能够按照相应的距离信息对神经元的位置进行自适应调节,从而实现了对数据间距离信息的直观展现,特别地,该算法还能自动避免神经元的过度收缩问题,从而极大地提高了算法的可控性和数据可视化的质量.  相似文献   

7.
面向分类数据的自组织神经网络   总被引:1,自引:2,他引:1  
作为一种优良的聚类和降维工具,自组织神经网络SOM(SelfOrganizingFeatureMaps)已经得到广泛应用。其不足之处是仅适合于数值数据,这对时常需要处理分类型数据(Categoricalvalueddata)或数值型与分类型混合数据(Mixednumericandcategoricalvalueddata)的数据挖掘应用是不够的。该文提出了一种新的基于覆盖(Overlap)的距离函数并将其用于SOM训练。实验结果表明,在不增加时空开销的前提下可取得较好的聚类效果。  相似文献   

8.
When used for visualization of high-dimensional data, the self-organizing map (SOM) requires a coloring scheme, such as the U-matrix, to mark the distances between neurons. Even so, the structures of the data clusters may not be apparent and their shapes are often distorted. In this paper, a visualization-induced SOM (ViSOM) is proposed to overcome these shortcomings. The algorithm constrains and regularizes the inter-neuron distance with a parameter that controls the resolution of the map. The mapping preserves the inter-point distances of the input data on the map as well as the topology. It produces a graded mesh in the data space such that the distances between mapped data points on the map resemble those in the original space, like in the Sammon mapping. However, unlike the Sammon mapping, the ViSOM can accommodate both training data and new arrivals and is much simpler in computational complexity. Several experimental results and comparisons with other methods are presented.  相似文献   

9.
提出了一种基于核的双自组织特征映射网络.该网络通过同时使用两个相关的映射网络扩展了原有的自组织神经网络,针对自组织特征映射网络容易受到高噪声的影响,将核学习的方法应用于自组织映射聚类中,以核函数代替了原始数据在特征空间中映射值的内积,传统的SOM算法使用的是欧氏距离,而KSOM通过使用不同的核函数为原始空间诱导出不同的欧式距离,这样就提高了算法的鲁棒性.将改进后的神经网络用于金融时间序列的预测,其实验结果表明,改进后的神经网络具有较强的鲁棒性.  相似文献   

10.
在现实世界中经常遇到混合数值属性和分类属性的数据, k-prototypes是聚类该类型数据的主要算法之一。针对现有混合属性聚类算法的不足,提出一种基于分布式质心和新差异测度的改进的 k-prototypes 算法。在新算法中,首先引入分布式质心来表示簇中的分类属性的簇中心,然后结合均值和分布式质心来表示混合属性的簇中心,并提出一种新的差异测度来计算数据对象与簇中心的距离,新差异测度考虑了不同属性在聚类过程中的重要性。在三个真实数据集上的仿真实验表明,与传统的聚类算法相比,本文算法的聚类精度要优于传统的聚类算法,从而验证了本文算法的有效性。  相似文献   

11.
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability.  相似文献   

12.
A new multi-layer self-organizing map (MLSOM) is proposed for unsupervised processing tree-structured data. The MLSOM is an improved self-organizing map for handling structured data. By introducing multiple SOM layers, the MLSOM can overcome the computational speed and visualization problems of SOM for structured data (SOM-SD). Node data in different levels of a tree are processed in different layers of the MLSOM. Root nodes are dedicatedly processed on the top SOM layer enabling the MLSOM a better utilization of SOM map compared with the SOM-SD. Thus, the MLSOM exhibits better data organization, clustering, visualization, and classification results of tree-structured data. Experimental results on three different data sets demonstrate that the proposed MLSOM approach can be more efficient and effective than the SOM-SD.  相似文献   

13.
陈黎飞  郭躬德 《软件学报》2013,24(11):2628-2641
类属型数据广泛分布于生物信息学等许多应用领域,其离散取值的特点使得类属数据聚类成为统计机器学习领域一项困难的任务.当前的主流方法依赖于类属属性的模进行聚类优化和相关属性的权重计算.提出一种非模的类属型数据统计聚类方法.首先,基于新定义的相异度度量,推导了属性加权的类属数据聚类目标函数.该函数以对象与簇之间的平均距离为基础,从而避免了现有方法以模为中心导致的问题.其次,定义了一种类属型数据的软子空间聚类算法.该算法在聚类过程中根据属性取值的总体分布,而不仅限于属性的模,赋予每个属性衡量其与簇类相关程度的权重,实现自动的特征选择.在合成数据和实际应用数据集上的实验结果表明,与现有的基于模的聚类算法和基于蒙特卡罗优化的其他非模算法相比,该算法有效地提高了聚类结果的质量.  相似文献   

14.
Interpreting the geometry of geological objects is a standard activity of field-based geologists. We present new graphics tools that will aid in extending this activity from 2-D geological mapping into a 3-D environment. Much of the existing 3-D geological modeling software supports the construction of objects with the input of dense control data. However, for regional mapping and near mine exploration work, sparse data is the norm. Tools are required therefore, which give the expert interpreter full control of the graphics objects, while at the same time constraining interpretations to specific control data from field observations. We present the initial results of a software design and programming project for the visualization of complex regional scale geologic objects using Bézier-based graphics tools that are optimized for sparse data interpretation. We also introduce the concept of a structural ribbon, which is a 3-D extended map trace, along with methods for the optimization of surface construction using graphical grip frames.  相似文献   

15.
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mixed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.  相似文献   

16.
The Self-Organizing Map (SOM) is a neural network model that performs an ordered projection of a high dimensional input space in a low-dimensional topological structure. The process in which such mapping is formed is defined by the SOM algorithm, which is a competitive, unsupervised and nonparametric method, since it does not make any assumption about the input data distribution. The feature maps provided by this algorithm have been successfully applied for vector quantization, clustering and high dimensional data visualization processes. However, the initialization of the network topology and the selection of the SOM training parameters are two difficult tasks caused by the unknown distribution of the input signals. A misconfiguration of these parameters can generate a feature map of low-quality, so it is necessary to have some measure of the degree of adaptation of the SOM network to the input data model. The topology preservation is the most common concept used to implement this measure. Several qualitative and quantitative methods have been proposed for measuring the degree of SOM topology preservation, particularly using Kohonen's model. In this work, two methods for measuring the topology preservation of the Growing Cell Structures (GCSs) model are proposed: the topographic function and the topology preserving map.  相似文献   

17.
Automatic cluster detection in Kohonen's SOM.   总被引:1,自引:0,他引:1  
Kohonen's self-organizing map (SOM) is a popular neural network architecture for solving problems in the field of explorative data analysis, clustering, and data visualization. One of the major drawbacks of the SOM algorithm is the difficulty for nonexpert users to interpret the information contained in a trained SOM. In this paper, this problem is addressed by introducing an enhanced version of the Clusot algorithm. This algorithm consists of two main steps: 1) the computation of the Clusot surface utilizing the information contained in a trained SOM and 2) the automatic detection of clusters in this surface. In the Clusot surface, clusters present in the underlying SOM are indicated by the local maxima of the surface. For SOMs with 2-D topology, the Clusot surface can, therefore, be considered as a convenient visualization technique. Yet, the presented approach is not restricted to a certain type of 2-D SOM topology and it is also applicable for SOMs having an n-dimensional grid topology.  相似文献   

18.
“Best K”: critical clustering structures in categorical datasets   总被引:2,自引:2,他引:0  
The demand on cluster analysis for categorical data continues to grow over the last decade. A well-known problem in categorical clustering is to determine the best K number of clusters. Although several categorical clustering algorithms have been developed, surprisingly, none has satisfactorily addressed the problem of best K for categorical clustering. Since categorical data does not have an inherent distance function as the similarity measure, traditional cluster validation techniques based on geometric shapes and density distributions are not appropriate for categorical data. In this paper, we study the entropy property between the clustering results of categorical data with different K number of clusters, and propose the BKPlot method to address the three important cluster validation problems: (1) How can we determine whether there is significant clustering structure in a categorical dataset? (2) If there is significant clustering structure, what is the set of candidate “best Ks”? (3) If the dataset is large, how can we efficiently and reliably determine the best Ks?  相似文献   

19.
A self-organizing map (SOM) is a nonlinear, unsupervised neural network model that could be used for applications of data clustering and visualization. One of the major shortcomings of the SOM algorithm is the difficulty for non-expert users to interpret the information involved in a trained SOM. In this paper, this problem is tackled by introducing an enhanced version of the proposed visualization method which consists of three major steps: (1) calculating single-linkage inter-neuron distance, (2) calculating the number of data points in each neuron, and (3) finding cluster boundary. The experimental results show that the proposed approach has the strong ability to demonstrate the data distribution, inter-neuron distances, and cluster boundary, effectively. The experimental results indicate that the effects of visualization of the proposed algorithm are better than that of other visualization methods. Furthermore, our proposed visualization scheme is not only intuitively easy understanding of the clustering results, but also having good visualization effects on unlabeled data sets.  相似文献   

20.
Hierarchical clustering of mixed data based on distance hierarchy   总被引:1,自引:0,他引:1  
Data clustering is an important data mining technique which partitions data according to some similarity criterion. Abundant algorithms have been proposed for clustering numerical data and some recent research tackles the problem of clustering categorical or mixed data. Unlike the subtraction scheme used for numerical attributes, there is no standard for measuring distance between categorical values. In this article, we propose a distance representation scheme, distance hierarchy, which facilitates expressing the similarity between categorical values and also unifies distance measuring of numerical and categorical values. We then apply the scheme to mixed data clustering, in particular, to integrate with a hierarchical clustering algorithm. Consequently, this integrated approach can uniformly handle numerical data and categorical data, and also enables one to take the similarity between categorical values into consideration. Experimental results show that the proposed approach produces better clustering results than conventional clustering algorithms when categorical attributes are present and their values have different degree of similarity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号