期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

PolSOM: A new method for multidimensional data visualization

Lu Xu Author VitaeAuthor Vitae Tommy W.S. Chow^{Author Vitae} 《Pattern recognition》2010,43(4):1668-1675

In this paper, a new algorithm named polar self-organizing map (PolSOM) is proposed. PolSOM is constructed on a 2-D polar map with two variables, radius and angle, which represent data weight and feature, respectively. Compared with the traditional algorithms projecting data on a Cartesian map by using the Euclidian distance as the only variable, PolSOM not only preserves the data topology and the inter-neuron distance, it also visualizes the differences among clusters in terms of weight and feature. In PolSOM, the visualization map is divided into tori and circular sectors by radial and angular coordinates, and neurons are set on the boundary intersections of circular sectors and tori as benchmarks to attract the data with the similar attributes. Every datum is projected on the map with the polar coordinates which are trained towards the winning neuron. As a result, similar data group together, and data characteristics are reflected by their positions on the map. The simulations and comparisons with Sammon's mapping, SOM and ViSOM are provided based on four data sets. The results demonstrate the effectiveness of the PolSOM algorithm for multidimensional data visualization. 相似文献

2.

聚类结果可视化研究

许翔燕江永全杨燕张仕斌《微计算机信息》2007,23(12):190-191

聚类分析在数据挖掘研究中占有重要的位置。聚类结果的可视化则是用图形的方式直观地表现聚类质量的优劣。目前采用的聚类结果可视化方法多为统计学方法,如饼图、柱状图等。但是这些统计学方法只能反映簇与簇之间的数量关系、簇内成分的比例关系,没有具体到每一个对象,没有利用到每个对象所包含的信息。针对上述问题,本文提出三种聚类结果的可视化方法:随机点图、顺序点图、电子云图。其中,随机点图的优点是简单、易于实现;顺序点图的优点是可以反映具体哪一个对象被错分,并且适合动态显示聚类过程;电子云图的优点是可以反映每个对象与相应聚类中心的距离。相似文献

3.

Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density

Sitao Wu^{Author Vitae} Tommy W.S. Chow Author Vitae 《Pattern recognition》2004,37(2):175-188

The self-organizing map (SOM) has been widely used in many industrial applications. Classical clustering methods based on the SOM often fail to deliver satisfactory results, specially when clusters have arbitrary shapes. In this paper, through some preprocessing techniques for filtering out noises and outliers, we propose a new two-level SOM-based clustering algorithm using a clustering validity index based on inter-cluster and intra-cluster density. Experimental results on synthetic and real data sets demonstrate that the proposed clustering algorithm is able to cluster data better than the classical clustering algorithms based on the SOM, and find an optimal number of clusters. 相似文献

4.

Post-retrieval search hit clustering to improve information retrieval effectiveness: Two digital forensics case studies

Nicole Lang Beebe Jan Guynes Clark Glenn B. Dietrich Myung S. Ko Daijin KoAuthor vitae 《Decision Support Systems》2011,51(4):732-744

This research extends text mining and information retrieval research to the digital forensic text string search process. Specifically, we used a self-organizing neural network (a Kohonen Self-Organizing Map) to conceptually cluster search hits retrieved during a real-world digital forensic investigation. We measured information retrieval effectiveness (e.g., precision, recall, and overhead) of the new approach and compared them against the current approach. The empirical results indicate that the clustering process significantly reduces information retrieval overhead of the digital forensic text string search process, which is currently a very burdensome endeavor. 相似文献

5.

Comparison of visualization of optimal clustering using self-organizing map and growing hierarchical self-organizing map in cellular manufacturing system

《Applied Soft Computing》2014

The present research deals with the cell formation problem (CFP) of cellular manufacturing system which is a NP-hard problem thus, the development of optimum machine-part cell formation algorithms has always been the primary attraction in the design of cellular manufacturing system. In this proposed work, the self-organizing map (SOM) approach has been used which is able to project data from a high-dimensional space to a low-dimensional space so it is considered a visualized approach for explaining a complicated CFP data set. However, for a large data set with a high dimensionality, a traditional flat SOM seems difficult to further explain the concepts inside the clusters. We propose one such possible solution for a large CFP data set by using the SOM in a hierarchical manner known as growing hierarchical self-organizing map (GHSOM). In the present work, the two novel contributions using GHSOM are: the choice of optimum architecture through the minimum pattern units extracted at layer 1 for the respective threshold values and selection. Furthermore, the experimental results clearly indicated that the machine-part visual clustering using GHSOM can be successfully applied in identifying a cohesive set of part family that is processed by a machine group. Computational experience specifically with the proposed GHSOM algorithm, on a set of 15 CFP problems from the literature, has shown that it performs remarkably well. The GHSOM algorithm obtained solutions that are at least as good as the ones found the literature. For 75% of the cell formation problems, the GHSOM algorithm improved the goodness of cell formation through GTE performance measure using SOM as well as best one from the literature, in some cases by as much as more than 12.81% (GTE). Thus, comparing the results of the experiment in this paper with the SOM and GHSOM using the paired t-test it has been revealed that the GHSOM approach performed better than the SOM approach so far the group technology efficiency (GTE) measures of performance of the goodness of cell formation is concerned. 相似文献

6.

Apply extended self-organizing map to cluster and classify mixed-type data

Chung-Chian Hsu Shu-Han Lin Wei-Shen TaiAuthor vitae 《Neurocomputing》2011,74(18):3832-3842

Mixed numeric and categorical data are commonly seen nowadays in corporate databases in which precious patterns may be hidden. Analyzing mixed-type data to extract the hidden patterns valuable to decision-making is therefore beneficial and critical for corporations to remain competitive. In addition, visualization facilitates exploration in the early stage of data analysis. In the paper, we present a visualized approach to analyzing multivariate mixed-type data. The proposed framework based on an extended self-organizing map allows visualized data cluster analysis as well as classification. We demonstrate the feasibility of the approach by analyzing two real-world datasets and compare with other existing models to show its advantages. 相似文献

7.

自组织映射在Web结构挖掘中的应用 总被引：1，自引：0，他引：1

周晓峥刘勘孟波周洞汝《计算机工程与应用》2003,39(3):31-33

该文讨论了用自组织映射进行Web结构挖掘的基本方法。用SOM可直观地表示数据的相似性和进行分类,还可方便地进行数据聚簇分析,并可在Web挖掘中找到权威页面等有用信息。相似文献

8.

Self-organizing map network as an interactive clustering tool — An application to group technology

Melody Y. Uday R. Kar Yan 《Decision Support Systems》1995,15(4)

The Self-Organizing Map (SOM) network, a variation of neural computing networks, is a categorization network developed by Kohonen. The theory of the SOM network is motivated by the observation of the operation of the brain. This paper presents the technique of SOM and shows how it may be applied as a clustering tool to group technology. A computer program for implementing the SOM neural networks is developed and the results are compared with other clustering approaches used in group technology. The study demonstrates the potential of using the Self-Organizing Map as the clustering tool for part family formation in group technology. 相似文献

9.

I-Miner环境下聚类及可视化研究

下载免费PDF全文

侯天子杨燕谭维《计算机工程与应用》2010,46(2):113-117

聚类分析是数据挖掘中的核心技术,利用相关的可视化方法显示聚类结果,将数据分布以直观、形象的图形方式呈现给决策者,使得决策者可以直观地分析数据。I-Miner是一个企业级的数据挖掘工具,利用I-Miner软件进行聚类分析,并用多种方法将聚类结果可视化。通过S语言拓展软件功能,编程实现了K-Medoid算法、SOM算法、SOM与K-Medoids结合的聚类组合算法,尤其是在高维数据的可视化上,实现了星图法和SOM之U矩阵法,弥补软件中聚类和可视化模块较少的不足。相似文献

10.

PKOM: A tool for clustering,analysis and comparison of big chemical collections

《Digital Signal Processing》2016

We describe the algorithm underlying PKOM, a methodology for clustering, analysis and visualization of multi-dimensional data onto a two-dimensional map. PKOM is based on the mixture of two very popular methods that have been widely used by the pharmaceutical industry for the clustering of genomic or SAR (Structure Activity Relationship) chemical information. The first method at the origin of PKOM is SOM (Self-Organizing Maps), a clustering technique based on neural networks. The second method is TREE MAPS, a visualization method based on hierarchical clustering by dendrograms. We initially describe herein the two methods and the reasons why we have taken the best of both to merge them into PKOM. We then describe in detail the PKOM algorithm and its advantages compared to the two former. Examples are given on how to apply this kind of 2-D topological clustering technique to the organization of big pharmaceutical collections in practical cases. 相似文献

11.

Streaming association rule (SAR) mining with a weighted order-dependent representation of Web navigation patterns

YongSeog Kim 《Expert systems with applications》2009,36(4):7933-7946

This paper considers a problem of finding predictive and useful association rules with a new Web mining algorithm, a streaming association rule (SAR) model. We first adopt a weighted order-dependent scheme (assigning more weights for early visited pages) rather than taking a traditional Boolean scheme (assigning 1 for visited and 0 for non-visited pages). This way, we intend to improve the limited representation of navigation patterns in previous association rule mining (ARM) algorithms. We also note that most traditional association rule models are not scalable because they require multiple scans of all records to re-calibrate a predictive model when there are new updates in original databases. The proposed SAR model takes a “divide-and-conquer” approach and requires only single scan of data sets to avoid the curse of dimensionality. Through comparative experiments on a real-world data set, we show that prediction models based on a weighted order-dependent representation are more accurate in predicting the next moves of Web navigators than models based on a Boolean representation. In particular, when combined with several heuristics developed to eliminate redundant association rules, SAR models show a very comparable prediction accuracy while maintaining a small fraction of association rules compared to traditional ARM models. Finally, we quantify and graphically show the significance or contribution of each pages to forming unique rule sets in each database segments. 相似文献

12.

A combined measure for quantifying and qualifying the topology preservation of growing self-organizing maps

Soledad DelgadoAuthor Vitae Consuelo Gonzalo^{Author Vitae} 《Neurocomputing》2011,74(16):2624-2632

The Self-Organizing Map (SOM) is a neural network model that performs an ordered projection of a high dimensional input space in a low-dimensional topological structure. The process in which such mapping is formed is defined by the SOM algorithm, which is a competitive, unsupervised and nonparametric method, since it does not make any assumption about the input data distribution. The feature maps provided by this algorithm have been successfully applied for vector quantization, clustering and high dimensional data visualization processes. However, the initialization of the network topology and the selection of the SOM training parameters are two difficult tasks caused by the unknown distribution of the input signals. A misconfiguration of these parameters can generate a feature map of low-quality, so it is necessary to have some measure of the degree of adaptation of the SOM network to the input data model. The topology preservation is the most common concept used to implement this measure. Several qualitative and quantitative methods have been proposed for measuring the degree of SOM topology preservation, particularly using Kohonen's model. In this work, two methods for measuring the topology preservation of the Growing Cell Structures (GCSs) model are proposed: the topographic function and the topology preserving map. 相似文献

13.

基于粒子群优化的自组织特征映射神经网络及应用 总被引：6，自引：1，他引：5

吕强俞金寿《控制与决策》2005,20(10):1115-1119

采用粒子群优化(PSO)算法优化权重失真指数(LW D I),提出了基于粒子群优化的SOM(PSO-SOM)训练算法.用该算法取代K ohonen提出的启发式训练算法,同时引进核函数,以加强PSO-SOM算法的非线性聚类能力.以某工厂丙烯腈反应器数据为聚类应用研究对象,研究结果表明,与启发式训练算法相比,PSO-SOM算法能够得到较优的聚类,而且该算法实现简单、便于工程应用,对丙烯腈反应器参数调整以及收率监测具有显著的指导作用. 相似文献

14.

A text mining approach on automatic generation of web directories and hierarchies 总被引：1，自引：0，他引：1

Hsin-Chang Yang Chung-Hong Lee 《Expert systems with applications》2004,27(4):10274-663

The World Wide Web (WWW) has been recognized as the ultimate and unique source of information for information retrieval and knowledge discovery communities. Tremendous amount of knowledge are recorded using various types of media, producing enormous amount of web pages in the WWW. Retrieval of required information from the WWW is thus an arduous task. Different schemes for retrieving web pages have been used by the WWW community. One of the most widely used scheme is to traverse predefined web directories to reach a user's goal. These web directories are compiled or classified folders of web pages and are usually organized into hierarchical structures. The classification of web pages into proper directories and the organization of directory hierarchies are generally performed by human experts. In this work, we provide a corpus-based method that applies a kind of text mining techniques on a corpus of web pages to automatically create web directories and organize them into hierarchies. The method is based on the self-organizing map learning algorithm and requires no human intervention during the construction of web directories and hierarchies. The experiments show that our method can produce comprehensible and reasonable web directories and hierarchies. 相似文献

15.

Quick and precise clustering of arbitrarily shaped flat patterns based on stringy effect

S. K. Cheng K. P. Rao 《Computers & Industrial Engineering》1997,33(3-4):485-488

Grouping a given number of arbitrarily shaped flat patterns to form a cluster which occupies minimal-area convex enclosure is very useful in solving cutting stock problem. This study is aimed at improving the effectiveness of conventional clustering processes by incorporating a new technique for the determination of optimal conditions for the sliding process. The new technique is referred to as ‘stringy effect’ which is based on minimizing the distance between centroids of the patterns during clustering. The efficiency of the proposed method is shown with the help of some typical multiple flat patterns. 相似文献

16.

Efficient clustering of databases induced by local patterns 总被引：2，自引：0，他引：2

Animesh P.R. 《Decision Support Systems》2008,44(4):925-943

Many large organizations have multiple large databases as they transact from multiple branches. Most of the previous pieces of work are based on a single database. Thus, it is necessary to study data mining on multiple databases. In this paper, we propose two measures of similarity between a pair of databases. Also, we propose an algorithm for clustering a set of databases. Efficiency of the clustering process has been improved using the following strategies: reducing execution time of clustering algorithm, using more appropriate similarity measure, and storing frequent itemsets space efficiently. 相似文献

17.

基于SOM聚类的可视化方法及应用研究

刘芳《计算机应用研究》2012,29(4):1300-1303

提出了用无监督的自组织映射方法对金融数据进行聚类,并用平行坐标和交互式的圆形平行坐标方法在二维平面上表示出来。用这种方法形成清晰的可视化聚类结果,不仅有效地总结了数据特征,还提高了聚类的可视效果,从而便于发现数据的变化趋势。相似文献

18.

Dynamic clustering of energy markets: An extended hidden Markov approach

《Expert systems with applications》2014,41(17):7722-7729

This paper studies the synchronization of energy markets using an extended hidden Markov model that captures between- and within-heterogeneity in time series by defining clusters and hidden states, respectively. The model is applied to U.S. data in the period from 1999 to 2012. While oil and natural gas returns are well portrayed by two volatility states, electricity markets need three additional states: two transitory and one to capture a period of abnormally high volatility. Although some states are common to both clusters, results favor the segmentation of energy markets as they are not in the same state at the same time. 相似文献

19.

Prediction of user navigation patterns by mining the temporal web usage evolution

Vincent S. Tseng Kawuu Weicheng Lin Jeng-Chuan Chang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2008,12(2):157-163

Advances in the data mining technologies have enabled the intelligent Web abilities in various applications by utilizing the hidden user behavior patterns discovered from the Web logs. Intelligent methods for discovering and predicting user’s patterns is important in supporting intelligent Web applications like personalized services. Although numerous studies have been done on Web usage mining, few of them consider the temporal evolution characteristic in discovering web user’s patterns. In this paper, we propose a novel data mining algorithm named Temporal N-Gram (TN-Gram) for constructing prediction models of Web user navigation by considering the temporality property in Web usage evolution. Moreover, three kinds of new measures are proposed for evaluating the temporal evolution of navigation patterns under different time periods. Through experimental evaluation on both of real-life and simulated datasets, the proposed TN-Gram model is shown to outperform other approaches like N-gram modeling in terms of prediction precision, in particular when the web user’s navigating behavior changes significantly with temporal evolution. 相似文献

20.

Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm 总被引：1，自引：0，他引：1

Sungjune Nallan C. Bong-Keun 《Data & Knowledge Engineering》2008,65(3):512-543

We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed and its superior performance is demonstrated. 相似文献