The present research deals with the cell formation problem (CFP) of cellular manufacturing system which is a NP-hard problem thus, the development of optimum machine-part cell formation algorithms has always been the primary attraction in the design of cellular manufacturing system. In this proposed work, the self-organizing map (SOM) approach has been used which is able to project data from a high-dimensional space to a low-dimensional space so it is considered a visualized approach for explaining a complicated CFP data set. However, for a large data set with a high dimensionality, a traditional flat SOM seems difficult to further explain the concepts inside the clusters. We propose one such possible solution for a large CFP data set by using the SOM in a hierarchical manner known as growing hierarchical self-organizing map (GHSOM). In the present work, the two novel contributions using GHSOM are: the choice of optimum architecture through the minimum pattern units extracted at layer 1 for the respective threshold values and selection. Furthermore, the experimental results clearly indicated that the machine-part visual clustering using GHSOM can be successfully applied in identifying a cohesive set of part family that is processed by a machine group. Computational experience specifically with the proposed GHSOM algorithm, on a set of 15 CFP problems from the literature, has shown that it performs remarkably well. The GHSOM algorithm obtained solutions that are at least as good as the ones found the literature. For 75% of the cell formation problems, the GHSOM algorithm improved the goodness of cell formation through GTE performance measure using SOM as well as best one from the literature, in some cases by as much as more than 12.81% (GTE). Thus, comparing the results of the experiment in this paper with the SOM and GHSOM using the paired t-test it has been revealed that the GHSOM approach performed better than the SOM approach so far the group technology efficiency (GTE) measures of performance of the goodness of cell formation is concerned. 相似文献
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster density, we propose that the clustering validity index be used not only globally to find optimal partitions of input data, but also locally to determine which two neighboring clusters are to be merged in a hierarchical clustering of Self-Organizing Map (SOM). A new two-level SOM-based clustering algorithm using the clustering validity index is also proposed. Experimental results on synthetic and real data sets demonstrate that the proposed clustering algorithm is able to cluster data in a better way than classical clustering algorithms on an SOM. 相似文献
Even though Self-Organizing Maps (SOMs) constitute a powerful and essential tool for pattern recognition and data mining, the common SOM algorithm is not apt for processing categorical data, which is present in many real datasets. It is for this reason that the categorical values are commonly converted into a binary code, a solution that unfortunately distorts the network training and the posterior analysis. The present work proposes a SOM architecture that directly processes the categorical values, without the need of any previous transformation. This architecture is also capable of properly mixing numerical and categorical data, in such a manner that all the features adopt the same weight. The proposed implementation is scalable and the corresponding learning algorithm is described in detail. Finally, we demonstrate the effectiveness of the presented algorithm by applying it to several well-known datasets. 相似文献
This paper presents a methodology to estimate the future success of a collaborative recommender in a citizen web portal. This methodology consists of four stages, three of them are developed in this study. First of all, a user model, which takes into account some usual characteristics of web data, is developed to produce artificial data sets. These data sets are used to carry out a clustering algorithm comparison in the second stage of our approach. This comparison provides information about the suitability of each algorithm in different scenarios. The benchmarked clustering algorithms are the ones that are most commonly used in the literature: c-Means, Fuzzy c-Means, a set of hierarchical algorithms, Gaussian mixtures trained by the expectation-maximization algorithm, and Kohonen's self-organizing maps (SOM). The most accurate clustering is yielded by SOM. Afterwards, we turn to real data. The users of a citizen web portal (Infoville XXI, http://www.infoville.es) are clustered. The clustering achieved enables us to study the future success of a collaborative recommender by means of a prediction strategy. New users are recommended according to the cluster in which they have been classified. The suitability of the recommendation is evaluated by checking whether or not the recommended objects correspond to those actually selected by the user. The results show the relevance of the information provided by clustering algorithms in this web portal, and therefore, the relevance of developing a collaborative recommender for this web site. 相似文献
The self-organizing map (SOM) is a very popular unsupervised neural-network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are related to the static architecture of this model as well as to the limited capabilities for the representation of hierarchical relations of the data. With our novel growing hierarchical SOM (GHSOM) we address both limitations. The GHSOM is an artificial neural-network model with hierarchical architecture composed of independent growing SOMs. The motivation was to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data. Furthermore, by providing a global orientation of the independently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated. The benefits of this novel neural network are a problem-dependent architecture and the intuitive representation of hierarchical relations in the data. This is especially appealing in explorative data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion. 相似文献
In this paper, a new hierarchical color quantization method based on self-organizing maps that provides different levels of quantization is presented. Color quantization (CQ) is a typical image processing task, which consists of selecting a small number of code vectors from a set of available colors to represent a high color resolution image with minimum perceptual distortion. Several techniques have been proposed for CQ based on splitting algorithms or cluster analysis. Artificial neural networks and, more concretely, self-organizing models have been usually utilized for this purpose. The self-organizing map (SOM) is one of the most useful algorithms for color image quantization. However, it has some difficulties related to its fixed network architecture and the lack of representation of hierarchical relationships among data. The growing hierarchical SOM (GHSOM) tries to face these problems derived from the SOM model. The architecture of the GHSOM is established during the unsupervised learning process according to the input data. Furthermore, the proposed color quantizer allows the evaluation of different color quantization rates under different codebook sizes, according to the number of levels of the generated neural hierarchy. The experimental results show the good performance of this approach compared to other quantizers based on self-organization. 相似文献
Web sites contain an ever increasing amount of information within their pages. As the amount of information increases so does
the complexity of the structure of the web site. Consequently it has become difficult for visitors to find the information
relevant to their needs. To overcome this problem various clustering methods have been proposed to cluster data in an effort
to help visitors find the relevant information. These clustering methods have typically focused either on the content or the
context of the web pages. In this paper we are proposing a method based on Kohonen’s self-organizing map (SOM) that utilizes
both content and context mining clustering techniques to help visitors identify relevant information quicker. The input of
the content mining is the set of web pages of the web site whereas the source of the context mining is the access-logs of
the web site. SOM can be used to identify clusters of web sessions with similar context and also clusters of web pages with
similar content. It can also provide means of visualizing the outcome of this processing. In this paper we show how this two-level
clustering can help visitors identify the relevant information faster. This procedure has been tested to the access-logs and
web pages of the Department of Informatics and Telecommunications of the University of Athens. 相似文献
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability. 相似文献
Fraudulent financial reporting (FFR) involves conscious efforts to mislead others regarding the financial condition of a business. It usually consists of deliberate actions to deceive regulators, investors or the general public that also hinder systematic approaches from effective detection. The challenge comes from distinguishing dichotomous samples that have their major attributes falling in the same distribution. This study pioneers a novel dual GHSOM (Growing Hierarchical Self-Organizing Map) approach to discover the topological patterns of FFR, achieving effective FFR detection and feature extraction. Specifically, the proposed approach uses fraudulent samples and non-fraudulent samples to train a pair of dual GHSOMs under the same training parameters and examines the hypotheses for counterpart relationships among their subgroups taking advantage of unsupervised learning nature and growing hierarchical structures from GHSOMs. This study further presents (1) an effective classification rule to detect FFR based on the topological patterns and (2) an expert-competitive feature extraction mechanism to capture the salient characteristics of fraud behaviors. The experimental results against 762 annual financial statements from 144 public-traded companies in Taiwan (out of which 72 are fraudulent and 72 are non-fraudulent) reveal that the topological pattern of FFR follows the non-fraud-central spatial relationship, as well as shows the promise of using the topological patterns for FFR detection and feature extraction. 相似文献
The prevention of subscriber churn through customer retention is a core issue of Customer Relationship Management (CRM). By minimizing customer churn a company maximizes its profit. This paper proposes a hybridized architecture to deal with customer retention problems. It does so not only through predicting churn probability but also by proposing retention policies. The architecture works in two modes: learning and usage.
In the learning mode, the churn model learner seeks potential associations from the subscriber database. This historical information is used to form a churn model. This mode also calls for a policy model constructor to use the attributes identified in the churn model to divide all ‘churners’ into distinct groups. The policy model constructor is also responsible for developing a policy model for each churner group. In the usage mode, a churn predictor uses the churn model to predict the churn probability of a given subscriber. When the churn model finds that the subscriber has a high churn probability the policy model is used to suggest specific retention policies.
This study’s experiments show that the churn model has an evaluation accuracy of approximately eighty-five percent. This suggests that policy model construction represents an interesting and important technique in investigating the characteristics of churner groups. Furthermore, this study indicates that understanding the relationships between churns is essential in creating effective retention policy models for dealing with ‘churners’. 相似文献