期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adjusting Fuzzy Similarity Functions for use with standard data mining tools

Avichai Meged Author VitaeRoy GelbardAuthor Vitae 《Journal of Systems and Software》2011,84(12):2374-2383

Data mining is crucial in many areas and there are ongoing efforts to improve its effectiveness in both the scientific and the business world. There is an obvious need to improve the outcomes of mining techniques such as clustering and other classifiers without abandoning the standard mining tools that are popular with researchers and practitioners alike. Currently, however, standard tools do not have the flexibility to control similarity relations between attribute values, a critical feature in improving mining-clustering results. The study presented here introduces the Similarity Adjustment Model (SAM) where adjusted Fuzzy Similarity Functions (FSF) control similarity relations between attribute values and hence ameliorate clustering results obtained with standard data mining tools such as SPSS and SAS. The SAM draws on principles of binary database representation models and employs FSF adjusted via an iterative learning process that yields improved segmentation regardless of the choice of mining-clustering algorithm. The SAM model is illustrated and evaluated on three common datasets with the standard SPSS package. The datasets were run with several clustering algorithms. Comparison of “Naïve” runs (which used original data) and “Fuzzy” runs (which used SAM) shows that the SAM improves segmentation in all cases. 相似文献

2.

Towards supporting expert evaluation of clustering results using a data mining process model

Kweku-Muata Osei-Bryson 《Information Sciences》2010,180(3):414-47

Clustering is a popular non-directed learning data mining technique for partitioning a dataset into a set of clusters (i.e. a segmentation). Although there are many clustering algorithms, none is superior on all datasets, and so it is never clear which algorithm and which parameter settings are the most appropriate for a given dataset. This suggests that an appropriate approach to clustering should involve the application of multiple clustering algorithms with different parameter settings and a non-taxing approach for comparing the various segmentations that would be generated by these algorithms. In this paper we are concerned with the situation where a domain expert has to evaluate several segmentations in order to determine the most appropriate segmentation (set of clusters) based on his/her specified objective(s). We illustrate how a data mining process model could be applied to address this problem. 相似文献

3.

A novel approach to fuzzy clustering based on a dissimilarity relation extracted from data using a TS system

Mario G.C.A. Cimino Author Vitae Author Vitae Francesco Marcelloni^{Author Vitae} 《Pattern recognition》2006,39(11):2077-2091

相似文献

4.

A new approach on search for similar documents with multiple categories using fuzzy clustering

R&#x;dvan Saraolu Kemal Tütüncü Novruz Allahverdi 《Expert systems with applications》2008,34(4):2545-2554

Searching for similar document has an important role in text mining and document management. In whether similar document search or in other text mining applications generally document classification is focused and class or category that the documents belong to is tried to be determined. The aim of the present study is the investigation of the case which includes the documents that belong to more than one category. The system used in the present study is a similar document search system that uses fuzzy clustering. The situation of belonging to more than one category for the documents is included by this system. The proposed approach consists of two stages to solve multicategories problem. The first stage is to find out the documents belonging to more than one category. The second stage is the determination of the categories to which these found documents belong to. For these two aims -threshold Fuzzy Similarity Classification Method (-FSCM) and Multiple Categories Vector Method (MCVM) are proposed as written order. Experimental results showed that proposed system can distinguish the documents that belong to more than one category efficiently. Regarding to the finding which documents belong to which classes, proposed system has better performance and success than the traditional approach. 相似文献

5.

An information granulation based data mining approach for classifying imbalanced data 总被引：2，自引：0，他引：2

Mu-Chen Chen Long-Sheng Chen 《Information Sciences》2008,178(16):3214-3227

Recently, the class imbalance problem has attracted much attention from researchers in the field of data mining. When learning from imbalanced data in which most examples are labeled as one class and only few belong to another class, traditional data mining approaches do not have a good ability to predict the crucial minority instances. Unfortunately, many real world data sets like health examination, inspection, credit fraud detection, spam identification and text mining all are faced with this situation. In this study, we present a novel model called the “Information Granulation Based Data Mining Approach” to tackle this problem. The proposed methodology, which imitates the human ability to process information, acquires knowledge from Information Granules rather then from numerical data. This method also introduces a Latent Semantic Indexing based feature extraction tool by using Singular Value Decomposition, to dramatically reduce the data dimensions. In addition, several data sets from the UCI Machine Learning Repository are employed to demonstrate the effectiveness of our method. Experimental results show that our method can significantly increase the ability of classifying imbalanced data. 相似文献

6.

A global clustering approach to point cloud simplification with a specified data reduction ratio 总被引：2，自引：0，他引：2

Hao Song 《Computer aided design》2008,40(3):281-292

This paper studies the problem of point cloud simplification by searching for a subset of the original input data set according to a specified data reduction ratio (desired number of points). The unique feature of the proposed approach is that it aims at minimizing the geometric deviation between the input and simplified data sets. The underlying simplification principle is based on clustering of the input data set. The cluster representation essentially partitions the input data set into a fixed number of point clusters and each cluster is represented by a single representative point. The set of the representatives is then considered as the simplified data set and the resulting geometric deviation is evaluated against the input data set on a cluster-by-cluster basis. Due to the fact that the change to a representative selection only affects the configuration of a few neighboring clusters, an efficient scheme is employed to update the overall geometric deviation during the search process. The search involves two interrelated steps. It first focuses on a good layout of the clusters and then on fine tuning the local composition of each cluster. The effectiveness and performance of the proposed approach are validated and illustrated through case studies using synthetic as well as practical data sets. 相似文献

7.

Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set

Marie Plasse Ndeye Niang Alexandre Villeminot 《Computational statistics & data analysis》2007,52(1):596-613

A method to analyse links between binary attributes in a large sparse data set is proposed. Initially the variables are clustered to obtain homogeneous clusters of attributes. Association rules are then mined in each cluster. A graphical comparison of some rule relevancy indexes is presented. It is used to extract best rules depending on the application concerned. The proposed methodology is illustrated by an industrial application from the automotive industry with more than 80 000 vehicles each described by more than 3000 rare attributes. 相似文献

8.

Dynamic data mining technique for rules extraction in a process of battery charging

R.A. Aliev R.R. Aliev B. Guirimov K. Uyar 《Applied Soft Computing》2008,8(3):1252-1258

Battery charging controllers design and application is a growing industry direction. Fast and efficient charging of battery packs is a problem which is difficult and often expensive to solve using conventional techniques. The majority of existing works on intelligent charging systems are based on expert knowledge and heuristics. Not all features of the desired charging behavior can be attained by the hard-wired logic implemented by expert generated rules. Because the battery charging is a highly dynamic process and the chemical technology a battery uses varies significantly for different battery types, data mining technique can be of real importance for extracting the charging rules from the large databases, especially when the charging logic is to be continuously changed during the life of the battery dependent on the type and characteristics of the battery and utilization conditions. In this paper we use soft computing-based data mining technique for extraction of control rules for effective and fast battery charging process. The obtained rules were used for NiCd battery charging. The comparative performance evaluation was done among the existing charging control methods and the proposed system, which demonstrated a significant increase of performance (minimum charging time and minimum overheating) using the soft computing-based approach. 相似文献

9.

KEEL: a software tool to assess evolutionary algorithms for data mining problems 总被引：4，自引：6，他引：4

J. Alcalá-Fdez L. Sánchez S. García M. J. del Jesus S. Ventura J. M. Garrell J. Otero C. Romero J. Bacardit V. M. Rivas J. C. Fernández F. Herrera 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2009,13(3):307-318

This paper introduces a software tool named KEEL which is a software tool to assess evolutionary algorithms for Data Mining problems of various kinds including as regression, classification, unsupervised learning, etc. It includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL, as well as the integration of evolutionary learning techniques with different pre-processing techniques, allowing it to perform a complete analysis of any learning model in comparison to existing software tools. Moreover, KEEL has been designed with a double goal: research and educational. Supported by the Spanish Ministry of Science and Technology under Projects TIN-2005-08386-C05-(01, 02, 03, 04 and 05). The work of Dr. Bacardit is also supported by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant GR/T07534/01. 相似文献