Distributed Mining of Maximal Frequent Itemsets on a Data Grid System |
| |
Authors: | Congnan Luo Anil L. Pereira Soon M. Chung |
| |
Affiliation: | (1) Department of Computer Science and Engineering, Wright State University, Dayton, Ohio 45435, USA |
| |
Abstract: | In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms, and its performance is scalable in terms of the database size and the number of nodes. This research was supported in part by LexisNexis, NCR and AFRL/Wright Brothers Institute (WBI). |
| |
Keywords: | Data Grid distributed data mining maximal frequent itemsets association rules scalability |
本文献已被 SpringerLink 等数据库收录! |
|