A new concise representation of frequent itemsets using generators and a positive border |
| |
Authors: | Guimei Liu Jinyan Li Limsoon Wong |
| |
Affiliation: | (1) School of Computing, National University of Singapore, COM1, Law Link, Singapore, 117590, Singapore;(2) School of Computer Engineering, Nanyang Technological University, Singapore, Singapore |
| |
Abstract: | A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low
or when the database is dense. Several concise representations have been previously proposed to eliminate the redundancy.
Generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets
on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive
border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude
smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger
than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of
this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive
borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search
space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining
algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for
mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.
|
| |
Keywords: | Frequent itemset mining Generator Concise representation Positive border Data mining |
本文献已被 SpringerLink 等数据库收录! |
|