A new algorithm for fast mining frequent itemsets using N-lists期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A new algorithm for fast mining frequent itemsets using N-lists

Authors:	DENG ZhiHong WANG ZhongHui & JIANG JiaJian

Affiliation:	Key Laboratory of Machine Perception(Ministry of Education),School of Electronics Engineering and Computer Science,Peking University,Beijing 100871,China

Abstract:	Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks.In this paper,we propose a novel vertical data representation called N-list,which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets.Based on the N-list data structure,we develop an efficient mining algorithm,PrePost,for mining all frequent itemsets.Efficiency of PrePost is achieved by the following three reasons.First,N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree.Second,the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy,where m and n are the cardinalities of the two N-lists respectively.Third,PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list.We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets.The experimental results show that the PrePost algorithm is the fastest in most cases.Even though the algorithm consumes more memory when the datasets are sparse,it is still the fastest one.

Keywords:	data mining frequent itemset mining data structure N-lists algorithm
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏