首页 | 本学科首页   官方微博 | 高级检索  
     


BitTableFI: An efficient mining frequent itemsets algorithm
Affiliation:1. Fraunhofer Institute for Applied Information Technology FIT, Schloss Birlinghoven, DE-53754 Sankt Augustin, Germany;2. Information Systems and Database Technology, RWTH Aachen University, DE-52056 Aachen, Germany;3. Department of Computer Science & Engineering, University of Dhaka, Dhaka-1000, Bangladesh;4. Faculty of Information Technology, University of Jyvaskyla, FI-40014 University of Jyvaskyla, Finland;1. Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230039, China;2. University of Chinese Academy of Sciences, Beijing 100080, China;3. University of Science and Technology of China, Heifei 230026, China;4. Google Research, Mountain View, CA 94043, USA
Abstract:Mining frequent itemsets in transaction databases, time-series databases and many other kinds of databases is an important task and has been studied popularly in data mining research. The problem of mining frequent itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying those itemsets that meet the frequent itemset requirement within this candidate set. Most of the previous research mainly focuses on pruning to reduce the candidate itemsets amounts and the times of scanning databases. However, many algorithms adopt an Apriori-like candidate itemsets generation and support count approach that is the most time-wasted process. To address this issue, the paper proposes an effective algorithm named as BitTableFI. In the algorithm, a special data structure BitTable is used horizontally and vertically to compress database for quick candidate itemsets generation and support count, respectively. The algorithm can also be used in many Apriori-like algorithms to improve the performance. Experiments with both synthetic and real databases show that BitTableFI outperforms Apriori and CBAR which uses ClusterTable for quick support count.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号