首页 | 本学科首页   官方微博 | 高级检索  
     

基于磁盘表存储FP-TREE的关联规则挖掘算法
引用本文:申彦,宋顺林,朱玉全.基于磁盘表存储FP-TREE的关联规则挖掘算法[J].计算机研究与发展,2012,49(6):1313-1322.
作者姓名:申彦  宋顺林  朱玉全
作者单位:1. 江苏大学信息管理与信息系统系 江苏镇江212013;江苏大学计算机科学与通信工程系 江苏镇江212013
2. 江苏大学计算机科学与通信工程系 江苏镇江212013
基金项目:国家自然科学基金项目,国家科技支撑计划基金项目,江苏省基础研究计划基金项目,博士研究生创新计划基金项目,江苏大学高级人才基金项目
摘    要:随着现实待挖掘数据库规模不断增长,系统可使用的内存成为用FP-GROWTH算法进行关联规则挖掘的瓶颈.为了摆脱内存的束缚,对大规模数据库中的数据进行关联规则挖掘,基于磁盘的关联规则挖掘成为重要的研究方向.对此,改进原始的FP-TREE数据结构,提出了一种新颖的基于磁盘表的DTRFP-GROWTH(disk table resident FP-TREE growth)算法.该算法利用磁盘表存储FP-TREE,降低内存使用,在传统FP-GROWTH算法占用过多内存、挖掘工作无法进行时,以独特的磁盘表存储FP-TREE技术,减少内存使用,能够继续完成挖掘工作,适合空间性能优先的场合.不仅如此,该算法还将关联规则挖掘和关系型数据库整合,克服了基于文件系统相关算法效率较低、开发难度较大等问题.在真实数据集上进行了验证实验以及性能分析.实验结果表明,在内存空间有限的情况下,DTRFP-GROWTH算法是一种有效的基于磁盘的关联规则挖掘算法.

关 键 词:FP-TREE  关联规则  磁盘存储  频繁项目集  DTRFP-GROWTH算法  FP-GROWTH算法  数据挖掘

Mining Algorithm of Association Rules Based on Disk Table Resident FP-TREE
Shen Yan , Song Shunlin , Zhu Yuquan.Mining Algorithm of Association Rules Based on Disk Table Resident FP-TREE[J].Journal of Computer Research and Development,2012,49(6):1313-1322.
Authors:Shen Yan  Song Shunlin  Zhu Yuquan
Affiliation:1(Department of Information Management and Information System,Jiangsu University,Zhenjiang,Jiangsu 212013) 2(Department of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang,Jiangsu 212013)
Abstract:As the size of the database to be mined is increasing constantly,the size of physical memory available has become a bottleneck when using FP-GROWTH algorithm for association rules mining.So,it is necessary to tackle space scalability by some new algorithms in order to mine association rules in huge database.Nowadays,disk-resident algorithm has become the main target.Therefore,the original data structure of FP-TREE is improved and a novel algorithm called DTRFP-GROWTH(disk table resident FP-TREE growth) is presented.This algorithm uses disk table for storing FP-TREE to decrease memory usage.When the mining works failed for FP-GROWTH using too much memory,DTRFP-GROWTH can continue to mine association rules from huge database by its special skill called disk table resident FP-TREE,which is suitable to occasions of space performance priority.In addition,this algorithm also integrates association rules mining with RDBMS system.It overcomes the problems of some relative solutions based on file system,such as low performance,high difficulty in development,etc.The correctness verification and performance analysis of the above mentioned algorithm are presented.The experiment results on real world data sets also support effectiveness of this algorithm.The study shows that in memory limited occasion,DTRFP-GROWTH is an effective association rules mining algorithm based on disk.
Keywords:FP-TREE  association rules  disk resident  frequent itemsets  DTRFP-GROWTH  FP-GROWTH  data mining
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号