首页 | 本学科首页   官方微博 | 高级检索  
     

基于R-list的Top-K高效用项集挖掘算法
引用本文:何登平,何宗浩.基于R-list的Top-K高效用项集挖掘算法[J].计算机工程与科学,2019,41(7):1318-1324.
作者姓名:何登平  何宗浩
作者单位:;1.重庆邮电大学通信与信息工程学院;2.重庆邮电大学通信新技术应用研究中心;3.重庆信科设计有限公司
摘    要:针对现有的一阶段Top-K高效用项集挖掘算法挖掘过程中阈值提升慢,迭代时生成大量候选项集造成内存占用过多等问题,提出一种基于重用链表(R-list)的Top-K高效用挖掘算法RHUM。使用一种新的数据结构R-list来存储并快速访问项集信息,无需第2次扫描数据库进行项集挖掘。该算法重用内存以保存候选集信息,结合改进的RSD阈值提升策略对数据进行预处理,期间采用更严格的剪枝参数在递归搜索的过程中同时计算多个项集的效用来缩小搜索空间。在不同类型数据集中的实验结果表明:RHUM算法在内存效率方面均优于其他一阶段算法,且在K值变化时能保持稳定。

关 键 词:高效用项集  一阶段挖掘  重用链表  数据挖掘  Top-K
收稿时间:2018-10-18
修稿时间:2019-07-25

A top-k high utility itemset mining algorithm based on R-list
HE Deng ping,HE Zong hao.A top-k high utility itemset mining algorithm based on R-list[J].Computer Engineering & Science,2019,41(7):1318-1324.
Authors:HE Deng ping  HE Zong hao
Affiliation:(1.School of Telecommunications and Information Engineering, Chongqing University of Posts and Telecommunications,Chongqing 400065; 2.Research Center of New Telecommunication Technology Applications, Chongqing University of Posts and Telecommunications,Chongqing 400065; 3.Chongqing Information Technology Designing Company Limited,Chongqing 401121,China)
Abstract:Aiming at the problem that the existing one-phase top-k high utility itemset mining algorithm is slow to raise the threshold and generates a large number of candidate sets, thus occupying too large memory space during the iteration, we propose a top-k high utility mining algorithm RHUM based on reused list (R-list). This algorithm uses a new data structure called R-list to store and quickly access itemset information without having to scan the database a second time for mining. It reuses the memory to save the information of candidate sets, and preprocesses data jointly with the improved RSD threshold increment strategy. During the recursive search process, stricter pruning parameters are used to calculate the effect of multiple item sets simultaneously to narrow the search space. Experimental results on different types of data sets show that the RHUM is superior to other one phase algorithms in memory efficiency and stable under the change of K value.
Keywords:high utility item set  one-phase mining  R-list  data mining  top-K  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号