首页 | 本学科首页   官方微博 | 高级检索  
     

基于多效用阈值的分布式高效用序列模式挖掘
引用本文:曾毅,张福泉.基于多效用阈值的分布式高效用序列模式挖掘[J].计算机工程与设计,2020,41(2):449-457.
作者姓名:曾毅  张福泉
作者单位:广西大学行健文理学院 计算机与信息工程系,广西 南宁 530005;北京理工大学 计算机学院,北京 100081
基金项目:广西壮族自治区项目;福建省科技厅引导性基金项目
摘    要:针对序列模式的高效用模式挖掘过程中搜索空间大、计算复杂度高的问题,提出一种基于多效用阈值的分布式高效用序列模式挖掘算法。采用数组结构保存模式的效用信息,解决效用矩阵导致的内存消耗大的缺点。设计1-项集与2-项集的深度剪枝策略,深入地缩小候选模式的搜索空间,减少搜索时间成本与缓存成本。提出挖掘算法的分布式实现方案,通过并行处理进一步降低模式挖掘的时间。基于中等规模与大规模的序列数据集分别进行实验,实验结果表明,该算法有效减少了候选模式的数量,降低了挖掘的时间成本与存储成本,对于大数据集表现出较好的可扩展能力与稳定性。

关 键 词:序列模式  大数据  高效用模式挖掘  分布式计算  频繁项集  剪枝策略

Distributed high utility sequence pattern mining based on multi utility thresholds
ZENG Yi,ZHANG Fu-quan.Distributed high utility sequence pattern mining based on multi utility thresholds[J].Computer Engineering and Design,2020,41(2):449-457.
Authors:ZENG Yi  ZHANG Fu-quan
Affiliation:(Computer and Information Engineering Department,Guangxi University Xingjian College of Science and Liberal Arts,Nanning 530005,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
Abstract:Aiming at the problems of large search space and high computational complexity of high utility pattern mining for sequence patterns,a distributed high utility sequence pattern mining algorithm based on multi utility thresholds was proposed.Structure of arrays was adopted to store utility information of patterns,and the disadvantage of large memory consumption of utility matrix was resolved.Deep mining strategies for one-itemset and two-itemsets were designed,and the search space of candidate patterns was reduced deeply,so that time cost and memory cost were both reduced.The distributed implementation schema for the mining algorithm was proposed,further,the patterns mining time was reduced through the parallel process.Experiments were done based on middle scale and large scale sequence datasets respectively,the proposed algorithm reduces the number of candidate patterns,mining time and storage effectively,and it performs good scalability and stability for big datasets.
Keywords:sequence pattern  big data  high utility pattern mining  distributed computing  frequent items  pruning strategy
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号