首页 | 本学科首页   官方微博 | 高级检索  
     

频繁和高效用项集挖掘
引用本文:李 慧,刘贵全,瞿春燕. 频繁和高效用项集挖掘[J]. 计算机科学, 2015, 42(5): 82-87, 123
作者姓名:李 慧  刘贵全  瞿春燕
作者单位:中国科技大学计算机与技术学院 合肥 230000
基金项目:本文受中央高校基本科研基金(WK2100100021),国家科技支撑计划(2012BAH17B03),安徽省自主创新专项-智能语音技术研发和产业化专项(13Z02008-5)资助
摘    要:对从事务数据库中挖掘有意义的项集的研究已超过10年.然而,大多数的研究要么使用频繁度或支持度(如频繁项集挖掘),要么使用效用值或利润(如高效用项集挖掘)作为主要的衡量标准.单独使用这两种衡量方式都有各自的局限性,比如频繁度很高的项集其效用值有可能很低,而效用值很高的项集其频繁度往往很低,将这些项集推荐给用户没有意义.将这两种衡量标准综合考虑,希望找出那些频繁度和效用值都很高的项集.该项工作最大的挑战是效用值既不满足单调性也不满足反单调性.因此,提出了高效算法FHIMA.FHIMA采用PrefixSpan的思想,挖掘时能避免产生非频繁的候选项集.此外,还根据效用和质量上界的一些性质,有效地缩小了搜索空间,极大地提高了FHIMA算法的效率.

关 键 词:Top-k  频繁  高效用  高质量项集

Mining Frequent and High Utility Itemsets
LI Hui,LIU Gui-quan and QU Chun-yan. Mining Frequent and High Utility Itemsets[J]. Computer Science, 2015, 42(5): 82-87, 123
Authors:LI Hui  LIU Gui-quan  QU Chun-yan
Affiliation:School of Computer Science and Technology,University of Science and Technology of China,Hefei 230000,China,School of Computer Science and Technology,University of Science and Technology of China,Hefei 230000,China and School of Computer Science and Technology,University of Science and Technology of China,Hefei 230000,China
Abstract:Mining interesting itemsets from transaction database has attracted a lot of research work for more than a decade.However,most of these studies either use frequency/support(e.g.,frequent itemset mining) or utility/profit (e.g.,high utility itemset mining) as the key interestingness measure.In other words,these two measures are consi-dered individually,which leads to some shortages that frequent itemsets may have low profit,or high profit itemsets may have very low support,so it is meaningless to recommend these itemsets to users.To this end,we considered these two measures from a unified perspective.Specifically,we proposed to identify the qualified itemsets which are both frequent and high utility.The key challenge to these problems is that the value of utility does not change monotonically when we add more items to a given itemset.Thus,we proposed an efficient algorithm named FHIMA (Frequent and High utility Itemset Mining Algorithms),where an effective upper bound based on frequency and utility is designed to further prune the search space.Moreover,FHIMA incorporates the idea of Prefixspan to avoid generating candidates,thus leading to high efficiency.Finally,the experiment results demonstrate the efficiency of FHIMA on real and synthetic datasets.
Keywords:Top-k  Frequent  High utility  Qualified itemsets
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号