首页 | 本学科首页   官方微博 | 高级检索  
     

不确定数据频繁闭项集挖掘算法
引用本文:刘慧婷,沈盛霞,赵鹏,姚晟.不确定数据频繁闭项集挖掘算法[J].计算机应用,2015,35(10):2911-2914.
作者姓名:刘慧婷  沈盛霞  赵鹏  姚晟
作者单位:安徽大学 计算机科学与技术学院, 合肥 230601
基金项目:国家自然科学基金资助项目(61202227);安徽省自然科学基金资助项目(1408085MF122)。
摘    要:由于不确定数据的向下封闭属性,挖掘全部频繁项集的方法会得到一个指数级的结果。为获得一个较小的合适的结果集,研究了在不确定数据上挖掘频繁闭项集,并提出了一种新的频繁闭项集挖掘算法——NA-PFCIM。该算法将项集挖掘过程看作一个概率分布函数,考虑到基于正态分布模型的方法提取的频繁项集精确度较高,而且支持大型数据库,采用了正态分布模型提取频繁项集。同时,为了减少搜索空间以及避免冗余计算,利用基于深度优先搜索的策略来获得所有的概率频繁闭项集。该算法还设计了两个剪枝策略:超集修剪和子集修剪。最后,在常用的数据集(T10I4D100K、Accidents、Mushroom、Chess)上,将提出的NA-PFCIM算法和基于泊松分布的A-PFCIM算法进行比较。实验结果表明,NA-PFCIM算法能够减少所要扩展的项集,同时减少项集频繁概率的计算,其性能优于对比算法。

关 键 词:不确定数据  频繁项集  频繁闭项集  剪枝策略  正态分布  
收稿时间:2015-05-08
修稿时间:2015-08-07

Frequent closed itemset mining algorithm over uncertain data
LIU Huiting,SHEN Shengxia,ZHAO Peng,YAO Sheng.Frequent closed itemset mining algorithm over uncertain data[J].journal of Computer Applications,2015,35(10):2911-2914.
Authors:LIU Huiting  SHEN Shengxia  ZHAO Peng  YAO Sheng
Affiliation:College of Computer Science and Technology, Anhui University, Hefei Anhui 230601, China
Abstract:Due to the downward closure property over uncertain data, existing solutions of mining all the frequent itemsets may lead an exponential number of results. In order to obtain a reasonable result set with small size, frequent closed itemsets discovering over uncertain data were studied, and a new algorithm called Normal Approximation-based Probabilistic Frequent Closed Itemsets Mining (NA-PFCIM) was proposed. The new method regarded the itemset mining process as a probability distribution function, and mined frequent itemsets by using the normal distribution model which supports large databases and can extract frequent itemsets with a high degree of accuracy. Then, the algorithm adopted the depth-first search strategy to obtain all probabilistic frequent closed itemsets, so as to reduce the search space and avoid redundant computation. Two probabilistic pruning techniques including superset pruning and subset pruning were also used in this method. Finally, the effectiveness and efficiency of the proposed methods were verified by comparing with the Possion distribution based algorithm called A-PFCIM. The experimental results show that NA-PFCIM can decrease the number of extending itemsets and reduce the complexity of calculation, it has better performance than the compared algorithm.
Keywords:uncertain data                                                                                                                        frequent itemset                                                                                                                        frequent closed itemset                                                                                                                        pruning strategy                                                                                                                        normal distribution
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号