首页 | 本学科首页   官方微博 | 高级检索  
     

基于模式增长的不确定数据的频繁模式挖掘算法
引用本文:王乐,常艳芬,王水.基于模式增长的不确定数据的频繁模式挖掘算法[J].计算机应用,2015,35(7):1921-1926.
作者姓名:王乐  常艳芬  王水
作者单位:宁波大红鹰学院 信息工程学院, 浙江 宁波 315175
基金项目:国家自然科学基金资助项目(61370200);宁波市自然科学基金资助项目(2013A610115, 2014A610073);浙江省教育厅一般科研项目(Y201432717);宁波大红鹰学院大宗商品专项课题(1320133004)。
摘    要:为提高不确定数据频繁模式(FP)挖掘算法的时空效率,提出了基于最大概率的不确定频繁模式挖掘(UFPM-MP)算法。首先,利用事务项集中的最大概率值预估期望支持数;然后,使用该期望支持数与最小期望支持数阈值进行比较,以确定某一项集是否为候选频繁项集,并对候选项集建立子树以递归挖掘频繁模式。实验中,UFPM-MP算法与AT-Mine算法进行了对比,并在6个典型的数据集上进行实验验证。实验结果表明,UFPM-MP算法的时空效率得到了提高,稀疏数据集上提高约30%,稠密数据集上的效率提高更为明显(约3~4倍)。预估期望支持数的策略有效地减少了子树和头表项的数量,从而提高了算法的时空效率;且最小期望支持数越小,或需要挖掘的频繁模式越多的时候,算法的时间效率提高越多。

关 键 词:不确定数据  频繁模式  频繁项集  模式增长  
收稿时间:2015-02-02
修稿时间:2015-04-02

Frequent pattern mining algorithm from uncertain data based on pattern-growth
WANG Le,CHANG Yanfeng,WANG Shui.Frequent pattern mining algorithm from uncertain data based on pattern-growth[J].journal of Computer Applications,2015,35(7):1921-1926.
Authors:WANG Le  CHANG Yanfeng  WANG Shui
Affiliation:School of Information Engineering, Ningbo Dahongying University, Ningbo Zhejiang 315175, China
Abstract:To improve the time and space efficiency of Frequent Pattern (FP) mining algorithm over uncertain dataset, the Uncertain Frequent Pattern Mining based on Max Probability (UFPM-MP) algorithm was proposed. First, the expected support number was estimated using maximum probability of the transaction itemset. Second, by comparing this expected support number to the minimum expected support number threshold, the candidate frequent itemsets were identified. Finally, the corresponding sub-trees were built for recursively mining frequent patterns. The UFPM-MP algorithm was tested on 6 classical datasets against the state-of-the-art algorithm AT (Array based tail node Tree structure)-Mine with positive results (about 30% improvement for sparse datasets, and 3-4 times more efficient for dense datasets). The expected support number estimation strategy effectively reduces the number of sub-trees and items of header table, and improves the algorithm's time and space efficiency; and when the minimum expected support threshold is low or there are lots of potential frequent patterns, time efficiency of the proposed algorithm performs more remarkably.
Keywords:uncertain data                                                                                                                        Frequent Pattern (FP)                                                                                                                        frequent itemset                                                                                                                        pattern-growth
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号