首页 | 本学科首页   官方微博 | 高级检索  
     

有效的不确定数据概率频繁项集挖掘算法
引用本文:刘浩然,刘方爱,李旭,王记伟.有效的不确定数据概率频繁项集挖掘算法[J].计算机应用,2015,35(6):1757-1761.
作者姓名:刘浩然  刘方爱  李旭  王记伟
作者单位:山东师范大学 信息科学与工程学院, 济南 250014
基金项目:国家自然科学基金资助项目,山东省自然科学基金资助项目,山东省科技发展计划项目
摘    要:针对已有概率频繁项集挖掘算法采用模式增长的方式构建树时产生大量树节点,导致内存空间占用较大以及发现概率频繁项集效率低等问题,提出了改进的不确定数据频繁模式增长(PUFP-Growth)算法。该算法通过逐条读取不确定事务数据库中数据,构造类似频繁模式树(FP-Tree)的紧凑树结构,同时更新项头表中保存所有尾节点相同项集的期望值的动态数组。当所有事务数据插入到改进的不确定数据频繁模式树(PUFP-Tree)中以后,通过遍历数组得到所有的概率频繁项集。最后通过实验结果和理论分析表明:PUFP-Growth算法可以有效地发现概率频繁项集;与不确定数据频繁模式增长(UF-Growth)算法和压缩的不确定频繁模式挖掘(CUFP-Mine)算法相比,提出的PUFP-Growth算法能够提高不确定数据概率频繁项集挖掘的效率,并且减少了内存空间的使用。

关 键 词:数据挖掘  不确定数据  可能世界模型  概率频繁项集  频繁模式  
收稿时间:2014-12-22
修稿时间:2015-03-23

Efficient mining algorithm for uncertain data in probabilistic frequent itemsets
LIU Haoran,LIU Fang'ai,LI Xu,WANG Jiwei.Efficient mining algorithm for uncertain data in probabilistic frequent itemsets[J].journal of Computer Applications,2015,35(6):1757-1761.
Authors:LIU Haoran  LIU Fang'ai  LI Xu  WANG Jiwei
Affiliation:College of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250014, China
Abstract:When using the way of pattern growth to construct tree structure, the exiting algorithms for mining probabilistic frequent itemsets suffer many problems, such as generating large number of tree nodes, occupying large memory space and having low efficiency. In order to solve these problems, a Progressive Uncertain Frequent Pattern Growth algorithm named PUFP-Growth was proposed. By the way of reading data in the uncertain database tuple by tuple, the proposed algorithm constructed tree structure as compact as Frequent Pattern Tree (FP-Tree) and updated dynamic array of expected value whose header table saved the same itemsets. When all transactions were inserted into the Progressive Uncertain Frequent Pattern tree (PUFP-Tree), all the probabilistic frequent itemsets could be mined by traversing the dynamic array. The experimental results and theoretical analysis show that PUFP-Growth algorithm can find the probabilistic frequent itemsets effectively. Compared with the Uncertain Frequent pattern Growth (UF-Growth) algorithm and Compressed Uncertain Frequent-Pattern Mine (CUFP-Mine) algorithm, the proposed PUFP-Growth algorithm can improve mining efficiency of probabilistic frequent itemsets on uncertain dataset and reduce memory usage to a certain degree.
Keywords:data mining  uncertain data  possible world model  probabilistic frequent itemset  frequent pattern
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号