首页 | 本学科首页   官方微博 | 高级检索  
     

改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA
引用本文:杨鹏坤,彭慧,周晓锋,孙玉庆.改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA[J].计算机应用,2015,35(3):775-778.
作者姓名:杨鹏坤  彭慧  周晓锋  孙玉庆
作者单位:1. 中国科学院 物联网研究发展中心, 江苏 无锡 214135; 2. 江苏物联网研究发展中心, 江苏 无锡 214135; 3. 无锡中科泛在信息技术研发中心有限公司, 江苏 无锡 214135; 4. 国网枣庄供电公司, 山东 枣庄 277100
基金项目:国家科技支撑计划项目(2012BAF12B08)
摘    要:针对最大频繁项目集挖掘算法(DMFIA)当候选项目集维数高而最大频繁项目集维数较低的情况下要产生大量的候选项目集的缺点,提出了一种改进的基于频繁模式树(FP-tree)结构的最大频繁项目集挖掘算法--FP-MFIA。该算法根据FP-tree的项目头表,采用自底向上的搜索策略逐层挖掘最大频繁项目集,从而加速每次对候选集计数的操作。在挖掘时根据每层的条件模式基产生维数较低的非频繁项目集,尽早对候选项目集进行剪枝和降维,可大量减少候选项目集的数量。同时在挖掘时充分利用最大频繁项集的性质,减少搜索空间。通过算法在不同支持度下挖掘时间的对比可知,算法FP-MFIA在最小支持度较低的情况下时间效率是DMFIA以及基于降维的最大频繁模式挖掘算法(BDRFI)的2倍以上,说明FP-MFIA在候选集维数较高的时候优势明显。

关 键 词:最大频繁项集    频繁模式树    数据挖掘    关联规则    非频繁项集
收稿时间:2014-09-05
修稿时间:2014-11-25

FP-MFIA: improved algorithm for mining maximum frequent itemsets based on frequent-pattern tree
YANG Pengkun , PENG Hui , ZHOU Xiaofeng , SUN Yuqing.FP-MFIA: improved algorithm for mining maximum frequent itemsets based on frequent-pattern tree[J].journal of Computer Applications,2015,35(3):775-778.
Authors:YANG Pengkun  PENG Hui  ZHOU Xiaofeng  SUN Yuqing
Affiliation:1. Research and Development Center for Internet of Things, Chinese Academy of Sciences, Wuxi Jiangsu 214135, China;
2. Jiangsu Research and Development Center for Internet of Things, Wuxi Jiangsu 214135, China;
3. Chinese Academy of Sciences Ubiquitous Information Technology Research and Development Center Company Limited, Wuxi Jiangsu 214135, China;
4. State Grid Zaozhuang Power Supply Company, Zaozhuang Shandong 277100, China
Abstract:Focusing on the drawback that Discovering Maximum Frequent Itemsets Algorithm (DMFIA) has to generate lots of maximum frequent candidate itemsets in each dimension when given datasets with many candidate items and each maximum frequent itemset is not long, an improved Algorithm for mining Maximum Frequent Itemsets based of Frequent-Pattern tree (FP-MFIA) for mining maximum frequent itemsets based on FP-tree was proposed. According to Htable of FP-tree, this algorithm used bottom-up searches to mine maximum frequent itemsets, thus accelerated the count of candidates. Producing infrequent itemsets with lower dimension according to conditional pattern base of every layer when mining, cutting and reducing dimensions of candidate itemsets can largely reduce the amount of candidate itemsets. At the same time taking full advantage of properties of maximum frequent itemsets will reduce the search space. The time efficiency of FP-MFIA is at least two times as much as the algorithm of DMFIA and BDRFI (algorithm for mining frequent itemsets based on dimensionality reduction of frequent itemset) according to computational time contrast based on different supports. It shows that FP-MFIA has a clear advantage when candidate itemsets are with high dimension.
Keywords:maximum frequent itemset  Frequent Pattern tree (FP-tree)  data mining  association rule  infrequent itemset
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号