首页 | 本学科首页   官方微博 | 高级检索  
     

分布式环境下约束性关联规则的快速挖掘
引用本文:吉根林,韦素云.分布式环境下约束性关联规则的快速挖掘[J].小型微型计算机系统,2007,28(5):882-885.
作者姓名:吉根林  韦素云
作者单位:南京师范大学,计算机科学系,江苏,南京,210097
摘    要:研究人员针对单机环境提出了约束性关联规则的挖掘算法,但它们不适用于分布式环境.为此本文讨论分布式环境下约束性关联规则的快速挖掘技术,提出一种基于分布式环境的约束性关联规则快速挖掘算法DCAR,其中包括局部约束性频繁项目集挖掘算法MLFC和全局约束性频繁项目集挖掘算法MGFC.该算法根据布尔约束条件产生向导集,采用一种新的候选项集生成函数Reorder-gen,该函数通过向导集高效地产生分布式环境中满足约束条件的、数量较少且完备的候选项集,并且求解全局约束性频繁项集过程中,传送局部候选项集支持数的通信量为O(n),从而提高了算法的挖掘效率.将本文提出的算法加以实现,实验结果表明DCAR算法高效可行,其效率大约是DMA-IC算法的2-3倍.

关 键 词:分布式数据挖掘  分布式关联规则  分布式数据库  频繁项目集  约束项
文章编号:1000-1220(2007)05-0882-04
修稿时间:2006-03-01

Fast Algorithms for Mining Constrained Association Rules in Distributed Systems
JI Gen-lin,WEI Su-yun.Fast Algorithms for Mining Constrained Association Rules in Distributed Systems[J].Mini-micro Systems,2007,28(5):882-885.
Authors:JI Gen-lin  WEI Su-yun
Affiliation:Department of Computer Science, Nanjing Normal University, Nanjing 210097,China
Abstract:Researchers have presented several algorithms for mining constrained association rules in a centralized database. Instead of applying such constraints as a post-processing step, integrating them into the mining algorithm can dramatically reduce the execution time. However, these algorithms are inapplicable to distributed databases. We introduce the problem of integrating constraints that are Boolean expressions into the distributed mining association rules, and especially investigate the key problem about finding all frequent itemsets that satisfy the Boolean expression in distributed databases. The fast algorithm DCAR for mining constrained association rules in distributed systems are proposed, which includes efficient algorithms MLFC and MGFC for distributed mining frequent itemsets that satisfy the Boolean expression. It generates a small number of candidate itemsets and requires only O(n) messages for support count for each candidate itemsets, where n is the number of sites in distributed databases The algorithms are implemented and its performance is studied. The experiment results show that the algorithms are effective and efficient. Algorithm DCAR is faster 2 to3 times than algorithm DMA-IC.
Keywords:distributed data mining  distributed association rules  distributed databases  frequent itemsets  item constraints
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号