首页 | 本学科首页   官方微博 | 高级检索  
     

基于MapReduce的MIC算法并行化
引用本文:吕瑞,蔡国永,裴广战. 基于MapReduce的MIC算法并行化[J]. 计算机科学, 2015, 42(11): 80-83, 103
作者姓名:吕瑞  蔡国永  裴广战
作者单位:桂林电子科技大学广西可信软件重点实验室 桂林541004,桂林电子科技大学广西可信软件重点实验室 桂林541004,桂林电子科技大学广西可信软件重点实验室 桂林541004
基金项目:本文受广西自然科学基金(2011GXNSFA018156),研究生创新项目(GDYCSZ201464)资助
摘    要:MIC是一种分析变量之间可能存在的关系的方法。该方法不仅能够有效识别出变量间各种复杂类型的关系,还能够准确描述噪音数据对存在关系的影响,对探索大数据集中变量之间的关系具有重要意义。针对该方法在处理包含大量变量的数据集时性能方面的不足,首次对它进行了基于MapReduce模型的并行化。提出的并行化方法首先对原算法进行更细颗粒度的划分,然后采用一种基于Map-Reduce-Map任务链的并行模型,该模型不仅有效地增加了并行的计算单元,还大大地降低了不必要的系统开销。最后,通过理论分析和实验验证得出,改进后的算法与原算法相比,在准确率方面具有等效性,运行速度大幅度提升且具有良好的可扩展性;实验同时指出了算法性能的提升与系统资源的关系。

关 键 词:大数据  MIC  关系挖掘  MapReduce  并行化
收稿时间:2014-11-07
修稿时间:2015-01-18

Parallelization of MIC Algorithm Based on MapReduce
LV Rui,CAI Guo-yong and PEI Guang-zhan. Parallelization of MIC Algorithm Based on MapReduce[J]. Computer Science, 2015, 42(11): 80-83, 103
Authors:LV Rui  CAI Guo-yong  PEI Guang-zhan
Affiliation:Guangxi Key Lab of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China,Guangxi Key Lab of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China and Guangxi Key Lab of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China
Abstract:MIC is a kind of method to analyze the possible relationships existing between variables,which can not only effectively identify the various complex types of relationships,but also accurately describe the impact of noise on the relationships.Exploring variable relationships in the large data sets is considered significant to big data mining.Aiming at the shortage of performance in dealing with the data set containing a large number of variables,this paper proposed a parallelization method based on MapReduce.Firstly,a finer and smaller partition to the raw algorithm was conducted,and then a parallel model based on the task chain Map-Reduce-Map was adopted.The model not only effectively increases the parallel computing units,but also greatly reduces unnecessary consumption of system resource.Theoretical ana-lysis and experimental verification demonstrate that the improved algorithm has the same accuracy as well as the original algorithm and a great improvment in terms of running speed.The relationship between speed-up ratio and the amount of process Map2 shows that our method has a good scalability in the aspects of system resources.
Keywords:Big data  MIC  Relationship mining  MapReduce  Parallelization
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号