首页 | 本学科首页   官方微博 | 高级检索  
     

MapReduce框架下并行知识约简算法模型研究
引用本文:钱进,苗夺谦,张泽华,张志飞.MapReduce框架下并行知识约简算法模型研究[J].计算机科学与探索,2013(1):35-45.
作者姓名:钱进  苗夺谦  张泽华  张志飞
作者单位:同济大学计算机科学与技术系;江苏理工学院计算机工程学院;同济大学嵌入式系统与服务计算教育部重点实验室
基金项目:国家自然科学基金Nos.60970061,61075056,61103067;中央高校基本科研业务费专项资金~~
摘    要:面向大规模数据进行知识约简是近年来粗糙集理论研究热点。经典的知识约简算法是一次性将小数据集装入单机主存中进行约简,无法处理海量数据。深入剖析了知识约简算法中的可并行性;设计并实现了数据和任务同时并行的Map和Reduce函数,用于计算不同候选属性集导出的等价类和属性重要性;构建了一种MapReduce框架下并行知识约简算法模型,用于计算基于正区域、基于差别矩阵或基于信息熵的知识约简算法的一个约简。在Hadoop平台上进行了相关实验,实验结果表明,该并行知识约简算法模型可以高效地处理海量数据集。

关 键 词:MapReduce  粗糙集  知识约简  数据并行  任务并行

Parallel Algorithm Model for Knowledge Reduction Using MapReduce
QIAN Jin,MIAO Duoqian, ZHANG Zehua,ZHANG Zhifei.Parallel Algorithm Model for Knowledge Reduction Using MapReduce[J].Journal of Frontier of Computer Science and Technology,2013(1):35-45.
Authors:QIAN Jin  MIAO Duoqian    ZHANG Zehua  ZHANG Zhifei
Affiliation:1,3 1. Department of Computer Science and Technology, Tongji University, Shanghai 201804, China 2. School of Computer Engineering, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China 3. Key Laboratory of Embedded System & Service Computing, Ministry of Education of China, Tongji University, Shanghai 201804, China
Abstract:Knowledge reduction for massive datasets has attracted many research interests in rough set theory. Classical knowledge reduction algorithms assume that all datasets can be loaded into the main memory of a single machine, which are infeasible for large-scale data. Firstly, this paper analyzes the parallel computations among classical knowledge reduction algorithms. Then, in order to compute the equivalence classes and attribute significance on different candidate attribute sets, it designs and implements the Map and Reduce functions using data and task parallelism. Finally, it constructs the parallel algorithm framework model for knowledge reduction using MapReduce, which can be used to compute a reduct for the algorithms based on positive region, discernibility matrix or information entropy. The experimental results demonstrate that the proposed parallel knowledge reduction algorithms can efficiently process massive datasets on Hadoop platform.
Keywords:MapReduce  rough set  knowledge reduction  data parallel  task parallel
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号