MapReduce框架下结合分布式编码计算的容错算法 Fault-Tolerant Algorithm Combined with Distributed Coding Computing in MapReduce Framework期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

MapReduce框架下结合分布式编码计算的容错算法

引用本文：	张基,谢在鹏,毛莺池,徐媛媛,朱晓瑞,李博文.MapReduce框架下结合分布式编码计算的容错算法[J].计算机工程,2021,47(4):173-179.

作者姓名：	张基谢在鹏毛莺池徐媛媛朱晓瑞李博文

作者单位：	河海大学计算机与信息学院, 南京 211100

基金项目：	国家重点研发计划;国家自然科学基金重点项目

摘要：	随着分布式系统规模扩大及计算复杂度增加，分布式计算的平均故障修复时间和容错计算所产生的通信开销呈现日益上升趋势。结合分布式编码计算和副本冗余技术，提出一种新的容错算法。map节点应用分布式编码计算的思想，将数据冗余分配至多个计算节点创建编码中间结果，降低计算节点在shuffle阶段的数据传输量。reduce节点通过对接收到的编码中间结果进行解码，从而验证中间结果的正确性并得到最终计算结果。实验结果表明，在基于MapReduce的分布式计算框架下，与三模冗余和两阶段三模冗余容错算法相比，该算法在完成容错计算的同时能降低计算过程中的通信开销和平均故障修复时间，并提高分布式系统的可用性和可靠性。
关键词：	分布式系统分布式计算容错算法分布式编码计算三模冗余
收稿时间：	2020-03-13
修稿时间：	2020-04-28
Fault-Tolerant Algorithm Combined with Distributed Coding Computing in MapReduce Framework

ZHANG Ji,XIE Zaipeng,MAO Yingchi,XU Yuanyuan,ZHU Xiaorui,LI Bowen.Fault-Tolerant Algorithm Combined with Distributed Coding Computing in MapReduce Framework[J].Computer Engineering,2021,47(4):173-179.

Authors:	ZHANG Ji XIE Zaipeng MAO Yingchi XU Yuanyuan ZHU Xiaorui LI Bowen

Affiliation:	School of Computer and Information, Hohai University, Nanjing 211100, China

Abstract:	The growing size and computational complexity of distributed systems lead to an increase in the Mean Time to Repair(MTTR)of distributed computing systems and the communication load caused by fault-tolerant computing.To solve the problems,this paper integrates distributed coding computing with replica redundancy to propose a novel faulttolerant algorithm.The map node uses the idea of distributed coding computing to allocate data replica to multiple computing nodes to create intermediate coding results and reduce the amount of data transmitted by the computing nodes in the shuffle phase.The reduce node decodes the received intermediate coding result to verify its correctness and obtain the final computing result.Experimental results show that in the MapReduce framework,the proposed algorithm can reduce the communication overhead and MTTR compared with the Triple Modular Redundancy(TMR)and two-stage TMR fault-tolerant algorithms.It also improves the availability and reliability of distributed systems.

Keywords:	distributed system distributed computing fault-tolerant algorithm distributed coding computing Triple Modular Redundancy(TMR)
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏