基于迭代填充的内存计算框架分区映射算法 Partitioning and mapping algorithm for in-memory computing framework based on iterative filling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于迭代填充的内存计算框架分区映射算法

引用本文：	卞琛,于炯,修位蓉,英昌甜,钱育蓉. 基于迭代填充的内存计算框架分区映射算法[J]. 计算机应用, 2017, 37(3): 647-653. DOI: 10.11772/j.issn.1001-9081.2017.03.647

作者姓名：	卞琛于炯修位蓉英昌甜钱育蓉

作者单位：	新疆大学信息科学与工程学院, 乌鲁木齐 830046

基金项目：	国家自然科学基金资助项目（61262088，61462079，61363083，61562086）；新疆维吾尔自治区高校科研计划项目（XJEDU2016S106）。

摘要：	针对内存计算框架Spark在作业Shuffle阶段一次分区产生的数据倾斜问题，提出一种内存计算框架的迭代填充分区映射算法（IFPM）。首先，分析Spark作业的执行机制，建立作业效率模型和分区映射模型，给出作业执行时间和分配倾斜度的定义，证明这些定义与作业执行效率的因果逻辑关系；然后，根据模型和定义求解，设计扩展式数据分区算法（EPA）和迭代式分区映射算法（IMA），在Map端建立一对多分区函数，并通过分区函数将部分数据填入扩展区内，在数据分布局部感知后再执行扩展区迭代式的多轮数据分配，根据Reduce端已分配数据量建立适应性的扩展区映射规则，对原生区的数据倾斜进行逐步修正，以此保障数据分配的均衡性。实验结果表明，在不同源数据分布条件下，算法均提高了作业Shuffle过程分区映射合理性，缩减了宽依赖Stage的同步时间，提高了作业执行效率。
关键词：	内存计算数据均衡扩展式分区迭代式映射
收稿时间：	2016-09-26
修稿时间：	2016-10-17
Partitioning and mapping algorithm for in-memory computing framework based on iterative filling

BIAN Chen,YU Jiong,XIU Weirong,YING Changtian,QIAN Yurong. Partitioning and mapping algorithm for in-memory computing framework based on iterative filling[J]. Journal of Computer Applications, 2017, 37(3): 647-653. DOI: 10.11772/j.issn.1001-9081.2017.03.647

Authors:	BIAN Chen YU Jiong XIU Weirong YING Changtian QIAN Yurong

Affiliation:	School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China

Abstract:	Focusing on the issue that the only one Hash/Range partitioning strategy in Spark usually results in unbalanced data load at Reduce phase and increases job duration sharply, an Iterative Filling data Partitioning and Mapping algorithm (IFPM) which include several innovative approaches was proposed. First of all, according to the analysis of job execute scheme of Spark, the job efficiency model and partition mapping model were established, the definitions of job execute timespan and allocation incline degree were given. Moreover, the Extendible Partitioning Algorithm (EPA) and Iterative Mapping Algorithm (IMA) were proposed, which reserved partial data into extend region by one-to-many partition function at Map phase. Data in extended region would be mapped by extra iterative allocation until the approximate data distribution was obtained, and the adaptive mapping function was executed by awareness of calculated data size at Reduce phase to revise the unbalanced data load in original region allocation. Experimental results demonstrate that for any distribution of the data, IFPM promotes the rationality of data load allocation from Map phase to Reduce phase and optimize the job efficiency of in-memory computing framework.

Keywords:	in-memory computing load balance extendible partitioning iterative mapping

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏