首页 | 本学科首页   官方微博 | 高级检索  
     

使用内存缓存的迭代应用编程框架
引用本文:连文波,汪美玲,陶秋铭,赵琛.使用内存缓存的迭代应用编程框架[J].计算机系统应用,2015,24(3):44-49.
作者姓名:连文波  汪美玲  陶秋铭  赵琛
作者单位:1. 中国科学院软件研究所基础软件国家工程研究中心,北京100190;中国科学院大学,北京100190
2. 中国科学院大学,北京,100190
基金项目:国家自然科学基金(61100067)
摘    要:迭代式计算是一类重要的大数据分析应用.在分布式计算框架MapReduce上实现迭代计算时,计算会被分解成多个作业并按作业依存关系顺序运行,这使得程序与分布式文件系统(DFS)有多次交互而影响程序执行时间.对这些交互相关数据的缓存会降低与DFS的交互时间,进而提升程序总体的性能.考虑到集群中的大量内存在多数情况下会处于空闲状态,提出了一种使用内存缓存的迭代式应用编程框架MemLoop.该系统从作业提交API、调度算法、缓存管理模块实现缓存管理以充分利用内存缓存迭代间可驻留数据与迭代内依存数据.我们将此框架与已有相关框架进行了比较,实验结果表明该框架能够提升迭代程序的性能.

关 键 词:作业依存  内存缓存  迭代程序  迭代间可驻留数据  迭代内依存数据
收稿时间:7/4/2014 12:00:00 AM
修稿时间:2014/8/11 0:00:00

MemLoop: A Programming Framework Using In-Memory Cache for Iterative Application
LIAN Wen-Bo,WANG Mei-Ling,TAO Qiu-Ming and ZHAO Chen.MemLoop: A Programming Framework Using In-Memory Cache for Iterative Application[J].Computer Systems& Applications,2015,24(3):44-49.
Authors:LIAN Wen-Bo  WANG Mei-Ling  TAO Qiu-Ming and ZHAO Chen
Affiliation:Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Science, Beijing 100190, China;Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Science, Beijing 100190, China;University of Chinese Academy of Science, Beijing 100190, China;University of Chinese Academy of Science, Beijing 100190, China
Abstract:The iterative computation is an important big data analysis application. While implementing iterative computation on the distributed computation framework MapReduce, the iterative program will be divided into more than one jobs which run in the order defined by the dependencies between jobs, which lead to many interactions between the program and distributed file system(DFS) that will affect the program's execution time. Caching these interaction-related data will reduce the time of interactions between the program and DFS and hence improve the overall performance of application. Considering that large amount of memory in cluster nodes is unused at most time, this paper proposes a programming framework called MemLoop using memory cache for iterative application. This system sufficiently uses the free memory in the cluster's nodes to cache data by implementing the memory caching management from three models: job submit API, task scheduling algorithm, cache management. The cached data is classified into two categories: inter-iteration resident data and intra-iteration dependent data. We compare this framework with previous related framework. The result shows that MemLoop can improve the performance of iterative program.
Keywords:job dependency  in-memory cache  iterative program  inter-iteration resident data  intra-iteration dependent data
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号