首页 | 本学科首页   官方微博 | 高级检索  
     

流式处理的异步图处理框架
引用本文:李金吉,张岩峰,巩树凤,于戈,高立新.流式处理的异步图处理框架[J].软件学报,2018,29(3):528-544.
作者姓名:李金吉  张岩峰  巩树凤  于戈  高立新
作者单位:东北大学计算机科学与工程学院 沈阳 110819,东北大学计算机科学与工程学院 沈阳 110819,东北大学计算机科学与工程学院 沈阳 110819,东北大学计算机科学与工程学院 沈阳 110819,美国麻州大学阿默斯特校区电子与计算机工程系 美国阿莫斯特 01003
基金项目:国家自然科学基金(61672141,61528203);计算机体系结构国家重点实验室开放课题(CARCH201610);中央高校基本科研业务费专项资金项目(N161604008)收稿时间:2017-07-31;修改时间:2017-09-05
摘    要:大图数据的处理与分析是近年来的热点研究问题,分布式图计算是目前处理大图数据的主流技术,但是存在诸多无法避免的问题,比如分布式计算的负载均衡和分布式实现的调试和优化仍然非常困难。另一方面,近几年的研究表明,通过设计合理的数据结构和处理模型,在单个PC上基于大容量磁盘的大图计算往往可以获得与分布式图计算相当的处理性能。文献14]显示,GraphChi在单机上的处理性能与Spark在50台节点上处理性能相差无几。本文结合累加迭代计算和单机并行处理技术,提出流式处理的异步计算模型ASP。它实现了对磁盘的完全顺序访问,允许流式的顺序载入结构数据的同时进行异步更新计算。基于ASP模型,我们提出了一种流式处理的异步图处理框架S-Maiter,实现了高效率的基于外存的单机大图处理,通过I/O线程优化、内存资源监控、shard级优先级调度等优化技术,大大提高了系统处理大图数据的性能。实验结果表明,在处理大图数据(1300万顶点,5亿连边)时,仅仅需要1台PC机计算资源的S-Maiter与在16台PC上运行的分布式Maiter的性能几乎相当。并且S-Maiter比另外一个流行的单机大图处理系统GraphChi快1.5倍。

关 键 词:外存  异步累加  I/O  流式处理
收稿时间:2017/7/31 0:00:00
修稿时间:2017/9/5 0:00:00

Streamlined Asynchronous Graph Processing Framework
LI Jin-Ji,ZHANG Yan-Feng,GONG Shu-Feng,YU Ge and GAO Li-Xin.Streamlined Asynchronous Graph Processing Framework[J].Journal of Software,2018,29(3):528-544.
Authors:LI Jin-Ji  ZHANG Yan-Feng  GONG Shu-Feng  YU Ge and GAO Li-Xin
Affiliation:College Of Computer Science and Engineering, Northeastern University, Shenyang 110819,College Of Computer Science and Engineering, Northeastern University, Shenyang 110819,College Of Computer Science and Engineering, Northeastern University, Shenyang 110819,College Of Computer Science and Engineering, Northeastern University, Shenyang 110819 and Department of Electrical and Computer Engineering, University of Massachusetts Amherst, U. S. 01003
Abstract:Big graph data processing and analysis is a hot research topic today. Distributed graph processing is the mainstream but suffers a few unavoidable issues, such as workload imbalancing and the debugging/optimizing difficulties in distributed programs. On the other hand, recent research results show that with a reasonable design of data structure and processing model, graph processing on a single PC can achieve comparable performance instead of using tens of machines.14] shows that GraphChi on a single PC can achieve almost the same performance with Spark with 50 nodes. In this paper, we propose a streamlined asynchronous graph processing model ASP based on accumulated iterative model and external storage based parallel computing techniques. ASP relies on sequential disk access and allows asynchronous computations on the graph structure data. Based on ASP, we design and implement a streamlined graph processing framework S-Maiter, which provides high performance graph processing ablity on a single PC. With the optimizations of I/O threading, memory monitoring, and shard-level priority scheduling, the performance of S-Maiter is greatly improved. Our experimental results on a big graph dataset (13 millions nodes and 500 million edges) show that, our 1-node S-Maiter can achieve comparable performance with distributed Maiter with 16 nodes. Furthermore, S-Maiter is 1.5 times faster than another popular sinlge-PC graph processing system GraphChi.
Keywords:external storage  asynchronous accumulated model  I/O  streamlined processing
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号