首页 | 本学科首页   官方微博 | 高级检索  
     

Spark环境下基于子图的异步迭代更新方法
引用本文:李超,董新华,陈建峡.Spark环境下基于子图的异步迭代更新方法[J].计算机工程与应用,2020,56(7):67-73.
作者姓名:李超  董新华  陈建峡
作者单位:湖北工业大学 计算机学院,武汉 430068
基金项目:湖北省科技厅自然科学基金;国家自然科学基金
摘    要:全局同步计算模型简单易用,但是路障同步导致收敛速度变慢。以顶点为中心的异步迭代虽然提高了收敛速度,但在计算节点之间需要频繁发送信息。在Spark环境下提出一种基于子图的异步迭代更新方法。在子图之间建立异步消息通信连接后,子图能以异步方式发送数据块;通过多线程同步避免数据读写冲突,保证异步更新时顶点状态的一致性。在大规模样本数据集上分别从收敛结果、收敛速度和通信代价验证方法有效性。实验结果表明,与全局同步迭代相比,该方法有效提高了计算收敛速度。与顶点为中心的异步更新方式相比,该方法在收敛时间上略有增长,但是显著降低了通信开销。

关 键 词:子图  异步更新  Spark环境  图数据  图切分  

Asynchronous Iterative Updates Method Based on Subgraph in Spark
LI Chao,DONG Xinhua,CHEN Jianxia.Asynchronous Iterative Updates Method Based on Subgraph in Spark[J].Computer Engineering and Applications,2020,56(7):67-73.
Authors:LI Chao  DONG Xinhua  CHEN Jianxia
Affiliation:School of Computer Science, Hubei University of Technology, Wuhan 430068, China
Abstract:Bulk synchronous parallel model is easy to program. However, long waiting time is required for each vertex to step into next round in the BSP models, and frequent messages-passing are incurred by vertex-centric asynchronous methods. To accelerate the execution of iterative graph computations, this paper proposes an asynchronous iterative method in Spark, and exploits two means to guarantee the validity. Firstly, by leveraging remote procedure call to establish connections, data blocks can be transmitted asynchronously among vertex partitions and edge partitions. Secondly, to guarantee data consistency during updating, synchronization is adopted to make threads exclusive access to vertex state. Experiments on large scale graph data are conducted on a local cluster. Comparing with the BSP and vertex-centric model, the proposed method not only accelerates iteration, but also improves communication efficiency.
Keywords:subgraph  asynchronous update  Spark  graph data  graph partition
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号