一种Spark环境下的高效率大规模图数据处理机制 A high efficiency large-scale graph data processing mechanism in the environment of Spark期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种Spark环境下的高效率大规模图数据处理机制

引用本文：	杨天晴,王津,杨旭涛,张学杰.一种Spark环境下的高效率大规模图数据处理机制[J].计算机应用研究,2016,33(12).

作者姓名：	杨天晴王津杨旭涛张学杰

作者单位：	云南大学信息学院,云南大学信息学院,云南大学信息学院,云南大学信息学院

基金项目：	面向云计算环境的应用迁移策略及资源优化配置关键技术研究(61170222)；

摘要：	本文针对现有的图处理和图管理框架存在的效率低下以及数据存储结构等问题,提出了一种适合于大规模图数据处理机制。首先分析了目前的一些图处理模型以及图存储框架的优势与存在的不足。其次,通过对分布式计算的特性分析采取适合大规模图的分割算法、数据抽取的优化以及缓存、计算层与持久层结合机制三方面来设计本文的图数据处理框架。最后通过PageRank和SSSP算法来设计实验与MapReduce框架和采用HDFS作持久层的Spark框架做性能对比。实验证明本文提出的框架要比MapReduce框架快90倍,比采用HDFS作持久层的Spark框架快2倍,能够满足高效率图数据处理的应用前景。
关键词：	图计算内存计算图数据库 Hadoop Spark PageRank
收稿时间：	2016/1/1 0:00:00
修稿时间：	2016/10/19 0:00:00
A high efficiency large-scale graph data processing mechanism in the environment of Spark

Yang Tianqing,Wang Jin,Yang Xutao and Zhang Xuejie.A high efficiency large-scale graph data processing mechanism in the environment of Spark[J].Application Research of Computers,2016,33(12).

Authors:	Yang Tianqing Wang Jin Yang Xutao and Zhang Xuejie

Affiliation:	School of Information Science and Engineering,Yunnan University,School of Information Science and Engineering,Yunnan University,School of Information Science and Engineering,Yunnan University,

Abstract:	Due to the inefficiency problems in graph data processing and management framework and storage structure, in this paper, we proposed a feasible mechanism to process large-scale graph data. We first reviewed the existing graph processing models, and pros and cons of graph data storage frameworks. By analyzing the characteristics of distributed computing, we then proposed our graph data framework by implementing three main parts, including segmentation algorithm of large-scale graph, caching and optimization for data extraction, and combination mechanism of calculation and persistence layer. By applying Pagerank and SSSP algorithm, experiments were conducted to compare the performance of the proposed framework, MapReduce and Spark with HDFS. Result shows that the proposed framework is more 90x faster than MapReduce, and 2x faster than Spark with HDFS, and the proposed framework will meet and satisfy the needs of high performance graph data processing.

Keywords:	graph computing graph database memory computing Hadoop Spark PageRank

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏