首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hadoop YARN的TensorFlow GPU集群的调度扩展
作者姓名:陆忠华  孙琨  王彦棡  王珏  刘芳
作者单位:1. 中国科学院计算机网络信息中心,北京 100190;2. 中国科学院大学,北京 100049
摘    要:本文研究并实现了大数据平台 Hadoop YARN 与深度学习框架 TensorFlow 的结合。通过对 DRF 算法的扩展,使得 Hadoop YARN 在原先支持 CPU 和内存的基础上,可以对 GPU 资源进行管理和调度。通过 YARN 的应用接口,把 TensorFlow 封装成了 YARN 的应用程序之一,把原来的分布式程序在多节点手动分发启动改为了在单节点自动分发启动,单机版不变。本文设计了多组实验对 YARN+TensorFlow 进行了多方位的测试,实验结果表明 YARN 和 TensorFlow 相结合相比原生 TensorFlow 程序具有相似的加速比,可以满足单系统多用户对 GPU 资源的使用,有效提高 GPU 资源的使用效率和编程人员的工作效率,增加系统的复用率。

关 键 词:Hadoop  YARN  Tensorflow  调度  深度学习  GPU  
收稿时间:2017-09-10

Scheduling Extension of TensorFlow GPU Cluster Based on Hadoop YARN
Authors:Lu Zhonghua  Sun Kun  Wang Yangang  Wang Jue  Liu Fang
Affiliation:1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;2. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:This paper studies and realizes the combination of big data platform Hadoop YARN and TensorFlow, a deep learning framework. Through the expansion of DRF algorithm, Hadoop YARN can manage and schedule GPU resources based on the original support of CPU and memory. Through the application interface of YARN, TensorFlow is encapsulated into one of the applications of YARN, and the original distributed program is changed to be started and distributed automatically at a single node, and the stand-alone version is unchanged. In this paper, multi-group experiments are designed to test the YARN + TensorFlow in various directions. The experimental results show that YARN and TensorFlow have a similar speedup compared with the native TensorFlow program, which can meet the single-system multi-user usage of GPU resources and effectively improve GPU resource usage efficiency and programmer productivity, increase the system's reuse rate.
Keywords:Hadoop YARN  Tensorflow  scheduling  deep learning  GPU  
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号