首页 | 本学科首页   官方微博 | 高级检索  
     

异构并行平台的Caffe推理速度提升方法
引用本文:王子曦,邵培南,邓畅. 异构并行平台的Caffe推理速度提升方法[J]. 计算机系统应用, 2022, 31(2): 220-226
作者姓名:王子曦  邵培南  邓畅
作者单位:中国电子科技集团公司第三十二研究所, 上海 201808
摘    要:随着计算机硬件性能的提高,目前在个人终端上也开始出现使用预训练机器学习模型进行推理的运用.Caffe是一款流行的深度学习框架,擅长图像分类等任务,但是在默认状态下只能单核运行,无法充分发挥异构并行计算设备的计算能力.深度学习对于计算性能的要求较高,如果能并行化以充分使用所有计算设备,就能提升计算速度和使用体验.由于CP...

关 键 词:调度算法  Caffe推理加速  快速分块调度算法  异构并行平台调度  深度学习性能优化
收稿时间:2021-04-14
修稿时间:2021-05-11

Caffe Inference Acceleration Method on Heterogeneous Parallel Platform
WANG Zi-Xi,SHAO Pei-Nan,DENG Chang. Caffe Inference Acceleration Method on Heterogeneous Parallel Platform[J]. Computer Systems& Applications, 2022, 31(2): 220-226
Authors:WANG Zi-Xi  SHAO Pei-Nan  DENG Chang
Affiliation:The 32nd Research Institute of China Electronics Technology Group Corporation, Shanghai 201808, China
Abstract:With the development of computer performance, pre-trained machine learning models are used for inference on personal devices. Caffe is a popular deep learning framework featuring image classification. However, it can only infer using one CPU core or one GPU if without customization, which limits the computing power of heterogeneous parallel computation devices. Deep learning is a demanding task for a computation device. For a better user experience and faster inference, it is important to fully use all computing cores of the device via parallelization. Considering the CPU-to-GPU performance ratio may vary on different deep learning models, tasks should not just be equally assigned to all computing cores. It should be noted that more overhead will be introduced if the tasks are divided into too many portions or synchronized scheduling algorithms are used. Thus, a well-designed scheduling algorithm able to reduce idle time is crucial for better performance. Some approaches have been developed to improve Caffe performance on heterogeneous parallel computation devices, whereas there are some limits on the platform hardware and usage. As a result, it is difficult to fully utilize the performance of these devices. This study reports the work on the improvement of Caffe interface and the proposed new algorithms. Caffe interface is extended to enable customized programs to use multiple computing cores or devices of a heterogeneous parallel platform for deep learning inference with Caffe. Some existing scheduling algorithms are ported and tested. To avoid synchronization overhead, two novel asynchronous scheduling algorithms, async-FIFO and fast-split, are proposed. All scheduling algorithms are tested and results show that the Caffe inference performance of heterogeneous parallel computation devices adopting fast-split is significantly faster than that in the case where only one computing core is adopted. Fast-split on average reduces performance waste by 7.4% and 21.0% on MNIST and Cifar-10 datasets, respectively, compared with the current best heterogeneous parallel scheduling algorithm HAT.
Keywords:scheduling algorithm  Caffe inference acceleration  fast-split scheduling algorithm  scheduling on heterogeneous parallel platform  deep learning performance optimization
本文献已被 维普 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号