首页 | 本学科首页   官方微博 | 高级检索  
     

Xeon Phi平台上基于模板优化的3D-GVF场计算加速
引用本文:齐金,李宽,杨灿群,杜云飞.Xeon Phi平台上基于模板优化的3D-GVF场计算加速[J].计算机工程与科学,2014,36(8):1435-1440.
作者姓名:齐金  李宽  杨灿群  杜云飞
基金项目:国家863计划资助项目(2012AA010903);国家自然科学基金资助项目(61170049,61303189)
摘    要:3D梯度向量流场(3D GVF field)广泛应用于多种3D图像分析算法中,其计算需要多次迭代,计算量大,如何提高其计算速度具有重要的研究意义。面向Intel Xeon Phi众核集成架构,首次进行了3D GVF场计算的加速优化。首先,挖掘3D图像像素点间存在的天然并行性,发挥众核架构优势,尝试线程级并行(多核)和数据级并行(SIMD)。其次,3D GVF场的计算过程是一种典型的3D 7点模板运算,结合Xeon Phi架构的L2 缓存规格,提出一种高效的数据分块策略,充分挖掘数据的时/空局部性,有效缓解模板计算引起的缓存缺失,提升了计算性能。实验结果表明,引入模板优化技术能显著提升3D GVF场的计算速度,在图像维度为5123时,所提方法在57核Xeon Phi平台上的性能相比在2.6GHz 8核16线程的Intel Xeon E5 2670 CPU上的性能,加速比可达2.77。

关 键 词:3D梯度向量流场  Xeon  Phi  模板优化  缓存分块  
收稿时间:2013-08-12
修稿时间:2014-08-25

Accelerating 3D GVF field computation on Xeon Phi using stencil optimization
QI Jin,LI Kuan,YANG Can qun,DU Yun fei.Accelerating 3D GVF field computation on Xeon Phi using stencil optimization[J].Computer Engineering & Science,2014,36(8):1435-1440.
Authors:QI Jin  LI Kuan  YANG Can qun  DU Yun fei
Affiliation:(1.National Laboratory of Parallel and Distributed Processing,National University of Defense Technology,Changsha 410073;(2.College of Computer Science,National University of Defense Technology,Changsha 410073,China)
Abstract:3D Gradient Vector Flow (GVF) field has wide applications in many image processing algorithms. The computation of GVF field typically needs several iterations and is rather time consuming. Therefore, it is important and meaningful to improve the computation speed of 3D GVF field. The data level parallelism and thread level parallelism are introduced to accelerate the GVF field computation procedure on Intel Xeon Phi many core integrated platform for the first time. Meanwhile, GVF field computation is a kind of stencil computation, whose computation memory access ratio is low. A novel cache blocking strategy is proposed to fully utilize the L2 cache of Xeon Phi architecture,and to improve the computation speed of GVF field. The experimental results show that the proposed optimizations could effectively improve the speed of GVF filed computation. Especially, for a 5123 3D image, compared with the performance obtained by a 2.6G Hz 8 core 16threads Intel Xeon E5 2670 CPU, the speedup achieved on Xeon Phi is 2.77X.
Keywords:3D GVF field  Xeon Phi  stencil optimization  cache blocking  
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号