首页 | 本学科首页   官方微博 | 高级检索  
     

基于OpenCL的图像模糊化算法优化研究
引用本文:张樱,张云泉,龙国平. 基于OpenCL的图像模糊化算法优化研究[J]. 计算机科学, 2012, 39(3): 260-264
作者姓名:张樱  张云泉  龙国平
作者单位:(中国科学院软件研究所并行软件与计算科学实验室 北京 100190); (中国科学院研究生院 北京 100190)
基金项目:国家自然科学基金(60303020),国家自然科学基金重点项目(60533020);国家高科技技术研究发展计划(2006AA01A102)资助
摘    要:现代GPU一般都提供特定硬件(如纹理部件、光栅化部件及各种片上缓存)以加速二维图像的处理和显示过程,相应的编程模型(CUDA、OpenCL)都定义了特定程序设计接口(CUDA的纹理内存,OpenCL的图像对象)以便图像应用能利用相关硬件支持。以典型图像模糊化处理算法在AMD平台GPU的优化为例,探讨了OpenCL的图像对象在图像算法优化上的适用范围,尤其是分析了其相对于更通用的基于全局内存加片上局部存储进行性能优化的方法的优劣。实验结果表明,图像对象只有在图像为四通道且计算过程中需要缓存的数据量较小时才能带来较好的性能改善,其余情况采用全局内存加局部存储都能获得较好性能。优化后的算法性能相对于精心实现的CPU版加速比为200~1000;相对于NVIDIA NPP库相应函数的性能加速比为1.3~5。

关 键 词:AMD GPU  Blur  OpenCL  图像对象

Research on Image Blur Algorithm Optimization Using OpenCL
ZHANG Ying,ZHANG Yun-quan,LONG Guo-ping. Research on Image Blur Algorithm Optimization Using OpenCL[J]. Computer Science, 2012, 39(3): 260-264
Authors:ZHANG Ying  ZHANG Yun-quan  LONG Guo-ping
Affiliation:1(Laboratory of Parallel Software and Computational Science,Institute of Software,Chinese Academy of Sciences,Beijing 100190,China)1(Graduate University,Chinese Academy of Sciences,Beijing 100190,China)2
Abstract:Modern GPUs generally provide specific hardware(such as texture, grating components and various on-chipcache) to accelerate the 2D image processing and displaying process. Programming model defines specific APIs to facili-fate image applications taking advantage of image-related GPU hardware, such as CUDA' s texture memory andOpenCI_'s Images Object. Taking the optimization of image blur algorithm on AMD GPU as an example, the papermade a deep insight into the using of OpenCL's image object on image applications,especially its advantage and disad-vantage compared to the more general optimization method based on global memory and the on-chip local memory. Theexperimental results demonstrate that the image object can provide better performance only when the processing imageis four-channel and the amount of data to be cached is small. For other cases, optimizing with global memory and localmemory can get better performance. After optimization,the speedup reaches 200x to 1000x in comparison with the welloptimized CPU code,and the speedup over NV)DIA NPP version is upto 1. 3x to 5x.
Keywords:AMD GPU  Blur  OpenCL  Images object
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号