首页 | 本学科首页   官方微博 | 高级检索  
     

基于GPU+CPU的CANNY算子快速实现
引用本文:唐斌,龙文.基于GPU+CPU的CANNY算子快速实现[J].液晶与显示,2016,31(7):714-720.
作者姓名:唐斌  龙文
作者单位:1. 贵州财经大学 信息学院, 贵州 贵阳 550025;
2. 贵州财经大学 贵州省经济系统仿真重点实验室, 贵州 贵阳 550025
基金项目:国家自然科学基金(No.61463009)
摘    要:本文提出一种基于GPU+CPU的快速实现Canny算子的方法。首先将算子分为串行和并行两部分,高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理在GPU中完成,将二维高斯滤波分解为水平方向上和垂直方向上的两次一维滤波从而降低计算的复杂度;然后使用CUDA编程完成多线程并行计算以加快计算速度;最后使用共享存储器隐藏线程访问全局存储的延迟;在CPU中则使用队列FIFO完成边缘连接。仿真测试结果表明:对分辨率为1024×1024的8位图像的处理时间为122 ms,相对应单独使用CPU而言,加速比最高可达5.39倍,因此本文方法充分利用了GPU的并行性的特征和CPU的串行处理能力。

关 键 词:CANNY  CUDA  GPU  加速
收稿时间:2015-12-21

Fast Canny algorithm based on GPU+CPU
TANG Bin,LONG Wen.Fast Canny algorithm based on GPU+CPU[J].Chinese Journal of Liquid Crystals and Displays,2016,31(7):714-720.
Authors:TANG Bin  LONG Wen
Affiliation:1. School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China;
2. Guizhou Key Laboratory of Economics System Simulation, Guizhou University of Finance and Economics, Guiyang 550025, China
Abstract:This paper presents a fast method for Canny algorithm based on GPU+CPU. The Canny algorithm is divided into two parts:Gauss filtering, gradient computations, non maximum suppression and double thresholding are processed by GPU. The fast method convert two-dimensional Gaussian filter to two separable convolutions to reduce the computation complexity. Then, multiple threads execute kernel in parallel to speed up the computation in the CUDA program. Finally, threads access shared memory instead of global memory to hide the latencies of global memory. In addition, FIFO is used to connect components in CPU. The simulation results show that the processing time of the 8-bit images with the resolution 1 024×1 024 is 122 ms, which is 5.39 times faster than CPU. Therefore, this method takes full advantage of the parallelism of GPU and the serial processing capability of CPU.
Keywords:Canny  CUDA  GPU  acceleration
点击此处可从《液晶与显示》浏览原始摘要信息
点击此处可从《液晶与显示》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号