基于OpenCL的尺度不变特征变换算法的并行设计与实现 Parallel design and implementation of scale invariant feature transform algorithm based on OpenCL期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于OpenCL的尺度不变特征变换算法的并行设计与实现

引用本文：	许川佩,王光.基于OpenCL的尺度不变特征变换算法的并行设计与实现[J].计算机应用,2016,36(7):1801-1806.

作者姓名：	许川佩王光

作者单位：	1. 桂林电子科技大学电子工程与自动化学院, 广西桂林 541004;2. 广西自动检测技术与仪器重点实验室(桂林电子科技大学), 广西桂林 541004

摘要：	针对尺度不变特征变换（SIFT）算法实时性差的问题，提出了利用开放式计算语言（OpenCL）并行优化的SIFT算法。首先，通过对原算法各步骤进行组合拆分、重构特征点在内存中的数据索引等方式对原算法进行并行化重构，使得算法的中间计算结果能够完全在显存中完成交互；然后，采用复用全局内存对象、共享局部内存、优化内存读取等策略对原算法各步骤进行并行设计，提高数据读取效率，降低传输延时；最后，利用OpenCL语言在图形处理单元（GPU）上实现了SIFT算法的细粒度并行加速，并在中央处理器（CPU）上完成了移植。与原SIFT算法配准效果相近时，并行化的算法在GPU和CPU平台上特征提取速度分别提升了10.51～19.33和2.34～4.74倍。实验结果表明，利用OpenCL并行加速的SIFT算法能够有效提高图像配准的实时性，并能克服统一计算设备架构（CUDA）因移植困难而不能充分利用异构系统中多种计算核心的缺点。
关键词：	尺度不变特征变换算法开放式计算语言复用内存对象细粒度并行异构系统
收稿时间：	2015-12-10
修稿时间：	2016-02-22
Parallel design and implementation of scale invariant feature transform algorithm based on OpenCL

XU Chuanpei,WANG Guang.Parallel design and implementation of scale invariant feature transform algorithm based on OpenCL[J].journal of Computer Applications,2016,36(7):1801-1806.

Authors:	XU Chuanpei WANG Guang

Affiliation:	1. School of Electrical Engineering and Automation, Guilin University of Electronic Technology, Guilin Guangxi 541004, China;2. Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

Abstract:	The real-time performance of Scale Invariant Feature Transform (SIFT) algorithm is excessively bad. To solve the problem, a parallel optimized SIFT algorithm using the Open Computing Language (OpenCL) was proposed. Firstly, all steps of the original algorithm were split and combined; in addition, the indexing method of feature points in memory was restructured. Thus the middle calculation results could be made completely to finish interaction in the memory. Then, each step of the original algorithm was designed in parallel to improve the efficiency of data reading and reduce the transmission delay by multiplexing global memory object, sharing local memory and optimizing memory access. Finally, a fine-grained parallel accelerated SIFT algorithm was completed on Graphics Processing Unit (GPU) platform using OpenCL and the transplant was completed on the Central Processing Unit (CPU) platform. The parallel algorithm speeded up 10.51-19.33 and 2.34-4.74 times in feature extraction on GPU and CPU platform when the registration result was close to the original algorithm. The experimental results show that the parallel accelerated SIFT algorithm using OpenCL can improve the real-time performance of image registration and overcome the disadvantages of that Compute Unified Device Architecture (CUDA) is difficult to be transplanted so that it can not make full use of the multiple computing cores in heterogeneous systems.

Keywords:	Scale Invariant Feature Transform (SIFT) algorithm Open Computing Language (OpenCL) multiplexed memory object fine-grained parallelism heterogeneous system

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏