首页 | 本学科首页   官方微博 | 高级检索  
     

面向FT-M7002的高斯滤波算法优化实现
引用本文:陈云,王梦园,柴晓楠,商建东. 面向FT-M7002的高斯滤波算法优化实现[J]. 计算机工程与科学, 2021, 43(5): 799-806. DOI: 10.3969/j.issn.1007-130X.2021.05.005
作者姓名:陈云  王梦园  柴晓楠  商建东
作者单位:(1.郑州大学信息工程学院,河南 郑州 450001;2.河南省超级计算中心(郑州大学),河南 郑州 450052)
摘    要:国产自主研发的飞腾系列高性能DSP处理器在图像处理领域的应用,对面向该平台的高性能图像处理算法提出了强烈需求.高斯滤波作为图像处理的基础算法,能有效滤除图像中的高斯噪声,在图像处理领域具有广泛应用.针对飞腾高性能DSP的体系结构特点与高斯滤波算法特性,实现了面向飞腾高性能DSP的高斯滤波算法优化.通过手工向量化、控制流...

关 键 词:高性能DSP  高斯滤波  向量并行优化  DMA传输优化
收稿时间:2020-12-17
修稿时间:2021-03-04

Optimization of Gaussian filtering algorithm on FT-M7002
CHEN Yun,WANG Meng-yuan,CHAI Xiao-nan,SHANG Jian-dong. Optimization of Gaussian filtering algorithm on FT-M7002[J]. Computer Engineering & Science, 2021, 43(5): 799-806. DOI: 10.3969/j.issn.1007-130X.2021.05.005
Authors:CHEN Yun  WANG Meng-yuan  CHAI Xiao-nan  SHANG Jian-dong
Affiliation:(1.School of Information Engineering,Zhengzhou University,Zhengzhou 450001;2.Supercomputing Center of Henan Province (Zhengzhou University),Zhengzhou 450052,China)
Abstract:With the application of domestically developed Feiteng series high-performance DSP processors in the field of image processing, there is a strong demand for high-performance image processing algorithms on this platform. As the basic algorithm of image processing, Gaussian filtering can effectively filter out Gaussian noise in images, and it has been widely used in the field of image processing. According to the architectural characteristics of FeiTeng high-performance DSP and the characteristics of Gaussian filtering algorithm, the optimization of Gaussian filtering algorithm on Feiteng high performance DSP is realized. Optimization methods such as manual vectorization, control flow elimination, and loop unrolling are adopted to take full advantage of data-level and instruction-level parallelism, thereby reducing the number of data accesses and improving instruction efficiency. According to the DMA hardware and vector memory structure characteristics in the FT-MT2 core, optimizations such as ping-pong cache and DMA array transposition are performed to reduce the data transmission time and improve the data locality. Test results under various filter kernel sizes and image matrix scales show that, compared to the serial implementation of the Gaussian filter algorithm, the parallel optimization implementation achieves a speedup of 1.3~1.41. With cache enabled, compared with the running performance of the Gaussian filtering algorithm in the dsplib library on the TMS320C6678 platform, the acceleration effect is 1.15~1.71 times.
Keywords:high performance DSP  Gaussian filtering  vector parallel optimization  DMA transmission optimization  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号