首页 | 本学科首页   官方微博 | 高级检索  
     

基于申威1621的高精度点积算法实现与优化
引用本文:徐方洁,王磊,王一卓,张亚光. 基于申威1621的高精度点积算法实现与优化[J]. 计算机系统应用, 2023, 32(2): 400-405
作者姓名:徐方洁  王磊  王一卓  张亚光
作者单位:中原工学院 前沿信息技术研究院, 郑州 450007
摘    要:点积函数是BLAS库中的一级基础函数,其被科学计算等领域广泛调用.由于浮点计算会引入舍入误差,现有BLAS库中双精度点积函数不足以满足某些应用领域的精度要求,因此需要高精度算法来实现更精确可靠的计算.在本文中,面向国产申威1621平台,在现有的BLAS库的基础上,新增高精度点积函数的实现接口,来满足应用的高精度需求.同时,对于高精度点积算法运用循环展开、访存优化、指令重排等优化策略,实现汇编级手工优化.实验结果显示,文中高精度点积算法的计算结果精度,近似达到了双精度点积的两倍,有效提升了原始算法精度.同时,在保证精度提升的基础上,文中优化后的高精度点积函数相比未优化前,平均性能加速比达到了1.61.

关 键 词:申威1621  点积  高精度  BLAS库接口  性能优化
收稿时间:2022-06-20
修稿时间:2022-07-18

Implementation and Optimization of High-precision Dot Product Algorithm Based on SW1621 Processor
XU Fang-Jie,WANG Lei,WANG Yi-Zhuo,ZHANG Ya-Guang. Implementation and Optimization of High-precision Dot Product Algorithm Based on SW1621 Processor[J]. Computer Systems& Applications, 2023, 32(2): 400-405
Authors:XU Fang-Jie  WANG Lei  WANG Yi-Zhuo  ZHANG Ya-Guang
Affiliation:Research Institute of Frontier Information Technology, Zhongyuan University of Technology, Zhengzhou 450007, China
Abstract:The dot product function is a first-level basic function in the BLAS library, which is widely called by scientific calculations and other fields. As the floating-point calculation introduces rounding errors, the double-precision dot product is unable to meet the accuracy requirements in some application fields, and thus high-precision algorithms are needed to achieve more accurate and reliable calculations. In this study, on the basis of the existing BLAS library, the interface of the high-precision dot product function is added to meet the high-precision requirements of applications on the domestic SW1621 platform. At the same time, the high-precision dot product algorithm uses such optimization strategies as loop expansion, visit-memory optimization, and instruction rearrangement to realize assembly-level manual optimization. The experimental results indicate that the high-precision dot product algorithm has the accuracy approximately twice that of the double-precision dot product, which effectively improves the precision of the original algorithm. On this basis, the average performance speedup of the high-precision dot product function reaches 1.61 after optimization.
Keywords:SW1621  dot product  high-precision  BLAS library interface  performance optimization
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号