首页 | 本学科首页   官方微博 | 高级检索  
     

面向飞腾处理器的高精度求和与点乘算法实现和优化
引用本文:黄春,姜浩,谷同祥,齐进,刘文超.面向飞腾处理器的高精度求和与点乘算法实现和优化[J].计算机工程与科学,2021,43(1):1-8.
作者姓名:黄春  姜浩  谷同祥  齐进  刘文超
作者单位:(1.国防科技大学计算机学院,湖南 长沙 410073;2.北京应用物理与计算数学研究所,北京 100088)
基金项目:国家自然科学基金;国家重点研发计划;科学挑战专题资助项目;湖南省自然科学基金
摘    要:在大规模和长时程数值计算中,浮点运算的舍入误差的累积效应可能导致数值结果不可信。求和与点乘是浮点数值计算中最为基础的运算,在大规模科学计算过程中被频繁调用,其数值结果精度至关重要。面向国产飞腾处理器,基于OpenBLAS,采用无误差变换技术设计了高效的汇编内核函数,实现并优化了高精度的求和与点乘算法。数值实验显示,该高精度算法的数值结果精度同原始算法在双倍工作精度下得到的数值结果精度相同,验证了本文算法的有效性;本文算法在单线程情况下运行时间分别是原始算法运行时间的1.57倍和1.76倍,在保证精度提升的同时效率没有明显的降低;在多线程情况下,同原始算法具有近乎相同的运行时间,体现了算法的高效性。理论误差分析进一步表明了本文算法的可靠性。

关 键 词:无误差变换  浮点数  高精度  求和  点乘  
收稿时间:2020-05-30
修稿时间:2020-06-30

Implementation and optimization of high-precision summation and dot product algorithms on Phytium processor
HUANG Chun,JIANG Hao,GU Tong-xiang,QI Jin,LIU Wen-chao.Implementation and optimization of high-precision summation and dot product algorithms on Phytium processor[J].Computer Engineering & Science,2021,43(1):1-8.
Authors:HUANG Chun  JIANG Hao  GU Tong-xiang  QI Jin  LIU Wen-chao
Affiliation:(1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073; 2.Institute of Applied Physics and Computational Mathematics,Beijing 100088,China)
Abstract:In large-scale and long-term numerical calculations, the cumulative effect of rounding errors in floating-point operations may lead to unreliable numerical results. Sum and dot multiplication are the most basic operations in floating-point numerical calculations. They are frequently called during large-scale scientific calculations, and the accuracy of their numerical results is very important. Oriented to the domestic Phytium processor, based on OpenBLAS, this paper uses error-free transformation technology to design efficient assembly kernel functions, and implements and optimizes the high-precision sum and dot product algorithms. Numerical experiments show that the accuracy of the numerical results of our high-precision algorithms is the same as that of the original algorithm under double working accuracy, which verifies the effectiveness of the algorithm. The running time of our algorithms is 1.57 and 1.76 times the running time of the original algorithms in the single-threaded case, and the efficiency is not significantly reduced while the accuracy is improved. In the case of multi-threading, it has almost the same running time as the original algorithms, which reflects the efficiency of our algorithms. Theoretical error analysis further ensures the reliability of our algorithms.
Keywords:error-free transformation  float-point number  high-precision  summation  dot product        
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号