首页 | 本学科首页   官方微博 | 高级检索  
     

基于申威1621处理器的BLAS一级函数优化
引用本文:李浩然,王磊.基于申威1621处理器的BLAS一级函数优化[J].计算机系统应用,2021,30(7):246-252.
作者姓名:李浩然  王磊
作者单位:中原工学院 计算机学院, 郑州 450007;中原工学院 前沿信息技术研究院, 郑州 450007
摘    要:BLAS (Basic Linear Algebra Subprograms)是一个基本线性代数操作的数学函数标准, 该库函数分为三个级别, 每个级别提供了向量与向量(1级)、向量与矩阵(2级)、向量与向量(三级)之间的基本运算. 本文研究了在申威1621处理器上BLAS一级函数的优化方案, 以函数AXPY为例, 充分利用平台的架构特点对其进行性能调优,设计了自动的线程分配方案. 实验结果显示优化过后的BLAS一级函数AXPY相对于GotoBLAS参考实现版本的单核和多核加速比分别高达4.36和9.50, 对于每种优化方式均得到了一定的性能提升.

关 键 词:申威1621  BLAS  并行  线程分配  SIMD向量化
收稿时间:2020/11/7 0:00:00
修稿时间:2020/12/12 0:00:00

Optimization of BLAS Level 1 Functions on SW1621 Processor
LI Hao-Ran,WANG Lei.Optimization of BLAS Level 1 Functions on SW1621 Processor[J].Computer Systems& Applications,2021,30(7):246-252.
Authors:LI Hao-Ran  WANG Lei
Affiliation:School of Computer Science, Zhongyuan University of Technology, Zhengzhou 450007, China; School of Computer Science, Zhongyuan University of Technology, Zhengzhou 450007, China;Research Institute of Frontier Information Technology, Zhongyuan University of Technology, Zhengzhou 450007, China
Abstract:The Basic Linear Algebra Subprogram (BLAS) is a mathematical function standard for basic linear algebra operations. The library function is divided into three levels in which basic operations between vector and vector (level 1), vector and matrix (level 2), and vector and vector (level 3) are offered. In this paper, we study the optimization scheme of BLAS level1 functions on SW1621 processor. With the function AXPY as an example, the architectural characteristics of the platform are fully used to optimize its performance, and an automatic thread allocation scheme is designed. The experimental results show that compared with the reference implementation version of GotoBLAS, the optimized BLAS level1 function, AXPY, has a high single-core acceleration ratio of 4.36 and a multi-core one of 9.50 respectively. Every optimization scheme can improve the performance.
Keywords:SW1621  Basic Linear Algebra Subprograms (BLAS)  parallel  automatic thread allocation  SIMD vectorization
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号