首页 | 本学科首页   官方微博 | 高级检索  
     

面向龙芯3A体系结构的BLAS库优化
引用本文:何颂颂,顾乃杰,朱海涛,刘燕君.面向龙芯3A体系结构的BLAS库优化[J].小型微型计算机系统,2012,33(3):571-575.
作者姓名:何颂颂  顾乃杰  朱海涛  刘燕君
作者单位:1. 中国科学技术大学计算机科学与技术学院,合肥230027;安徽省计算与通讯软件重点实验室,合肥230027
2. 中国科学技术大学计算机科学与技术学院,合肥230027;中国科学院计算技术研究所,北京100190
3. 安徽大学计算机科学与技术学院,合肥,230039
基金项目:国家"八六三"高技术研究发展计划项目(2008AA010902)资助;国家自然科学基金项目(60833004)资助
摘    要:双精度普通矩阵乘法DGEMM是BLAS库中最核心的函数之一,大部分三级BLAS库函数的核心计算都是通过调用DGEM M来实现的.该文针对龙芯3A具有128位访存指令的特点,通过理论分析,找到了最佳的循环展开方式;针对龙芯3A的Cache替换策略(随机替换),通过使用地址交错技术,减少了Cache的冲突失效;针对龙芯3A访存带宽有限的问题,通过使用共享数据的任务划分方式,减少了数据访存量.优化后的DGEMM单核和多核运算速度均是性能最高的开源BLAS库(Goto-BLAS)的2倍多.

关 键 词:矩阵乘法  BLAS  任务划分  Linpack

Optimization of BLAS for Loongson-3A Architecture
HE Song-song , GU Nai-jie , ZHU Hai-tao , LIU Yan-jun.Optimization of BLAS for Loongson-3A Architecture[J].Mini-micro Systems,2012,33(3):571-575.
Authors:HE Song-song  GU Nai-jie  ZHU Hai-tao  LIU Yan-jun
Affiliation:1(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China) 2(Anhui Province Key Laboratory of Computing and Communication Software,Hefei 230027,China) 3(Institute of Computing Technology Chinese Academy of Sciences,Beijing 100190,China) 4(School of Computer Science and Technology,Anhui University,Hefei 230039,China)
Abstract:General matrix multiplication of double precision(DGEMM) is one of the most important functions in BLAS library,which is called by many functions in the level-3 BLAS.The theoretical analyses help us find out the best way for loop unrolling contraposing 128-bit memory access instructions of Loongson-3A.By means of address interleaving,cache conflict misses are reduced according to the random cache replacement policy.Considering the limited memory bandwidth of Loongson-3A,task classification on the basis of data sharing is adopted to reduce the data access.The computation speed of the optimized DGEMM on single-core and multi-core is more than twice that of the open source BLAS library of highest performance.
Keywords:matrix multiplication  BLAS  division of tasks  Linpack
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号