首页 | 本学科首页   官方微博 | 高级检索  
     

多核龙芯3A上二级BLAS库的优化
引用本文:李毅,何颂颂,李恺.多核龙芯3A上二级BLAS库的优化[J].计算机系统应用,2011,20(1):163-167.
作者姓名:李毅  何颂颂  李恺
作者单位:中国科学技术大学计算机科学与技术学院,合肥,230027
基金项目:基金项目:国家高技术研究发展计划(863)(2008AA010902);自然科学基金(60833004)
摘    要:针对龙芯3A体系结构以及二级BLAS库函数的特点,在指令级、存储级和线程级抽取并行方案,总结了一些合适的优化方法,并对其进行了定量的分析.实验表明,这些优化可以将二级BLAS函数单线程的性能提升20%以上,多线程下也可以得到2.5倍左右的加速比,这对今后多核龙芯上的系统软件优化工作有着一定的帮助.

关 键 词:龙芯3A  BLAS  优化  Gemv  Ger  访存  多线程
收稿时间:2010/4/29 0:00:00
修稿时间:2010/5/27 0:00:00

Optimization of BLAS Level 2 Based on Multi-Core Loongson 3A
LI Yi,HE Song-Song and LI Kai.Optimization of BLAS Level 2 Based on Multi-Core Loongson 3A[J].Computer Systems& Applications,2011,20(1):163-167.
Authors:LI Yi  HE Song-Song and LI Kai
Affiliation:School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
Abstract:According to characteristics of Loongson 3A architecture and BLAS level 2, this article derives the parallel solutions from instruction level, storage level and thread level. We summarize some suitable optimization methods and make a quantitative analysis. Experiment shows that the single-threading performance of BLAS level 2 is increased by 20%, and the multi-threading speedup reaches to 2.5. All of these will give some help to the optimization of system software on multi-core Loongson 3A.
Keywords:Loongson 3A  BLAS  optimization  Gemv  Ger  memory access  multi-threading
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号