基于高预测性的稀疏矩阵向量乘法并行计算优化 Regularizing graph centrality computations期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于高预测性的稀疏矩阵向量乘法并行计算优化

引用本文：	夏天, 付格林, 曲劭儒, 罗中沛, 任鹏举. 基于高预测性的稀疏矩阵向量乘法并行计算优化[J]. 计算机研究与发展, 2023, 60(9): 1973-1987. DOI: 10.7544/issn1000-1239.202330421

作者姓名：	夏天付格林曲劭儒罗中沛任鹏举

作者单位：	1.人机混合增强智能全国重点实验室（西安交通大学）　西安　710049;2.视觉信息与应用国家工程研究中心（西安交通大学）　西安　710049;3.西安交通大学人工智能与机器人研究所　西安　710049

基金项目：	国家重点研发计划项目（2022YFB4500500）；陕西省重点研发计划项目（2022ZDLGY01-08）

摘要：	稀疏矩阵向量乘法(sparse matrix-vector multiplication, SpMV)是广泛应用于科学计算、工业仿真和智能计算等领域的重要算法，是核心的计算行为之一. 在一些应用场景中，需要进行多次的SpMV迭代，以完成精确的数值模拟、线性代数求解和图分析收敛等计算要求. 受限于SpMV本身的高度随机性和稀疏性所导致的数据局部性极差、缓存效率极低、计算模式非常不规则等问题，导致其计算负载成为当前高性能处理器的优化难点和研究热点. 基于现代高性能超标量乱序处理器的架构特征，深入研究SpMV的各类性能瓶颈，并且提出从提升可预测性和降低程序复杂度的角度进行全面的性能优化. 其核心思想是：通过构建串行访问的数据结构，提升数据访问的规律性和局部性，大幅度优化数据预取效率和缓存利用效率；通过构建规则的分支跳转条件，提升程序的分支预测准确率，有效提升程序执行效率；通过灵活运用SIMD指令集，有效提升计算资源利用率. 通过对以上特性的优化，该方法可以显著缓解性能瓶颈，大幅度提升处理器资源、缓存资源和访存带宽的利用率，并且获得与主流商用计算库MKL相比平均2.6倍的加速比，相比于现有最先进算法获得平均1.3倍的加速比.
关键词：	矩阵向量乘法稀疏矩阵计算矩阵格式分支预测数据预取
收稿时间：	2023-05-31
修稿时间：	2023-07-25
Regularizing graph centrality computations

Xia Tian, Fu Gelin, Qu Shaoru, Luo Zhongpei, Ren Pengju. Optimization of Parallel Computation on Sparse Matrix-Vector Multiplication with High Predictability[J]. Journal of Computer Research and Development, 2023, 60(9): 1973-1987. DOI: 10.7544/issn1000-1239.202330421

Authors:	Xia Tian Fu Gelin Qu Shaoru Luo Zhongpei Ren Pengju

Affiliation:	1.National Key Laboratory of Human-Machine Hybrid Augmented Intelligence(Xi’an Jiaotong University), Xi’an 710049;2.National Engineering Research Center of Visual Information and Applications(Xi’an Jiaotong University), Xi’an 710049;3.Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049

Abstract:	Sparse matrix-vector multiplication (SpMV) has been widely applied in scientific computation, industry simulation and intelligent computation domains, which is the critical algorithm in all these applications. Usually, iterative computation of SpMV is required to fulfill precise numeric simulation, linear algebra solving and graph analytics requirements. However, due to the poor data locality, low cache usage and extreme irregular computation patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose to improve its performance by pursuing high predictability and low program complexity. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve the pipeline efficiency by creating regular branch patterns to make the branch prediction more accurate. Meanwhile, we flexibly lever the SIMD instructions to optimize the parallel execution and fully use CPU’s computation resources. Experimental results show that using the above optimization approaches, our SpMV kernel is effective to significantly alleviate the critical bottlenecks and improve the efficiency of CPU pipeline, cache and memory bandwidth usage. The resulting performance achieves average 2.6 times speedup against Intel’s commercial library of MKL, as well as average 1.3 times speedup against the existing best SpMV algorithm.

Keywords:	matrix-vector multiplication sparse matrix computation matrix format branch prediction data prefetching

	点击此处可从《计算机研究与发展》浏览原始摘要信息
	点击此处可从《计算机研究与发展》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏