首页 | 本学科首页   官方微博 | 高级检索  
     


AMT: asynchronous in-place matrix transpose mechanism for sunway many-core processor
Authors:Chen  Zhengbo  Wang  Di  Yu  Qi  Zheng  Fang  Guo  Feng  Chen  Zuoning
Affiliation:1.Information Engineering University, Zhengzhou, China
;2.State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, China
;3.Chinese Academy of Engineering, Beijing, China
;
Abstract:

Matrix multiplication is widely used in a variety of application domains. When the input matrices and the product differ in the memory format, matrix transpose is required. The efficiency of matrix transpose has a non-negligible impact on performance. However, the state-of-the-art software solution and its optimizations suffer from low efficiency due to frequent interference to main pipeline and their inability to achieve parallel matrix transpose and multiplication. To address this issue, we propose AMT, an asynchronous and in-place matrix transpose mechanism based on C2R algorithm, to efficiently perform matrix transpose. AMT performs matrix transpose in an asynchronous processing module and uses two customized asynchronous matrix transpose instructions to facilitate processing. We implement the logic design of AMT using RTL and verify its correctness. Simulation results show that AMT achieves an average of 1.27x (up to 1.48x) speedup over a state-of-the-art software baseline, and is within 95.4% of an ideal method. Overhead analysis shows that AMT only incurs small area overhead and power consumption.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号