首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
随着RISC-V指令集的流行,出现了一批应用于IoT智能硬件、嵌入式系统、人工智能芯片、安全设备及高性能计算等不同领域的开源和商业IP软核。性能、功耗和面积三者之间的平衡需要指令集可裁剪、易扩展,以及软件开发环境的配套支持。为此,按照增加自定义指令、扩展ALU功能单元、连接控制信号和数据通路、FPGA原型验证、定制交叉编译环境和应用程序测试的流程,基于FPGA快速实现了定制化RISC-V处理器。以加速矩阵运算为例,基于FPGA在开源IP蜂鸟E203上设计了一条计算向量内积的自定义指令,并在FPGA上进行了原型验证。应用测试程序表明,定制化的RISC-V处理器的计算性能有显著提升,矩阵乘法运算的性能加速比达到了5.3~7.6。  相似文献   

2.
刘畅  武延军  吴敬征  赵琛 《软件学报》2021,32(12):3992-4024
指令集作为软硬件之间的接口规范,是信息技术生态的起始原点.RISC-V是计算机体系结构走向开放的必然产物,其出现为系统研究领域带来了新的思路,即系统软件问题的研究深度可以进一步向下延伸至指令集架构,从而拓展甚至颠覆软件领域的“全栈”概念.对近年来RISC-V指令集架构相关的研究成果进行了综述.首先介绍了RISC-V指令集的发展现状,指出开展RISC-V研究应重点关注的指令集范围.然后分析了RISC-V处理器设计要点和适用范围.同时,围绕RISC-V系统设计问题,从指令集、功能实现、性能提升、安全策略这4个方面,论述了RISC-V处理器基本的研究思路,并分析了近年来的研究成果.最后借助具体的研究案例,阐述了RISC-V在领域应用的价值,并展望了RISC-V架构后续研究的可能切入点和未来发展方向.  相似文献   

3.
musl libc是一个轻量级的标准C库,其代码库小巧,提供了全面的POSIX接口支持,具有高度可移植性并支持多种架构和操作系统,被广泛用于嵌入式系统、网络服务器、容器等领域. RISC-V指令集作为一种开源的指令集,目前发布了相对稳定的SIMD指令集, RISC-V生态软件环境也迎来了新的优化热潮,但是对于musl libc库RVV扩展优化还是一片空白.本文立足于musl libc基础库和RISC-V RVV扩展指令集的协同研究点,提出了兼容基础指令集和向量扩展指令集的实现方案,利用向量扩展指令集优化了常见的C库函数strlen和memset,并在gem5模拟器上进行了对比分析,实验结果表明,相较于C语言实现,在性能方面,利用RVV优化的strlen函数平均提升83%–703%, memset函数平均提升85%–334%.  相似文献   

4.
RISC-V指令集架构具有永久开源、指令集精简且高效、处理器微架构模块化、架构扩展性强等特点,在云计算、边缘计算、车载智能计算等领域的应用日渐广泛,其向量扩展部件可以大幅度提高计算机的运算效率,减少不必要的硬件开销。随着处理器运算能力增强和寄存器位数扩展等硬件的进一步发展,向量部件已成为处理器芯片架构中的常用技术,可用来增强处理器性能。向量控制模块是向量部件的核心控制单元,具有时序关系复杂、规范难以描述等特点。本文针对向量控制模块特点,优化设计验证流程,构建高效率验证平台,以功能覆盖率和代码覆盖率为牵引量化验证进度。通过RISC-V向量控制模块验证,有效提升向量控制模块的可靠性,降低流片风险,减轻子系统级验证和系统级验证负担,使之专注于互联、交互响应和接口验证。  相似文献   

5.
RISC-V指令集架构具有模块化、可扩展等特性.基于RISC-V架构的处理器,可以在整数指令集的基础之上,有选择地支持官方标准指令集扩展,以及非标准的用户自定义指令集扩展.这也意味着,对于每个新增的自定义扩展指令集,用户都需要自己在编译工具链中实现相应支持.通过分析LLVM编译框架,研究RISC-V自定义扩展指令支持的通用方法,并以玄铁C910自定义指令集为例进行实现和验证.为基于LLVM基础架构的RISC-V自定义指令集扩展研究与实现提供借鉴.  相似文献   

6.
随着在云计算领域得到广泛的应用和关注,集群容器编排管理平台Kubernetes已广泛应用于容器化应用服务的自动部署和发布、应用弹性扩展和回滚更新、故障检测和自我修复等服务场景.第5代精简指令集计算机(fifth-generation reduced instruction-set computer, RISC-V)具有精简化、模块化、可扩展和开源4大技术特点和优势,已经得到学术界和工业界的广泛关注.本文立足于Kubernetes生态和RISC-V生态的协同研究点,为Kubernetes调度器提供异构指令集架构(instruction set architecture, ISA)的云服务任务调度支持.本文通过对生产环境中RISC-V指令集架构的各类计算任务需求进行了量化分析,发现现有的集群容器编排平台Kubernetes不具备调度RISC-V指令集架构的计算任务的能力,尤其是其调度算法无法利用RISC-V用户自定义的可扩展指令集架构特性提供高性能的可靠服务.为解决上述问题,本文提出了一种创建时调度的ISAMatch模型,综合考虑指令集亲和性、同种指令集架构节点数量和节点资源利用率等多个方...  相似文献   

7.
开源指令集RISC-V自2011年推出,至今已有10个年头.作为一个新兴的指令集架构,其发展势头非常迅猛,受到产业界和学术界的广泛关注.RISC-V的崛起,给体系结构、系统软件等领域带来了新的机遇和挑战.RISC-V指令集架构的开放性、模块化、高度可定制的特点也使其成为体系结构和系统软件创新的理想实验平台.  相似文献   

8.
RISC-V是一种新的指令集架构,发布以来得到了大量关注,在描述了RISC-V的产生背景、基本设计的基础上,简单比较了其与现有的开源指令集架构、商业指令集架构的优劣,然后详细介绍了现有的采用RISC-V架构的开源处理器、开源SoC,并展望了RISC-V的未来发展.  相似文献   

9.
RISC-V是基于精简指令集原理建立的免费开放指令集架构, 具有完全开源、架构简单、易于移植、模块化设计等特点. 随着网络高速发展, 安全风险无处不在, 利用RISC-V的可扩展特性是一种非常有效地提升RISC-V设备安全的方式. 因此, 本文针对RISC-V自定义指令的安全能力, 结合可信计算、流密码技术, 设计了简单高效的RISC-V自定义指令, 实现基于可信基的数据安全存储功能, 并依托GNU编译工具链实现对自定义指令的编译支持, 在模拟器上测试应用程序对自定义指令的调用执行. 该指令充分结合可信计算与流密码的安全特性, 可实现较强的安全性.  相似文献   

10.
随着通信、芯片等技术的不断发展,以及现在提出的万物互联的概念,物联网将迎来一个大的发展;其中 IoT 终端设备的研究是重中之重.应用于 IoT的终端设备不仅需要在几 mW的功率范围内工作,而且需要灵活的计算能力.这就要求应用于 IoT 终端设备的处理器能实现更高的能效比.本文设计了一款基于 RISC-V 指令集的微控制器,首先详细介绍了该RISC-V 微控制器的微结构、存储子系统和RISC-V 指令集架构;最后在VCS 验证环境中验证了该微控制器的逻辑功能.  相似文献   

11.
稀疏矩阵与向量相乘SpMV是求解稀疏线性系统中的一个重要问题,但是由于非零元素的稀疏性,计算密度较低,造成计算效率不高。针对稀疏矩阵存在的一些不规则性,利用混合存储格式来进行SpMV计算,能够提高对稀疏矩阵的压缩效率,并扩大其适应范围。HYB是一种广泛使用的混合压缩格式,其性能较为稳定。而随着GPU并行计算得到普遍应用以及CPU日趋多核化,因此利用GPU和多核CPU构建异构并行计算系统得到了普遍的认可。针对稀疏矩阵的HYB存储格式中的ELL和COO存储特征,把两部分数据分别分割到CPU和GPU进行协同并行计算,既能充分利用CPU和GPU的计算资源,又能够发挥CPU和GPU的计算特性,从而提高了计算资源的利用效能。在分析CPU+GPU异构计算模式的特征的基础上,对混合格式的数据分割和共享方面进行优化,能够较好地发挥在异构计算环境的优势,提高计算性能。  相似文献   

12.
In this work, efficient algorithms for sparse computations (reordering algorithms, storage schemes, symbolic factorization, master degree-of-freedom, L1D1U numerical factorization, forward and backward solutions) are developed and integrated into the proposed procedures. In order to exploit fast saxpy operations offered by many vector computers, take advantage of available cache in many workstations, and minimize data movements into fast memory, special storage schemes are designed to store the coefficient (unsymmetrical) matrix. Thus, the upper triangular portion of the coefficient matrix is stored in a compressed sparse row format, while the lower triangular portion of the same matrix is stored in a compressed sparse column format. A reordering algorithm is applied on one portion of the matrix to minimize fill-ins. Unsymmetrical matrix–vector multiplication has also been sparsely computed for “error-norm check” purpose. The entire sparse procedures have been coded in standard Fortran language.  相似文献   

13.
Sparse matrix–vector multiplication (SpMV) is one of the most important high level operations for basic linear algebra. Nowadays, the GPU has evolved into a highly parallel coprocessor which is suited to compute-intensive, highly parallel computation. Achieving high performance of SpMV on GPUs is relatively challenging, especially when the matrix has no specific structure. For these general sparse matrices, a new data structure based on the bisection ELLPACK format, BiELL, is designed to realize the load balance better, and thus improve the performance of the SpMV. Besides, based on the same idea of JAD format, the BiJAD format can be obtained. Experimental results on various matrices show that the BiELL and BiJAD formats perform better than other similar formats, especially when the number of non-zero elements per row varies a lot.  相似文献   

14.
Many high performance computing applications require computing both sparse matrix‐vector product (SMVP) and sparse matrix‐transpose vector product (SMTVP) for better overall performance. Under such a circumstance, it is critical to maintain a similarly high throughput for these two computing patterns with the underlying sparse matrix encoded in a single storage format. The compressed sparse block (CSB) format proposed by Buluç et al. allows computing both problems on multi‐core CPUs with nearly identical throughputs. On the other hand, a direct porting of CSB to graphics processing units (GPUs), which have been recently recognized as a powerful general purpose computing platform, turns out to be inefficient. In this work, we propose a new data structure, designated as expanded CSB (eCSB), to minimize the throughput gap between SMVP and SMTVP computations on GPUs, while at the same time enable a high computing throughput. We also use a hybrid storage format to store elements in each block, which can be selected dynamically at runtime. Experimental results show that the proposed techniques implemented on a Kepler GPU delivers similar throughput on both SMVP and SMTVP and the throughput is up to 13 times faster than that of the CPU‐based CSB implementation. In addition, our eCSB procedure outperforms the previous GPU results by up to 188% and 914% in computing SMVP and SMTVP, and we validate the effectiveness of eCSB by means of wall‐clock time of bi‐conjugate gradient algorithm; our eCSB is 25% faster than Compressed Sparse Rows (CSR) and 6% faster than HYB, respectively. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

15.
The irregular nature of sparse matrix-vector multiplication, Ax=y, has led to the development of a variety of compressed storage formats, which are widely used because they do not store any unnecessary elements. One of these methods, the Jagged Diagonal Storage format (JDS) is, in addition, considered appropriate for the implementation of iterative methods on parallel and vector processors. In this work we present the Transpose Jagged Diagonal Storage format (TJDS) which drew inspiration from the Jagged Diagonal Storage scheme but requires less storage space than JDS. We propose an alternative storage scheme which makes no assumptions about the sparsity pattern of the matrix and only needs three linear arrays instead of the four linear arrays required by JDS. Specifically, the data is aligned in such a way that the permutation array used in JDS, to permute the solution vector back to the original ordering, is unnecessary. This allow us to save the memory space required to store an integer vector of length n, where n stands for the number of columns in the sparse matrix A. This storage saving reaches, for the selection of matrices used in this work, from 14% up to 45% of the number of non-zero values of the sparse matrices. We present a case study of a 6×6 sparse matrix to show the data structures and the algorithm to compute Ax=y using the TJDS format.  相似文献   

16.
尹孟嘉  许先斌  何水兵  胡婧  叶从欢  张涛 《计算机科学》2017,44(4):182-187, 206
稀疏矩阵向量乘(Sparse matrix-vector multiplication,SPMV)是广泛应用于大规模线性求解系统和求解矩阵特征值等问题的基本运算,但在迭代处理过程中它也常常成为处理的瓶颈,影响算法的整体性能。对于不同形态的矩阵,选择不同的存储格式 ,对应的算法往往会产生较大的性能影响。通过实验分析,找到各种矩阵形态在不同存储结构下体现的性能变化特征,构建一个有效的性能度量模型,为评估稀疏矩阵运算开销、合理选择存储格式做出有效的指导。在14组CSR,COO,HYB格式和8组ELL格式的测试用例下,性能预测模型和测量之间的差异低于9%。  相似文献   

17.
稀疏矩阵是指那些多数元素为零的矩阵。本文利用稀疏矩阵"稀疏"特点进行存储和计算可以大大节省存储空间,提高计算效率。通过采用标准C 语言设计实现了稀疏矩阵乘法运算器。  相似文献   

18.
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. General purpose GPU (GPGPU) provides high computing ability and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more efficiently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we propose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU memory bandwidth and the performance of the GPU.  相似文献   

19.
稀疏矩阵是指那些多数元素为零的矩阵。本文利用稀疏矩阵“稀疏”特点进行存储和计算可以大大节省存储空间,提高计算效率。通过采用标准C++语言设计实现了稀疏矩阵乘法运算器。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号