首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 208 毫秒
1.
VLIW DSP通过软件流水获得时间并行性,通过指令分簇获得空间并行性.指令的分簇本质上是资源分配问题.传统的指令分簇假设一条指令分到某一簇执行,而某些体系结构提供SIMD指令,传统的分簇算法对这类体系结构并不完全适用.提出的基于评估模型的分簇算法能对SIMD指令和普通指令进行合理的分簇.分簇之后,通过调度簇间传输指令,合成适当的簇间双字传输指令.由于SIMD和簇间双字传输的引入,以及较好的分簇决策,程序整体的调度延迟变短.对许多数字信号处理程序相对于没分簇的情况下的性能有2~3倍的性能提升,相对寄存器压力分簇算法有约7~10%性能的提升.  相似文献   

2.
一种基于寄存器压力的VLIW DSP分簇算法   总被引:1,自引:0,他引:1  
寄存器是程序运行时最宝贵的资源之一,软件流水在对VLIW DSP指令调度的同时,会显著增加寄存器的压力,从而导致寄存器溢出,软件流水中止。在以往的研究中,软件流水之前的指令分簇会更多地考虑指令并行性,往往会把寄存器的压力交给寄存器分配阶段,当物理寄存器不够分配时会造成寄存器溢出。通过考察指令运行时的寄存器压力情况对指令进行分簇,这样可根据各个簇的寄存器压力的动态信息减少寄存器的溢出,提高指令运行效率。  相似文献   

3.
田祖伟  孙光 《计算机科学》2010,37(5):130-133
程序中大量分支指令的存在,严重制约了体系结构和编译器开发并行性的能力。有效发掘指令级并行性的一个主要挑战是要克服分支指令带来的限制。利用谓词执行可有效地删除分支,将分支指令转换为谓词代码,从而扩大了指令调度的范围并且删除了分支误测带来的性能损失。阐述了基于谓词代码的指令调度、软件流水、寄存器分配、指令归并等编译优化技术。设计并实现了一个基于谓词代码的指令调度算法。实验表明,对谓词代码进行编译优化,能有效提高指令并行度,缩短代码执行时间,提高程序性能。  相似文献   

4.
虚拟寄存器结构   总被引:3,自引:1,他引:2  
廖恒  李三立 《计算机学报》1996,19(11):801-809
虚拟存会器概念在名已经沿用近30年,鉴于面向寄存器的RISC结构的迅速发展以及寄存器对指令级并行性的重要性,本文首先提出了虚拟寄存器的新概念。虚拟寄存器结构是指令级并行调度和发射Trace Merging算法在处理机体系结构上的一种实现方法。  相似文献   

5.
分簇结构超长指令字DSP编译器的设计与实现   总被引:5,自引:0,他引:5  
超长指令字(VLIW)是高端DSP普遍采用的体系结构。VLIW DSP在硬件上没有调度和冲突判决的机制,其性能的发挥完全依靠编译嚣的优化效果.基于可重定向编译基础设施IMPACT,为分簇VLIW DSP YHFT—D4设计与实现了优化编译器.其中着重讨论了可重定向信息的定义、代码注释、SIMD指令的支持、分簇寄存器分配以度指令级并行开发和资源冲突解决等内容.实验结果表明该编译器可以达到较好的优化效果.  相似文献   

6.
王显著  李三立  黄震春 《计算机学报》1998,21(12):1112-1118
本文讨论了开发Java处理器的指令级并行性的策略,提出采用虚拟寄存器技术的Java处理器(VRJP)结构,并给出了判断相关性和管理虚拟寄存器的方法。分析和实验表明,VRJP能够有效地开发Java的指令级并行性,提高Java程序的执行效率。在VRJP中,大多数虚拟寄存器都不需要对应的物理寄存器,大大降低了物理寄存器的访问频率。  相似文献   

7.
编译器提高程序并行性的主要障碍是:频繁的控制转移和模棱两可的内存访问。推断和推测是vliw处理器体系结构的新特点,为了消除分支或访存对指令级并行性识别的影响。指令调度是编译器挖掘程序指令级并行性的关键技术之一,本文论述了如何在指令调度中有效地利用推断和推测技术,提高程序的性能。  相似文献   

8.
密码专用处理器常采用分簇式超长指令字(Very Long Instruction Word, VLIW)架构,其性能的发挥依赖于编译器的实现.当前对于通用VLIW架构的编译后端优化方案,在密码专用处理器上都有一定的不适应性.为此,本文提出了一种面向密码专用处理器的、同时进行簇指派、指令调度和寄存器分配的编译器后端优化方法.构造“定值-引用”链,求解变量的候选寄存器类型集合交集,确定其寄存器类型;实时评估可用资源,进行基于优先级的指令选择和基于平衡寄存器压力的簇指派;改进线性扫描算法,基于变量的“待引用次数”列表进行实时的寄存器分配.实验结果表明,本方法能够提升生成代码的性能,且算法是非启发式的,减小了编译所需的时间.  相似文献   

9.
超长指令字(Very Long Instruction Word,VLIW)处理器一般采用总线互连的多簇结构,每个簇中的功能单元共享一个本地寄存器堆,簇间采用总线传输数据,以避免功能单元增多时,全连通结构的延时、面积和功耗的快速增长;但簇间数据共享时的拷贝和延时,使得处理器在性能上有所下降.文中提出了一种寄存器堆互连的多簇VLIW结构,采用寄存器堆来连接各个簇,从而可以避免簇间数据传输的延时和额外的数据拷贝操作.同时也提出了针对这种结构的指令调度算法,以提高指令调度的性能.实验结果表明,与全连通的VLIW结构相比,寄存器堆互连结构在性能上仅有13%左右的性能下降,代码长度则基本不变;这都优于总线互连的多簇结构.  相似文献   

10.
所谓指令级并行性又称细拉度并行,主要是相对粗拉度并行而言的,后者是指存在于程序(主要是进程或线程间)的并行性。顾名思义,指令级并行是指存在于指令一级即指令间的并行性主要是指  相似文献   

11.
为提高大数据平台下大规模图例的最大团问题求解效率,提出一种基于并行约束规划的最大团识别算法.通过BMT图划分策略将一个复杂图例分割为若干个可独立计算的子图,并将其分配给Spark集群中的计算节点,每个计算节点采用约束规划方法对分割产生的子问题分别进行建模和求解,实现最大团问题的并行化处理.引入时间预测模型,设计基于任务运行时间预测模型的并行图划分方法,从而有效解决计算节点的负载均衡问题.实验结果表明,与基于BMC图划分策略的最大团并行识别算法相比,该算法具有更高的求解效率,可取得近似线性的加速比.  相似文献   

12.
An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connects two kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intra-node layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, a special collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient.  相似文献   

13.
多簇超长指令字处理器的簇间数据传输会将导致处理器性能下降。该文针对寄存器堆互连的多簇超长指令字(RFCC-VLIW)结构,提出一种新的二维力量引导调度算法,其力量表达式为以周期和簇为自变量的二维力量。实验结果表明,以RFCC-VLIW结构为目标,该二维力量引导调度算法优于现有的其他多簇超长指令字处理器的调度算法。  相似文献   

14.
The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural registers. The motivation for the multicluster architecture is to reduce the clock cycle time, relative to a single-cluster architecture with the same number of hardware resources, by reducing the size and complexity of components on critical timing paths. Resource partitioning, however, introduces instruction-execution overhead and may reduce the number of concurrently executing instructions. To counter these two negative by-products of partitioning, we developed a static instruction scheduling algorithm. We describe this algorithm, and using trace-driven simulations of SPEC92 benchmarks, evaluate its effectiveness. This evaluation indicates that for the configurations considered, the multicluster architecture may have significant performance advantages at feature sizes below 0.35 m, and warrants further investigation.  相似文献   

15.
超长指令字技术   总被引:2,自引:0,他引:2  
指令系统是决定计算机体系结构特征的最核心因素,本文首先对近几年发展起来的超长指令字技术的基本原理进行了科短的介绍,进而Intel公司研制的IA-64超长指令字计算机的设计描述了超长指令字计算机的性能特点,最后简述VLIW技术对计算机技术结构发展的影响及其最近发展概况。  相似文献   

16.
NoSQL databases are famed for the characteristics of high scalability, high availability, and high fault-tolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. Themethod uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithmis used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.  相似文献   

17.
针对当下数据大规模增长对计算能力需求的急剧增长,传统独立运行的机器在大规模网络社区中执行社区检测操作时无法提供所需的数据处理能力的问题,提出一种网络加权Voronoi图的并行分散迭代社区聚类法(NWVD-PDICCM)。利用基于网络加权Voronoi图的分散迭代社区聚类方法(NWVD-DICCM)提取大型网络的有效社区结构。结合并行聚类方法,将DICCM方法的操作从串行过程转换为并行计算。利用执行并行社区聚类时的图分区,通过最小化从属工作者之间的通信来加速该过程。仿真实验结果表明,NWVD-PDICCM可以与一系列计算机架构平台共同运行,并且实现基于Spark平台的并行操作,相比其他几种较新的方法,在大规模网络数据处理能力方面得到显著提升。  相似文献   

18.
李辉楷  韩军  翁新钎  贺中柱  曾晓洋 《计算机工程》2012,38(23):240-242,246
针对AES与SHA-3候选算法中Gr?stl软件运算速度慢的问题,提出一种通过精简指令集计算机(RISC)协处理器来加速算法运算的设计方案。该协处理器复用片上高速缓存充当查找表来加速运算,并在RISC处理器的基本指令集架构中增加特殊指令。实验结果表明,与传统基于并行查找表的方案相比,该方案能够以较小的硬件代价加速AES与Gr?stl运算。  相似文献   

19.
The recent rapid development in information systems (ISs) has resulted in a critical need for integration and interoperability between heterogeneous ISs in various domains, using specific commonalities. However, stovepipe systems have been caused due to inconsistencies in planning IS architecture among stakeholders. So far, there has been no research on an enterprise architecture framework (EAF) that can satisfy with the coefficient factors of system architecture (SA) and enterprise architecture (EA). This paper proposes a new EAF that can resolve the problems inherent in existing legacy EAFs and their features. EAFoC (Enterprise Architecture Framework based on Commonality) is based on commonality that can be satisfied as the coefficient factors in both SA and EA within a common information technology (IT) domain. Thus, it should be possible to integrate an established heterogeneous framework for each stakeholder's view. Consequently, the most important contribution of this paper is to establish the appropriate EAFoC for the development of consistent IS architecture, smooth communication among stakeholders, systematic integration management of diversified and complicated new IT technologies, interoperability among heterogeneous ISs, and reusability based on commonality with other platforms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号