首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
指令控制流水线是在通用EPIC处理器内部专门为指令控制系统设计的一条与执行流水线相互锁步的流水线,用于携带共用信息和全局控制信息.提出了一种在通用EPIC微处理器设计中采用的指令控制流水线技术,介绍了指令控制流水线的具体设计与实现方法.实际应用表明,指令控制流水线技术能够有效降低EPIC微处理器的设计复杂度.  相似文献   

2.
胡正伟  仲顺安  陈禾 《计算机工程》2007,33(21):237-239
研究了VelociTI结构浮点数字信号处理器寄存器堆的流水线读写原理并提出了一种设计方法。该方法对单操作数双精度浮点指令采用2个32位数据通路用1个流水线周期读取源操作数,双操作数双精度浮点指令采用锁定译码单元,利用若干流水线周期读取源操作数。采用写控制向量的方法实现了流水线多个周期执行写操作。该方法正确实现了基于IEEE754标准的双精度浮点数据在寄存器堆与功能单元之间的32位数据通路上的传输,仿真结果验证了其正确性。  相似文献   

3.
本文研究设计了一种系统化流水线控制方法及其递推理论公式。基于该方法,在0.18um CMOS工艺条件下,实现系统时钟为200MHz的兼容于ARMv4指令集微处理器,指令执行效率最低为150 MIPS。每一级流水线本质为本级流水线是否保存指令执行结果、本级流水线指令是否执行下一级流水线、是否自锁执行本级流水线。根据这一本质,只需单独考虑每一类指令的执行过程,而无需考虑指令之间的复杂关系,使系统设计简单化,其优点在于对流水线深度不敏感,控制结构类似,适合于高深度、最优深度流水线设计,解决流水线控制逻辑的设计难度,提高微处理器指令执行效率。  相似文献   

4.
YHFT-DX是国防科技大学设计的一款高性能定点DSP。论文设计并实现了YHFT-DX指令控制流水线,提出了在YHFT-DX 超长指令字结构中跨取指包边界派发和指令预取的方法,有效提升了流水线的性能。对指令流水线进行了高频结构优化,将派发部件的关键路径延时压缩40%,满足了600 MHz频率的设计目标。  相似文献   

5.
基于FPGA平台设计并实现了一种五级流水线CPU.它参考MIPS机将指令的执行过程进行抽象,把指令分成取值、译码、执行、访存、写回五级流水处理.首先设计系统级的结构,决定CPU的结构和指令系统.其次对整体结构进行分解,确定模块与模块之间的信号连接,采用VHDL实现CPU.最后通过Debug-controller调试软件对五级流水线CPU进行调试.结果表明了所设计的流水线CPU的有效性.  相似文献   

6.
CPU流水线技术中的结构相关和数据相关   总被引:1,自引:0,他引:1  
孙启良 《福建电脑》2010,26(7):49-50
CPU流水线技术是计算机CPU设计中普遍采用的一种并行处理技术。它可以提高指令的运行效率,但是其流水线相关问题是流水线执行过程中的主要障碍,会给流水线中指令序列的顺利执行带来许多不利的影响。流水线中的相关主要是结构相关和数据相关,本文重点介绍了他们的解决方法。其中数据相关较普遍,旁路技术是主要的解决方法。  相似文献   

7.
对MIPS指令集的流水线CPU进行了研究,提出了一种基于MIPS的流水线处理器模型机设计方案。方案设计的数据通路可以逐条添加指令,更贴近于教学实践,采用五级流水线架构,共实现52条指令,包含乘法及除法指令,对于每个流水段中的教学应用方法进行了分析。针对流水线数据与指令的相关问题,设计了专用的异常处理模块。模型机在EDA平台上进行了测试,测试结果表明,该方案符合设计要求。  相似文献   

8.
提出了基于流水线的运算部件仿真设计方法,基本思想是在流水线上实现指令从译码分配到执行这一全过程的仿真.详细介绍了流水线设计和运算部件仿真,给出了关键的数据结构和函数接口.  相似文献   

9.
曹学飞 《微处理机》2011,32(5):78-80
流水线技术的出现改变了计算机传统的顺序执行模式,通过指令的重叠执行,提高了指令执行的并行性,极大地提高了计算机系统的性能。针对微处理器流水线设计中的一些难点,通过分析流水线性能,给出了流水线设计中需要考虑的两个关键技术:流水线最佳级数的选取方法和流水线的划分,并简要介绍了时钟分配问题。  相似文献   

10.
针对椭圆曲线密码算法中有限域模乘运算的需求,提出其专用模乘指令。利用指令域中的组参数实现算法多组模乘运算,通过对参数进行配置,使指令支持运算长度拓展,在模乘运算单元中实现Montgomery模乘算法,并设计素域和二进制域统一的硬件流水线,以及双域乘法器单元结构。实验结果表明,该有限域模乘指令和硬件运算单元具有较高的执行效率和较好的灵活性。  相似文献   

11.
李勇  胡慧俐  杨焕荣 《计算机应用》2014,34(4):1005-1009
数字信号处理软件中循环程序在执行时间上占有很大比例,用指令缓冲器暂存循环代码可以减少程序存储器的访问次数,提高处理器性能。在VLIW处理器指令流水线中增加一个支持循环指令的缓冲器,该缓冲器能够缓存循环程序指令,并以软件流水的形式向功能部件派发循环程序指令。这样循环程序代码只需访存一次而执行多次,大大减少了访存次数。在循环指令运行期间,缓冲器发出信号使程序存储器进入睡眠状态可以降低处理器功耗。典型的应用程序测试表明,使用了循环缓冲后,取指流水线空闲率可达90%以上,处理器整体性能提高10%左右,而循环缓冲的硬件面积开销大约占取指流水线的9%。  相似文献   

12.
YHFT-D4是一款具有分簇的VLIW体系结构的DSP,它有多个功能单元,可在单个时钟周期并行地执行多条指令。指令执行的功能单元是哪个,哪些指令并行执行,这些由编译器或程序员静态决定,文章给出了YHFT-D4汇编器的设计和实现方法。  相似文献   

13.
The popularity of multimedia applications made them a major theme in embedded systems. The key component for supporting multimedia application well is embedded processor. Thus, we have designed and implemented an embedded processor, called UniDual processor, to achieve this objective. Its key features are the integration of instructions of reduced instruction set computers (RISCs) and digital signal processors (DSPs) as well as the support of special instruction set and shared‐based clustered register architecture. However, an important issue of UniDual that remains open is how to efficiently allocate registers. In this paper, we present a scheduling and instruction transformation approach to resolve the aforementioned issue. The proposed approach schedules instructions and then transforms overlapped instructions into RISC and DSP instructions by taking communication overhead and hardware limitations into account. Compared with the greedy approach, the evaluation shows that our work is relatively effective in performance and code size reduction. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
基于VelociTI体系结构的DSP指令分配的实现   总被引:1,自引:0,他引:1  
在设计基于VelociTI体系结构的数字信号处理器过程中,为了高速实现并行指令的分配,提出了一种基于该体系结构的指令分配方法:排序法。该方法结合决策树原理实现取指包指令并行性测试,并将处理器的功能单元按照一个规定的顺序排列,使每一个功能单元与执行包的某一个字段对应,将执行包中的指令根据译码的结果和功能单元的顺序进行重新排序,从而完成指令到功能单元的分配。仿真结果证明该方法是十分有效的。  相似文献   

15.
An Energy-Efficient Processor Architecture for Embedded Systems   总被引:1,自引:0,他引:1  
We present an efficient programmable architecture for compute-intensive embedded applications. The processor architecture uses instruction registers to reduce the cost of delivering instructions, and a hierarchical and distributed data register organization to deliver data. Instruction registers capture instruction reuse and locality in inexpensive storage structures that are located near to the functional units. The data register organization captures reuse and locality in different levels of the hierarchy to reduce the cost of delivering data. Exposed communication resources eliminate pipeline registers and control logic, and allow the compiler to schedule efficient instruction and data movement. The architecture keeps a significant fraction of instruction and data bandwidth local to the functional units, which reduces the cost of supplying instructions and data to large numbers of functional units. This architecture achieves an energy efficiency that is 23× greater than an embedded RISC processor.  相似文献   

16.
针对超标量深流水线中物理寄存器资源冲突造成的流水线阻塞问题,提出了一种多指令共享同一物理寄存器资源的非阻塞指令发射方法。该方法可在物理寄存器资源冲突下继续分配物理寄存器,利用发射缓冲队列临时缓冲冲突的指令,增加发射流水级实际可分配的物理寄存器数量,释放发射窗口,提高物理寄存器使用的并行性。实验结果表明:相对于传统重命名方法,该方法可减少27.3%的物理寄存器资源实现传统方法相同的性能。  相似文献   

17.
The high speed needed in solving digital signal processing problems in real time has often given rise to multiple processor hardware designs. Devices such as the TMS32020 digital signal processor possess features designed to support concurrent processing. Progress in this area is currently hampered by the lack of suitable multiprocessor development tools. It is suggested that an incremental approach to multiprocessor development, using several methods of simulating the signal processor, may be used. Two simulation environments specifically for the development and testing of multiple digital signal processor designs are described. Firstly a single processor simulation system where the algorithms which will be performed by other concurrent processors may be executed in a high level language but without any need to simulate the instructions of the other processors. Secondly a multiple TMS32020 digital signal processor system where processors are simulated as several communicating tasks on a host computer using the IBM AIX (UNIX derived) multitasking operating system.  相似文献   

18.
In a superscalar processor, instructions of various types flow through an execution pipeline, traversing hardware resources which are mostly shared among many different instruction types. A notable exception to shared pipeline resources is the collection of functional units, the hardware that performs specific computations. In a trade-off of cost versus performance, a pipeline designer must decide how many of each type of functional unit to place in a processor’s pipeline. In this paper, we model a superscalar processor’s issue queue and functional units as a novel queuing network. We treat the issue queue as a finite-sized waiting area and the functional units as servers. In addition to common queuing problems, customers of the network share the queue but wait for specific servers to become ready (e.g., addition instructions wait for adders). Furthermore, the customers in this queue are not necessary ready for service, since instructions may be waiting for operands. In this paper we model a novel queuing network that provides a solution to the expected queue length of each type of instruction. This network and its solution can also be generalized to other problems, notably other resource-allocation issues that arise in superscalar pipelines.  相似文献   

19.
针对嵌入式控制与数字信号处理混合应用领域,建立了一种基于MCU-DSP融合架构处理器的Load先行机制.该内核使用静态超标量技术,拥有整数、存取、循环三条流水线,并采用特殊的四级流水.在存取流水线中,Load先行机制通过动态调度指令的访存顺序,实现了Load指令对Store指令的先行,提前了整数流水线中运算操作数的准备,加快了流水线的处理速度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号