首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
一种RISC地址产生器生成算法的设计与实例化   总被引:1,自引:0,他引:1  
提高功能部件的并行性是开发高性能微处理器的基本途径。在RISC处理器中设计独立的地址产生器可实现算术运算与地址运算并行处理,从而提高RISC处理器的性能。文中根据现今RISC处理器中常用的寻址方式,提出了一种RISC地址产生器生成算法并进行了实例化。实例化结果可作为IP核应用到RISC处理器的设计中。  相似文献   

2.
DSP处理器面向数字信号处理领域,具有高度的实时性要求。论文设计了一种能够满足DSP处理器特殊寻址方式的地址产生单元,同时支持并行指令的执行,实现了算术运算与地址运算的并行处理,有效地提高了数字信号的处理速度。  相似文献   

3.
在微处理器运算速率的竞争较量中,由高级微器件(AMD)公司设计的Am29000无疑占了上风。Am29000流线型指令处理器采用了增强型减少指令系统(RISC)设计,被认为是目前世界上运算速率最高的32位微处理器。加州的Sunnyvale公司将这种新型微处理器  相似文献   

4.
新技术追踪     
由于仅靠精减指令集运算(RISC)处理器的软件路由器已不能满足对带宽及其他先进功能的要求,许多路由器设计人员开始考虑用全定制专用集成电路(ASIC)取代RISC处理器。 ASIC是针对特殊用途设计的芯片,具有密度高、速度快,成本低等多项优点。ASIC技术发展很快,许多过去由软件完成的功能可能移给ASIC。最新的0.25微米ASIC技术可在单一150MHz芯片上  相似文献   

5.
在国产申威高性能多核服务器系统中,基础编译系统对应用程序中访存操作进行代码生成时,没有考虑国产处理器指令特征,导致编译器生成的访存地址计算代码效率较低,影响国产高性能处理器的性能。为充分发挥国产处理器高性能计算能力,提出一种加速访存地址计算的编译优化方法。加速访存地址计算编译优化基于处理器支持带扩展因子的运算指令,在编译器后端内存地址表达式合法性检查中,添加针对乘加模式的地址计算表达式合法性检查算法,自动识别地址表达式中存在的乘加运算并进行合法性检验,对符合条件的地址表达式在代码生成阶段匹配生成带扩展因子的运算指令来快速计算访存地址,从而加快访存指令的发射与执行以及应用程序中的访存地址生成,提升访存效率。使用行业标准性能测试集SPEC CPU2006对优化效果进行评测,结果表明,相比优化前SPECspeed Integer与SPECspeed Float Point两个子集,该优化方法平均性能分别提高了2.53%与1.50%。  相似文献   

6.
李辉楷  韩军  翁新钎  贺中柱  曾晓洋 《计算机工程》2012,38(23):240-242,246
针对AES与SHA-3候选算法中Gr?stl软件运算速度慢的问题,提出一种通过精简指令集计算机(RISC)协处理器来加速算法运算的设计方案。该协处理器复用片上高速缓存充当查找表来加速运算,并在RISC处理器的基本指令集架构中增加特殊指令。实验结果表明,与传统基于并行查找表的方案相比,该方案能够以较小的硬件代价加速AES与Gr?stl运算。  相似文献   

7.
苹果电脑国际有限公司宣布推出速度达到200MHz的RISC(精简指令集运算)PowerPC 604e处理器(由摩托罗拉及IBM制造),首台多处理器桌面电脑内置两个PowerPC 604e处理器,每一处理器速度为180MHz,显著提高了处理复杂运算工作的效率。苹果电脑同时宣布推出一个独立的180MHz PowerPC 604e处理器插卡,方便个别Power Macintosh型号的用户将系统性能提升至180MHz。性能显著提升 PowerPC 604e处理器由摩托罗拉及IBM提供,时钟速度达200MHz,内部Level-1高速缓冲存储器的容量由32K增至  相似文献   

8.
为使RISC处理器平台具备检测代码重用攻击的能力,将控制流完整性机制与可信计算中的动态远程证明协议相结合,提出面向RISC处理器的硬件辅助控制流认证方案。以开源RISC处理器为基础,扩展与处理器紧耦合的硬件监控单元,同时给出控制流认证方案的证明协议,设计用于跟踪执行路径的硬件编码方法以实现信息压缩。实验结果表明,与C-FLAT方案相比,该方案传输延时小且资源消耗少,能够保证RISC处理器控制流的可信安全。  相似文献   

9.
专用指令集处理器(ASIP)结合了ASIC协处理器的高效性与通用处理器的灵活性,在信息安全领域具有广泛的应用前景.本文针对RSA/ECC密码算法,提出了一种专用指令集安全处理器的设计与VLSI实现方案.本文的ASIP基于32位RISC架构,通过采用专用的指令集和特殊的运算单元,以较小的软硬件代价实现了密码算法的高效运算.本设计采用TSMC0.25μm标准CMOS工艺综合,核心电路等效门为28K,最高时钟频率可达到150MHz,完成一次1024位RSA算法仅需200毫秒.  相似文献   

10.
申威26010高性能众核处理器在多核处理器申威1600基础上,采用片上系统(system on chip,SoC)技术,在单芯片内集成4个运算控制核心和256个运算核心,采用 自主设计的64位申威RISC(reduced instruction set computer)指令系统,支持256位 SIMD(single ...  相似文献   

11.
Segars  S. 《Micro, IEEE》1997,17(4):12-19
Portable and handheld products require processors that consume less power than those in desktop and other powered applications. As a result, designers must analyze power use in the early stage of design both at the circuit and system levels. RISC processors, such as our ARM7TDMI, have both strengths and weaknesses as far as power consumption is concerned. From a system perspective, RISC processors should consume more power than CISC processors since RISCs need to be fed with an instruction virtually every cycle. RISC processors usually have a fixed 32bit instruction format, which forces a 32-bit memory access every cycle. Thus, the processor consumes power both in accessing the memory and in driving 32 address and 32 data wires across a PCB  相似文献   

12.
针对中值滤波算法速度慢的缺点,设计了一款基于传输触发架构的专用处理器,使得中值滤波的速度得到了大幅度的提升。其中数据存取单元采用二维寻址方式,与通用处理器相比,寻址时减少了加法指令和乘法指令的使用,提高了数据存取速度;设计了专用排序功能单元,与通用处理器相比减少了比较和跳转指令的使用。仿真和验证结果表明,在图像中值滤波处理中,该处理器比传统RISC架构通用处理器的效率有较大的提高。  相似文献   

13.
介绍了一种支持32位精简指令集处理器中页式地址管理的存储管理单元(MMU)的设计与实现。该单元实现了完整的虚实地址转换功能和保护机制,支持固定映射和地址转换旁路缓冲器转换两种模式。该单元基于全定制设计方式完成设计,采用中芯国际0.18 μm工艺。两次流片后的测试结果表明,采用此设计的32位微处理器可以正常地完成所定义的各类存储管理功能,可以正常地启动和运行Linux操作系统。  相似文献   

14.
Experience with a Hybrid Processor: K-Means Clustering   总被引:2,自引:0,他引:2  
We discuss hardware/software co-processing on a hybrid processor for a compute- and data-intensive multispectral imaging algorithm, k-means clustering. The experiments are performed on two models of the Altera Excalibur board, the first using the soft IP core 32-bit NIOS 1.1 RISC processor, and the second with the hard IP core ARM processor. In our experiments, we compare performance of the sequential k-means algorithm with three different accelerated versions. We consider granularity and synchronization issues when mapping an algorithm to a hybrid processor. Our results show that speedup of 11.8X is achieved by migrating computation to the Excalibur ARM hardware/software as compared to software only on a Gigahertz Pentium III. Speedup on the Excalibur NIOS is limited by the communication cost of transferring data from external memory through the processor to the customized circuits. This limitation is overcome on the Excalibur ARM, in which dual-port memories, accessible to both the processor and configurable logic, have the biggest performance impact of all the techniques studied.  相似文献   

15.
This paper aims to describe architecture for video coding on a processor with an ARM and DSP cores. The proposed platform has been designed for MPEG-4 Visual Simple Profile. The obtained results are optimized if compared with these of single-core. The dual-core processors, composed of RISC and DSP, are widely used as the based-band processors of cell phones. The RISC suits for IO control, while DSP is useful for computation. The operational efficiency of the integration of RISC and DSP is outstanding. Video compression requires a great deal of computation, so we take both the feature of coding algorithm and the hardware platform into consideration. We analyze features of key components in video codec and propose the framework, which adopts DMA to shorten the time needed. It is the result of the communication between the dual-cores. The experimental results indicate that during the inter-frame processing, dual-core with DMA can cut down the processing time by 1/4 more than that of single-use of ARM or DSP. Moreover, it can save 3/4 of the time for encode/decode processing in inter-frame. Especially, in respect of motion estimation, the performance rating can be improved by 4 times.  相似文献   

16.
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.  相似文献   

17.
This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500.  相似文献   

18.
Organization of the Motorola 88110 superscalar RISC microprocessor   总被引:1,自引:0,他引:1  
Diefendorff  K. Allen  M. 《Micro, IEEE》1992,12(2):40-63
Motorola's second-generation RISC microprocessor, which uses advanced techniques for exploiting instruction-level parallelism, including superscalar instruction issue, our-of-order instruction completion, speculative execution, dynamic instruction rescheduling, and two parallel, high-bandwidth, on-chip caches, is discussed. The microprocessor was designed to serve as the central processor in low-cost personal computers and workstations, and support demanding graphics and digital signal processing applications. The 88110's instruction set architecture, instruction sequencer, register files, execution units, address translation facilities, caches, and external bus interface are described  相似文献   

19.
基于存储技术的高速嵌入式处理器的设计与实现   总被引:1,自引:0,他引:1  
张钦  韩承德 《计算机学报》2007,30(5):831-837
SoPC(片上可编程系统,System on a Programmable Chip)在嵌入式系统中有着广泛的应用,通常用FPGA(现场可编程门阵列,Field Programmable Gate Array)实现.一类嵌入式处理器,例如小波变换处理器、压缩和解压缩处理器、FFT处理器,都可以采用基于存储技术的设计方法.FPGA的片内存储资源相对较少,如何有效地利用FPGA的片内存储资源实现高速的嵌入式处理器成为需要研究的问题.文中以FFT处理器为例说明这种方法的有效性,通过采用一种地址映射调度策略和两种无冲突操作数地址映射方式,减少了所使用的FPGA片内存储资源,提高了处理速度.该FFT处理器在实际系统中起到了关键作用.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号