首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
配置流驱动计算体系结构指导下的ASIP设计   总被引:1,自引:0,他引:1  
为了兼顾嵌入式处理器设计中的灵活性与高效性,提出配置流驱动计算体系结构.在体系结构设计中将软/硬件界面下移,使功能单元之间的互连网络对编译器可见,并由编译器来完成传输路由,从而支持复杂但更为高效的互连网络.在该体系结构指导下,提出一种支持段式可重构互连网络的专用指令集处理器(ASIP)设计方法.该方法应用到密码领域的3类ASIP设计中表明,与简单总线互连相比,在不影响性能的前提下,可平均节约53%的互连功耗和38.7%的总线数量,从而达到减少总线数量、降低互连功耗的目的.  相似文献   

2.
Time predictability is an important requirement for real-time embedded application domains such as automotive, air transportation, and multimedia processing. However, the architectural design of modern microprocessors mainly concentrates on improving the average-case performance, which can significantly compromise the time predictability and can make accurate worst-case performance analysis extremely difficult if not impossible. This paper studies the time predictability of VLIW (Very Long Instruction Word) processors and its compiler support. We analyze the impediments to time predictability for VLIW processors and propose compiler-based techniques to address these problems with minimal disturbance on the VLIW hardware design. The VLIW compiler is enhanced to support full if conversion, hyperblock scheduling, and intra-block nop insertion to enable efficient WCET (Worst Case Execution Time) analysis for VLIW processors. Our experiments indicate that the time-predictability of VLIW processor can be improved significantly.
Wei ZhangEmail:
  相似文献   

3.
一种动态VLIW调度机制的研究和实现   总被引:2,自引:0,他引:2       下载免费PDF全文
VLIW结构是开发ILP的一种重要手段,其优点是结构规整简单、硬件复杂度低。但是,完全依靠编译器进行指令调度的机制限制了VLIW结构性能的提高。本文提出了一种基于确定指令延迟的动态VLIW调度机制,该机制利用大部分指令执行时间确定的特点,根据运行时信息重新调度指令的执行顺序,以进一步开发ILP。在FPGA上的实验结果表明,该机制具有线性的硬件复杂度。  相似文献   

4.
数字图像处理(Digital Image Processing)广泛应用于航空航天、生物医学工程、通信工程、工业和工程、军事公安、文化艺术等方面.由于一些应用的实时性和环境要求,通常采用数字信号处理器(Digital Signal Processing,简称DSP)处理图像.采用超长指令字(Very Long Instruction Word,简称VLIW)体系结构的DSP由于功耗低、硬件结构简单和并行性好等优点,在实时图像处理应用中使用广泛.根据图像处理算法特点和VLIW DSP体系结构特点提出在YLIW DSP上优化图像处理算法的一般方法,包括存储优化方法和指令级并行优化方法.最后采用提出的方法对多个常用的图像处理算法优化,试验结果表明有较好优化效果.  相似文献   

5.
The MAJC architecture enhances application performance by exploiting parallelism at multiple levels-instruction, data, thread, and process. Supporting vertical multithreading, speculative multithreading, and chip multiprocessors, the scalable VLIW architecture is also capable of advanced speculation and predication and treats all data types similarly  相似文献   

6.
DSP处理器采用VLIW结构提高了指令级并行度,同时也增加了为其开发汇编器的难度.本文在汇编器GAS(GNV Assemblor)的基础上,讨论了为VLIW结构DSP开发汇编器的关键技术.该技术通过分析汇编指令的串并行信息为DSP产生指令包;通过相关性检查改善了代码膨胀问题,在保证汇编器功能正确的同时,提高了性能.  相似文献   

7.
基于VLIW的机器相关优化编译技术研究   总被引:2,自引:0,他引:2  
VLIW体系结构性能的发挥在很大程度上依赖于其相应的编译器。编译优化主要包括两个方面:一方面是传统的编译器优化技术;另一方面是针对具体机器平台特定的优化技术。VLIW机器相关的编译优化技术应该针对具体的机器平台,基于超长指令字体系结构的特点,考虑如何充分利用机器提供的硬件资源,以达到软件(编译器)和硬件(CPU)的最大匹配,从而生成高效率高并行度的目标代码。论文从超长指令字的特点出发,探讨了在VLIW体系结构下与机器相关的编译优化的实现方案,同时提出了几点在具体进行与机器相关的优化编译时的关键技术。  相似文献   

8.
This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running simultaneously in a VLIW processor with a register file and a fetch unit per thread along with memory disambiguation hardware for speculative load and store operations. This paper analyzes the performance impact of the dual-thread VLIW processor, which includes analysis of migrating disambiguation hardware for speculative load operations to the compiler and of the sensitivity of the model to the variation of branch misprediction, second-level cache miss penalties, and register file copy time. Up to 34 percent improvement in performance can be attained using the dual-thread VLIW processor when compared to a single-threaded VLIW processor model.  相似文献   

9.
集成电路芯片工艺的发展已可使一个系统或一个子系统集成在一个芯片上 ,称为系统集成芯片。本文综述了系统集成芯片的硬件构造、超长指令 (VLIW )结构、芯片嵌入软件及软硬件协同设计方法。  相似文献   

10.
分簇结构超长指令字DSP编译器的设计与实现   总被引:5,自引:0,他引:5  
超长指令字(VLIW)是高端DSP普遍采用的体系结构。VLIW DSP在硬件上没有调度和冲突判决的机制,其性能的发挥完全依靠编译嚣的优化效果.基于可重定向编译基础设施IMPACT,为分簇VLIW DSP YHFT—D4设计与实现了优化编译器.其中着重讨论了可重定向信息的定义、代码注释、SIMD指令的支持、分簇寄存器分配以度指令级并行开发和资源冲突解决等内容.实验结果表明该编译器可以达到较好的优化效果.  相似文献   

11.
Suga  A. Matsunami  K. 《Micro, IEEE》2000,20(4):21-27
Because conventional RISC processors have insufficient processing power to support the continuing development of digital consumer products, we need a new high performance processor for multimedia applications. Processing multimedia video images requires more than 10 times the currently available performance. At Fujitsu, we provide this higher performance in software to attain a high degree of flexibility. We developed the FR500 microprocessor with a novel embedded VLIW (very long instruction word) architecture for use in such digital consumer products. The FR500 is the first product in the FR-V line, Fujitsu's generic name for VLIW architecture microprocessors. The FR-V line offers the flexibility to develop new products optimized for a wide variety of digital consumer products. In this paper, we describe the FR-V architecture, which includes our variable-length VLIW and instruction set architectures, speculative execution control, and conditional execution control. We also evaluate its performance  相似文献   

12.
针对嵌入式系统日益严峻的调试挑战,提出并实现了一种基于32 bit超标量DSP内核的片上调试与实时跟踪架构。该架构通过设计专用的跟踪接口与其他硬件资源,并扩展JTAG端口、存储器保护逻辑与流水线控制逻辑,以较低的硬件开销实现对内核的实时运行控制、内部寄存器与存储器的非侵入访问、带复杂触发条件的断点与观察点设置、硬件单步以及程序流的实时跟踪等典型特征的支持,可满足绝大部分嵌入式系统的开发与调试需求。  相似文献   

13.
The Syte workstation architecture closely couples the graphics system and the processor to improve interactive performance and reduce hardware and software overhead without added support mechanisms.  相似文献   

14.
功耗-体系结构描述语言XP-ADL及其设计环境   总被引:2,自引:0,他引:2  
降低计算机系统的功耗日益成为系统设计中的重要目标,可配置VLIW体系结构在低功耗系统设计中具有显著的优势.本文提出一种功耗一体系结构描述语言XP-ADL,并介绍了基于该语言的体系结构设计环境.XP-ADL语言将系统中各功能部件的结构表示和它们的执行(功能)语义分离开来,方便了可配置的VLIW体系结构的描述.同时,为了便于在功耗模型下进行体系结构空间探索,XP-ADL允许在设计环境中包含各类功耗模型和功耗约束.  相似文献   

15.
With the development of real-time ray tracing in recent years, it is now very interesting to ask if real-time performance can be achieved for high-quality rendering algorithms based on ray tracing. In this paper, we propose a pipelined architecture to implement reverse photon mapping. Our architecture can use real-time ray tracing to generate photon points and camera points, so the main challenge is how to implement the gathering phase that computes the final image. Traditionally, the gathering phase of photon mapping has only allowed coarse-grain parallelism, and this situation has been a source of inefficiency, cache thrashing, and limited throughput. To avail fine-grain pipelining and data parallelism, we arrange computations so that photons can be processed independently, similar to the way that triangles are efficiently processed in traditional real-time graphics hardware. We employ several techniques to improve cache behavior and to reduce communication overhead. Simulations show that the bandwidth requirements of this architecture are within the capacity of current and future hardware, and this suggests that photon mapping may be a good choice for real-time performance in the future.  相似文献   

16.
VLIW体系结构微处理器的控制流分析与其模拟软件设计   总被引:1,自引:0,他引:1  
本文在研究超长指令字(VLIW)体系结构的基础上,总结了VLIW体系结构的指令结构特征、处理器结构特征和执行特征,通过比较VLIW体系结构微处理器模拟器的两种设计方案,选定以结构为基础设计模拟器的方案,并解决了模拟的设计难点-串并行冲突的问题。  相似文献   

17.
《Parallel Computing》2013,39(10):586-602
Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a multimedia coprocessor resembles of single-instruction multiple-data (SIMD) engines into architectures exploiting ILP at compile time, such as very long instruction word (VLIW) and transport triggered architecture (TTA). However, the ILP regions fail to scale with the increased vector length to achieve high performance in the DLP regions. Furthermore, the register-to-register nature of SIMD instructions causes current SIMD engines to have limitations in handling memory alignment, data reorganization, and control flow. Many supporting instructions such as data permutations, address generations, and loop branches, are required to aid in the execution of the real SIMD computation instructions. To mitigate these problems, we propose optimized SIMD engines that have the capabilities for combining VLIW or TTA processing with a unified scalar and long vector computations as well as efficient SIMD hardware for real computation. Our new architecture is based on TTA and is called multimedia coprocessor (MCP). This architecture includes following features: (1) a simple coprocessor structure with 8-way TTA, (2) cost-effective SIMD hardware capable of performing floating-point operations, (3) long vector capabilities built upon existing SIMD hardware and a single register file and processor data path for both scalar operands and vector elements, and (4) an optimized SIMD architecture that addresses the SIMD limitations. Our experimental evaluations show that MCP can outperform conventional SIMD techniques by an average of 39% and 12% in performance for multimedia kernels and applications, respectively.  相似文献   

18.
Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability.  相似文献   

19.
Vector computing can effectively improve the computing efficiency of computers and reduce unnecessary hardware overhead. With the improvement of CPU computing capability, the expansion of register number, and other hardware development trends, vector computing has becoming one of the widely used technologies to improve the CPU performance. The RISC-V architecture, which is highly focused on, also needs vector technology to improve the architecture performance. The open source RISC-V assembler only support standard instructions, and does not support vector instructions until now. In order to support RISC-V vector instructions, this paper details the design and implementation of RISC-V assembler supporting vector instructions.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号