共查询到20条相似文献,搜索用时 15 毫秒
1.
Time predictability is an important requirement for real-time embedded application domains such as automotive, air transportation,
and multimedia processing. However, the architectural design of modern microprocessors mainly concentrates on improving the
average-case performance, which can significantly compromise the time predictability and can make accurate worst-case performance
analysis extremely difficult if not impossible.
This paper studies the time predictability of VLIW (Very Long Instruction Word) processors and its compiler support. We analyze
the impediments to time predictability for VLIW processors and propose compiler-based techniques to address these problems
with minimal disturbance on the VLIW hardware design. The VLIW compiler is enhanced to support full if conversion, hyperblock
scheduling, and intra-block nop insertion to enable efficient WCET (Worst Case Execution Time) analysis for VLIW processors.
Our experiments indicate that the time-predictability of VLIW processor can be improved significantly.
相似文献
Wei ZhangEmail: |
2.
数字信号处理软件中循环程序在执行时间上占有很大比例,用指令缓冲器暂存循环代码可以减少程序存储器的访问次数,提高处理器性能。在VLIW处理器指令流水线中增加一个支持循环指令的缓冲器,该缓冲器能够缓存循环程序指令,并以软件流水的形式向功能部件派发循环程序指令。这样循环程序代码只需访存一次而执行多次,大大减少了访存次数。在循环指令运行期间,缓冲器发出信号使程序存储器进入睡眠状态可以降低处理器功耗。典型的应用程序测试表明,使用了循环缓冲后,取指流水线空闲率可达90%以上,处理器整体性能提高10%左右,而循环缓冲的硬件面积开销大约占取指流水线的9%。 相似文献
3.
在基于VLIW结构的分组密码专用处理器设计过程中,研究了VLIW处理器的指令集体系结构建模技术.设计了一个指令精确的指令集模拟器,通过附加一个流水线相关及停顿统计模块,实现了周期精确的程序运行统计和流水线停顿统计.结合指令集模拟器、汇编器以及调试器,设计了一个面向VLIW处理器的辅助程序优化环境.利用模拟器和调试器来评估程序的指令级并行度以及资源占用情况,辅助程序开发者优化VLIW处理器程序,从而达到软硬件协作开发VLIW处理器指令级并行性的最终目的. 相似文献
4.
VLIW是DSP芯片上使用最多的一种技术,要发挥DSP芯片的性能优势,需要编译器的支持.目前关于VLLW技术的研究主要集中在如何形成更长的基本块,以及基本块之间的代码优化算法上,对于如何选择指令从而形成一个超长指令字的算法,却没有仔细地描述和实现,但这是在编译器的指令调度模块中需要具体考虑的问题,具有工程实践意义.本文通过改进编译器的lisf算法实现了支持VLIW技术的指令调度优化算法,改进的算法可以充分利用芯片的VLIW结构的优势,加速程序运行,具有较好性能. 相似文献
5.
This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running simultaneously in a VLIW processor with a register file and a fetch unit per thread along with memory disambiguation hardware for speculative load and store operations. This paper analyzes the performance impact of the dual-thread VLIW processor, which includes analysis of migrating disambiguation hardware for speculative load operations to the compiler and of the sensitivity of the model to the variation of branch misprediction, second-level cache miss penalties, and register file copy time. Up to 34 percent improvement in performance can be attained using the dual-thread VLIW processor when compared to a single-threaded VLIW processor model. 相似文献
6.
In the high-speed free-form surface machining, the real-time motion planning and interpolation is a challenging task. This paper presents the design and implementation of a dedicated processor for the interpolation task in computerized numerical control (CNC) machine tools. The jerk-limited look-ahead motion planning and interpolation algorithm has been integrated in the interpolation processor to achieve smooth motion in the high-speed machining. The processor features a compactly designed floating-point parallel computing architecture, which employs a 3-stage pipelined reduced instruction set computer (RISC) core and a very long instruction word (VLIW) floating-point arithmetic unit. A new asynchronous execution mechanism has been employed in the processor to allow multi-cycle instructions to be performed in parallel. The proposed processor has been verified on a low-cost field programmable gate array (FPGA) chip in a prototype controller. Experimental result has demonstrated the significant improvement of the computing performance with the interpolation processor in the free-form surface machining. 相似文献
7.
8.
The popularity of multimedia applications made them a major theme in embedded systems. The key component for supporting multimedia application well is embedded processor. Thus, we have designed and implemented an embedded processor, called UniDual processor, to achieve this objective. Its key features are the integration of instructions of reduced instruction set computers (RISCs) and digital signal processors (DSPs) as well as the support of special instruction set and shared‐based clustered register architecture. However, an important issue of UniDual that remains open is how to efficiently allocate registers. In this paper, we present a scheduling and instruction transformation approach to resolve the aforementioned issue. The proposed approach schedules instructions and then transforms overlapped instructions into RISC and DSP instructions by taking communication overhead and hardware limitations into account. Compared with the greedy approach, the evaluation shows that our work is relatively effective in performance and code size reduction. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
9.
Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability. 相似文献
10.
本文主要针对VLIW结构的处理器结构特点,提出了一种基于伪随机和模块化指令流激励的面向对象的逻辑验证平台的设计方法,提高了处理器验证的覆盖率和效率。验证人员可以在平台上协同工作,开发相应的验证程序,完成指定的逻辑功能验证任务。实验结果表明该平台的建立解决了逻辑功能验证的瓶颈,该平台具有高效、灵活、易用和维护性好等特点。 相似文献
11.
Pipelining in multi-query optimization 总被引:1,自引:0,他引:1
Nilesh N. Dalvi Sumit K. Sanghai Prasan Roy S. Sudarshan 《Journal of Computer and System Sciences》2003,66(4):728-762
Database systems frequently have to execute a set of related queries, which share several common subexpressions. Multi-query optimization exploits this, by finding evaluation plans that share common results. Current approaches to multi-query optimization assume that common subexpressions are materialized. Significant performance benefits can be had if common subexpressions are pipelined to their uses, without being materialized. However, plans with pipelining may not always be realizable with limited buffer space, as we show. We present a general model for schedules with pipelining, and present a necessary and sufficient condition for determining validity of a schedule under our model. We show that finding a valid schedule with minimum cost is NP-hard. We present a greedy heuristic for finding good schedules. Finally, we present a performance study that shows the benefit of our algorithms on batches of queries from the TPCD benchmark. 相似文献
12.
13.
14.
本介绍一个采用VLIW超长指令字体系结构的高性能单片多处理机,在这个体系结构中采用流水寄存器堆来消除循环程序内的数据相关,从而使程序能够在指令级以极高的并行度并行运行。模拟实验结果表明这个体系结构具有很高的运算速度和很好的性能价格比。 相似文献
15.
仿真器是进行硬件设计评估,系统软件设计开发和计算机体系结构研究的有力工具。文章以一款VLIW(超长指令字)结构的CPU仿真器———MCS为实例,讨论了指令集仿真器设计的一般原理和关键技术。在对目标CPU进行仿真的基础上,通过模拟部分操作系统功能,MCS可以导入并且运行经过目标机编译器编译的目标可执行代码,进行可配置的数据收集和数据分析,从而达到评估CPU设计,分析指令效率,支持编译系统调试的目的。 相似文献
16.
弹性数据相关与软件流水 总被引:1,自引:0,他引:1
最差路径是有分支循环软件流水的一大障碍.对于有分支循环,某些数据相关(称为弹性相关)在循环的动态执行中可能产生、也可能不产生实例.据此,可将严重限制并行性的弹性相关用限制较松的虚构相关代替,再进行软件流水.若调度没有遵守原来的弹性相关,则使用下推变换修正.从而缓解或者完全解除了最差路径的限制.该方法与经典的控制猜测互补,特点是允许调度含错,然后纠错. 相似文献
17.
仿真器是进行硬件设计评估,系统软件设计开发和计算机体系结构研究的有力工具.文章以一款VuW(超长指令字)结构的CPU仿真器--MCS为实例,讨论了指令集仿真器设计的一般原理和关键技术.在对目标CPU进行仿真的基础上,通过模拟部分操作系统功能,MCS可以导入并且运行经过目标机编译器编译的目标可执行代码,进行可配置的数据收集和数据分析,从而达到评估CPU设计,分析指令效率,支持编译系统调试的目的. 相似文献
18.
PB数据管道技术在高校招生管理系统中的应用 总被引:1,自引:0,他引:1
MIS系统开发中会遇到很多在不同数据库管理系统之间共享、璺输数据 问题,这些问题制约了系统在跨数据库平台上使用的方便性。本文首先介绍了PB数据管道技术的使用方法,然后以一个已完成的MIS系统,某高校信息管理网络系统中的招生模块为例,具体说明了如何在应用程序中动态实现数据管道技术。 相似文献
19.
介绍了基于FPGA实现VLIW微处理器的基本方法,对VLIW微处理器具体划分为5个主要功能模块.依据FPGA的设计思想,采用自顶向下和文本与原理图相结合的流水线方式的设计方法,进行VLIW微处理器的5个模块功能设计,从而最终实现VLIW微处理器的功能,并进行了板级功能验证. 相似文献
20.
VLIW是一种早已出现但一直未能广泛使用而现今又被重新重点研究的微处理器设计思想与技术,它跟超标量技术一样支持每周期执行多条指令,但并行度更高。本文将详细介绍VLIW的概念及其发展历程,讨论VLIW微处理器的特征与所需的编译技术支持,并与超标量微处理器进行比较分析。 相似文献