共查询到20条相似文献,搜索用时 15 毫秒
1.
Time predictability is an important requirement for real-time embedded application domains such as automotive, air transportation,
and multimedia processing. However, the architectural design of modern microprocessors mainly concentrates on improving the
average-case performance, which can significantly compromise the time predictability and can make accurate worst-case performance
analysis extremely difficult if not impossible.
This paper studies the time predictability of VLIW (Very Long Instruction Word) processors and its compiler support. We analyze
the impediments to time predictability for VLIW processors and propose compiler-based techniques to address these problems
with minimal disturbance on the VLIW hardware design. The VLIW compiler is enhanced to support full if conversion, hyperblock
scheduling, and intra-block nop insertion to enable efficient WCET (Worst Case Execution Time) analysis for VLIW processors.
Our experiments indicate that the time-predictability of VLIW processor can be improved significantly.
相似文献
Wei ZhangEmail: |
2.
VLIW是DSP芯片上使用最多的一种技术,要发挥DSP芯片的性能优势,需要编译器的支持.目前关于VLLW技术的研究主要集中在如何形成更长的基本块,以及基本块之间的代码优化算法上,对于如何选择指令从而形成一个超长指令字的算法,却没有仔细地描述和实现,但这是在编译器的指令调度模块中需要具体考虑的问题,具有工程实践意义.本文通过改进编译器的lisf算法实现了支持VLIW技术的指令调度优化算法,改进的算法可以充分利用芯片的VLIW结构的优势,加速程序运行,具有较好性能. 相似文献
3.
在基于VLIW结构的分组密码专用处理器设计过程中,研究了VLIW处理器的指令集体系结构建模技术.设计了一个指令精确的指令集模拟器,通过附加一个流水线相关及停顿统计模块,实现了周期精确的程序运行统计和流水线停顿统计.结合指令集模拟器、汇编器以及调试器,设计了一个面向VLIW处理器的辅助程序优化环境.利用模拟器和调试器来评估程序的指令级并行度以及资源占用情况,辅助程序开发者优化VLIW处理器程序,从而达到软硬件协作开发VLIW处理器指令级并行性的最终目的. 相似文献
4.
This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running simultaneously in a VLIW processor with a register file and a fetch unit per thread along with memory disambiguation hardware for speculative load and store operations. This paper analyzes the performance impact of the dual-thread VLIW processor, which includes analysis of migrating disambiguation hardware for speculative load operations to the compiler and of the sensitivity of the model to the variation of branch misprediction, second-level cache miss penalties, and register file copy time. Up to 34 percent improvement in performance can be attained using the dual-thread VLIW processor when compared to a single-threaded VLIW processor model. 相似文献
5.
In the high-speed free-form surface machining, the real-time motion planning and interpolation is a challenging task. This paper presents the design and implementation of a dedicated processor for the interpolation task in computerized numerical control (CNC) machine tools. The jerk-limited look-ahead motion planning and interpolation algorithm has been integrated in the interpolation processor to achieve smooth motion in the high-speed machining. The processor features a compactly designed floating-point parallel computing architecture, which employs a 3-stage pipelined reduced instruction set computer (RISC) core and a very long instruction word (VLIW) floating-point arithmetic unit. A new asynchronous execution mechanism has been employed in the processor to allow multi-cycle instructions to be performed in parallel. The proposed processor has been verified on a low-cost field programmable gate array (FPGA) chip in a prototype controller. Experimental result has demonstrated the significant improvement of the computing performance with the interpolation processor in the free-form surface machining. 相似文献
6.
7.
Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability. 相似文献
8.
本文主要针对VLIW结构的处理器结构特点,提出了一种基于伪随机和模块化指令流激励的面向对象的逻辑验证平台的设计方法,提高了处理器验证的覆盖率和效率。验证人员可以在平台上协同工作,开发相应的验证程序,完成指定的逻辑功能验证任务。实验结果表明该平台的建立解决了逻辑功能验证的瓶颈,该平台具有高效、灵活、易用和维护性好等特点。 相似文献
9.
10.
Pipelining in multi-query optimization 总被引:1,自引:0,他引:1
Nilesh N. Dalvi Sumit K. Sanghai Prasan Roy S. Sudarshan 《Journal of Computer and System Sciences》2003,66(4):728-762
Database systems frequently have to execute a set of related queries, which share several common subexpressions. Multi-query optimization exploits this, by finding evaluation plans that share common results. Current approaches to multi-query optimization assume that common subexpressions are materialized. Significant performance benefits can be had if common subexpressions are pipelined to their uses, without being materialized. However, plans with pipelining may not always be realizable with limited buffer space, as we show. We present a general model for schedules with pipelining, and present a necessary and sufficient condition for determining validity of a schedule under our model. We show that finding a valid schedule with minimum cost is NP-hard. We present a greedy heuristic for finding good schedules. Finally, we present a performance study that shows the benefit of our algorithms on batches of queries from the TPCD benchmark. 相似文献
11.
12.
本介绍一个采用VLIW超长指令字体系结构的高性能单片多处理机,在这个体系结构中采用流水寄存器堆来消除循环程序内的数据相关,从而使程序能够在指令级以极高的并行度并行运行。模拟实验结果表明这个体系结构具有很高的运算速度和很好的性能价格比。 相似文献
13.
VLIW是一种早已出现但一直未能广泛使用而现今又被重新重点研究的微处理器设计思想与技术,它跟超标量技术一样支持每周期执行多条指令,但并行度更高。本文将详细介绍VLIW的概念及其发展历程,讨论VLIW微处理器的特征与所需的编译技术支持,并与超标量微处理器进行比较分析。 相似文献
14.
仿真器是进行硬件设计评估,系统软件设计开发和计算机体系结构研究的有力工具.文章以一款VuW(超长指令字)结构的CPU仿真器--MCS为实例,讨论了指令集仿真器设计的一般原理和关键技术.在对目标CPU进行仿真的基础上,通过模拟部分操作系统功能,MCS可以导入并且运行经过目标机编译器编译的目标可执行代码,进行可配置的数据收集和数据分析,从而达到评估CPU设计,分析指令效率,支持编译系统调试的目的. 相似文献
15.
仿真器是进行硬件设计评估,系统软件设计开发和计算机体系结构研究的有力工具。文章以一款VLIW(超长指令字)结构的CPU仿真器———MCS为实例,讨论了指令集仿真器设计的一般原理和关键技术。在对目标CPU进行仿真的基础上,通过模拟部分操作系统功能,MCS可以导入并且运行经过目标机编译器编译的目标可执行代码,进行可配置的数据收集和数据分析,从而达到评估CPU设计,分析指令效率,支持编译系统调试的目的。 相似文献
16.
PB数据管道技术在高校招生管理系统中的应用 总被引:1,自引:0,他引:1
MIS系统开发中会遇到很多在不同数据库管理系统之间共享、璺输数据 问题,这些问题制约了系统在跨数据库平台上使用的方便性。本文首先介绍了PB数据管道技术的使用方法,然后以一个已完成的MIS系统,某高校信息管理网络系统中的招生模块为例,具体说明了如何在应用程序中动态实现数据管道技术。 相似文献
17.
介绍了基于FPGA实现VLIW微处理器的基本方法,对VLIW微处理器具体划分为5个主要功能模块.依据FPGA的设计思想,采用自顶向下和文本与原理图相结合的流水线方式的设计方法,进行VLIW微处理器的5个模块功能设计,从而最终实现VLIW微处理器的功能,并进行了板级功能验证. 相似文献
18.
Nagm Mohamed Nazeih Botros Mohamad Alweh 《通讯和计算机》2009,6(12):70-73,84
This is a comparative study of cache energy dissipations in Very Long Instruction Word (VLIW) and the classical superscalar microprocessors. While architecturally different, the two types are analyzed in this work under the assumption of having similar underlying silicon fabrication platforms. The outcomes of the study reveal how energy is exploited in the cache system of the former which makes it more appealing to low, power applications compared to the latter. 相似文献
19.
Ying-Hsiang Wen Jen-Wei Huang Ming-Syan Chen 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(6):784-795
Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and Pipelined (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach and the software algorithm in terms of execution time. 相似文献