期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A time-predictable VLIW processor and its compiler support

Jun Yan Wei Zhang 《Real-Time Systems》2008,38(1):67-84

Time predictability is an important requirement for real-time embedded application domains such as automotive, air transportation, and multimedia processing. However, the architectural design of modern microprocessors mainly concentrates on improving the average-case performance, which can significantly compromise the time predictability and can make accurate worst-case performance analysis extremely difficult if not impossible. This paper studies the time predictability of VLIW (Very Long Instruction Word) processors and its compiler support. We analyze the impediments to time predictability for VLIW processors and propose compiler-based techniques to address these problems with minimal disturbance on the VLIW hardware design. The VLIW compiler is enhanced to support full if conversion, hyperblock scheduling, and intra-block nop insertion to enable efficient WCET (Worst Case Execution Time) analysis for VLIW processors. Our experiments indicate that the time-predictability of VLIW processor can be improved significantly.

Wei ZhangEmail:

相似文献

2.

一种面向VLIW芯片的线性指令调度算法

甘玲汤睿《微计算机信息》2009,25(2)

VLIW是DSP芯片上使用最多的一种技术,要发挥DSP芯片的性能优势,需要编译器的支持.目前关于VLLW技术的研究主要集中在如何形成更长的基本块,以及基本块之间的代码优化算法上,对于如何选择指令从而形成一个超长指令字的算法,却没有仔细地描述和实现,但这是在编译器的指令调度模块中需要具体考虑的问题,具有工程实践意义.本文通过改进编译器的lisf算法实现了支持VLIW技术的指令调度优化算法,改进的算法可以充分利用芯片的VLIW结构的优势,加速程序运行,具有较好性能. 相似文献

3.

VLIW处理器ISA建模与辅助软件优化技术

严迎建叶建森刘军伟徐劲松《计算机工程与设计》2009,30(11)

在基于VLIW结构的分组密码专用处理器设计过程中,研究了VLIW处理器的指令集体系结构建模技术.设计了一个指令精确的指令集模拟器,通过附加一个流水线相关及停顿统计模块,实现了周期精确的程序运行统计和流水线停顿统计.结合指令集模拟器、汇编器以及调试器,设计了一个面向VLIW处理器的辅助程序优化环境.利用模拟器和调试器来评估程序的指令级并行度以及资源占用情况,辅助程序开发者优化VLIW处理器程序,从而达到软硬件协作开发VLIW处理器指令级并行性的最终目的. 相似文献

4.

High-performance and low-cost dual-thread VLIW processor using Weld architecture paradigm

Ozer E. Conte T.M. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(12):1132-1142

This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running simultaneously in a VLIW processor with a register file and a fetch unit per thread along with memory disambiguation hardware for speculative load and store operations. This paper analyzes the performance impact of the dual-thread VLIW processor, which includes analysis of migrating disambiguation hardware for speculative load operations to the compiler and of the sensitivity of the model to the variation of branch misprediction, second-level cache miss penalties, and register file copy time. Up to 34 percent improvement in performance can be attained using the dual-thread VLIW processor when compared to a single-threaded VLIW processor model. 相似文献

5.

An FPGA-based low-cost VLIW floating-point processor for CNC applications

《Microprocessors and Microsystems》2017

In the high-speed free-form surface machining, the real-time motion planning and interpolation is a challenging task. This paper presents the design and implementation of a dedicated processor for the interpolation task in computerized numerical control (CNC) machine tools. The jerk-limited look-ahead motion planning and interpolation algorithm has been integrated in the interpolation processor to achieve smooth motion in the high-speed machining. The processor features a compactly designed floating-point parallel computing architecture, which employs a 3-stage pipelined reduced instruction set computer (RISC) core and a very long instruction word (VLIW) floating-point arithmetic unit. A new asynchronous execution mechanism has been employed in the processor to allow multi-cycle instructions to be performed in parallel. The proposed processor has been verified on a low-cost field programmable gate array (FPGA) chip in a prototype controller. Experimental result has demonstrated the significant improvement of the computing performance with the interpolation processor in the free-form surface machining. 相似文献

6.

Automatic instruction-set architecture synthesis for VLIW processor cores in the ASAM project

《Microprocessors and Microsystems》2017

相似文献

7.

Flexible VLIW processor based on FPGA for efficient embedded real-time image processing

Vincent Brost Fan Yang Charles Meunier 《Journal of Real-Time Image Processing》2014,9(1):47-59

Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability. 相似文献

8.

一个VLIW处理器验证平台的设计

李罗生侯朝焕《微处理机》2004,25(4):46-48

本文主要针对VLIW结构的处理器结构特点，提出了一种基于伪随机和模块化指令流激励的面向对象的逻辑验证平台的设计方法，提高了处理器验证的覆盖率和效率。验证人员可以在平台上协同工作，开发相应的验证程序，完成指定的逻辑功能验证任务。实验结果表明该平台的建立解决了逻辑功能验证的瓶颈，该平台具有高效、灵活、易用和维护性好等特点。相似文献

9.

面向VLIW处理器的GAS汇编器实现

王红梅张铁军王东辉《微计算机应用》2010,31(5)

DSP处理器采用VLIW结构提高了指令级并行度,同时也增加了为其开发汇编器的难度.本文在汇编器GAS(GNV Assemblor)的基础上,讨论了为VLIW结构DSP开发汇编器的关键技术.该技术通过分析汇编指令的串并行信息为DSP产生指令包;通过相关性检查改善了代码膨胀问题,在保证汇编器功能正确的同时,提高了性能. 相似文献

10.

Pipelining in multi-query optimization 总被引：1，自引：0，他引：1

Nilesh N. Dalvi Sumit K. Sanghai Prasan Roy S. Sudarshan 《Journal of Computer and System Sciences》2003,66(4):728-762

Database systems frequently have to execute a set of related queries, which share several common subexpressions. Multi-query optimization exploits this, by finding evaluation plans that share common results. Current approaches to multi-query optimization assume that common subexpressions are materialized. Significant performance benefits can be had if common subexpressions are pipelined to their uses, without being materialized. However, plans with pipelining may not always be realizable with limited buffer space, as we show. We present a general model for schedules with pipelining, and present a necessary and sufficient condition for determining validity of a schedule under our model. We show that finding a valid schedule with minimum cost is NP-hard. We present a greedy heuristic for finding good schedules. Finally, we present a performance study that shows the benefit of our algorithms on batches of queries from the TPCD benchmark. 相似文献

11.

面向VLIW DSP结构的编译器的设计与实现

王敏王红梅张铁军单睿王东辉《微计算机应用》2009,30(7)

VLIW编译器实现指令并行性挖掘、相关性检查、指令调度等职能,对VLIW处理器的性能影响较大.本文基于一款VLIW DSP芯片,利用可重定位编译器IMPACT的前端和代码生成器模板,设计和实现了高性能的VLIW编译器.利用伪数据类型和Intrinsic函数结合,在编译器中构建了对SIMD功能的支持.实验结果显示,对比基于GCC版本的编译器,该编译器生成的指令数平均下降42%,并行包数下降30%. 相似文献

12.

一个VLIW体系结构的单片多处理机

汤志忠张赤红《计算机研究与发展》1993,30(10):1-8

本介绍一个采用ＶＬＩＷ超长指令字体系结构的高性能单片多处理机，在这个体系结构中采用流水寄存器堆来消除循环程序内的数据相关，从而使程序能够在指令级以极高的并行度并行运行。模拟实验结果表明这个体系结构具有很高的运算速度和很好的性能价格比。相似文献

13.

VLIW微处理器特征与编译技术支持

郑飞陆鑫达《微处理机》1996,(3):1-4

VLIW是一种早已出现但一直未能广泛使用而现今又被重新重点研究的微处理器设计思想与技术，它跟超标量技术一样支持每周期执行多条指令，但并行度更高。本文将详细介绍VLIW的概念及其发展历程，讨论VLIW微处理器的特征与所需的编译技术支持，并与超标量微处理器进行比较分析。相似文献

14.

VUW体系CPU仿真器MCS的设计与实现

李锋王雷刘又诚周伯生《计算机工程与应用》2001,37(21):165-168

仿真器是进行硬件设计评估,系统软件设计开发和计算机体系结构研究的有力工具.文章以一款VuW(超长指令字)结构的CPU仿真器--MCS为实例,讨论了指令集仿真器设计的一般原理和关键技术.在对目标CPU进行仿真的基础上,通过模拟部分操作系统功能,MCS可以导入并且运行经过目标机编译器编译的目标可执行代码,进行可配置的数据收集和数据分析,从而达到评估CPU设计,分析指令效率,支持编译系统调试的目的. 相似文献

15.

VLIW体系CPU仿真器MCS的设计与实现

李锋王雷《计算机工程与应用》2001,37(21):165-168

仿真器是进行硬件设计评估,系统软件设计开发和计算机体系结构研究的有力工具。文章以一款VLIW(超长指令字)结构的CPU仿真器———MCS为实例,讨论了指令集仿真器设计的一般原理和关键技术。在对目标CPU进行仿真的基础上,通过模拟部分操作系统功能,MCS可以导入并且运行经过目标机编译器编译的目标可执行代码,进行可配置的数据收集和数据分析,从而达到评估CPU设计,分析指令效率,支持编译系统调试的目的。相似文献

16.

PB数据管道技术在高校招生管理系统中的应用 总被引：1，自引：0，他引：1

时希杰李波《微型电脑应用》2002,18(9):42-44

MIS系统开发中会遇到很多在不同数据库管理系统之间共享、璺输数据问题，这些问题制约了系统在跨数据库平台上使用的方便性。本文首先介绍了PB数据管道技术的使用方法，然后以一个已完成的MIS系统，某高校信息管理网络系统中的招生模块为例，具体说明了如何在应用程序中动态实现数据管道技术。相似文献

17.

VLIW处理器的设计与实现

唐骞杨小雪《微型机与应用》2010,29(11)

介绍了基于FPGA实现VLIW微处理器的基本方法,对VLIW微处理器具体划分为5个主要功能模块.依据FPGA的设计思想,采用自顶向下和文本与原理图相结合的流水线方式的设计方法,进行VLIW微处理器的5个模块功能设计,从而最终实现VLIW微处理器的功能,并进行了板级功能验证. 相似文献

18.

Cache memory energy minimization in VLIW processors

Nagm Mohamed Nazeih Botros Mohamad Alweh 《通讯和计算机》2009,6(12):70-73,84

This is a comparative study of cache energy dissipations in Very Long Instruction Word （VLIW） and the classical superscalar microprocessors. While architecturally different, the two types are analyzed in this work under the assumption of having similar underlying silicon fabrication platforms. The outcomes of the study reveal how energy is exploited in the cache system of the former which makes it more appealing to low, power applications compared to the latter. 相似文献

19.

Hardware-Enhanced Association Rule Mining with Hashing and Pipelining

Ying-Hsiang Wen Jen-Wei Huang Ming-Syan Chen 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(6):784-795

Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and Pipelined (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach and the software algorithm in terms of execution time. 相似文献

20.

一种支持VLIW DSP条件跳转指令的技术研究

余锋林耿锐戴福泉《工业控制计算机》2009,22(2):35-37

条件跳转指令是VLIW DSP中频繁使用的一种指令,循环是条件跳转指令应用的主要领域之一。条件跳转指令高效的设计是VLIW DSP高效运行的关键。针对这类指令实现的复杂性,讨论了一种新的结构Hyperblock,并用这种结构设计实现了BWDSP100处理器中的条件跳转指令,实验证明该方法对于DSP核心算法程序以及实际应用程序都可以获得较好的优化效果,提高了指令并行性。相似文献