共查询到20条相似文献,搜索用时 62 毫秒
1.
Ashok Sudarsanam Sharad Malik Masahiro Fujita 《Design Automation for Embedded Systems》1999,4(2-3):187-206
We address the problem of code generation for embedded DSP systems. Such systems devote a limited quantity of silicon to program
memory, so the embedded software must be sufficiently dense. Additionally, this software must be written so as to meet various
high-performance constraints. Unfortunately, current compiler technology is unable to generate dense, high-performance code
for DSPs, due to the fact that it does not provide adequate support for the specialized architectural features of DSPs via
machine-dependent code optimizations. Thus, designers often program the embedded software in assembly, a very time-consuming
task. In order to increase productivity, compilers must be developed that are capable of generating high-quality code for
DSPs. The compilation process must also be made retargetable, so that a variety of DSPs may be efficiently evaluated for potential
use in an embedded system. We present a retargetable compilation methodology that enables high-quality code to be generated
for a wide range of DSPs. Previous work in retargetable DSP compilation has focused on complete automation, and this desire
for automation has limited the number of machine-dependent optimizations that can be supported. In our efforts, we have given
code quality higher priority over complete automation. We demonstrate how by using a library of machine-dependent optimization
routines accessible via a programming interface, it is possible to support a wide range of machine-dependent optimizations,
albeit at some cost to automation. Experimental results demonstrate the effectiveness of our methodology, which has been used
to build good-quality compilers for three fixed-point DSPs.
This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献
2.
We address the problem of code generation for DSP systems on a chip. In such systems, the amount of silicon devoted to program ROM is limited, so in addition to meeting various high-performance constraints, the application software must be sufficiently dense. Unfortunately, existing compiler technology is unable to generate high-quality code for DSPs since it does not provide adequate support for the specialized architectural features of DSPs. Thus, designers often resort to programming application software in assembly, which is a very tedious and time-consuming task. In this paper, we focus on providing compiler support for a group of specialized architectural features that exist in many DSPs, namely indirect addressing modes with auto-increment/decrement arithmetic. In these DSPs, an indexed addressing mode is generally not available, so automatic variables must be accessed by allocating address registers and performing address arithmetic. Subsuming address arithmetic into auto-increment /decrement arithmetic improves both the performance and size of the generated code. Our objective is to provide a method for comprehensively analyzing the performance benefits and hardware cost due to an auto-increment /decrement feature that varies from-l to +l, and allowing access to k address registers in an address generator. We provide this method via a parameterizable optimization algorithm that operates on a procedure-wise basis. Thus, the optimization techniques in a compiler can be used not only to generate efficient or compact code, but also to help the designer of a custom DSP architecture make decisions on address arithmetic features. 相似文献
3.
Mittal G. Zaretsky D. Xiaoyong Tang Banerjee P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(11):1177-1190
As new applications in embedded communications and control systems push the computational limits of digital signal processing (DSP) functions, there will be an increasing need for software applications to be migrated to hardware in the form of a hardware-software codesign system. In many cases, access to the high-level source code may not be available. It is thus desirable to have a technology to translate the software binaries intended for processors to hardware implementations. This paper provides details on the retargetable FREEDOM compiler. The compiler automatically translates DSP software binaries to register-transfer level (RTL) VHDL and Verilog for implementation on field-programmable gate arrays (FPGAs) as standalone or system-on-chip implementations. We describe the underlying optimizations and some novel algorithms for alias analysis, data dependency analysis, memory optimizations, procedure call recovery, and back-end code scheduling. Experimental results on resource usage and performance are shown for several program binaries intended for the Texas Instruments C 6211 DSP (VLIW) and the ARM 922 T reduced instruction set computer (RISC) processors. Implementation results for four kernels from the Simulink demo library and others from commonly used DSP applications, such as MPEG-4, Viterbi, and JPEG are also discussed. The compiler generated RTL code is mapped to Xilinx Virtex II and Altera Stratix FPGAs. We record overall performance gains of 1.5-26.9 for the hardware implementations of the kernels. Comparisons with the power aware compiler techniques (PACT) high-level synthesis compiler are used to show that software binaries can be used as intermediate representations from any high-level language and generate efficient hardware implementations. 相似文献
4.
《Signal Processing Magazine, IEEE》1998,15(2)
The Texas Instruments VelociTI architecture is a very long instruction word (VLIW) architecture. The TMS320C6x family of digital signal processors (DSPs) is the first to employ the VelociTI architecture, with the TMS3206201 (C6201) being the first device in this family. The C6201 is based on the fixed-point TMS320C62x (C62x) CPU. This article describes the VelociTI VLIW architecture and discusses the C62x, C67x, C6201, and the VelociTI development tools. An overview of the VelociTI including architectural principles, data path, instruction set, and pipeline operation is presented, and both the C62x fixed-point CPU and the C67x floating-point CPU are described. A summary of the C62x benchmark performance is also presented. The chip-level support outside the CPU that allows the C6201 to operate in a variety of high-performance DSP environments is also described. An overview of the C6x development environment is also given, demonstrating the breadth of the development environment and illustrating the programming methodology. The article concludes with a performance analysis of the C compiler 相似文献
5.
W. Kreuzer ÖVE IEEE S. Fröhlich M. Gotschlich IEEE A. Helm B. Wess IEEE 《e & i Elektrotechnik und Informationstechnik》1998,115(1):41-47
The synthesis of efficient programs for digital signal processors with non-homogeneous register sets is still a challenge of compiler design. In this paper, we introduce the concept of a data flow graph compiler for digital signal processors. In a first step, the data flow graph is decomposed into constrained expression trees and represented by trellis trees, which allows to apply a straight-line code generation algorithm whose complexity depends just linearly on the size of the graph. Registers are assigned by taking into account the constraints of multi-function instructions. The execution time of the resulting assembly code is minimized by exploiting instruction level parallelism and memory layout optimizations. 相似文献
6.
As DSP (Digital Signal Processing) applications become more complex, there is also a growing need for new architectures supporting efficient high-level language compilers. We try to synthesize a new DSP processor architecture by adding several DSP processor specific features to a RISC core that has a compiler friendly structure, such as many general-purpose registers and orthogonal instructions. The synthesized digital signal processor supports single-cycle MAC (Multiply-and-ACcumulate), direct memory access, automatic address generation, and hardware looping capabilities in addition to ordinary RISC instructions. The compiler for the new architecture is quickly implemented by developing a code-converter that modifies the assembly codes that are generated by the RISC compiler. The performance effects of adding each of these as well as all the combined features are evaluated using seven DSP-kernel benchmarks, a QCELP vocoder, and an MPEG video decoder. The effects of CPU clock frequency change due to the addition of these features are also considered. Finally, we also compare the performances with several existing DSP processors, such as TMS320C3x, TMS320C54x, and TMS320C5x. 相似文献
7.
Many software compilers for embedded processors produce machine code of insufficient quality. Since for most applications
software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language.
In order to eliminate this bottleneck and to enable the use of high-level language compilers also for embedded software, new
code generation and optimization techniques are required. This paper describes a novel code generation technique for embedded
processors with irregular data path architectures, such as typically found in fixed-point DSPs. The proposed code generation
technique maps data flow graph representation of a program into highly efficient machine code for a target processor modeled
by instruction set behavior. High code quality is ensured by tight coupling of different code generation phases. In contrast
to earlier works, mainly based on heuristics, our approach is constraint-based. An initial set of constraints on code generation
are prescribed by the given processor model. Further constraints arise during code generation based on decisions concerning
code selection, register allocation, and scheduling. Whenever possible, decisions are postponed until sufficient information
about a good decision has been collected. The constraints are active in the "background" and guarantee local satisfiability
at any point of time during code generation. This mechanism permits to simultaneously cope with special-purpose registers
and instruction level parallelism. We describe the detailed integration of code generation phases. The implementation is based
on the constraint logic programming (CLP) language ECLiPSe. For a standard DSP, we show that the quality of generated code
comes close to hand-written assembly code. Since the input processor model can be edited by the user, also retargetability
of the code generation technique is achieved within a certain processor class.
This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献
8.
简单介绍了多DSP硬件模块的应用背景。主要介绍了基于美国德州仪器(TI)公司生产的TMS320 VC5416 DSP芯片实现的8 DSP硬件模块实现方法。该模块的结构主要包括多片DSP、FLASH程序加载、JTAG硬件仿真和FPGA等子模块。详细论述了多DSP与FPGA的连接、FLASH存储器与DSP、FPGA的连接,以及硬件仿真所用的JTAG菊花链。并且通过验证该硬件模块运行正确。 相似文献
9.
Increasing mask costs and decreasing feature sizes together with productivity demand have led to the trend of platform design. Software programmable embedded cores are used to provide the necessary flexibility in integrated systems. Facing increasing system complexity, single-issue digital signal processors (DSPs) have been replaced by cores providing the execution of several instructions in parallel. The most common programming model for multi-issue DSP core architectures is Very Long Instruction Word (VLIW) which is based on static scheduling, and enables minimization of the worst case execution time and reduces core complexity. The drawback of traditional VLIW is poor code density, which leads to high program memory requirements and, therefore, requires a large silicon area of the DSP subsystem. To overcome this problem without limiting the core performance, a scalable long instruction word (xLIW) is introduced. A special align unit is used for implementing the xLIW program memory interface. In this paper, the align unit and its main architectural feature, a scalable instruction buffer, is introduced in detail. xLIW is part of a project for a parameterized DSP core. 相似文献
10.
11.
Stan Liao Srinivas Devadas Kurt Keutzer Steve Tjiang Albert Wang 《Design Automation for Embedded Systems》1998,3(1):59-73
We address the problem of code optimization for embedded DSP microprocessors. Such processors (e.g., those in the TMS320 series) have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. In this paper we formulate and solve some optimization problems that arise in code generation for processors with irregular datapaths. In addition to instruction scheduling and register allocation, we also formulate the accumulator spilling and mode selection problems that arise in DSP microprocessors. We present optimal and heuristic algorithms that determine an instruction schedule simultaneously optimizing accumulator spilling and mode selection. Experimental results are presented. 相似文献
12.
Hen-Geul Yeh 《Industrial Electronics, IEEE Transactions on》1994,41(1):132-135
The author used two fixed-point TMS320C25 digital signal processors (DSPs) to implement in parallel the FFT. The significance of this multiprocessing system is: (1) the number of times block data transfer occurs between these two DSPs is minimum, (2) each DSP can independently perform the same FFT routine with different data set, and (3) the total computational load is nearly equally distributed to two DSPs. The speedup of this system over a single sequential processor is close to two 相似文献
13.
Ritz S. Pankert M. Zivojinovic V. Meyr H. 《Selected Areas in Communications, IEEE Journal on》1993,11(3):348-358
A synthesis environment that targets software programmable architectures such as digital signal processors (DSPs) is presented. These processors are well suited for implementation of real-time signal processing systems with medium throughput requirements. Techniques that tightly couple the synthesis environment to an existing communication system simulator are also presented. This enables a seamless transition between the simulation and implementation design level of communication systems. Special focus is on optimization techniques for mapping data flow oriented block diagrams onto DSPs. The combination of different mapping and optimization strategies allows comfortable synthesis of real-time code that is highly adapted to application-specific needs imposed by constraints on memory space, sampling rate, or latency. Thus, tradeoff analysis is supported by efficient interactive or automatic exploration of the design space. All presented concepts are illustrated by the design of a phase synchronizer with automatic gain control on a floating-point DSP 相似文献
14.
15.
TMS320F2812内存管理与FLASH引导模式实现 总被引:1,自引:0,他引:1
通过对TMS320F2812数字信号处理器内存管理与连接命令文件编写的讨论,介绍了在基于TMS320F2812系统开发中FLASH引导模式的实现过程,并针对其中存在的问题提出解决办法。该方法的运用可简化系统的软、硬件设计且具有易于实现的特点,已在一款测试仪的设计中被证明可行且应用效果良好。 相似文献
16.
17.
Kandemir M. Vijaykrishnan N. Irwin M.J. Wu Ye 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(6):801-804
Optimizing for energy constraints is of critical importance due to the proliferation of battery-operated embedded devices. Thus, it is important to explore both hardware and software solutions for optimizing energy. The focus of high-level compiler optimizations has traditionally been on improving performance. In this paper, we present an experimental evaluation of several state-of-the-art high-level compiler optimizations on energy consumption, considering both the processor core (datapath) and memory system. This is in contrast to many of the previous works that have considered them in isolation 相似文献
18.
Shuvra S. Bhattacharyya Praveen K. Murthy Edward A. Lee 《The Journal of VLSI Signal Processing》1999,21(2):151-166
The implementation of software for embedded digital signal processing (DSP) applications is an extremely complex process. The complexity arises from escalating functionality in the applications; intense time-to-market pressures; and stringent cost, power and speed constraints. To help cope with such complexity, DSP system designers have increasingly been employing high-level, graphical design environments in which system specification is based on hierarchical dataflow graphs. Consequently, a significant industry has emerged for the development of data-flow-based DSP design environments. Leading products in this industry include SPW from Cadence, COSSAP from Synopsys, ADS from Hewlett Packard, and DSP Station from Mentor Graphics. This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors. The algorithms focus primarily on the minimization of code size, and the minimization of the memory required for the buffers that implement the communication channels in the input dataflow graph. These are critical problems because programmable digital signal processors have very limited amounts of on-chip memory, and the speed, power, and cost penalties for using off-chip memory are often prohibitively high for embedded applications. Furthermore, memory demands of applications are increasing at a significantly higher rate than the rate of increase in on-chip memory capacity offered by improved integrated circuit technology. 相似文献
19.
DSP环境下C代码的手工汇编优化 总被引:3,自引:0,他引:3
由于DSP器件的特殊结构,使得该平台上C编译器的效率较低,编译生成的汇编代码含有大量冗余,无法充分发挥DSP强大的运算能力,因而对C语言程序进行手工汇编优化就成为DSP软件开发和移植中常用的方法。TMS320C5410是TI推出的一款16位定点DSP芯片,结合在该芯片上优化实现G.729语音编码压缩算法的经验,详细探讨了手工汇编优化过程中使用的优化策略以及其他注意事项。 相似文献
20.
点阵式液晶显示模块具有性能稳定、适合应用于便携式智能仪器仪表等特点,是一种较低价位、具有较高显示功能的显示器件。文中介绍了内藏液晶显示控制器T6963C的液晶显示模块的特点及其显示方式。在此基础上,给出了该液晶显示模块与基于DSP(数字信号处理器)TMS320LF2407A的嵌入式系统的硬件接口电路和部分C语言代码。最后,实现了该液晶显示模块在TMS320LF2407A的嵌入式系统中的液晶显示功能,成为该现场温度监控系统的重要组成部分。其程序与硬件逻辑图也可为其他DSP系统提供参考。 相似文献