期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Retargetable Compilation Methodology for Embedded Digital Signal Processors Using a Machine-Dependent Code Optimization Library

Ashok Sudarsanam Sharad Malik Masahiro Fujita 《Design Automation for Embedded Systems》1999,4(2-3):187-206

We address the problem of code generation for embedded DSP systems. Such systems devote a limited quantity of silicon to program memory, so the embedded software must be sufficiently dense. Additionally, this software must be written so as to meet various high-performance constraints. Unfortunately, current compiler technology is unable to generate dense, high-performance code for DSPs, due to the fact that it does not provide adequate support for the specialized architectural features of DSPs via machine-dependent code optimizations. Thus, designers often program the embedded software in assembly, a very time-consuming task. In order to increase productivity, compilers must be developed that are capable of generating high-quality code for DSPs. The compilation process must also be made retargetable, so that a variety of DSPs may be efficiently evaluated for potential use in an embedded system. We present a retargetable compilation methodology that enables high-quality code to be generated for a wide range of DSPs. Previous work in retargetable DSP compilation has focused on complete automation, and this desire for automation has limited the number of machine-dependent optimizations that can be supported. In our efforts, we have given code quality higher priority over complete automation. We demonstrate how by using a library of machine-dependent optimization routines accessible via a programming interface, it is possible to support a wide range of machine-dependent optimizations, albeit at some cost to automation. Experimental results demonstrate the effectiveness of our methodology, which has been used to build good-quality compilers for three fixed-point DSPs. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

2.

Analysis and Evaluation of Address Arithmetic Capabilities in Custom DSP Architectures

Ashok Sudarsanam Stan Liao Srinivas Devadas 《Design Automation for Embedded Systems》1999,4(1):5-22

We address the problem of code generation for DSP systems on a chip. In such systems, the amount of silicon devoted to program ROM is limited, so in addition to meeting various high-performance constraints, the application software must be sufficiently dense. Unfortunately, existing compiler technology is unable to generate high-quality code for DSPs since it does not provide adequate support for the specialized architectural features of DSPs. Thus, designers often resort to programming application software in assembly, which is a very tedious and time-consuming task. In this paper, we focus on providing compiler support for a group of specialized architectural features that exist in many DSPs, namely indirect addressing modes with auto-increment/decrement arithmetic. In these DSPs, an indexed addressing mode is generally not available, so automatic variables must be accessed by allocating address registers and performing address arithmetic. Subsuming address arithmetic into auto-increment /decrement arithmetic improves both the performance and size of the generated code. Our objective is to provide a method for comprehensively analyzing the performance benefits and hardware cost due to an auto-increment /decrement feature that varies from-l to +l, and allowing access to k address registers in an address generator. We provide this method via a parameterizable optimization algorithm that operates on a procedure-wise basis. Thus, the optimization techniques in a compiler can be used not only to generate efficient or compact code, but also to help the designer of a custom DSP architecture make decisions on address arithmetic features. 相似文献

3.

An Overview of a Compiler for Mapping Software Binaries to Hardware

Mittal G. Zaretsky D. Xiaoyong Tang Banerjee P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(11):1177-1190

As new applications in embedded communications and control systems push the computational limits of digital signal processing (DSP) functions, there will be an increasing need for software applications to be migrated to hardware in the form of a hardware-software codesign system. In many cases, access to the high-level source code may not be available. It is thus desirable to have a technology to translate the software binaries intended for processors to hardware implementations. This paper provides details on the retargetable FREEDOM compiler. The compiler automatically translates DSP software binaries to register-transfer level (RTL) VHDL and Verilog for implementation on field-programmable gate arrays (FPGAs) as standalone or system-on-chip implementations. We describe the underlying optimizations and some novel algorithms for alias analysis, data dependency analysis, memory optimizations, procedure call recovery, and back-end code scheduling. Experimental results on resource usage and performance are shown for several program binaries intended for the Texas Instruments C 6211 DSP (VLIW) and the ARM 922 T reduced instruction set computer (RISC) processors. Implementation results for four kernels from the Simulink demo library and others from commonly used DSP applications, such as MPEG-4, Viterbi, and JPEG are also discussed. The compiler generated RTL code is mapped to Xilinx Virtex II and Altera Stratix FPGAs. We record overall performance gains of 1.5-26.9 for the hardware implementations of the kernels. Comparisons with the power aware compiler techniques (PACT) high-level synthesis compiler are used to show that software binaries can be used as intermediate representations from any high-level language and generate efficient hardware implementations. 相似文献

4.

High VelociTI processing [Texas Instruments VLIW DSP architecture]

《Signal Processing Magazine, IEEE》1998,15(2)

The Texas Instruments VelociTI architecture is a very long instruction word (VLIW) architecture. The TMS320C6x family of digital signal processors (DSPs) is the first to employ the VelociTI architecture, with the TMS3206201 (C6201) being the first device in this family. The C6201 is based on the fixed-point TMS320C62x (C62x) CPU. This article describes the VelociTI VLIW architecture and discusses the C62x, C67x, C6201, and the VelociTI development tools. An overview of the VelociTI including architectural principles, data path, instruction set, and pipeline operation is presented, and both the C62x fixed-point CPU and the C67x floating-point CPU are described. A summary of the C62x benchmark performance is also presented. The chip-level support outside the CPU that allows the C6201 to operate in a variety of high-performance DSP environments is also described. An overview of the C6x development environment is also given, demonstrating the breadth of the development environment and illustrating the programming methodology. The article concludes with a performance analysis of the C compiler 相似文献

5.

Übersetzung von Datenflußgraphen in optimierte Assemblerprogramme für Signalprozessoren

W. Kreuzer ÖVE IEEE S. Fröhlich M. Gotschlich IEEE A. Helm B. Wess IEEE 《e & i Elektrotechnik und Informationstechnik》1998,115(1):41-47

The synthesis of efficient programs for digital signal processors with non-homogeneous register sets is still a challenge of compiler design. In this paper, we introduce the concept of a data flow graph compiler for digital signal processors. In a first step, the data flow graph is decomposed into constrained expression trees and represented by trellis trees, which allows to apply a straight-line code generation algorithm whose complexity depends just linearly on the size of the graph. Registers are assigned by taking into account the constraints of multi-function instructions. The execution time of the resulting assembly code is minimized by exploiting instruction level parallelism and memory layout optimizations. 相似文献

6.

A Compiler-Friendly RISC-Based Digital Signal Processor Synthesis and Performance Evaluation

Jiyang Kang Jongbok Lee Wonyong Sung 《The Journal of VLSI Signal Processing》2001,27(3):297-312

As DSP (Digital Signal Processing) applications become more complex, there is also a growing need for new architectures supporting efficient high-level language compilers. We try to synthesize a new DSP processor architecture by adding several DSP processor specific features to a RISC core that has a compiler friendly structure, such as many general-purpose registers and orthogonal instructions. The synthesized digital signal processor supports single-cycle MAC (Multiply-and-ACcumulate), direct memory access, automatic address generation, and hardware looping capabilities in addition to ordinary RISC instructions. The compiler for the new architecture is quickly implemented by developing a code-converter that modifies the assembly codes that are generated by the RISC compiler. The performance effects of adding each of these as well as all the combined features are evaluated using seven DSP-kernel benchmarks, a QCELP vocoder, and an MPEG video decoder. The effects of CPU clock frequency change due to the addition of these features are also considered. Finally, we also compare the performances with several existing DSP processors, such as TMS320C3x, TMS320C54x, and TMS320C5x. 相似文献

7.

Phase-Coupled Mapping of Data Flow Graphs to Irregular Data Paths

Steven Bashford Rainer Leupers 《Design Automation for Embedded Systems》1999,4(2-3):119-165

Many software compilers for embedded processors produce machine code of insufficient quality. Since for most applications software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language. In order to eliminate this bottleneck and to enable the use of high-level language compilers also for embedded software, new code generation and optimization techniques are required. This paper describes a novel code generation technique for embedded processors with irregular data path architectures, such as typically found in fixed-point DSPs. The proposed code generation technique maps data flow graph representation of a program into highly efficient machine code for a target processor modeled by instruction set behavior. High code quality is ensured by tight coupling of different code generation phases. In contrast to earlier works, mainly based on heuristics, our approach is constraint-based. An initial set of constraints on code generation are prescribed by the given processor model. Further constraints arise during code generation based on decisions concerning code selection, register allocation, and scheduling. Whenever possible, decisions are postponed until sufficient information about a good decision has been collected. The constraints are active in the "background" and guarantee local satisfiability at any point of time during code generation. This mechanism permits to simultaneously cope with special-purpose registers and instruction level parallelism. We describe the detailed integration of code generation phases. The implementation is based on the constraint logic programming (CLP) language ECLiPSe. For a standard DSP, we show that the quality of generated code comes close to hand-written assembly code. Since the input processor model can be edited by the user, also retargetability of the code generation technique is achieved within a certain processor class. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

8.

一种公共的多DSP硬件模块实现方法

沈会敏《无线电工程》2007,37(6):57-59

简单介绍了多DSP硬件模块的应用背景。主要介绍了基于美国德州仪器(TI)公司生产的TMS320 VC5416 DSP芯片实现的8 DSP硬件模块实现方法。该模块的结构主要包括多片DSP、FLASH程序加载、JTAG硬件仿真和FPGA等子模块。详细论述了多DSP与FPGA的连接、FLASH存储器与DSP、FPGA的连接,以及硬件仿真所用的JTAG菊花链。并且通过验证该硬件模块运行正确。相似文献

9.

A scalable instruction buffer and align unit for xDSPcore

Panis C. Grunbacher H. Nurmi J. 《Solid-State Circuits, IEEE Journal of》2004,39(7):1094-1100

Increasing mask costs and decreasing feature sizes together with productivity demand have led to the trend of platform design. Software programmable embedded cores are used to provide the necessary flexibility in integrated systems. Facing increasing system complexity, single-issue digital signal processors (DSPs) have been replaced by cores providing the execution of several instructions in parallel. The most common programming model for multi-issue DSP core architectures is Very Long Instruction Word (VLIW) which is based on static scheduling, and enables minimization of the worst case execution time and reduces core complexity. The drawback of traditional VLIW is poor code density, which leads to high program memory requirements and, therefore, requires a large silicon area of the DSP subsystem. To overcome this problem without limiting the core performance, a scalable long instruction word (xLIW) is introduced. A special align unit is used for implementing the xLIW program memory interface. In this paper, the align unit and its main architectural feature, a scalable instruction buffer, is introduced in detail. xLIW is part of a project for a parameterized DSP core. 相似文献

10.

高质量0.6 Kb/s声码器的TMS320VC55x实现 总被引：1，自引：0，他引：1

田秋玲崔慧娟唐昆《电声技术》2005,(8):50-53

给出了一种编码速率为600b／s的高质量声码器算法及基于DSP芯片的硬件实现。介绍了语音编解码算法原理、声码器系统的硬件结构、工作流程以及软件实现与代码优化。针对C55xDSP芯片的结构特点,采用C与汇编混合编程,汇编指令优化等方法,大大降低了算法的存储复杂度和运算复杂度,达到了实时性要求。相似文献

11.

Code Optimization Techniques in Embedded DSP Microprocessors

Stan Liao Srinivas Devadas Kurt Keutzer Steve Tjiang Albert Wang 《Design Automation for Embedded Systems》1998,3(1):59-73

We address the problem of code optimization for embedded DSP microprocessors. Such processors (e.g., those in the TMS320 series) have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. In this paper we formulate and solve some optimization problems that arise in code generation for processors with irregular datapaths. In addition to instruction scheduling and register allocation, we also formulate the accumulator spilling and mode selection problems that arise in DSP microprocessors. We present optimal and heuristic algorithms that determine an instruction schedule simultaneously optimizing accumulator spilling and mode selection. Experimental results are presented. 相似文献

12.

Parallel implementation of the fast Fourier transform on twoTMS320C25 digital signal processors

Hen-Geul Yeh 《Industrial Electronics, IEEE Transactions on》1994,41(1):132-135

The author used two fixed-point TMS320C25 digital signal processors (DSPs) to implement in parallel the FFT. The significance of this multiprocessing system is: (1) the number of times block data transfer occurs between these two DSPs is minimum, (2) each DSP can independently perform the same FFT routine with different data set, and (3) the total computational load is nearly equally distributed to two DSPs. The speedup of this system over a single sequential processor is close to two 相似文献

13.

High-level software synthesis for the design of communicationsystems

Ritz S. Pankert M. Zivojinovic V. Meyr H. 《Selected Areas in Communications, IEEE Journal on》1993,11(3):348-358

A synthesis environment that targets software programmable architectures such as digital signal processors (DSPs) is presented. These processors are well suited for implementation of real-time signal processing systems with medium throughput requirements. Techniques that tightly couple the synthesis environment to an existing communication system simulator are also presented. This enables a seamless transition between the simulation and implementation design level of communication systems. Special focus is on optimization techniques for mapping data flow oriented block diagrams onto DSPs. The combination of different mapping and optimization strategies allows comfortable synthesis of real-time code that is highly adapted to application-specific needs imposed by constraints on memory space, sampling rate, or latency. Thus, tradeoff analysis is supported by efficient interactive or automatic exploration of the design space. All presented concepts are illustrated by the design of a phase synchronizer with automatic gain control on a floating-point DSP 相似文献

14.

基于DSP平台的AVS视频编码器设计优化

张妍薛永林赵康《电视技术》2008,32(12)

介绍了在TMs320DM6446 DSP平台上实现AVS视频编码器的算法设计与优化方法.在软件整体设计优化的基础上,重点对运动估计等算法进行了优化改进;同时针对平台特点给出结构优化方法.主要包括提高代码并行性及存储器和数据搬移的优化.测试结果表明,通过优化,在保证图像质量损失较小的情况下,编码器的编码速率有显著提高. 相似文献

15.

TMS320F2812内存管理与FLASH引导模式实现 总被引：1，自引：0，他引：1

奚刚吴清华《现代电子技术》2007,30(22):58-60

通过对TMS320F2812数字信号处理器内存管理与连接命令文件编写的讨论,介绍了在基于TMS320F2812系统开发中FLASH引导模式的实现过程,并针对其中存在的问题提出解决办法。该方法的运用可简化系统的软、硬件设计且具有易于实现的特点,已在一款测试仪的设计中被证明可行且应用效果良好。相似文献

16.

G.718宽带语音编解码器的DSP实现及优化

申星海陈德宏王春柳《电声技术》2016,40(1)

G.718是ITU-T最新提出的一种嵌入式可变速率宽带语音和音频编解码标准,该算法将语音信号进行分类编码,算法复杂度大大增加,但可以在窄带和宽带均达到极佳的语音质量.在分析其算法原理和关键技术的基础上,结合TMS320C55x系列DSP平台和G.718算法特点,提出了合理的汇编优化实现方案,在TMS320C5505EVM上完成了实时宽带语音编解码器.实验测试表明,G.718算法的语音质量优于同类型其他算法的宽带语音编解码器. 相似文献

17.

Influence of compiler optimizations on system power

Kandemir M. Vijaykrishnan N. Irwin M.J. Wu Ye 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(6):801-804

Optimizing for energy constraints is of critical importance due to the proliferation of battery-operated embedded devices. Thus, it is important to explore both hardware and software solutions for optimizing energy. The focus of high-level compiler optimizations has traditionally been on improving performance. In this paper, we present an experimental evaluation of several state-of-the-art high-level compiler optimizations on energy consumption, considering both the processor core (datapath) and memory system. This is in contrast to many of the previous works that have considered them in isolation 相似文献

18.

Synthesis of Embedded Software from Synchronous Dataflow Specifications 总被引：3，自引：0，他引：3

Shuvra S. Bhattacharyya Praveen K. Murthy Edward A. Lee 《The Journal of VLSI Signal Processing》1999,21(2):151-166

The implementation of software for embedded digital signal processing (DSP) applications is an extremely complex process. The complexity arises from escalating functionality in the applications; intense time-to-market pressures; and stringent cost, power and speed constraints. To help cope with such complexity, DSP system designers have increasingly been employing high-level, graphical design environments in which system specification is based on hierarchical dataflow graphs. Consequently, a significant industry has emerged for the development of data-flow-based DSP design environments. Leading products in this industry include SPW from Cadence, COSSAP from Synopsys, ADS from Hewlett Packard, and DSP Station from Mentor Graphics. This paper reviews a set of algorithms for compiling dataflow programs for embedded DSP applications into efficient implementations on programmable digital signal processors. The algorithms focus primarily on the minimization of code size, and the minimization of the memory required for the buffers that implement the communication channels in the input dataflow graph. These are critical problems because programmable digital signal processors have very limited amounts of on-chip memory, and the speed, power, and cost penalties for using off-chip memory are often prohibitively high for embedded applications. Furthermore, memory demands of applications are increasing at a significantly higher rate than the rate of increase in on-chip memory capacity offered by improved integrated circuit technology. 相似文献

19.

DSP环境下C代码的手工汇编优化 总被引：3，自引：0，他引：3

刘浩然王冠《现代电子技术》2006,29(12):92-95

由于DSP器件的特殊结构,使得该平台上C编译器的效率较低,编译生成的汇编代码含有大量冗余,无法充分发挥DSP强大的运算能力,因而对C语言程序进行手工汇编优化就成为DSP软件开发和移植中常用的方法。TMS320C5410是TI推出的一款16位定点DSP芯片,结合在该芯片上优化实现G.729语音编码压缩算法的经验,详细探讨了手工汇编优化过程中使用的优化策略以及其他注意事项。相似文献

20.

内藏液晶显示控制器T6963C与DSP的接口应用

何轩徐国旺阎旭东占林松《电子工程师》2007,33(4):41-43

点阵式液晶显示模块具有性能稳定、适合应用于便携式智能仪器仪表等特点,是一种较低价位、具有较高显示功能的显示器件。文中介绍了内藏液晶显示控制器T6963C的液晶显示模块的特点及其显示方式。在此基础上,给出了该液晶显示模块与基于DSP(数字信号处理器)TMS320LF2407A的嵌入式系统的硬件接口电路和部分C语言代码。最后,实现了该液晶显示模块在TMS320LF2407A的嵌入式系统中的液晶显示功能,成为该现场温度监控系统的重要组成部分。其程序与硬件逻辑图也可为其他DSP系统提供参考。相似文献