首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We address the problem of code generation for embedded DSP systems. In such systems, it is typical for one or more digital signal processors (DSPs), program memory, and custom circuitry to be integrated onto a single IC. Consequently, the amount of silicon area that is dedicated to program memory is limited, so the embedded software must be sufficiently dense. Additionally, this software must be written so as to meet various high-performance constraints, which may include hard real-time constraints. Unfortunately, existing compiler technology is unable to generate dense, high-performance code for DSPs since it does not provide adequate support for the specialized architectural features of DSPs. These specialized features not only allow for the fast execution of common DSP operations, but they also allow for the generation of dense assembly code that specifies these operations. Thus, system designers often hand-program the embedded software in assembly, which is a very time-consuming task. In this paper, we focus on providing compiler support for one particular specialized architectural feature, namely the paged absolute addressing mode – this feature is found in two commercial DSPs, the Texas Instruments' TMS320C25 and TMS320C50 fixed-point DSPs; however, it may also be featured in application-specific processors (ASIPs). We present some machine-dependent code optimizations that improve code density by exploiting this architectural feature. Experimental results demonstrate that for a set of typical DSP benchmarks, some of our optimizations reduce overall code size and data memory consumption by an average of 5.0% and 16.0%, respectively. Our experimental vehicle throughout this research is the TMS320C25.  相似文献   

2.
We address the problem of code generation for DSP systems on a chip. In such systems, the amount of silicon devoted to program ROM is limited, so in addition to meeting various high-performance constraints, the application software must be sufficiently dense. Unfortunately, existing compiler technology is unable to generate high-quality code for DSPs since it does not provide adequate support for the specialized architectural features of DSPs. Thus, designers often resort to programming application software in assembly, which is a very tedious and time-consuming task. In this paper, we focus on providing compiler support for a group of specialized architectural features that exist in many DSPs, namely indirect addressing modes with auto-increment/decrement arithmetic. In these DSPs, an indexed addressing mode is generally not available, so automatic variables must be accessed by allocating address registers and performing address arithmetic. Subsuming address arithmetic into auto-increment /decrement arithmetic improves both the performance and size of the generated code. Our objective is to provide a method for comprehensively analyzing the performance benefits and hardware cost due to an auto-increment /decrement feature that varies from-l to +l, and allowing access to k address registers in an address generator. We provide this method via a parameterizable optimization algorithm that operates on a procedure-wise basis. Thus, the optimization techniques in a compiler can be used not only to generate efficient or compact code, but also to help the designer of a custom DSP architecture make decisions on address arithmetic features.  相似文献   

3.
Many software compilers for embedded processors produce machine code of insufficient quality. Since for most applications software must meet tight code speed and size constraints, embedded software is still largely developed in assembly language. In order to eliminate this bottleneck and to enable the use of high-level language compilers also for embedded software, new code generation and optimization techniques are required. This paper describes a novel code generation technique for embedded processors with irregular data path architectures, such as typically found in fixed-point DSPs. The proposed code generation technique maps data flow graph representation of a program into highly efficient machine code for a target processor modeled by instruction set behavior. High code quality is ensured by tight coupling of different code generation phases. In contrast to earlier works, mainly based on heuristics, our approach is constraint-based. An initial set of constraints on code generation are prescribed by the given processor model. Further constraints arise during code generation based on decisions concerning code selection, register allocation, and scheduling. Whenever possible, decisions are postponed until sufficient information about a good decision has been collected. The constraints are active in the "background" and guarantee local satisfiability at any point of time during code generation. This mechanism permits to simultaneously cope with special-purpose registers and instruction level parallelism. We describe the detailed integration of code generation phases. The implementation is based on the constraint logic programming (CLP) language ECLiPSe. For a standard DSP, we show that the quality of generated code comes close to hand-written assembly code. Since the input processor model can be edited by the user, also retargetability of the code generation technique is achieved within a certain processor class. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

4.
As new applications in embedded communications and control systems push the computational limits of digital signal processing (DSP) functions, there will be an increasing need for software applications to be migrated to hardware in the form of a hardware-software codesign system. In many cases, access to the high-level source code may not be available. It is thus desirable to have a technology to translate the software binaries intended for processors to hardware implementations. This paper provides details on the retargetable FREEDOM compiler. The compiler automatically translates DSP software binaries to register-transfer level (RTL) VHDL and Verilog for implementation on field-programmable gate arrays (FPGAs) as standalone or system-on-chip implementations. We describe the underlying optimizations and some novel algorithms for alias analysis, data dependency analysis, memory optimizations, procedure call recovery, and back-end code scheduling. Experimental results on resource usage and performance are shown for several program binaries intended for the Texas Instruments C 6211 DSP (VLIW) and the ARM 922 T reduced instruction set computer (RISC) processors. Implementation results for four kernels from the Simulink demo library and others from commonly used DSP applications, such as MPEG-4, Viterbi, and JPEG are also discussed. The compiler generated RTL code is mapped to Xilinx Virtex II and Altera Stratix FPGAs. We record overall performance gains of 1.5-26.9 for the hardware implementations of the kernels. Comparisons with the power aware compiler techniques (PACT) high-level synthesis compiler are used to show that software binaries can be used as intermediate representations from any high-level language and generate efficient hardware implementations.  相似文献   

5.
面向VLIW结构的高性能代码生成技术   总被引:1,自引:1,他引:0  
DSP处理器通过采用VLIW结构获得了高性能,同时也增加了编译器为其生成汇编代码的难度.代码生成器作为编译器的代码生成部件,是VLIW结构能够发挥性能的关键.由此提出并实现了一种基于可重定向编译框架的代码生成器.该代码生成器充分利用VLIW的体系结构特点,支持SIMD指令,支持谓词执行,能够生成高度指令级并行的汇编代码,显著提高应用程序的执行性能.  相似文献   

6.
7.
Designing compilers for Explicitly Parallel Instruction Computing (EPIC.) architectures presents challenges substantially different from those encountered in designing compilers for traditional sequential architectures. These challenges are addressed not only by employing new optimizations that are specific to EPIC, but also by employing new ways to architect compilers. EPIC architectures provide features that allow compilers to take a proactive role in exploiting instruction level parallelism. Compiler technology is intimately intertwined with the target processor architecture, and compiler architects must solve new analysis and optimization problems to achieve the highest levels of performance. When complex optimizations are uniformly applied to large applications, the resulting slow compile speeds are unacceptable. Demanding requirements to produce high-quality code at high compile speed shapes the fundamental structure of EPIC compilers  相似文献   

8.
Due to the everlasting consumer demand for more complex applications, embedded systems have evolved both in terms of complexity and heterogeneity. The architecture of such systems often includes several kinds of different computing resources (DSPs, GPUs, etc.). As a consequence, software designers are facing significant performance and portability issues to target these devices. Software relies more and more on virtualization technologies to maximize portability of applications. In order to balance portability and performance, most virtualization technologies leverage Just-in-time (JIT) compilation to provide runtime optimized code from portable one. Nevertheless, the efficiency of JIT compilation depends on the ability to compensate its overhead with execution speedups of generated code. While most research efforts focus on limiting overhead of JIT compilation phases by reducing their occurrences, this paper investigates opportunities of speeding up JIT compilation itself. We first present a performance analysis of different JIT compilation technologies in order to identify hardware and software optimization opportunities. Second, we propose a solution based on a dedicated processor with specialized instructions for critical functions of JIT compilers. These specialized instructions provide an average 5× speedup on manipulations of associative arrays and dynamic memory allocation. Based on the LLVM framework, we show a 15% overall speedup on code generator’s execution time. Because our specialized instructions are hidden behind standard libraries, we also argue that these instructions may be transparently reused for a wider range of applications.  相似文献   

9.
10.
Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement.  相似文献   

11.
To meet strict speed and power requirements for embedded applications, many high-end digital Signal Processors (DSPs) commonly employ non-orthogonal architectures that are typically characterized by irregular data paths, heterogeneous registers, and multiple memory banks. Obviously to harvest the benefits provided by this non-orthogonal architecture sufficient compiler support is necessary and important. However, the complexity of such architectures presents a great challenge to compiler design and the usual compilation techniques for general-purpose CPUs do not adapt well to the irregularity of DSP. The entire code generation process must include the following phases: intermediate representation, code compaction, instruction scheduling, memory bank assignment (or variable partition), and register/accumulator assignment. Much related research only considers some phases, which is inadequate. In this paper, we present an effective code generation algorithm named Rotation Scheduling with Spill Codes Predicting (RSSP) to maximally exploit the benefits of non-orthogonal architectures. It contains six parts that cover almost the entire phases of the code generation process. As well as introducing the detailed principles and algorithms of the proposed RSSP, we use an analytic model to evaluate its preliminary performance. Evaluation results clearly demonstrate the effectiveness of the proposed method. Furthermore, we also present some preliminary ideas to generalize RSSP, which can make it more practicable and suit various DSPs with similar architectural features.
Cheng Chen (Corresponding author)Email:
  相似文献   

12.
DSP实时操作系统及其应用   总被引:1,自引:1,他引:0  
张欣 《现代电子技术》2004,27(1):4-6,10
由DSP处理器单独承担原来需要微控器和DSP处理器共同完成的任务的新一代DSP处理系统,已经开始成为嵌入式DSP系统开发领域主流。而且为了有效的发挥DSP处理器不断增加的性能,一个DSP已经开始用于同时并发的多个任务处理。在多任务或多个DSP处理器的系统中采用实时操作系统可以有效的降低开发难度,提高系统的可靠性和可升级性。本文对基于VDK的DSP实时操作系统内核进行了研究,详细描述了内核采用的多线程调度机制,并以一个多任务应用系统为例,实现了新线程的创建和取消,多线程之间的优先级排列和调度策略,给出了其API函数使用方法。  相似文献   

13.
Retargetable pipeline hazard detection for partially bypassed processors   总被引:1,自引:0,他引:1  
Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.  相似文献   

14.
15.
The increasing use of embedded software, often implemented on a core processor in a single-chip system, is a clear trend in the telecommunications, multimedia, and consumer electronics industries. A companion paper (Paulin et al., 1997) presents a survey of application and architecture trends for embedded systems in these growth markets. However, the lack of suitable design technology remains a significant obstacle in the development of such systems. One of the key requirements is more efficient software compilation technology. Especially in the case of fixed-point digital signal processor (DSP) cores, it is often cited that commercially available compilers are unable to take full advantage of the architectural features of the processor. Moreover, due to the shorter lifetimes and the architectural specialization of many processor cores, processor designers are often compelled to neglect the issue of compiler support. This situation has resulted in an increased research activity in the area of design tool support for embedded processors. This paper discusses design technology issues for embedded systems using processor cores, with a focus on software compilation tools. Architectural characteristics of contemporary processor cores are reviewed and tool requirements are formulated. This is followed by a comprehensive survey of both existing and new software compilation techniques that are considered important in the context of embedded processors  相似文献   

16.
An essential component of today's embedded system is an instruction-set processor running real-time software. All variations of these core components contain at least the minimum data-flow processing capabilities, while a certain class contain specialized units for highly data-intensive operations for Digital Signal Processing (DSP). For the required level of memory interaction, the parallel executing Address Calculation Unit (ACU) is often used to tune the architecture to the memory access characteristics of the application. The design of the ACU is performance critical. In today's typical design flow, this design task is somewhat driven by intuition as the transformation from application algorithm to architecture is complex and the exploration space is immense. Automatic utilities to aid the designer are essential; however, the key compilation techniques which map high-level language constructs onto addressing units have lagged far behind the emergence of these units. This paper presents a new retargetable approach and prototype tool for the analysis of array references and traversals for efficient use of ACUs. In addition to being an enhancement to existing compiler systems, the ArrSyn utility may be used as an aid to architecture exploration. A simple specification of the addressing resources and basic operations drives the available transformations and allows the designer to quickly evaluate the effects on speed and code size of his/her algorithm. Thus, the designer can tune the design of the ACU toward the application constraints. ArrSyn has been successfully used together with a C compiler developed for a VLIW architecture for an MPEG audio decoding application. The combination of these methods with the C compiler showed on average a 39% speedup and 29% code size reduction for a representative set of DSP benchmarks.  相似文献   

17.
DM642中的C编译器可以将C语言程序自动编译转化为C6000DSP汇编程序,这样开发者就可以直接利用高级语言实现DSP软件的初步设计,缩短开发周期。但面向DSP的C语言程序是针对极少数据点的实时处理过程,因而需要对C代码进行优化,以满足实时性。在此以一个图像压缩算法JPEG在TMS320DM642中的实现为例,简要说明C代码在DM642中的优化过程。  相似文献   

18.
19.
20.
This paper addresses instruction-level parallelism in code generation for digital signal processors (DSPs). In the presence of potential parallelism, the task of code generation includes code compaction, which parallelizes primitive processor operations under given dependency and resource constraints. Furthermore, DSP algorithms in most cases are required to guarantee real-time response. Since the exact execution speed of a DSP program is only known after compaction, real-time constraints should be taken into account during the compaction phase. While previous DSP code generators rely on rigid heuristics for compaction, we propose a novel approach to exact local code compaction based on an integer programming (IP) model, which handles time constraints. Due to a general problem formulation, the IP model also captures encoding restrictions and handles instructions having alternative encodings and side effects and therefore applies to a large class of instruction formats. Capabilities and limitations of our approach are discussed for different DSPs  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号