首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We address the problem of code generation for embedded DSP systems. Such systems devote a limited quantity of silicon to program memory, so the embedded software must be sufficiently dense. Additionally, this software must be written so as to meet various high-performance constraints. Unfortunately, current compiler technology is unable to generate dense, high-performance code for DSPs, due to the fact that it does not provide adequate support for the specialized architectural features of DSPs via machine-dependent code optimizations. Thus, designers often program the embedded software in assembly, a very time-consuming task. In order to increase productivity, compilers must be developed that are capable of generating high-quality code for DSPs. The compilation process must also be made retargetable, so that a variety of DSPs may be efficiently evaluated for potential use in an embedded system. We present a retargetable compilation methodology that enables high-quality code to be generated for a wide range of DSPs. Previous work in retargetable DSP compilation has focused on complete automation, and this desire for automation has limited the number of machine-dependent optimizations that can be supported. In our efforts, we have given code quality higher priority over complete automation. We demonstrate how by using a library of machine-dependent optimization routines accessible via a programming interface, it is possible to support a wide range of machine-dependent optimizations, albeit at some cost to automation. Experimental results demonstrate the effectiveness of our methodology, which has been used to build good-quality compilers for three fixed-point DSPs. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

2.
车德亮  沈绪榜  王忠 《信号处理》2005,21(5):534-538
由于传统的内嵌地址产生器不能有效地支持数字信号处理应用的需要,在开发面向航天应用的高速信号处理器LS-DSP时,设计支持数字信号处理应用的地址产生器成为LS-DSP开发中的重要环节。本文通过研究常用的数字信号处理计算的数据地址运算特点,提出了LS-DSP地址产生器的生成算法。在根据该算法逻辑实现LS-DSP地址产生器时,为了减小地址产生器面积,针对循环类地址计算又提出了一种快速的动态START、END产生方法。实验结果表明,LS-DSP使用本文的地址产生器比采用传统的地址产生器可有效的提高数字信号处理运算的速度。  相似文献   

3.
Mixed-Mode BIST Using Embedded Processors   总被引:2,自引:0,他引:2  
In complex systems, embedded processors may be used to run software routines for test pattern generation and response evaluation. For system components which are not completely random pattern testable, the test programs have to generate deterministic patterns after random testing. Usually the random test part of the program requires long run times whereas the part for deterministic testing has high memory requirements.In this paper it is shown that an appropriate selection of the random pattern test method can significantly reduce the memory requirements of the deterministic part. A new, highly efficient scheme for software-based random pattern testing is proposed, and it is shown how to extend the scheme for deterministic test pattern generation. The entire test scheme may also be used for implementing a scan based BIST in hardware.  相似文献   

4.
一种高效的FFT处理器地址快速生成方法   总被引:3,自引:0,他引:3  
地址产生器是FFT处理器的主要组成部分,地址快速生成和旋转因子读取次数是它的两个重要指标,但很少有算法能够将其统一起来。本文采取了一种新的操作数地址生成顺序并构造了一种新的FFT循环级数表示方法,基于操作数地址的位倒序方式,提出了一种兼有地址简单快速生成与避免重复读取旋转因子特点的可变长地址生成方法,解决了以往地址产生时生成速度与旋转因子重复读取之间的矛盾,实现了快速和降低系统功耗的统一。  相似文献   

5.
We address the problem of code optimization for embedded DSP microprocessors. Such processors (e.g., those in the TMS320 series) have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. In this paper we formulate and solve some optimization problems that arise in code generation for processors with irregular datapaths. In addition to instruction scheduling and register allocation, we also formulate the accumulator spilling and mode selection problems that arise in DSP microprocessors. We present optimal and heuristic algorithms that determine an instruction schedule simultaneously optimizing accumulator spilling and mode selection. Experimental results are presented.  相似文献   

6.
In this paper we present an efficient data fetch circuitry to retrieve several operands from a n-way parallel memory system in a single machine cycle. The proposed address generation unit operates with an improved version of the low-order parallel memory access approach. Our design supports data structures of arbitrary lengths and different odd strides. The experimental results show that our address generation unit is capable of generating eight 32 − bit addresses every 6 ns for different strides when implemented on a VIRTEX-II PRO xc2vp30-7ff1696 FPGA device using only trivial hardware resources.
Georgi N. GaydadjievEmail:

Carlo Galuzzi   received the M.Sc. in Mathematics (summa cum laude) from Università Degli Studi di Milano, Italy in 2003. He is currently at the final stage of his Ph.D. in Computer Engineering at TU Delft, The Netherlands. He is a reviewer for more than 20 international conferences and journals. He served as publication chair for many conferences, e.g. MICRO-41, SAMOS 2006-08, DTIS 2007. His research interests include instruction-set extension, hardware-software partitioning and graph theory. Carlo received the best paper award at ARC 2008. Chunyang Gou   was born in Sichuan, China in 1981. He received the Bachelor degree from University of Electronic Science and Technology of China (UESTC), Chengdu, China in 2003 and the MSc degree from Tsinghua University, Beijing, China in 2006. He is currently working towards the Ph.D. in Computer Engineering in the Delft University of Technology, The Netherlands. His research interests include computer architecture in general, with particular emphasis on high-performance memory hierarchies. Humberto Calderón   was born in La Paz, Bolivia, in 1964. He received the M.Sc. degree in Computer Sciences from the ITCR (Costa Rica) in 1997 and the Ph.D. degree in computer engineering from TU Delft, The Netherlands, in 2007. His current research interests include reconfigurable computing, multimedia embedded systems, computer arithmetic, intelligent control and robotics. He currently joined the “Istituto Italiano di Tecnologia in Genova, Italy, as a senior engineer and researcher. Georgi N. Gaydadjiev   was born in Plovdiv, Bulgaria, in 1964. He is currently assistant professor at the Computer Engineering Laboratory, Delft University of Technology, The Netherlands. His research and development industrial experience includes more than 15 years in hardware and software design at System Engineering Ltd. in Pravetz Bulgaria and Pijnenburg Microelectronics and Software B.V. in Vught, the Netherlands. His research interests include: embedded systems design, advanced computer architectures, hardware/software co-design, VLSI design, cryptographic systems and computer systems testing. Georgi has been a member of many conference program committees at different levels, e.g. ISC, ICS, Computing Frontiers, ICCD, HiPC and more. He was program chair of SAMOS in 2006 and was a general chair in 2007. Georgi received the best paper awards at Usenix/SAGE LISA 2006 and WiSTP 2007. He is IEEE and ACM member. Stamatis Vassiliadis   (M’86-SM’92-F’97) was born in Manolates, Samos, Greece 1951. Regrettably, Prof. Vassiliadis deceased in April 2007. He was a chair professor in the Electrical Engineering department of Delft University of Technology (TU Delft), The Netherlands. He had also served in the EE faculties of Cornell University, Ithaca, NY and the State University of New York (S.U.N.Y.), Binghamton, NY. He worked for a decade with IBM where he had been involved in a number of advanced research and development projects. For his work he received numerous awards including 24 publication awards, 15 invention awards and an outstanding innovation award for engineering/scientific hardware design. His 72 USA patents rank him as the top all time IBM inventor. Dr. Vassiliadis received an honorable mention Best Paper award at the ACM/IEEE MICRO25 in 1992 and Best Paper awards in the IEEE CAS (1998, 2001), IEEE ICCD (2001), PDCS (2002) and the best poster award in the IEEE NANO (2005). He is an IEEE and ACM fellow and a member of the Royal Dutch Academy of Science.   相似文献   

7.
针对嵌入式应用的特点,设计了一种基于RAM比较TAG的分支目标缓冲器(BTB),并通过硬件模拟方法(BTB控制逻辑用RTL实现,存储体用定制逻辑实现)研究BTB结构参数对BTB的性能、能耗以及对整个处理器系统的性能和能耗的影响,根据仿真结果选取应用于嵌入式处理器的最优BTB结构参数.根据该参数,进一步设计基于CAM比较TAG的BTB,经SPEC2000评测,相对于基于RAM比较TAG的BTB,基于CAM比较TAG的BTB可使功耗降低37.17%.  相似文献   

8.
Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded processors. Unlike conventional general purpose processors, ASIPs and embedded processors typically run a single application and hence must be optimized extensively for this in order to extract maximum performance. Further, low power and low cost requirements of ASIPs may demand reuse of pipeline stages causing pipelines with complex structural hazards. In such architectures, exploiting higher ILP is a major challenge to the designer.Existing techniques deal with either scheduling hardware pipelines to obtain higher throughput or software pipelining—an instruction scheduling technique for iterative computation—for exploiting greater ILP. We integrate these techniques to co-schedule hardware and software pipelines to achieve greater instruction throughput. In this paper, we develop the underlying theory of Co-Scheduling, called the Modulo-Scheduled Pipeline (or MS-Pipeline) theory. More specifically, we establish the necessary and sufficient condition for achieving the maximum throughput in a given pipeline operating under modulo scheduling. Further, we establish a sufficient condition to achieve a specified throughput, based on which we also develop a methodology for designing the hardware pipelines that achieve such a throughput. Further, we present initial experimental results which help to establish the usefulness of MS-pipeline theory in software pipelining. As the proposed theory helps to analyze and improve the throughput of Modulo-Scheduled Pipelines (MS-pipelines), it is especially useful in designing ASIPs and embedded processors.  相似文献   

9.
基于Matlab与STM32的电机控制代码自动生成   总被引:1,自引:2,他引:1  
基于Matlab与STM32的代码自动生成方法与无刷直流电机控制系统相结合,使得控制系统的设计与实现更为方便快捷。通过对无刷直流电机控制系统进行Matlab仿真模型设计,再利用针对STM32微型控制器的Simulink库STM32 MAT/Target与相关软件工具,实现可读、可移植的C代码工程文件的自动生成,并在STM32F103上实际运行,其运行状态与仿真结果基本一致。该方法既通过Matlab的仿真对控制系统进行精确的设计,又利用其自动生成代码的特点便于控制算法的实现,两者相互结合,在保证准确性的同时加快了从设计到实现的整个过程。  相似文献   

10.
地址产生部件(AGU)是DSP芯片的重要组成部分,通过支持多种寻址模式,提高了指令的执行效率.详细介绍了某嵌入式DSP的寻址模式及其指令编码结构,在此基础上设计了该DSP的AGU,使其不仅支持几种特殊的寻址模式,还支持单周期三寻址操作.最后对该AGU进行了优化.结果表明,优化后的AGU在改善性能和功耗的同时能够有效减少数字信号处理算法的执行周期.  相似文献   

11.
This paper proposes a software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors. CALiBeR can be used by embedded systemdesigners to explore different code optimization alternatives, that is, high-qualitycustomized retiming solutions for desired throughput and program memory sizerequirements, while minimizing register pressure. An extensive set of experimentalresults is presented, demonstrating that our algorithm compares favorablywith one of the best state-of-the-art algorithms, achieving up to 50% improvementin performance and up to 47% improvement in register requirements. In orderto empirically assess the effectiveness of clustering for high ILP applications,additional experiments are presented contrasting the performance achievedby software pipelined kernels executing on clustered and on centralized machines.  相似文献   

12.
Architectural verification is a critical aspect of the microprocessor design cycle. In this paper, we present a design verification environment centered around a biased random instruction generator for simulation-based architectural verification of pipelined microprocessors. The instruction generator uses biases specified by the user to generate instruction sequences for simulation. These biases are not hard-coded and can thus be changed depending on the specific areas in the design and type of design errors being targeted. Correctness checking is achieved using assertion checking and end-of-state comparison with a high-level architectural model. Several architectural-level errors are introduced into a behavioral model of the DLX processor to investigate the processor's response in the presence of design errors. Simulation experiments conducted using the behavioral model show that biased random instruction sequences provide higher coverage of RTL conditional branches and design errors than random instruction sequences or manually-generated test programs. Furthermore, instruction sequences containing a high percentage of read-after-write (RAW) and control dependencies are the most useful.  相似文献   

13.
余乐  李任伟  王瑶  李洋洋  吴超  贾瑞 《电子学报》2018,46(4):992-1004
近年来,随着各种IP核的广泛应用,SoC-FPGA的应用领域也随之日益扩展.处理器作为SoC-FPGA的核心IP,其对系统性能的影响至关重要.使用开源处理器IP能大幅度提高SoC-FPGA系统级设计的效率,已成为现在项目开发中常用的手段.本文研究了现有的绝大多数开源处理器的关键技术指标,从可用性和稳定性上提出了一种选择开源处理器的方法.根据该方法,选择出一些具有高可用性和稳定性的开源处理器.最后,利用不同厂商提供的FPGA EDA工具将所述的开源处理器进行了综合与实现,并与现有FPGA厂商提供的商用软核Nios Ⅱ和Microblaze进行了比较和讨论.  相似文献   

14.
面向嵌入式应用的指令集自动扩展   总被引:1,自引:1,他引:1       下载免费PDF全文
 面向特定应用扩展指令集,并通过定制的硬件实现这些扩展指令,能够大幅度提高嵌入式处理器的性能.本文提出了一种全自动的面向特定应用的指令集扩展流程,该流程能够较精确地估算扩展指令的性能加速比和硬件开销,并高效完成指令模板匹配.实验结果表明,在给定的硬件开销限制下,该方法产生的扩展指令能够显著提升嵌入式应用的性能.  相似文献   

15.
实时嵌入式双操作系统架构研究综述   总被引:2,自引:0,他引:2       下载免费PDF全文
张美玉  张倩颖  孟子琪  施智平  关永 《电子学报》2018,46(11):2787-2796
随着全球工业4.0战略的提出,工业控制及航空航天等领域对嵌入式系统的要求越来越高,对实时性具有严格要求同时还需要应用功能丰富的通用操作系统的支持.针对该需求现状,学术界和产业界提出双操作系统的解决思路,即同时运行实时操作系统和通用操作系统.本文归纳总结国内外嵌入式双操作系统的研究现状,对各种实现技术进行深入探讨和分析,列举嵌入式双操作系统典型应用和主要应用领域,最后对该类技术的研究趋势进行总结.  相似文献   

16.
We address the problem of code generation for embedded DSP systems. In such systems, it is typical for one or more digital signal processors (DSPs), program memory, and custom circuitry to be integrated onto a single IC. Consequently, the amount of silicon area that is dedicated to program memory is limited, so the embedded software must be sufficiently dense. Additionally, this software must be written so as to meet various high-performance constraints, which may include hard real-time constraints. Unfortunately, existing compiler technology is unable to generate dense, high-performance code for DSPs since it does not provide adequate support for the specialized architectural features of DSPs. These specialized features not only allow for the fast execution of common DSP operations, but they also allow for the generation of dense assembly code that specifies these operations. Thus, system designers often hand-program the embedded software in assembly, which is a very time-consuming task. In this paper, we focus on providing compiler support for one particular specialized architectural feature, namely the paged absolute addressing mode – this feature is found in two commercial DSPs, the Texas Instruments' TMS320C25 and TMS320C50 fixed-point DSPs; however, it may also be featured in application-specific processors (ASIPs). We present some machine-dependent code optimizations that improve code density by exploiting this architectural feature. Experimental results demonstrate that for a set of typical DSP benchmarks, some of our optimizations reduce overall code size and data memory consumption by an average of 5.0% and 16.0%, respectively. Our experimental vehicle throughout this research is the TMS320C25.  相似文献   

17.
面向路径的测试数据自动生成方法述评   总被引:19,自引:1,他引:19       下载免费PDF全文
单锦辉  王戟  齐治昌 《电子学报》2004,32(1):109-113
为指定的程序路径自动生成测试数据是软件单元测试中一个基本问题.求解该问题的实质在于约束系统的建立和求解,其主要困难之一在于非线性约束求解是一个理论上困难的问题.文中将面向路径的测试数据自动生成方法分为四类——随机法、静态法、动态法和试探法,分析和比较了每一类中有代表性的方法,并探讨了研究方向.  相似文献   

18.
为减少进化代数,提高路径覆盖成功率,提出了多邻域Kalman滤波PSO测试数据生成方法.在该方法中将粒子固定划分到不同邻域中,各邻域内指定一个粒子向全局最优粒子学习,其余各粒子向所在邻域中最优粒子学习,而全局最优粒子利用无速度项的简化PSO进化.在此过程中,除全局最优粒子外的各粒子利用Kalman滤波方程更新粒子的位置.实验表明,相较于基本PSO和其他PSO方法,即使是覆盖困难的路径,本文方法也具有进化代数少、路径覆盖成功率高及性能稳定的特点.  相似文献   

19.
李佩钊  姚国良   《电子器件》2008,31(2):634-637
针对嵌入式linux内核移植的需要,阐述了在嵌入式Linux内核中的虚拟地址空间映射的实现.分别从硬件,软件的角度分析了虚拟地址空间映射的基本原理,以及硬件和软件之间的各自功能和相互配合.同时,阐述了在805plus微处理器平台上虚拟地址空间映射的具体实现方案.实验表明,805plus平台在开启MMU模块后,能够在虚拟地址空间正确运行,完成内核初始化.  相似文献   

20.
基于离散粒子群求解复杂联盟的并行生成   总被引:2,自引:0,他引:2  
联盟生成是多agent系统中的一个关键问题。该文引入离散粒子群优化来解决这一问题,采用粒子的随机扰动避免了算法的早熟,设计一种二维二进制编码实现复杂联盟的并行生成,通过编码可行性检查、冲突消解和补偿策略克服了求解过程中的资源冲突和联盟死锁。仿真实验说明了算法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号