共查询到20条相似文献,搜索用时 31 毫秒
1.
Philippe Faes Peter Bertels Jan Van Campenhout Dirk Stroobandt 《Design Automation for Embedded Systems》2009,13(4):223-243
In many embedded systems, the computational power of an instruction set processor is combined with hardware accelerators.
Building such combined systems implies co-design of the software that runs on the processor and the hardware that accelerates the embedded application. During the co-design
process, the application is partitioned into a software part (running on the processor) and a hardware part (running on the
accelerator). In order to ease the iterative process of partitioning, we introduce a novel design methodology. In our methodology,
the interface between hardware and software is transparent to the software designer, and is based on dynamic method interception.
Because the interface is transparent and generated automatically, the initial all-software prototype of the system can more
easily be refined and partitioned. We show that method interception is inexpensive, and we demonstrate method interception
in a real-life application.
Using our methodology, embedded systems can be designed fast, reducing time-to-market, while still achieving a high run-time
performance. 相似文献
2.
《电子学报:英文版》2017,(6):1198-1205
FPGA based soft vector processing accelerators are used frequently to perform highly parallel data processing tasks. Since they are not able to implement complex control manipulations using software, most FPGA systems now incorporate either a soft processor or hard processor. A FPGA based AXI bus compatible vector accelerator architecture is proposed which utilises fully pipelined and heterogeneous ALU for performance, and microcoding is employed for reusability. The design is tested with several design examples in four different lane configurations. Compared with Central processing unit (CPU), Digital signal processor (DSP), Altera C2H tool and OpenCL SDK implementations, the vector processor improves on execution time and energy consumption by factors of up to 6.6 and 6.4 respectively. 相似文献
3.
锁相环在处理器时钟设计中的应用 总被引:2,自引:1,他引:1
文章先进讲述了锁相环的基本原理以及相关的数学基础,接着介绍了经典锁相环在高性能处理器时钟产生中的应用,并对模拟压控振荡器的类型以及噪声类型及其抑制两方面作了小结,随后介绍了新发展的全数字锁相环在时钟产生的应用,最后总结全文对两种锁相结构性能特征以及锁相技术发展趋势作了介绍。 相似文献
4.
《Solid-State Circuits, IEEE Journal of》2009,44(8):2251-2259
5.
《Circuits and Systems II: Express Briefs, IEEE Transactions on》2009,56(11):870-874
6.
General-purpose multicore processors are being accepted in all segments of the industry, including signal processing and embedded space, as the need for more performance and general-purpose programmability has grown. Parallel processing increases performance by adding more parallel resources while maintaining manageable power characteristics. The implementations of multicore processors are numerous and diverse. Designs range from conventional multiprocessor machines to designs that consist of a "sea" of programmable arithmetic logic units (ALUs). In this article, we cover some of the attributes common to all multicore processor implementations and illustrate these attributes with current and future commercial multicore designs. The characteristics we focus on are application domain, power/performance, processing elements, memory system, and accelerators/integrated peripherals. 相似文献
7.
以高性能的S3C44BOX芯片为处理器核心,结合嵌入式实时操作系统uC/OS-Ⅱ,设计并实现了实时性强、结构优化的农田信息采集系统;构建了嵌入式系统软硬件平台,详细阐述了应用软件的任务设计、优先级安排和各任务之间的关联性,经过理论和实验证明,该农田信息采集系统性能优良,可靠性高。 相似文献
8.
William E. Dougherty David J. Pursley Donald E. Thomas 《The Journal of VLSI Signal Processing》1999,21(3):209-218
Power consumption is an increasingly important consideration in the design of mixed hardware/software systems. This work defines the notion of instruction subsetting and explores its use as a means of reducing power consumption from the system level of design. Instruction subsetting is defined as creating an application specific instruction set processor from a more general processor, such as a DSP. Although not as effective as an ASIC solution, instruction subsetting provides much of the power savings while maintaining some level of programmability. Beyond energy savings, instruction subsetting also offers the opportunity to reduce the design cycle through the re-use of existing processor intellectual property including behavioral and structural designs, hardware simulators, application code, and compilers. We synthesized 9 ASIPs through place and route and found that a poorly chosen instruction set may consume more than 4 times the energy of an ASIP with a proper instruction set choice. This finding will allow designers to consider another set of trade-offs in their hardware/software design space exploration. 相似文献
9.
Heiner Giefers Raphael Polig Christoph Hagleitner 《Journal of Signal Processing Systems》2016,85(3):307-323
In this paper we analyze the power consumption and energy efficiency of general matrix-matrix multiplication (GEMM) and Fast Fourier Transform (FFT) implemented as streaming applications for an FPGA-based coprocessor card. The power consumption is measured with internal voltage sensors and the power draw is broken down onto the systems components in order to classify the energy consumed by the processor cores, the memory, the I/O links and the FPGA card. We present an abstract model that allows for estimating the power consumption of FPGA accelerators on the system level and validate the model using the measured kernels. The performance and energy consumption is compared against optimized multi-threaded software running on the POWER7 host CPUs. Our experimental results show that the accelerator can improve the energy efficiency by an order of magnitude when the computations can be undertaken in a fixed point format. Using floating point data, the gain in energy-efficiency was measured as up to 30 % for the double precision GEMM accelerator and up to 5 × for a 1k complex FFT. 相似文献
10.
在由通用RISC处理器核和附加定点硬件加速器构成的定点SoC(System-on-Chip)芯片体系架构基础上,提出了一种新颖的基于统计分析的定点硬件加速器字长设计方法。该方法利用统计参数在数学层面上求解计算出满足不同信噪比要求下的最小字长,能有效地降低芯片面积、功耗和制作成本,从而在没有DSP协处理器的低成本RISC处理器核SoC芯片上运行高计算复杂度应用。 相似文献
11.
Bosi B. Bois G. Savaria Y. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(3):299-308
In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every digital signal processor (DSP) developer's toolset. In general, such a library provides high-level interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work. Complex signal transformations then become function calls, e.g., C-callable functions. Considering the two-dimensional (2-D) convolver function as an example of great significance for DSP's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming. Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors. Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose DSP and accelerators are proposed. In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 DSP microprocessor available from Texas Instruments Incorporated. However, the proposed concept is not limited to a particular processor 相似文献
12.
Ishmael Sameen Yoong Choon Chang Mow Song Ng Bok-Min Goi Chee-Pun Ooi 《Journal of Signal Processing Systems》2013,71(2):123-142
This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom hardware accelerators generated through high-level synthesis. The proposed system architecture, synthesized on an Altera DE3 Stratix III FPGA board, was developed through an iterative design space exploration methodology using Altera’s C2H compiler. Experimental results show that the proposed system architecture is capable of real-time video processing performance for grayscale image resolutions of up to 1920?×?1080 (1080p) when ran on the Altera DE3 board, and it outperforms the existing 2-D DWT architecture implementations known in literature by a considerable margin in terms of throughput. While the proposed 2-D DWT system architecture satisfies real-time performance constraints, it can also perform both forward and inverse DWT, support a number of popular DWT filters used for image and video compression and provide architecture programmability in terms of number of levels of decomposition as well as image width and height. Based from the design principles used to implement the proposed 2-D DWT system architecture, a system design guideline can be formulated for SOC designs which plan to incorporate dedicated 2-D DWT hardware acceleration. 相似文献
13.
光伏并网发电系统是光伏系统发展的趋势,文章根据光伏并网发电系统的特点,设计了一套基于数字信号处理器TNS320F2407控制的单相光伏并网逆变器。分析了系统的结构和控制原理,设计了最大功率点跟踪算法和锁相环的软件设计流程图。实验结果表明并网电流波形良好,逆变器输出的电流基本与电网电压同频同相,并网的功率因数近似为1。 相似文献
14.
Holtmann U. Ernst R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1993,1(3):262-267
Coprocessor design is one application of high-level synthesis. We want to focus on high-performance coprocessors to speed up time critical parts in hardware-software codesign of embedded controllers. Time critical software parts often contain nested loops, often with data-dependent branches and data-dependent number of iterations. When (loop) pipelining is employed for high performance, the control dependencies become a dominant limitation to pipeline utilization. Branch prediction is a possible approach, but is usually restricted to few instructions and to one branch because of hardware and control overhead. Multiple branch prediction and speculative computation take a more global view on the program flow. We give practical examples of how speculative computation with multiple branch prediction increases performance far beyond a usual ASAP scheduling based on a CDFG. For scheduling, speculative computation requires a modification of the CDFG and, for the allocation phase, the insertion of register sets to save the processor status. The controller needs slight modification. We conclude that manual application of our approach will in general be too difficult, such that it can only be used in connection with synthesis 相似文献
15.
Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article,we have reviewed the representative neural network accelerators.As an entirety,the corresponding software stack must consider the hardware architecture of the specific accelerator to enhance the end-to-end performance.And we summarize the programming environments of neural network accelerators and optimizations in software stack.Finally,we comment the future trend of neural network accelerator and programming environments. 相似文献
16.
Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform. 相似文献
17.
18.
Michael Wu Yang Sun Siddharth Gupta Joseph R. Cavallaro 《Journal of Signal Processing Systems》2011,64(1):123-136
Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal (MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms. 相似文献
19.
基于SoC设计的软硬件协同验证方法学 总被引:3,自引:3,他引:0
文章介绍了软硬件协同验证方法学及其验证流程。在软件方面,采用了一套完整的软件编译调试仿真工具链,它包括处理器的仿真虚拟原型和基本的汇编、链接、调试器;在硬件方面,对软件调试好的应用程序进行RTL仿真、综合,并最终在SoC设计的硬件映像加速器(FPGA)上实现并验证。 相似文献
20.
It is observed that due to the availability of fast and highly efficient processors, many embedded system developers are attracted to implement the majority of the system components in software rather than hardware. Software implementation offers a great level of flexibility and scalability of the design. At the same time, a wide choice exists between generic processors, DSP processors, network processors, etc. This increases the design space exploration by many folds to select an appropriate processor or a processor version for a specific application or application component. In this review, recent prominent directions for embedded software performance estimation have been discussed and their salient features are summarized. 相似文献