期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using method interception for hardware/software co-development

Philippe Faes Peter Bertels Jan Van Campenhout Dirk Stroobandt 《Design Automation for Embedded Systems》2009,13(4):223-243

In many embedded systems, the computational power of an instruction set processor is combined with hardware accelerators. Building such combined systems implies co-design of the software that runs on the processor and the hardware that accelerates the embedded application. During the co-design process, the application is partitioned into a software part (running on the processor) and a hardware part (running on the accelerator). In order to ease the iterative process of partitioning, we introduce a novel design methodology. In our methodology, the interface between hardware and software is transparent to the software designer, and is based on dynamic method interception. Because the interface is transparent and generated automatically, the initial all-software prototype of the system can more easily be refined and partitioned. We show that method interception is inexpensive, and we demonstrate method interception in a real-life application. Using our methodology, embedded systems can be designed fast, reducing time-to-market, while still achieving a high run-time performance. 相似文献

2.

Fully Pipelined Soft Vector Processor as a CPU Accelerator

《电子学报:英文版》2017,(6):1198-1205

FPGA based soft vector processing accelerators are used frequently to perform highly parallel data processing tasks. Since they are not able to implement complex control manipulations using software, most FPGA systems now incorporate either a soft processor or hard processor. A FPGA based AXI bus compatible vector accelerator architecture is proposed which utilises fully pipelined and heterogeneous ALU for performance, and microcoding is employed for reusability. The design is tested with several design examples in four different lane configurations. Compared with Central processing unit (CPU), Digital signal processor (DSP), Altera C2H tool and OpenCL SDK implementations, the vector processor improves on execution time and energy consumption by factors of up to 6.6 and 6.4 respectively. 相似文献

3.

锁相环在处理器时钟设计中的应用 总被引：2，自引：1，他引：1

杨丰林沈绪榜《微电子学与计算机》2002,19(6):32-38

文章先进讲述了锁相环的基本原理以及相关的数学基础，接着介绍了经典锁相环在高性能处理器时钟产生中的应用，并对模拟压控振荡器的类型以及噪声类型及其抑制两方面作了小结，随后介绍了新发展的全数字锁相环在时钟产生的应用，最后总结全文对两种锁相结构性能特征以及锁相技术发展趋势作了介绍。相似文献

4.

Heterogeneous Multicore SoC With SiP for Secure Multimedia Applications

《Solid-State Circuits, IEEE Journal of》2009,44(8):2251-2259

A heterogeneous multicore system-on-chip (SoC) has been developed for high-definition (HD) multimedia applications that require secure DRM (digital rights management). The SoC integrates three types of processors: two specific-purpose accelerators for cipher and high-resolution video decoding; one general-purpose accelerator (MX); and three CPUs. This is how our SoC achieves high performance and low power consumption with hardware customized for video processing applications that process a large amount of data. To achieve secure data control, hardware memory management and software system virtualization are adopted. The security of the system is the result of the cooperation between the hardware and software on the system. Furthermore, a highly tamper-resistant system is provided on our SiP (System in a package), through DDR2 SDRAMs and a flash memory that contain confidential information in one package. This secure multimedia processor provides a solution to protect contents and to safely deliver secure sensitive information when processing billing transactions that involve digital content delivery. The SoC was implemented in a 90 nm generic CMOS technology. 相似文献

5.

A Fast Spline Curve Rendering Accelerator Architecture

《Circuits and Systems II: Express Briefs, IEEE Transactions on》2009,56(11):870-874

Spline curve rendering is an essential operation in modern 2-D graphic applications. Different from the software acceleration approach by graphic processor units, this brief presents a very large scale integration hardware accelerator architecture for supporting fast curve rendering. Many existing accelerators employ a sequential forward-difference algorithm, where a step size is used in calculating the next sample on the curve. The problem of hardware-based curve rendering is that feedback loops are required to accumulate the difference, and these loops inhibit many traditional performance-enhancement strategies such as unfolding and pipelining. This brief proposes a different parallel design approach by transforming the difference equation set into parallel ones. Each equation has to be equipped with the same increased step size but accumulated starting from different initial values. Although more initial values must be precomputed, this computation can itself be sped up by using the accelerator. The proposed design can be applied not only to cubic spline curves but also to any curves defined by parameterized polynomial functions. 相似文献

6.

A survey of multicore processors

Blake G. Dreslinski R.G. Mudge T. 《Signal Processing Magazine, IEEE》2009,26(6):26-37

General-purpose multicore processors are being accepted in all segments of the industry, including signal processing and embedded space, as the need for more performance and general-purpose programmability has grown. Parallel processing increases performance by adding more parallel resources while maintaining manageable power characteristics. The implementations of multicore processors are numerous and diverse. Designs range from conventional multiprocessor machines to designs that consist of a "sea" of programmable arithmetic logic units (ALUs). In this article, we cover some of the attributes common to all multicore processor implementations and illustrate these attributes with current and future commercial multicore designs. The characteristics we focus on are application domain, power/performance, processing elements, memory system, and accelerators/integrated peripherals. 相似文献

7.

基于uC／OS—II农田信息采集系统的研究与设计

胡侃《山西电子技术》2011,(6):95-97

以高性能的S3C44BOX芯片为处理器核心,结合嵌入式实时操作系统uC／OS-Ⅱ,设计并实现了实时性强、结构优化的农田信息采集系统;构建了嵌入式系统软硬件平台,详细阐述了应用软件的任务设计、优先级安排和各任务之间的关联性,经过理论和实验证明,该农田信息采集系统性能优良,可靠性高。相似文献

8.

Subsetting Behavioral Intellectual Property for Low Power ASIP Design

William E. Dougherty David J. Pursley Donald E. Thomas 《The Journal of VLSI Signal Processing》1999,21(3):209-218

Power consumption is an increasingly important consideration in the design of mixed hardware/software systems. This work defines the notion of instruction subsetting and explores its use as a means of reducing power consumption from the system level of design. Instruction subsetting is defined as creating an application specific instruction set processor from a more general processor, such as a DSP. Although not as effective as an ASIC solution, instruction subsetting provides much of the power savings while maintaining some level of programmability. Beyond energy savings, instruction subsetting also offers the opportunity to reduce the design cycle through the re-use of existing processor intellectual property including behavioral and structural designs, hardware simulators, application code, and compilers. We synthesized 9 ASIPs through place and route and found that a poorly chosen instruction set may consume more than 4 times the energy of an ASIP with a proper instruction set choice. This finding will allow designers to consider another set of trade-offs in their hardware/software design space exploration. 相似文献

9.

Measuring and Modeling the Power Consumption of Energy-Efficient FPGA Coprocessors for GEMM and FFT

Heiner Giefers Raphael Polig Christoph Hagleitner 《Journal of Signal Processing Systems》2016,85(3):307-323

In this paper we analyze the power consumption and energy efficiency of general matrix-matrix multiplication (GEMM) and Fast Fourier Transform (FFT) implemented as streaming applications for an FPGA-based coprocessor card. The power consumption is measured with internal voltage sensors and the power draw is broken down onto the systems components in order to classify the energy consumed by the processor cores, the memory, the I/O links and the FPGA card. We present an abstract model that allows for estimating the power consumption of FPGA accelerators on the system level and validate the model using the measured kernels. The performance and energy consumption is compared against optimized multi-threaded software running on the POWER7 host CPUs. Our experimental results show that the accelerator can improve the energy efficiency by an order of magnitude when the computations can be undertaken in a fixed point format. Using floating point data, the gain in energy-efficiency was measured as up to 30 % for the double precision GEMM accelerator and up to 5 × for a 1k complex FFT. 相似文献

10.

基于统计分析的SoC定点硬件加速器字长设计

周凡时龙兴杨军《固体电子学研究与进展》2007,27(2):240-245,274

在由通用RISC处理器核和附加定点硬件加速器构成的定点SoC(System-on-Chip)芯片体系架构基础上,提出了一种新颖的基于统计分析的定点硬件加速器字长设计方法。该方法利用统计参数在数学层面上求解计算出满足不同信噪比要求下的最小字长,能有效地降低芯片面积、功耗和制作成本,从而在没有DSP协处理器的低成本RISC处理器核SoC芯片上运行高计算复杂度应用。相似文献

11.

Reconfigurable pipelined 2-D convolvers for fast digital signalprocessing

Bosi B. Bois G. Savaria Y. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(3):299-308

In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every digital signal processor (DSP) developer's toolset. In general, such a library provides high-level interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work. Complex signal transformations then become function calls, e.g., C-callable functions. Considering the two-dimensional (2-D) convolver function as an example of great significance for DSP's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming. Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors. Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose DSP and accelerators are proposed. In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 DSP microprocessor available from Texas Instruments Incorporated. However, the proposed concept is not limited to a particular processor 相似文献

12.

A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform

Ishmael Sameen Yoong Choon Chang Mow Song Ng Bok-Min Goi Chee-Pun Ooi 《Journal of Signal Processing Systems》2013,71(2):123-142

This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom hardware accelerators generated through high-level synthesis. The proposed system architecture, synthesized on an Altera DE3 Stratix III FPGA board, was developed through an iterative design space exploration methodology using Altera’s C2H compiler. Experimental results show that the proposed system architecture is capable of real-time video processing performance for grayscale image resolutions of up to 1920?×?1080 (1080p) when ran on the Altera DE3 board, and it outperforms the existing 2-D DWT architecture implementations known in literature by a considerable margin in terms of throughput. While the proposed 2-D DWT system architecture satisfies real-time performance constraints, it can also perform both forward and inverse DWT, support a number of popular DWT filters used for image and video compression and provide architecture programmability in terms of number of levels of decomposition as well as image width and height. Based from the design principles used to implement the proposed 2-D DWT system architecture, a system design guideline can be formulated for SOC designs which plan to incorporate dedicated 2-D DWT hardware acceleration. 相似文献

13.

基于DSP的光伏并网逆变器的设计

蒲鹏鹏刘广思《电子质量》2009,(7):20-23

光伏并网发电系统是光伏系统发展的趋势,文章根据光伏并网发电系统的特点,设计了一套基于数字信号处理器TNS320F2407控制的单相光伏并网逆变器。分析了系统的结构和控制原理,设计了最大功率点跟踪算法和锁相环的软件设计流程图。实验结果表明并网电流波形良好,逆变器输出的电流基本与电网电压同频同相,并网的功率因数近似为1。相似文献

14.

Experiments with low-level speculative computation based onmultiple branch prediction

Holtmann U. Ernst R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1993,1(3):262-267

Coprocessor design is one application of high-level synthesis. We want to focus on high-performance coprocessors to speed up time critical parts in hardware-software codesign of embedded controllers. Time critical software parts often contain nested loops, often with data-dependent branches and data-dependent number of iterations. When (loop) pipelining is employed for high performance, the control dependencies become a dominant limitation to pipeline utilization. Branch prediction is a possible approach, but is usually restricted to few instructions and to one branch because of hardware and control overhead. Multiple branch prediction and speculative computation take a more global view on the program flow. We give practical examples of how speculative computation with multiple branch prediction increases performance far beyond a usual ASAP scheduling based on a CDFG. For scheduling, speculative computation requires a modification of the CDFG and, for the allocation phase, the insertion of register sets to save the processor status. The controller needs slight modification. We conclude that manual application of our approach will in general be too difficult, such that it can only be used in connection with synthesis 相似文献

15.

A survey of neural network accelerator with software development environments

Jin Song Xuemeng Wang Zhipeng Zhao Wei Li Tian Zhi 《半导体学报》2020,(2):20-28

Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article,we have reviewed the representative neural network accelerators.As an entirety,the corresponding software stack must consider the hardware architecture of the specific accelerator to enhance the end-to-end performance.And we summarize the programming environments of neural network accelerators and optimizations in software stack.Finally,we comment the future trend of neural network accelerator and programming environments. 相似文献

16.

CELL/B.E.的高性能维特比译码

下载免费PDF全文

Lai Junjie Tang Jun Peng Yingning Chen Jianwen 《中国通信》2009,6(2):150-156

Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio （SR） systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element （SPE） of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform. 相似文献

17.

基于MAX5307和DSP的多回路测控系统设计

庞晓东贾凯徐方《电子工程师》2005,31(11):64-67

提出了一种多回路测控系统的设计方案.该方案仅使用一个DSP(数字信号处理器)及一个多通道集成的D/A转换器件MAX5307,不仅同时保证了多个测控回路的实时性及控制精度,而且实现简单,成本低廉.文中结合实际系统,给出了其具体的硬件和软件实现.该方法具有广泛的适用性,对类似系统的设计具有参考价值. 相似文献

18.

Implementation of a High Throughput Soft MIMO Detector on GPU

Michael Wu Yang Sun Siddharth Gupta Joseph R. Cavallaro 《Journal of Signal Processing Systems》2011,64(1):123-136

Multiple-input multiple-output (MIMO) significantly increases the throughput of a communication system by employing multiple antennas at the transmitter and the receiver. To extract maximum performance from a MIMO system, a computationally intensive search based detector is needed. To meet the challenge of MIMO detection, typical suboptimal MIMO detectors are ASIC or FPGA designs. We aim to show that a MIMO detector on Graphic processor unit (GPU), a low-cost parallel programmable co-processor, can achieve high throughput and can serve as an alternative to ASIC/FPGA designs. However, careful architecture aware software design is needed to leverage the performance offered by GPU. We propose a novel soft MIMO detection algorithm, multi-pass trellis traversal (MTT), and show that we can achieve ASIC/FPGA-like performance and handle different configurations in software on GPU. The proposed design can be used to accelerate wireless physical layer simulations and to offload MIMO detection processing in wireless testbed platforms. 相似文献

19.

基于SoC设计的软硬件协同验证方法学 总被引：3，自引：3，他引：0

赵刚侯立刚刘源朱修殿吴武臣《微电子学与计算机》2006,23(6):24-26

文章介绍了软硬件协同验证方法学及其验证流程。在软件方面，采用了一套完整的软件编译调试仿真工具链，它包括处理器的仿真虚拟原型和基本的汇编、链接、调试器；在硬件方面，对软件调试好的应用程序进行RTL仿真、综合，并最终在SoC设计的硬件映像加速器（FPGA）上实现并验证。相似文献

20.

Recent trends in embedded system software performance estimation

Rajendra Patel Arvind Rajawat 《Design Automation for Embedded Systems》2013,17(1):193-213

It is observed that due to the availability of fast and highly efficient processors, many embedded system developers are attracted to implement the majority of the system components in software rather than hardware. Software implementation offers a great level of flexibility and scalability of the design. At the same time, a wide choice exists between generic processors, DSP processors, network processors, etc. This increases the design space exploration by many folds to select an appropriate processor or a processor version for a specific application or application component. In this review, recent prominent directions for embedded software performance estimation have been discussed and their salient features are summarized. 相似文献