共查询到20条相似文献,搜索用时 15 毫秒
1.
Gschwind M. Salapura V. Maurer D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(2):241-250
Application-specific processors offer an attractive option in the design of embedded systems by providing high performance for a specific application domain. In this work, we describe the use of a reconfigurable processor core based on an RISC architecture as starting point for application-specific processor design. By using a common base instruction set, development cost can be reduced and design space exploration is focused on the application-specific aspects of performance. An important aspect of deploying any new architecture is verification which usually requires lengthy software simulation of a design model. We show how hardware emulation based on programmable logic can be integrated into the hardware/software codesign flow. While previously hardware emulation required massive investment in design effort and special purpose emulators, an emulation approach based on high-density field-programmable gate array (FPGA) devices now makes hardware emulation practical and cost effective for embedded processor designs. To reduce development cost and avoid duplication of design effort, FPGA prototypes and ASIC implementations are derived from a common source: We show how to perform targeted optimizations to fully exploit the capabilities of the target technology while maintaining a common source base 相似文献
2.
针对片上系统(SoC)开发周期较长和现场可编程门阵列(FPGA)可重用的特点,设计了基于ARM7TDMI处理器核的SoC的百万门级FPGA验证平台。介绍了怎样设计平台并利用该平台进行IP核验证、底层硬件驱动和实时操作系统及高层应用软件的验证。使用该平台能够基本验证SoC系统的设计,并加快SoC系统的开发。整个系统原理清晰,结构简单,扩展灵活、方便。 相似文献
3.
Kiran Kumar Anumandla Rangababu Peesapati Samrat L. Sabat Siba K. Udgata 《Design Automation for Embedded Systems》2012,16(4):221-240
This paper presents floating point design and implementation of System on Chip (SoC) based Differential Evolution (DE) algorithm using Xilinx Virtex-5 Field Programmable Gate Array (FPGA). The hardware implementation is carried out to enhance the execution speed of the embedded applications. Intellectual Property (IP) of DE algorithm is developed and interfaced with the 32-bit PowerPC 440 processor using processor local bus (PLB) of Xilinx Virtex-5 FPGA. In the proposed architecture the algorithmic parameters of DE are scalable. The software and hardware implementation of the DE algorithm is carried out in PowerPC embedded processor and hardware IP respectively. The optimization of numerical benchmark functions and system identification in control systems are implemented to verify the proposed hardware SoC platform. The performance of the IP is measured in terms of acceleration gain of the DE algorithm. The optimization problems are solved by using floating point arithmetic in both embedded processor and hardware. The experimental result concludes that the hardware DE IP accelerates the execution speed approximately by 200 times compared to equivalent software implementation of DE algorithm on PowerPC 440 processor. Further, as a case study an Infinite Impulse Response (IIR) based system identification task on SoC using the developed hardware accelerator is implemented. 相似文献
4.
5.
An increasing number of standards in wireless communications have encouraged to study programmable processors as platforms for flexible receivers. A multiple-input multiple-output (MIMO) antenna system combined with orthogonal frequency division multiplexing (OFDM) technique has been introduced in many wireless communications standards, such as in the third generation long term evolution (3G LTE). The MIMO-OFDM system requires an efficient detector and a platform support for parallel processing of multiple subcarriers. A K-best list sphere detector (LSD) provides for near optimal decoding performance and a fixed throughput making it an interesting algorithm from the point of view of practical implementations.In this paper, we compare the implementations of the K-best LSD on four processor platforms: a digital signal processor (DSP), software defined radio (SDR), application-specific processor (ASP) and application-specific instruction-set processor (ASIP). The DSP is a popular very long instruction word (VLIW) device (TMS320C6455), the SDR processor employs multithreading and multiple cores (SB3500 core processor), the ASP is based on transport triggered architecture (TTA), while the ASIP is the SDR processor enhanced with a special instruction-set extension for sorting.A 2×2 MIMO antenna system with 64-quadrature amplitude modulation (64-QAM) is assumed. The chosen list sizes K=8 and 16 are based on simulation results carried out in MATLAB environment with the third generation long term evolution (3G LTE) parameters. The proposed ASIP achieved a promising throughput of 32.0 Mbps, where the software defined radio (SDR) implementation on the SB3500 core processor suffers from an inefficient software sorter. The ASP, in which the minimized hardware complexity has been the goal, achieves a throughput of 7.6 Mbps. However, more essential examination is related to the symbol time, which sets strict parallel processing requirements to the programmable processors. 相似文献
6.
利用SOPC Builder可以在短时间内把Nios Ⅱ CPU、Avalon总线、外围设备、片内调试模块等集成在一起生成系统需要的NiosⅡ处理器,然后用QuartusⅡ软件把NIOSⅡ处理器其它外部设备接口结合在一起编译下载到FPGA芯片中,即完成系统的硬件设计;软件设计通常采用C/C++语言编写并用NoisⅡIDE编译后下载到FPGA中来实现一个SOPC系统。 相似文献
7.
针对电路设计中经常碰到数据的噪声干扰现象,提出了一种Kalman滤波的FPGA实现方法。该方法采用了Ⅱ公司的高精度模数转换器ADS1251以及Altera公司的EPIC12,首先用卡尔曼滤波算法设计了一个滤波器,然后将该滤波器分解成简单的加、减、乘、除运算。通过基于FPGA平台的硬件与软件的合理设计,成功地实现了数据噪声的滤除设计,并通过实践仿真计算,验证了所实现滤波的有效性。 相似文献
8.
9.
10.
11.
12.
13.
Andrews D. Sass R. Anderson E. Agron J. Peck W. Stevens J. Baijot F. Komp E. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(1):34-44
This paper introduces hthreads, a unifying programming model for specifying application threads running within a hybrid computer processing unit (CPU)/field-programmable gate-array (FPGA) system. Presently accepted hybrid CPU/FPGA computational models-and access to these computational models via high level languages-focus on programming language extensions to increase accessibility and portability. However, this paper argues that new high-level programming models built on common software abstractions better address these goals. The hthreads system, in general, is unique within the reconfigurable computing community as it includes operating system and middleware layer abstractions that extend across the CPU/FPGA boundary. This enables all platform components to be abstracted into a unified multiprocessor architecture platform. Application programmers can then express their computations using threads specified from a single POSIX threads (pthreads) multithreaded application program and can then compile the threads to either run on the CPU or synthesize them to run within an FPGA. To enable this seamless framework, we have created the hardware thread interface (HWTI) component to provide an abstract, platform-independent compilation target for hardware-resident computations. The HWTI enables the use of standard thread communication and synchronization operations across the software/hardware boundary. Key operating system primitives have been mapped into hardware to provide threads running in both hardware and software uniform access to a set of sub-microsecond, minimal-jitter services. Migrating the operating system into hardware removes the potential bottleneck of routing all system service requests through a central CPU. 相似文献
14.
15.
Jack Jean Xuejun Liang Brian Drozd Karen Tomko Yan Wang 《The Journal of VLSI Signal Processing》2000,25(1):39-53
This paper describes the acceleration of an infrared automatic target recognition (IR ATR) application with a co-processor board that contains multiple field programmable gate array (FPGA) chips. Template and pixel level parallelism is exploited in an FPGA design for the bottleneck portion of the application. The implementation of this design achieved a speedup of 21 compared to running on the host processor. The paper then describes an FPGA resource manager (RM) developed to support concurrent applications sharing the FPGA board. With the RM, the system is dynamically reconfigurable. That is, while part of the co-processor board is busy computing, another part can be reconfigured for other purposes. The IR ATR application was ported to work with the RM and has been shown to adapt to the amount of reconfigurable hardware that is available at the time the application is executed. 相似文献
16.
Bosi B. Bois G. Savaria Y. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(3):299-308
In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every digital signal processor (DSP) developer's toolset. In general, such a library provides high-level interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work. Complex signal transformations then become function calls, e.g., C-callable functions. Considering the two-dimensional (2-D) convolver function as an example of great significance for DSP's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming. Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors. Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose DSP and accelerators are proposed. In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 DSP microprocessor available from Texas Instruments Incorporated. However, the proposed concept is not limited to a particular processor 相似文献
17.
Philippe Faes Peter Bertels Jan Van Campenhout Dirk Stroobandt 《Design Automation for Embedded Systems》2009,13(4):223-243
In many embedded systems, the computational power of an instruction set processor is combined with hardware accelerators.
Building such combined systems implies co-design of the software that runs on the processor and the hardware that accelerates the embedded application. During the co-design
process, the application is partitioned into a software part (running on the processor) and a hardware part (running on the
accelerator). In order to ease the iterative process of partitioning, we introduce a novel design methodology. In our methodology,
the interface between hardware and software is transparent to the software designer, and is based on dynamic method interception.
Because the interface is transparent and generated automatically, the initial all-software prototype of the system can more
easily be refined and partitioned. We show that method interception is inexpensive, and we demonstrate method interception
in a real-life application.
Using our methodology, embedded systems can be designed fast, reducing time-to-market, while still achieving a high run-time
performance. 相似文献
18.
Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform. 相似文献
19.