期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

DSP system integration and prototyping with FPGAS

Jouni Isoaho Jari Pasanen Olli Vainio Hannu Tenhunen 《The Journal of VLSI Signal Processing》1993,6(2):155-172

Field Programmable Gate Arrays (FPGAs) offer a cost-effective and flexible technology for DSP ASIC prototype development. In this article, the fast ASIC prototyping concept based on the use of multiple FPGAs is reviewed in different engineering applications. The design experiences of the proposed approach, applied to four different DSP ASIC design projects are presented. The design experiences concerning the selection of the design methodology, application architectures and prototyping technologies are analyzed with respect to efficient system integration and ASIC migration from the FPGA prototype onto first-time functional silicon. Novel prototyping techniques based on using configurable hardware modellers concerning the same objective are studied. Some future goals are outlined to develop an integrated, multipurpose DSP ASIC prototyping environment. 相似文献

2.

FPGAs as reconfigurable processing elements

Fawcett B. 《Circuits and Devices Magazine, IEEE》1996,12(2):8-10

In most applications, FPGAs are used to implement “glue logic”, providing the advantages of high integration levels without the expense and risk of custom ASIC development. However, as SRAM-based FPGA devices have increased in capability, their use as in-system-configurable computing elements is receiving considerable attention. Indeed, reconfigurable FPGA technology holds the potential for reshaping the future of computing by providing the capability to dynamically alter a computer's hardware resources to optimally service immediate computational needs. Computing circuits built from SRAM-based FPGAs can meet the true goal of parallel processing-executing algorithms in circuitry with the inherent parallelism of hardware, while avoiding the instruction fetch and load/store bottlenecks of traditional von Neumann architectures. There are many computationally-intensive algorithms that can benefit from being partially or wholly implemented in hardware. Typically, these algorithms are too specialized to justify the expense of manufacturing custom IC devices 相似文献

3.

Efficient Implementations for AES Encryption and Decryption 总被引：1，自引：0，他引：1

Rashmi Ramesh Rachh P. V. Ananda Mohan B. S. Anami 《Circuits, Systems, and Signal Processing》2012,31(5):1765-1785

This paper proposes two efficient architectures for hardware implementation of the Advanced Encryption Standard (AES) algorithm. The composite field arithmetic for implementing SubBytes (S-box) and InvSubBytes (Inverse S-box) transformations investigated by several authors is used as the basis for deriving the proposed architectures. The first architecture for encryption is based on optimized S-box followed by bit-wise implementation of MixColumns and AddRoundKey and optimized Inverse S-box followed by bit-wise implementation of InvMixColumns and AddMixRoundKey for decryption. The proposed S-box and Inverse S-box used in this architecture are designed as a cascade of three blocks. In the second proposed architecture, the block III of the proposed S-box is combined with the MixColumns and AddRoundKey transformations forming an integrated unit for encryption. An integrated unit for decryption combining the block III of the proposed InvSubBytes with InvMixColumns and AddMixRoundKey is formed on similar lines. The delays of the proposed architectures for VLSI implementation are found to be the shortest compared to the state-of-the-art implementations of AES operating in non-feedback mode. Iterative and fully unrolled sub-pipelined designs including key schedule are implemented using FPGA and ASIC. The proposed designs are efficient in terms of Kgates/Giga-bits per second ratio compared with few recent state-of-the-art ASIC (0.18-μm CMOS standard cell) based designs and throughput per area (TPA) for FPGA implementations. 相似文献

4.

NEDA: a low-power high-performance DCT architecture 总被引：4，自引：0，他引：4

Shams A.M. Chidanandan A. Pan W. Bayoumi M.A. 《Signal Processing, IEEE Transactions on》2006,54(3):955-964

Conventional distributed arithmetic (DA) is popular in application-specific integrated circuit (ASIC) design, and it features on-chip ROM to achieve high speed and regularity. In this paper, a new DA architecture called NEDA is proposed, aimed at reducing the cost metrics of power and area while maintaining high speed and accuracy in digital signal processing (DSP) applications. Mathematical analysis proves that DA can implement inner product of vectors in the form of two's complement numbers using only additions, followed by a small number of shifts at the final stage. Comparative studies show that NEDA outperforms widely used approaches such as multiply/accumulate (MAC) and DA in many aspects. Being a high-speed architecture free of ROM, multiplication, and subtraction, NEDA can also expose the redundancy existing in the adder array consisting of entries of 0 and 1. A hardware compression scheme is introduced to generate a butterfly structure with minimum number of additions. NEDA-based architectures for 8 /spl times/ 8 discrete cosine transform (DCT) core are presented as an example. Savings exceeding 88% are achieved, when the compression scheme is applied along with NEDA. Finite word-length simulations demonstrate the viability and excellent performance of NEDA. 相似文献

5.

A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS

Samuel Antão Leonel Sousa 《Journal of Signal Processing Systems》2014,76(3):249-259

Modular arithmetic is a building block for a variety of applications potentially supported on embedded systems. An approach to turn modular arithmetic more efficient is to identify algorithmic modifications that would enhance the parallelization of the target arithmetic in order to exploit the properties of parallel devices and platforms. The Residue Number System (RNS) introduces data-level parallelism, enabling the parallelization even for algorithms based on modular arithmetic with several data dependencies. However, the mapping of generic algorithms to full RNS-based implementations can be complex and the utilization of suitable hardware architectures that are scalable and adaptable to different demands is required. This paper proposes and discusses an architecture with scalability features for the parallel implementation of algorithms relying on modular arithmetic fully supported by the Residue Number System (RNS). The systematic mapping of a generic modular arithmetic algorithm to the architecture is presented. It can be applied as a high level synthesis step for an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) design flow targeting modular arithmetic algorithms. An implementation with the Xilinx Virtex 4 and Altera Stratix II Field Programmable Gate Array (FPGA) technologies of the modular exponentiation and Elliptic Curve (EC) point multiplication, used in the Rivest-Shamir-Adleman (RSA) and (EC) cryptographic algorithms, suggests latency results in the same order of magnitude of the fastest hardware implementations of these operations known to date. 相似文献

6.

RAPID PROTOTYPING - Area efficient FIR filters for high speed FPGA implementation

Macpherson K.N. Stewart R.W. 《Vision, Image and Signal Processing, IEE Proceedings -》2006,153(6):711-720

A new algorithm that synthesises multiplier blocks with low hardware requirement suitable for implementation as part of full-parallel finite impulse response (FIR) filters is presented. Although the techniques in use are applicable to implementation on application-specific integrated circuit (ASIC) and Structured ASIC technologies, analysis is performed using field programmable gate array (FPGA) hardware. Fully pipelined, full-parallel transposed-form FIR filters with multiplier block were generated using the new and previous algorithms, implemented on an FPGA target and the results compared. Previous research in this field has concentrated on minimising multiplier block adder cost but the results presented here demonstrate that this optimisation goal does not minimise FPGA hardware. Minimising multiplier block logic depth and pipeline registers is shown to have the greatest influence in reducing FPGA area cost. In addition to providing lower area solutions than existing algorithms, comparisons with equivalent filters generated using the distributed arithmetic technique demonstrate further area advantages of the new algorithm 相似文献

7.

The flexibility of configurable computing

Villasenor J. Hutchings B. 《Signal Processing Magazine, IEEE》1998,15(5):67-84

There has been growing recent interest in configurable computing, which can be viewed as a hybrid between ASICs and programmable processors. Configurable computing machines are implemented with programmable logic: flexible hardware that can be structured to fit the natural organization and data flow of a computation. The enabling device for configurable computing is the field-programmable array (FPGA). For applications characterized by deeply pipelined, highly parallel, and integer arithmetic processing, configurable computing machines can outperform alternative solutions by up to an order of magnitude. The combination in a single device of dedicated hardware and rapid, submillisecond-scale reprogrammability constitutes an exciting and promising development whose implications are only just beginning to be exploited. We begin with a brief tutorial on FPGAs that describes the most common FPGA architectures and how these architectures are used to support computation, memory access, and data flow. We then present FPGAs as computing machines and focus on devices that are reconfigured during run time. Ongoing research involving FPGAs and future directions are also discussed 相似文献

8.

Dual-Data Rate Transpose-Memory Architecture Improves the Performance,Power and Area of Signal-Processing Systems

Mohamed El-Hadedy Xinfei Guo Martin Margala Mircea R. Stan Kevin Skadron 《Journal of Signal Processing Systems》2017,88(2):167-184

This paper presents a novel type of high-speed and area-efficient register-based transpose memory architecture enabled by reporting on both edges of the clock. The proposed new architecture, by using the double-edge triggered registers, doubles the throughput and increases the maximum frequency by avoiding some of the combinational circuit used in prior work. The proposed design is evaluated with both FPGA and ASIC flow in 28/32nm technology. The experimental results show that the proposed memory achieves almost 4X improvement in throughput while consuming 46 % less area with the FPGA implementations compared to prior work. For ASIC implementations, it achieves more than 60 % area reduction and at least 2X performance improvement while burning 60 % less power compared to other register-based designs implemented with the same flow. As an example, a proposed 8X8 transpose memory with 12-bit input/output resolution is able to achieve a throughput of 107.83Gbps at 647MHz by taking only 140 slices on a Virtex-7 Xilinx FPGA platform, and achieve a throughput of 88.2Gbps at 529MHz by taking 0.024mm ² silicon area for ASIC. The proposed transpose memory is integrated in both 2D-DCT and 2D-IDCT blocks for signal processing applications on the same FPGA platform. The new architecture allows a 3.5X speed-up in performance for the 2D-DCT algorithm, compared to the previous work, while consuming 28 % less area, and 2D-IDCT achieves a 3X speed-up while consuming 20 % less area. 相似文献

9.

FPGA vs. ASIC for low power applications

Amara Amara 《Microelectronics Journal》2006,37(8):669-677

Field Programmable Gate Array (FPGA) are becoming more and more popular and are used in many applications. However, it is well known that the performance is limited comparing to full ASIC implementation, but for many applications the speed requirements fit the ones provided already by existing FPGA circuits. Power consumption seems to be one of the most important limiting factor and so far it is in favour of Application Specific Integrated Circuits (ASIC) [Varghese Georges, Jan M. Rabaey, Low-Energy FPGA, Architecture and Design, Kluwer Academic Publishers, 2001; Tadahiro Kuroda, Power-Aware Electronics: Challenges and Opportunities, Tutorial at FTFC 2003, Paris, May 2003]. In this paper, we will present results obtained by characterizing various circuits implemented using both FPGA and ASIC technologies in order to determine the power consumption ratio and evaluate the efficiency of the power optimization techniques such as clock gating [Amara AMARA, Philippe Royannez, VHDL for Low Power, (Chapter 11), Low Power Electronics Design, Edited by Christian Piguet, CRC Press 2005; Luca Benini, Giovanni De Micheli, Dynamic Power Management, Kluwer Academic Publishers, 1998].We have started a study in order to compare the power consumption of two Intellectual Property (IP), a counter circuit and an image transform circuit. Both circuits have been implemented using FPGA Family circuits from ALTERA and Hardware Copy of the circuits which are close to the ASIC implementation. A full ASIC implementation using UMC 0.13 μm have be also characterized in terms of power.FPGA power consumption estimation flow is based on ALTERA tools (QuartusII) that provide accurate overall power consumption for a set of input stimuli, on various targets: FPGA families and Hardware Copy. ASIC power consumption estimation flow is based on Synopsys Power tools. 相似文献

10.

Achieving SCA Conformance Testing with Model-Based Testing

Julien Botella Jean-Philippe Delahaye Eddie Jaffuel Bruno Legeard Fabien Peureux 《Journal of Signal Processing Systems》2016,85(1):113-128

Time-to-market and implementation cost are high-priority considerations in the automation of digital hardware design. Nowadays, digital signal processing applications are implemented into fixed-point architectures due to its advantage of manipulating data with lower word-length. Thus, floating-point to fixed point conversion is mandatory. This conversion is translated into optimizing the integer word length and fractional word length. Optimizing the integer word-length can significantly reduce the cost when the application is tolerant to a low probability of overflow. In this paper, a new selective simulation technique to accelerate overflow effect analysis is introduced. A new integer word-length optimization algorithm that exploits this selective simulation technique is proposed to reduce both implementation cost and optimization time. The efficiency of our proposals is illustrated through experiments, where selective simulation technique allows accelerating the execution time of up to 1200 and 1000 when applied on Global Positioning System and on Fast Fourier Transform part (FFT) of Orthogonal Frequency Division Multiplexing chain respectively. Moreover, applying the optimization algorithm on the FFT part leads to a cost reduction between 17 to 22 % with respect to interval arithmetic and an acceleration factor of up to 617 with respect to classical max-1 algorithm. 相似文献

11.

基于SEP3203微处理器的FPGA验证平台

徐小宇温小静《现代电子技术》2007,30(6):20-22

为了最大程度上提高ASIC设计的仿真效果，以制作实物的方式对ASIC设计的FPGA验证方法进行了研究，提出了一种基于SEP3203微处理器的FPGA验证平台系统解决方案。在方案描述中，首先给出了验证平台的总体框架，然后逐一介绍了平台的CPU——SEP3203微处理器、核心板硬件设计、FPGA板硬件设计、系统总线设计和电源系统设计，最后给出了平台的实物图。为了达到验证后代码不做任何改动就可直接用于流片的目的，对SRAM接口与ABMAAHB接口进行了研究，提出了在FPGA的SRAM接口后面增加ABMA AHB与SRAM接口转换电路的方法。实验证明，本平台可以提高SOC外围设备功能仿真的效果，达到了平台的设计目的。相似文献

12.

ASIC设计自动化最新工具—FPGA开发系统

高延敏《微电子学》1992,22(4):31-34

本文介绍了ASIC设计自动化最新工具——FPGA开发系统的软、硬件支撑环境,FPGA的概况,特点和基本结构,FPGA系列器件和工作频率以及在微机FPGA开发系统上如何进行ASIC电路的设计,最后给出一个设计实例的流程。相似文献

13.

FPGA and ASIC implementation of robust invisible binary image watermarking algorithm using connectivity preserving criteria

P. Karthigaikumar K. Baskaran 《Microelectronics Journal》2011,42(1):82-88

Digital watermarking is the process of hiding information into a digital signal to authenticate the contents of digital data. There are number of watermarking algorithm implemented in software and few in hardware. This paper discusses the implementation of robust invisible binary image watermarking algorithm in Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuits (ASIC) using connectivity preserving criteria. The algorithm is processed in spatial domain. The algorithm is prototyped in (i) XILINX FPGA (ii) 130 nm ASIC. The algorithm is tested in Virtex-E (xcv50e-8-cs144) FPGA and implemented in an ASIC. 相似文献

14.

High Speed FPGA-Based Implementations of Delayed-LMS Filters

Y. Yi R. Woods L.K. Ting C.F.N. Cowan 《Journal of Signal Processing Systems》2005,39(1-2):113-131

A variation of the least means squares (LMS) algorithm, called the delayed LMS (DLMS) algorithm is ideally suited for highly pipelined, adaptive digital filter implementations. In this paper, we present an efficient method to determine the delays in the DLMS filter. Furthermore, in order to achieve fully pipelined circuit architectures for FPGA implementation, we transfer these delays using retiming. The method has been used to derive a series of retimed delayed LMS (RDLMS) architectures, which allow a 66.7% reduction in delays and 5 times faster convergence time thereby giving superior performance in terms of throughput rate when compared to previous work. Three circuit architectures and three hardware shared versions are presented which have been implemented using the Virtex-II FPGA technology resulting in a throughput rate of 182 Msample/s. 相似文献

15.

软件无线电技术在雷达设计中的应用 总被引：1，自引：0，他引：1

潘胜《现代电子技术》2005,28(4):70-71,74

软件无线电作为一种新的无线通讯概念，他的应用研究主要集中于军用通讯以及民用移动通讯领域。随着DDS，DSP，FPGA和ASIC等技术的发展，软件无线电技术在雷达信号的产生及处理等方面的应用已逐渐成熟，软件化雷达的实际应用已变得十分可行。软件无线电技术的应用使雷达系统具有可编程、可扩展、灵活性高的特点。本文将软件无线电的设计思想应用于雷达系统的设计中，并对系统的硬件设计、元器件选择及软件设计流程等进行了简单说明，给出了相应的实现方案。相似文献

16.

基于TotalRecall技术ASIC的FPGA原型验证

下载免费PDF全文

郭安华黄世震《电子器件》2012,35(3):313-316

芯片设计中一个非常重要的环节是验证.随着FPGA技术的迅速发展使基于FPGA的原型验证被广泛的用于ASIC的开发过程,FPGA原型验证是ASIC有效的验证途径,但传统FPGA原型验证的可视性非常差.为了解决传统FPGA原型验证可视性的问题,验证工程师采用了结合TotalRecall技术的FPGA原型验证方法对一款鼠标芯片进行验证.获得该方法不仅能提供100％的可视性,还确保FPGA原型验证以实时硬件速度运行.该方法创新了ASIC的验证方法学. 相似文献

17.

利用系统级ASIC技术构建高可靠机载控制设备

刘强银志军《光电技术应用》2012,27(2):17-20

描述了机载控制设备面临的问题,提出了一种新的采用系统级ASIC技术构建机载控制设备的方法。分析了机载控制设备的功能,对主要的系统级ASIC技术进行介绍和对比分析,提出采用基于FPGA的SOPC技术构建机载控制设备实施方案,并对软件和硬件设计进行了详细介绍。介绍了Xilinx的EDK设计工具设计SOPC系统的开发流程。最后在控制系统平台上进行了系统功能测试,证明了采用SOPC技术设计的机载控制设备的稳定性和可行性。相似文献

18.

Using bus-based connections to improve field-programmable gate-array density for implementing datapath circuits

Ye A. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(5):462-473

As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are increasingly being used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multibit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, compared to conventional FPGA routing architectures, the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections. 相似文献

19.

Hardware Simulator Design for MIMO Propagation Channel on Shipboard at 2.2 GHz

Bachir Habib Hanna Farhat Gheorghe Zaharia Ghaïs El Zein 《Wireless Personal Communications》2013,71(4):2535-2561

A wireless communication system can be tested either in actual conditions or with a hardware simulator reproducing actual conditions. With a hardware simulator it is possible to freely simulate a desired radio channel, making it possible to test “on table” mobile radio equipments. This paper presents new architectures for the digital block of a hardware simulator of MIMO propagation channels. This simulator can be used for LTE and WLAN IEEE 802.11ac applications, in indoor and outdoor environments. However, in this paper, specific architectures of the digital block of the simulator for shipboard environment are presented. A hardware simulator must reproduce the behavior of the radio propagation channel. Thus, a measurements campaign has been conducted to obtain the impulse responses of the shipboard channel using a channel sounder designed and realized at IETR. After the presentation of the channel sounder, the channel impulse responses are described and implemented. Then, the new architectures of the digital block of the hardware simulator, implemented on a Xilinx Virtex-IV FPGA are presented. The accuracy, the occupation on the FPGA and the latency of the architectures are analyzed. 相似文献

20.

Automatic Synthesis of Motion Estimation Processors Based on a New Class of Hardware Architectures

Nuno Roma Leonel Sousa 《The Journal of VLSI Signal Processing》2003,34(3):277-290

A new class of fully parameterizable multiple array architectures for motion estimation in video sequences based on the Full-Search Block-Matching algorithm is proposed in this paper. This class is based on a new and efficient AB2 single array architecture with minimum latency, maximum throughput and full utilization of the hardware resources. It provides the ability to configure the target processor within the boundary values imposed for the configuration parameters concerning the algorithm setup, the processing time and the circuit area. With this purpose, a software configuration tool has been implemented to determine the set of possible configurations which fulfill the requisites of a given video coder. Experimental results using both FPGA and ASIC technologies are presented. In particular, the implementation of a single array processor configuration on a single-chip is illustrated, evidencing the ability to estimate motion vectors in real-time. 相似文献