首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 481 毫秒
1.
2.
3.
Field Programmable Gate Array (FPGA) is an efficient reconfigurable integrated circuit platform and has become a core signal processing mieroehip device of digital systems over the last decade. With the rapid development of semiconductor technology, the performance and system inte- gration of FPGA devices have been significantly progressed, and at the same time new challenges arise. The design of FPGA architecture is required to evolve to meet these challenges, while also taking advantage of ever increased microchip density. This survey reviews the recent development of advanced FPGA architectures, including improvement of the programming technologies, logic blocks, intercon- nects, and embedded resources. Moreover, some important emerging design issues of FPGA archi- tectures, such as novel memory based FPGAs and 3D FPGAs, are also presented to provide an outlook for future FPGA development.  相似文献   

4.
In most commercial field programmable gate arrays (FPGA's) the number of wiring tracks in each channel is the same across the entire chip. A long-standing open question for both FPGA's and channeled gate arrays is whether or not some uneven distribution of routing tracks across the chip would lead to an area benefit. For example, many circuit designers intuitively believe that most congestion occurs near the center of a chip, and hence expect that having wider routing channels near the chip center would be beneficial. In this paper, we determine the relative area-efficiency of several different routing track distributions. We first investigate FPGA's in which horizontal and vertical channels contain different numbers of tracks in order to determine if such a directional bias provides a density advantage. Second, we examine routing track distributions in which the track capacities vary from channel to channel. We compare the area efficiency of these nonuniform routing architectures to that of an FPGA with uniform channel capacities across the entire chip. The main result is that the most area-efficient global routing architecture is one with uniform (or very nearly uniform) channel capacities across the entire chip in both the horizontal and vertical directions. This paper shows why this result, which is contrary to the intuition of many FPGA architects, is true. While a uniform routing architecture is the most area-efficient, several nonuniform and directionally biased architectures are fairly area-efficient provided that appropriate choices are made for the pin positions on the logic blocks and the logic block array aspect ratio  相似文献   

5.
As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are increasingly being used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multibit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, compared to conventional FPGA routing architectures, the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections.  相似文献   

6.
This authors explore the effect of logic block architecture on the speed of a field-programmable gate array (FPGA). Four classes of logic block architecture are investigated: NAND gates, multiplexer configurations, lookup tables, and wide-input AND-OR gates. An experimental approach is taken, in which each of a set of benchmark logic circuits is synthesized into FPGAs that use different logic blocks. The speed of the resulting FPGA implementations using each logic block is measured. While the results depend on the delay of the programmable routing, experiments indicate that five- and six-input lookup tables and certain multiplexer configurations produce the lowest total delay over realistic values of routing delay. The fine grain blocks, such as the two-input NAND gate, exhibit poor performance because these gates require many levels of logic block to implement the circuits and hence require a large routing delay  相似文献   

7.
The relationship between the routability of a field-programmable gate array (FPGA) and the flexibility of its interconnection structures is examined. The flexibility of an FPGA is determined by the number and distribution of switches used in the interconnection. While good routability can be obtained with a high flexibility, a large number of switches will result in poor performance and logical density because each switch has significant delay and area. The minimum number of switches required to achieve good routability is determined by implementing several industrial circuits in a variety of interconnection architectures. These experiments indicate that high flexibility is essential for the connection block that joins the logic blocks to the routing channel, but a relative low flexibility is sufficient for switch blocks at the junction of horizontal and vertical channels. Furthermore, it is necessary to use only a few more routing tracks than the absolute minimum possible with structures of surprisingly low flexibility  相似文献   

8.
The relationship between the functionality of a field-programmable gate array (FPGA) logic block and the area required to implement digital circuits using that logic block is examined. The investigation is done experimentally by implementing a set of industrial circuits as FPGAs using CAD (computer-aided design) tools for technology mapping, placement, and routing. A range of programming technologies (the method of FPGA customization) is explored using a simple model of the interconnection and logic block area. The experiments are based on logic blocks that use lookup tables for implementing combinational logic. Results indicate that the best number of inputs to use (a measure of the block's functionality) is between three and four, and that a D flip-flop should be included in the logic block. The results are largely independent of the programming technology. More generally, it was observed that the area efficiency of a logic block depends not only on its functionality but also on the average number of pins connected per logic block  相似文献   

9.
This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3%  相似文献   

10.
A new application-independent approach for evaluating the fault tolerance of field-programmable gate-array (FPGA) interconnect structures is presented. Signal routing in the presence of faulty resources at switch block and FPGA levels is analyzed; this problem is directly related to the fault tolerance of FPGA interconnects for testing and reconfiguration at manufacturing and run-time applications. Two criteria are proposed and used as figure-of-merit for evaluating different FPGA interconnect architectures. The proposed approach is based on the number of available paths between pairs of end points and the probability to establish a one-to-one mapping between all input and output end points. A probabilistic approach is also presented to evaluate the fault-tolerant routing of the entire FPGA by connecting switch blocks in chains, as required for testing and to account for the input–output (I/O) pin restrictions of an FPGA chip. All possible interconnect faults for programmable switches and wiring channels are considered in the fault model. The proposed method is applicable to arbitrary switch block structures. Experimental results on commercial as well as academic designed FPGAs are presented and analyzed.  相似文献   

11.
《Microelectronics Journal》2015,46(6):551-562
Most commercial Field Programmable Gate Arrays (FPGAs) have limitations in terms of density, speed, configuration overhead and power consumption mostly due to the use of SRAM cells in Look-Up Tables (LUTs), configuration memory and programmable interconnects. Also, hardwired Application Specific Integrated Circuit (ASIC) blocks designed for high performance arithmetic circuits in FPGA reduce the area available for reconfiguration. In this paper, we propose a novel generalized hybrid CMOS-memristor based architecture using stateful-NOR gates as basic building blocks for implementation of logic functions. These logic functions are implemented on memristor nanocrossbar layers, while the CMOS layer is used for selection and connection of memristors. The proposed pipelined architecture combines the features of ASIC, FPGA and microprocessor based designs. It has high density due to the use of nanocrossbar layer and high throughput especially for arithmetic circuits. The proposed architecture for three input one output logic block is compared with conventional LUT based Configurable Logic Block (CLB) having the same number of inputs and outputs; which shows 1.82×area saving, 1.57×speedup and 3.63×less power consumption. The automation algorithm to implement any logic function using proposed architecture is also presented.  相似文献   

12.
《Microelectronics Journal》2015,46(8):706-715
Detailed routing solutions for island style FPGA architectures using Boolean satisfiability (SAT) based formulations have been proposed in this paper. Due to decreasing size of ICs and hence, the increasing complexity of the routing resource constraints, routing has been a big challenge in electronic design automation field. Our proposed techniques work on multi-pin net routing where all nets are considered for routing in their intact form whereas, most of the existing routing solutions decompose multi-pin nets into two-pin nets for detailed routing to ease the problem. However this approach, apart from increasing the number of nets in the circuits, may also introduce pin doglegging which, when not permitted by the architecture of FPGA, would require extra constraints to eliminate. Many detailed routers adopt sequential detailed routing approaches which are vulnerable to the net ordering problem which may cause a routable circuit to be erroneously classified as unroutable. Our proposed techniques avoid these pitfalls by keeping the multi-pin nets intact and solve all nets simultaneously using SAT. The SAT-based multi-pin net dogleg-free formulations presented here achieve significant improvement over existing SAT-based solutions with respect to the number of variables and clauses used, thereby achieving greater scalability and also display comparable and sometimes better routability results on benchmark circuits when compared with other detailed routing solutions. Detailed routing is also significantly affected by the architecture of the switching blocks. This paper proposes SAT-based formulation for three different switch box architectures i.e. Subset, Wilton, and Universal switches. Our experiments clearly demonstrate how routing solutions for a circuit can differ significantly for different types of switch boxes.  相似文献   

13.
Field-programmable gate arrays (FPGAs) are an important implementation medium for digital logic. Unfortunately, they currently suffer from poor silicon area utilization due to routing constraints. In this paper we present Triptych, an FPGA architecture designed to achieve improved logic density with competitive performance. This is done by allowing a per-mapping tradeoff between logic and routing resources, and with a routing scheme designed to match the structure of typical circuits. We show that, using manual placement, this architecture yields a logic density improvement of up to a factor of 3.5 over commercial FPGAs, with comparable performance. We also describe Montage, the first FPGA architecture to fully support asynchronous and synchronous interface circuits  相似文献   

14.
Efficient Implementations for AES Encryption and Decryption   总被引:1,自引:0,他引:1  
This paper proposes two efficient architectures for hardware implementation of the Advanced Encryption Standard (AES) algorithm. The composite field arithmetic for implementing SubBytes (S-box) and InvSubBytes (Inverse S-box) transformations investigated by several authors is used as the basis for deriving the proposed architectures. The first architecture for encryption is based on optimized S-box followed by bit-wise implementation of MixColumns and AddRoundKey and optimized Inverse S-box followed by bit-wise implementation of InvMixColumns and AddMixRoundKey for decryption. The proposed S-box and Inverse S-box used in this architecture are designed as a cascade of three blocks. In the second proposed architecture, the block III of the proposed S-box is combined with the MixColumns and AddRoundKey transformations forming an integrated unit for encryption. An integrated unit for decryption combining the block III of the proposed InvSubBytes with InvMixColumns and AddMixRoundKey is formed on similar lines. The delays of the proposed architectures for VLSI implementation are found to be the shortest compared to the state-of-the-art implementations of AES operating in non-feedback mode. Iterative and fully unrolled sub-pipelined designs including key schedule are implemented using FPGA and ASIC. The proposed designs are efficient in terms of Kgates/Giga-bits per second ratio compared with few recent state-of-the-art ASIC (0.18-μm CMOS standard cell) based designs and throughput per area (TPA) for FPGA implementations.  相似文献   

15.
Ultra high-speed block turbo decoder architectures meet the demand for even higher data rates and open up new opportunities for the next generations of communication systems such as fiber optic transmissions. This paper presents the implementation, onto an FPGA device of an ultra high throughput block turbo code decoder. An innovative architecture of a block turbo decoder which enables the memory blocks between all half-iterations to be removed is presented. A complexity analysis of the elementary decoder leads to a low complexity decoder architecture for a negligible performance degradation. The resulting turbo decoder is implemented on a Xilinx Virtex II-Pro FPGA in a communication experimental setup which also includes an innovative parallel product encoder. The implemented block turbo decoder processes input data at 600 Mb/s. The component code is an extended Bose, Ray-Chaudhuri, Hocquenghem (eBCH(16,11)) code. Some solutions to reach even higher data rates are finally presented.  相似文献   

16.
A new algorithm that synthesises multiplier blocks with low hardware requirement suitable for implementation as part of full-parallel finite impulse response (FIR) filters is presented. Although the techniques in use are applicable to implementation on application-specific integrated circuit (ASIC) and Structured ASIC technologies, analysis is performed using field programmable gate array (FPGA) hardware. Fully pipelined, full-parallel transposed-form FIR filters with multiplier block were generated using the new and previous algorithms, implemented on an FPGA target and the results compared. Previous research in this field has concentrated on minimising multiplier block adder cost but the results presented here demonstrate that this optimisation goal does not minimise FPGA hardware. Minimising multiplier block logic depth and pipeline registers is shown to have the greatest influence in reducing FPGA area cost. In addition to providing lower area solutions than existing algorithms, comparisons with equivalent filters generated using the distributed arithmetic technique demonstrate further area advantages of the new algorithm  相似文献   

17.
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value  相似文献   

18.
A novel FPGA chip FDP2008 (Fudan Programmable Logic) has been designed and implemented with the SMIC 0.18μm CMOS logic 1P6M process. The new design method means that the configurable logic block can be configured as distributed RAM and a shift register. A universal programmable routing circuit is also presented; by adopting offset lines, complementary hanged end-lines and MUX + Buffer routing switches, the whole FPGA chip is highly repeatable, and the signal delay is uniform and predictable over the total chip. A standard configuration interface SPI is added in the configuration circuit, and a group of highly sensitive amplifiers is used to magnify the read back data. FDP2008 contains 20 × 30 logic TILEs, 200 programmable IOBs and 10 × 4 kbit dual port block RAMs. The hardware software cooperation test shows that FDP2008 works correctly and efficiently.  相似文献   

19.
20.
Distributed arithmetic techniques are the key to efficient implementation of DSP algorithms in FPGAs. The distributed arithmetic process is briefly described. A representative DSP design application in the form of an 8 tap FIR filter is offered for the Xilinx XC3042 field programmable logic array (FPGA). The design is presented in sufficient detail—from filter specifications via filter design software through detailed logic of salient data and control functions to obtain a realistic placing and routing of configurable logic block (CLBs) and in/out block (IOBs) components for simulation verification and performance evaluation vis-a-vis commercially available dedicated 8 tap FIR filter chips.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号