首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Field programmable gate arrays (FPGAs) with supply voltage (Vdd) programmability have been proposed recently to reduce FPGA power, where the Vdd-level can be customized for FPGA circuit elements and unused circuit elements can be power-gated. In this paper, we first design novel Vdd-programmable and Vdd-gateable interconnect switches with minimal number of configuration SRAM cells. We then evaluate Vdd-programmable FPGA architectures using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, our best architecture reduces the minimal energy-delay product by 54.39% with 17% more area and 3% more configuration SRAM cells. Our evaluation results also show that LUT size 4 gives the lowest energy consumption, and LUT size 7 leads to the highest performance, both for all evaluated architectures.  相似文献   

2.
《Microelectronics Journal》2015,46(6):551-562
Most commercial Field Programmable Gate Arrays (FPGAs) have limitations in terms of density, speed, configuration overhead and power consumption mostly due to the use of SRAM cells in Look-Up Tables (LUTs), configuration memory and programmable interconnects. Also, hardwired Application Specific Integrated Circuit (ASIC) blocks designed for high performance arithmetic circuits in FPGA reduce the area available for reconfiguration. In this paper, we propose a novel generalized hybrid CMOS-memristor based architecture using stateful-NOR gates as basic building blocks for implementation of logic functions. These logic functions are implemented on memristor nanocrossbar layers, while the CMOS layer is used for selection and connection of memristors. The proposed pipelined architecture combines the features of ASIC, FPGA and microprocessor based designs. It has high density due to the use of nanocrossbar layer and high throughput especially for arithmetic circuits. The proposed architecture for three input one output logic block is compared with conventional LUT based Configurable Logic Block (CLB) having the same number of inputs and outputs; which shows 1.82×area saving, 1.57×speedup and 3.63×less power consumption. The automation algorithm to implement any logic function using proposed architecture is also presented.  相似文献   

3.
As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are increasingly being used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multibit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, compared to conventional FPGA routing architectures, the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections.  相似文献   

4.
This paper describes GlitchLess, a circuit-level technique for reducing power in field-programmable gate arrays (FPGAs) by eliminating unnecessary logic transitions called glitches. This is done by adding programmable delay elements to the logic blocks of the FPGA. After routing a circuit and performing static timing analysis, these delay elements are programmed to align the arrival times of the inputs of each lookup table (LUT), thereby preventing new glitches from being generated. Moreover, the delay elements also behave as filters that eliminate other glitches generated by upstream logic or off-chip circuitry. On average, the proposed implementation eliminates 87% of the glitching, which reduces overall FPGA power by 17%. The added circuitry increases the overall FPGA area by 6% and critical-path delay by less than 1%. Furthermore, since it is applied after routing, the proposed technique requires little or no modifications to the routing architecture or computer-aided design (CAD) flow.   相似文献   

5.
This authors explore the effect of logic block architecture on the speed of a field-programmable gate array (FPGA). Four classes of logic block architecture are investigated: NAND gates, multiplexer configurations, lookup tables, and wide-input AND-OR gates. An experimental approach is taken, in which each of a set of benchmark logic circuits is synthesized into FPGAs that use different logic blocks. The speed of the resulting FPGA implementations using each logic block is measured. While the results depend on the delay of the programmable routing, experiments indicate that five- and six-input lookup tables and certain multiplexer configurations produce the lowest total delay over realistic values of routing delay. The fine grain blocks, such as the two-input NAND gate, exhibit poor performance because these gates require many levels of logic block to implement the circuits and hence require a large routing delay  相似文献   

6.
One way to reduce the delay and area of field-programmable gate arrays (FPGAs) is to employ logic-cluster-based architectures, where a logic cluster is a group of logic elements connected with high-speed local interconnections. In this paper, we empirically evaluate FPGA architectures with logic clusters ranging in size from 1 to 20, and show that compared to architectures with size 1 clusters, architectures with size 8 clusters have 23% less delay (30% faster clock speed) and require 14% less area. We also show that FPGA architectures with large cluster sizes can significantly reduce design compile time-an increasingly important concern as the logic capacity of FPGA's rises. For example, an architecture that uses size 20 clusters requires seven times less compile time than an architecture with size 1 clusters  相似文献   

7.
8.
This paper describes a new programmable routing fabric for field-programmable gate arrays (FPGAs). Our results show that an FPGA using this fabric can achieve 1.57 times lower dynamic power consumption and 1.35 times lower average net delays with only 9% reduction in logic density over a baseline island-style FPGA implemented in the same 65-nm CMOS technology. These improvements in power and delay are achieved by 1) using only short interconnect segments to reduce routed net lengths, and 2) reducing interconnect segment loading due to programming overhead relative to the baseline FPGA without compromising routability. The new routing fabric is also well-suited to monolithically stacked 3-D-IC implementation. It is shown that a 3-D-FPGA using this fabric can achieve a 3.3 times improvement in logic density, a 2.51 times improvement in delay, and a 2.93 times improvement in dynamic power consumption over the same baseline 2-D-FPGA.  相似文献   

9.
Architecture of field-programmable gate arrays   总被引:8,自引:0,他引:8  
A survey of field-programmable gate array (FPGA) architectures and the programming technologies used to customize them is presented. Programming technologies are compared on the basis of their volatility, size parasitic capacitance, resistance, and process technology complexity. FPGA architectures are divided into two constituents: logic block architectures and routing architectures. A classification of logic blocks based on their granularity is proposed, and several logic blocks used in commercially available FPGAs are described. A brief review of recent results on the effect of logic block granularity on logic density and performance of an FPGA is then presented. Several commercial routing architectures are described in the context of a general routing architecture model. Finally, recent results on the tradeoff between the flexibility of an FPGA routing architecture, its routability, and its density are reviewed  相似文献   

10.
Fabrication cost of application-specific integrated circuits (ASICs) is exponentially rising in deep submicron region due to rapidly rising non-recurring engineering cost. Field programmable gate arrays (FPGAs) provide an attractive alternative to ASICs but consume an order of magnitude higher power. There is a need to explore ways of reducing FPGA power consumption so that they can also be employed in ultra low power (ULP) applications instead of ASICs. Subthreshold region of operation is an ideal choice for ULP low-throughput FPGAs. The routing of an FPGA consumes most of the chip area and primarily determines the circuit delay and power consumption. There is a need to design moderate-speed ULP routing switches for subthreshold FPGA. This article proposes a novel subthreshold FPGA routing switch box (SB) that utilises the leakage voltage through transistor as biasing voltage which shows 69%, 61.2% and 30% improvement in delay, power delay product and delay variation, respectively, over conventional routing SB.  相似文献   

11.
With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB.  相似文献   

12.
Field programmable gate arrays usage has been growing steadily for years now. Their popularity stems from the fact that they can be reprogrammed to implement any function, with any amount of parallelism. Unfortunately, exactly due to their flexibility, FPGAs require a huge amount of resources, in the form of LUTs and routing switches, and these can take up to 90% of the chip area. In this paper we present the development of a low-power full CMOS multiple-valued logic to build a LUT for FPGAs. Several circuits are mapped to quaternary LUTs and compared to their binary counterpart. Results show great improvements in terms of area and power consumption. Moreover, we show the positive impact of the proposed architecture in the global reduction of routing switches and wiring, and hence in the total FPGA area.  相似文献   

13.
Low leakage techniques for FPGAs   总被引:1,自引:0,他引:1  
Reconfigurable architectures are well suited for wireless applications since they provide high performance computation together with the capability to adapt to changing communication protocols. Moving to 90-nm technology and below, FPGAs could suffer from leakage energy consumption due to the large number of inactive transistors. This paper presents an extensive study on the application of different low-leakage techniques to the design of FPGAs. The approaches are compared and mixed to find an implementation of switch blocks and look-up tables which reduces leakage without affecting delay and area. The circuits we propose achieve an 86% stand-by energy saving and 46% active leakage reduction with respect to standard implementations. The FPGA delay is not affected, while area is increased by only 3%.  相似文献   

14.
陈星  王丽云  王元  吴方  王健  陈利光  来金梅 《电子学报》2011,39(5):1165-1168
传统的可编程互联结构在短距离互连上往往采用单管、中距离上有双向线,这使得在CLB中查找表(LUT)数目变大后,互连上的延迟会随线长增加而呈指数增长.本文提出了一种改进的高性能互连结构,改进了短、中和长距离互连,使得其在CLB中LUT数目增加的情况下让芯片拥有更好的互连延迟特性,通过对这种互连结构和传统的互连结构进行建模...  相似文献   

15.
Field-programmable gate arrays (FPGAs) are an important implementation medium for digital logic. Unfortunately, they currently suffer from poor silicon area utilization due to routing constraints. In this paper we present Triptych, an FPGA architecture designed to achieve improved logic density with competitive performance. This is done by allowing a per-mapping tradeoff between logic and routing resources, and with a routing scheme designed to match the structure of typical circuits. We show that, using manual placement, this architecture yields a logic density improvement of up to a factor of 3.5 over commercial FPGAs, with comparable performance. We also describe Montage, the first FPGA architecture to fully support asynchronous and synchronous interface circuits  相似文献   

16.
This article proposes a Configurable Memristive Logic Block (CMLB) that comprises of novel memristive logic cells. The memristive logic cells are constructed from memristive D flip-flop, 6-bit non-volatile look-up table (NVLUT), and multiplexers. The memristive logic cells are interconnected using memristive switch matrix cells to form the CMLB. The CMLB is then used to construct a memristor-based FPGA architecture. The proposed CMLB shows a reduction of 8.6% of device area and 1.094 times lesser critical path delay against the SRAM-based FPGA architecture. Against similar CMOS-based circuits, the memristive D flip-flop provides switching speed of 1.08 times faster, the NVLUT reduces power consumption by 6.25 nW, and the memristive logic cells reduce device area by 60.416 µm2. In this research work also, various memristor-based FPGA architectures found in the literature are compared against the SRAM-based FPGA architecture.  相似文献   

17.
Distributed arithmetic techniques are the key to efficient implementation of DSP algorithms in FPGAs. The distributed arithmetic process is briefly described. A representative DSP design application in the form of an 8 tap FIR filter is offered for the Xilinx XC3042 field programmable logic array (FPGA). The design is presented in sufficient detail—from filter specifications via filter design software through detailed logic of salient data and control functions to obtain a realistic placing and routing of configurable logic block (CLBs) and in/out block (IOBs) components for simulation verification and performance evaluation vis-a-vis commercially available dedicated 8 tap FIR filter chips.  相似文献   

18.
Logic emulation is so far the fastest method to verify the system functionality in the gate level before chip fabrication. Field-programmable gate array (FPGA)-based logic emulator with large gate capacity generally comprises a large number of FPGAs or special processors connected in mesh or crossbar topology. However, gate utilization of FPGAs and speed of emulation are limited by the number of signal pins among FPGAs and the interconnection architecture of the logic emulator. This paper first describes a new interconnection architecture called TOMi (Time-multiplexed, Off-chip, Multicasting interconnection) and proposes a circuit partitioning algorithm called ATOMi (Algorithm for TOMi) for multi-FPGA system incorporating four to eight FPGAs where FPGAs are interconnected through TOMi. ATOMi reduces the number of off-chip signal transfers to optimize the performance for multi-FPGA system implemented by TOMi. Experimental results using Partitioning93 benchmarks show that, by adopting the proposed TOMi interconnection architecture along with ATOMi, the pin count is reduced to 14.4%–88.6% while the critical path delay is reduced to 66.1%–90.1% compared to traditional architectures including mesh, crossbar, and VirtualWire architecture.  相似文献   

19.
This paper introduces a novel ultra-low-power SRAM. A large power reduction is obtained by the use of four new techniques that allow for a wider and better trade-off between area, delay and active and passive energy consumption for low-power embedded SRAMs. The design targets wireless applications that require a moderate performance at an ultra-low-power consumption. The implemented design techniques consist of a more efficient memory databus, the exploitation of the dynamic read stability of SRAM cells, a new low-swing write technique and a distributed decoder. An 8-KB 5T SRAM was fabricated in a 0.18-mum technology. The measurement results confirm the feasibility and the usefulness of the proposed techniques. A reduction of active power consumption with a factor of 2 is reported as compared to the current state of the art. The results are generalized towards a 32-KB SRAM.  相似文献   

20.
Asynchronous serial transceivers have been recently used for data serializing in large on-chip systems to alleviate the routing congestion and improve the routability. FPGAs have considerable potential for using the asynchronous serial transmission but they have serious challenges to use this technology. In this paper, we present a new FPGA architecture corresponding with a new routing algorithm to use the asynchronous data serializing technique in modern FPGAs. Experimental results show that allocated routing tracks and routing congestion can be reduced considerably (18.81% and 48.73%, respectively) by using the asynchronous data serializing without any performance degradation in cost of reasonable overhead in area and power consumption. The resulting improvements will increase for larger and more complex FPGAs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号