期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using the Minimum Set of Input Combinations to Minimize the Area of Local Routing Networks in Logic Clusters Containing Logically Equivalent I/Os in FPGAs

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2010,18(1):95-107

Mapping digital circuits onto field-programmable gate arrays (FPGAs) usually consists of two steps. First, circuits are mapped into look-up tables (LUTs). Then, LUTs are mapped onto physical resources. The configuration of LUTs is usually determined during the first step and remains unchanged throughout the second. In this paper, we demonstrate that by reconfiguring LUTs during the second step, one can increase the flexibility of FPGA routing resources. This increase in flexibility can then be used to reduce the implementation area of FPGAs. In particular, it is shown that, for a logic cluster with $ I$ inputs and $ N$ $ k$-input LUTs, a set of $Ntimes kquad (I+N-k+1):1$ multiplexers can be used to connect logic cluster inputs to LUT inputs while maintaining logic equivalency among the logic cluster I/Os. The multiplexers (called a local routing network) are shown to be the minimum required to maintain logic equivalency. Comparing to the previous design, which employs a fully connected local routing network, the proposed design can reduce logic cluster area by 3%–25% and can reduce a significant amount of fanouts for logic cluster inputs. 相似文献

2.

Sharing of SRAM Tables Among NPN-Equivalent LUTs in SRAM-Based FPGAs

Meyer J. Kocan F. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(2):182-195

This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3% 相似文献

3.

A routing algorithm for FPGAs with time-multiplexed interconnects

Ruiqi Luo Xiaolei Chen Yajun Ha 《半导体学报》2020,(2):73-82

Previous studies show that interconnects occupy a large portion of the timing budget and area in FPGAs.In this work,we propose a time-multiplexing technique on FPGA interconnects.In order to fully exploit this interconnect architecture,we propose a time-multiplexed routing algorithm that can actively identify qualified nets and schedule them to multiplexable wires.We validate the algorithm by using the router to implement 20 benchmark circuits to time-multiplexed FPGAs.We achieve a 38%smaller minimum channel width and 3.8%smaller circuit critical path delay compared with the state-of-the-art architecture router when a wire can be time-multiplexed six times in a cycle. 相似文献

4.

Metro-on-FPGA: A feasible solution to improve the congestion and routing resource management in future FPGAs

A. Belghadr A. Jahanian 《Integration, the VLSI Journal》2014

Asynchronous serial transceivers have been recently used for data serializing in large on-chip systems to alleviate the routing congestion and improve the routability. FPGAs have considerable potential for using the asynchronous serial transmission but they have serious challenges to use this technology. In this paper, we present a new FPGA architecture corresponding with a new routing algorithm to use the asynchronous data serializing technique in modern FPGAs. Experimental results show that allocated routing tracks and routing congestion can be reduced considerably (18.81% and 48.73%, respectively) by using the asynchronous data serializing without any performance degradation in cost of reasonable overhead in area and power consumption. The resulting improvements will increase for larger and more complex FPGAs. 相似文献

5.

The memory/logic interface in FPGAs with large embedded memoryarrays

Wilton S.J.E. Rose J. Vranesic Z.G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(1):80-91

As the capacities of field-programmable gate arrays (FPGAs) grow, they will be used to implement much larger circuits than ever before. These larger circuits often require significant amounts of storage. In order to address these storage requirements, FPGAs with large embedded memory arrays are now being developed by several vendors. One of the crucial components of an FPGA with on-chip memory is the routing structure between the memory arrays and logic resources. If this memory/logic interface is not flexible enough, many circuits will be unroutable, while if it is too flexible, it will be slower and consume more chip area than is necessary. In this paper, we show that an interconnect in which each memory pin can connect to between four and seven logic routing tracks is best in terms of both area and speed. We also show that by adding switches to support nets that connect multiple memory arrays, we can reduce the memory access time by up to 25% and improve the routability slightly 相似文献

6.

The effect of LUT and cluster size on deep-submicron FPGA performance and density 总被引：2，自引：0，他引：2

Ahmed E. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(3):288-298

In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA. 相似文献

7.

An automated approach for locating multiple faulty LUTs in an FPGA

T. Nandha Kumar Chia Wai Chong 《Microelectronics Reliability》2008,48(11-12):1900-1906

相似文献

8.

A novel robust FPGA routing switch box design for ultra low power applications

S.D. Pable Mohd. Hasan 《International Journal of Electronics》2013,100(1):15-27

Fabrication cost of application-specific integrated circuits (ASICs) is exponentially rising in deep submicron region due to rapidly rising non-recurring engineering cost. Field programmable gate arrays (FPGAs) provide an attractive alternative to ASICs but consume an order of magnitude higher power. There is a need to explore ways of reducing FPGA power consumption so that they can also be employed in ultra low power (ULP) applications instead of ASICs. Subthreshold region of operation is an ideal choice for ULP low-throughput FPGAs. The routing of an FPGA consumes most of the chip area and primarily determines the circuit delay and power consumption. There is a need to design moderate-speed ULP routing switches for subthreshold FPGA. This article proposes a novel subthreshold FPGA routing switch box (SB) that utilises the leakage voltage through transistor as biasing voltage which shows 69%, 61.2% and 30% improvement in delay, power delay product and delay variation, respectively, over conventional routing SB. 相似文献

9.

Low-Power Programmable FPGA Routing Circuitry

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(8):1048-1060

We consider circuit techniques for reducing field-programmable gate-array (FPGA) power consumption and propose a family of new FPGA routing switch designs that are programmable to operate in three different modes: high-speed, low-power, or sleep. High-speed mode provides similar power and performance to traditional FPGA routing switches. In low-power mode, speed is curtailed in order to reduce power consumption. Leakage is reduced by 28%–52% in low-power versus high-speed mode, depending on the particular switch design selected. Dynamic power is reduced by 28%–31% in low-power mode. Leakage power in sleep mode, which is suitable for unused routing switches, is 61%–79% lower than in high-speed mode. Each of the proposed switch designs has a different power/area/speed tradeoff. All of the designs require only minor changes to a traditional routing switch and involve relatively small area overhead, making them easy to incorporate into current commercial FPGAs. The applicability of the new switches is motivated through an analysis of timing slack in industrial FPGA designs. It is observed that a considerable fraction of routing switches may be slowed down (operate in low-power mode), without impacting overall design performance. 相似文献

10.

Analysis of ageing effects on ARTIX7 XILINX FPGA

《Microelectronics Reliability》2017

FPGAs are considered as an attractive alternative to ASICs, thanks to their reconfigurability and their low development costs. However, since they are the first experiencing new technology nodes, their ability to tackle VLSI ageing mechanisms is crucial, especially in critical applications such as space and avionics ones. This work aims to understand ageing degradation on FPGAs. An experimental approach is adopted in order to characterize the effects of degradation on FPGAs Look up tables (LUTs). Different stress conditions were tested to accelerate ageing process and identify the mechanisms behind. Ageing tests have been executed on a total of 17 FPGAs belonging to Artix7 XILINX family. Results show that Negative-Bias Temperature Instability ageing damage is the main cause of timing degradation on the studied FPGAs. 相似文献

11.

Architectural Modifications to Enhance the Floating-Point Performance of FPGAs

Beauchamp M.J. Hauck S. Underwood K.D. Hemmert K.S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(2):177-187

With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB. 相似文献

12.

FPGA 布线开关的电路设计

郑泉智杨银堂高海霞《电子器件》2003,26(4):344-347,364

在分析隔离岛式FPGA布线结构的基础上，设计了导通晶体管布线开关和三态缓冲布线开关。设计了级恢复电路，解决了导通晶体管开关引起的静态功耗问题。提出了基于扇人的三态缓冲开关bufm，避免了一般缓冲开关的扇出问题。最后，我们对各种布线开关的延时特性作了比较，提出了一些合理的建议。相似文献

13.

Region-Based Routing: A Mechanism to Support Efficient Routing Algorithms in NoCs

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(3):356-369

An efficient routing algorithm is important for large on-chip networks [network-on-chip (NoC)] to provide the required communication performance to applications. Implementing NoC using table-based switches provide many advantages, including possibility of changing routing algorithms and fault tolerance, due to the option of table reconfigurations. However, table-based switches have been considered unsuitable for NoCs due to their perceived high area and power consumption. In this paper, we describe the region-based routing (RBR) mechanism which groups destinations into network regions allowing an efficient implementation with logic blocks. RBR can also be viewed as a mechanism to reduce the number of entries in routing tables. RBR is general and can be used in conjunction with any adaptive routing algorithm. In particular, we have evaluated the proposed scheme in conjunction with a general routing algorithm, namely segment-based routing (SR) and an Application Specific Routing Algorithm (APSRA) using regular and irregular mesh topologies. Our study shows that the number of entries in the table is significantly reduced, especially for large networks. Evaluation results show that RBR requires only four regions to support several routing algorithms in a 2-D mesh with no performance degradation. Considering link failures, our results indicate that RBR combined with SR is able to tolerate up to 7 link failures in an 8 $,times,$8 mesh. RBR also reduces area and power dissipation of an equivalent table-based implementation by factors of 8 and 10, respectively. Moreover, the degradation in performance of the network is insignificant when using APSRA combined with RBR. 相似文献

14.

The Triptych FPGA architecture

Borriello G. Ebeling C. Hauck S.A. Burns S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(4):491-501

Field-programmable gate arrays (FPGAs) are an important implementation medium for digital logic. Unfortunately, they currently suffer from poor silicon area utilization due to routing constraints. In this paper we present Triptych, an FPGA architecture designed to achieve improved logic density with competitive performance. This is done by allowing a per-mapping tradeoff between logic and routing resources, and with a routing scheme designed to match the structure of typical circuits. We show that, using manual placement, this architecture yields a logic density improvement of up to a factor of 3.5 over commercial FPGAs, with comparable performance. We also describe Montage, the first FPGA architecture to fully support asynchronous and synchronous interface circuits 相似文献

15.

New Non-Volatile Memory Structures for FPGA Architectures

Choi D. Kyu Choi Villasenor J.D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(7):874-881

A new set of programmable elements (PEs) using a new non-volatile device for use with routing switches and logical elements within a field-programmable gate array (FPGA) is described. The PEs have small area, can be combined with components that use low operational voltage on the same CMOS logic process, are non-volatile, enable the use of fast thin-oxide pass transistors, and are reprogrammable. A novel non-volatile flip-flop for use within the logical elements is presented as well. In combination, these methods enable programmable logic devices with improved area efficiency, the speed advantages of SRAM-based FPGAs, and a wide range of opportunities for power down strategies. 相似文献

16.

Resource requirements and layouts for field programmableinterconnection chips

Bhatia D. Haralambides J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(3):346-355

Field-programmable interconnection chips (FPIC's) provide the capability of realizing user programmable interconnection for any desired permutation. Such an interconnection is very much desired for supporting rapid prototyping of hardware systems and for providing programmable communication networks for parallel and distributed computing. An FPIC should realize any possible permutation of input to output pins via a set of programmable switches. In this paper, we show that any such architecture requires a minimum of Ω(n log n) switches, where Ω is the number of I/O pins. The result stems from an analysis of the underlying permutation network. In addition, for networks of bounded degree d, we prove an Ω(log_d-1 n) bound on the routing delay (maximum length of routing paths for specific I/O permutations) and an Ω(n log_d-1 n) bound on the average utilization of programmable switches used by the FPIC to implement a specific permutation. For the same type of networks, we prove an Ω(n log_d-1 n) bound on the number of nodes of the network. Furthermore, we design efficient architectures for FPIC's offering a wide variety of routing delays, high average programmable resource utilization, and O(n²)-area two-layer layouts. The proposed structures are called hybrid Benes-Crossbar (HBC) architectures and clearly exhibit a tradeoff between performance (routing delay utilization) and area of the layout 相似文献

17.

Optimized Implementation of RNS FIR Filters Based on FPGAs

Salvatore Pontarelli Gian Carlo Cardarilli Marco Re Adelio Salsano 《Journal of Signal Processing Systems》2012,67(3):201-212

In this paper optimized Residue Number System (RNS) arithmetic blocks to better exploit some of the architectural characteristics of the last generation FPGAs are presented. The implementation of modulo m adders, modulo m constant and general multipliers, input and output converters are presented. These architectures are based on moduli sets chosen in order to optimally use the 6-input Look-Up Tables (LUTs) available in the Complex Logic Blocks (CLBs) of the new generation FPGAs. Experiments based on the implementation of Finite Impulse Response (FIR) filters characterized by different number of taps and wordlengths shows that the use of RNS together with suitable moduli sets optimally fits the 6-input LUTs in the last generation FPGAs architectures. 相似文献

18.

Evaluating logic functionality of cascaded fracturable LUTs

下载免费PDF全文

GUO Zhenhong LIN Yu LI Tianyi JIA Rui GAO Tongqiang YANG Haigang 《太赫兹科学与电子信息学报》2016,14(3):474-480

Look Up Tables(LUTs) are the key components of Field-Programmable Gate Arrays(FPGAs). Many LUT architectures have been studied; nevertheless, it is difficult to quantificationally evaluate an LUT based architecture. Traditionally, dedicated efforts on specific modifications to the technology mapping tools are required for LUT architecture evaluation. A more feasible evaluation method for logic functionality is strongly required for the design of LUT architecture. In this paper, a mathematical method for logic functionality calculation is proposed and conventional and fracturable LUT architectures are analyzed. Furthermore, a cascaded fracturable LUT architecture is presented, which achieves twice logic functionality compared with the conventional LUTs and fracturable LUTs. 相似文献

19.

Mesh routing topologies for multi-FPGA systems

Hauck S. Borriello G. Ebeling C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(3):400-408

There is currently great interest in using fixed arrays of FPGAs for logic emulators, custom computing devices, and software accelerators. An important part of designing such a system is determining the proper routing topology to use to interconnect the FPGAs. This topology can have a great effect on the area and delay of the resulting system. Crossbar, Hierarchical Crossbar, and Mesh interconnection schemes have all been proposed for use in FPGA-based systems. In this paper, we examine Mesh interconnection schemes, and propose several constructs for more efficient topologies. These reduce interchip delays by more than 60% over the basic four-way Mesh 相似文献

20.

A novel and efficient routing architecture for multi-FPGA systems

Khalid M.A.S. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):30-39

Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value 相似文献