共查询到20条相似文献,搜索用时 15 毫秒
1.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2010,18(1):95-107
2.
Meyer J. Kocan F. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(2):182-195
This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3% 相似文献
3.
Previous studies show that interconnects occupy a large portion of the timing budget and area in FPGAs.In this work,we propose a time-multiplexing technique on FPGA interconnects.In order to fully exploit this interconnect architecture,we propose a time-multiplexed routing algorithm that can actively identify qualified nets and schedule them to multiplexable wires.We validate the algorithm by using the router to implement 20 benchmark circuits to time-multiplexed FPGAs.We achieve a 38%smaller minimum channel width and 3.8%smaller circuit critical path delay compared with the state-of-the-art architecture router when a wire can be time-multiplexed six times in a cycle. 相似文献
4.
Asynchronous serial transceivers have been recently used for data serializing in large on-chip systems to alleviate the routing congestion and improve the routability. FPGAs have considerable potential for using the asynchronous serial transmission but they have serious challenges to use this technology. In this paper, we present a new FPGA architecture corresponding with a new routing algorithm to use the asynchronous data serializing technique in modern FPGAs. Experimental results show that allocated routing tracks and routing congestion can be reduced considerably (18.81% and 48.73%, respectively) by using the asynchronous data serializing without any performance degradation in cost of reasonable overhead in area and power consumption. The resulting improvements will increase for larger and more complex FPGAs. 相似文献
5.
Wilton S.J.E. Rose J. Vranesic Z.G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(1):80-91
As the capacities of field-programmable gate arrays (FPGAs) grow, they will be used to implement much larger circuits than ever before. These larger circuits often require significant amounts of storage. In order to address these storage requirements, FPGAs with large embedded memory arrays are now being developed by several vendors. One of the crucial components of an FPGA with on-chip memory is the routing structure between the memory arrays and logic resources. If this memory/logic interface is not flexible enough, many circuits will be unroutable, while if it is too flexible, it will be slower and consume more chip area than is necessary. In this paper, we show that an interconnect in which each memory pin can connect to between four and seven logic routing tracks is best in terms of both area and speed. We also show that by adding switches to support nets that connect multiple memory arrays, we can reduce the memory access time by up to 25% and improve the routability slightly 相似文献
6.
Ahmed E. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(3):288-298
In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA. 相似文献
7.
8.
Fabrication cost of application-specific integrated circuits (ASICs) is exponentially rising in deep submicron region due to rapidly rising non-recurring engineering cost. Field programmable gate arrays (FPGAs) provide an attractive alternative to ASICs but consume an order of magnitude higher power. There is a need to explore ways of reducing FPGA power consumption so that they can also be employed in ultra low power (ULP) applications instead of ASICs. Subthreshold region of operation is an ideal choice for ULP low-throughput FPGAs. The routing of an FPGA consumes most of the chip area and primarily determines the circuit delay and power consumption. There is a need to design moderate-speed ULP routing switches for subthreshold FPGA. This article proposes a novel subthreshold FPGA routing switch box (SB) that utilises the leakage voltage through transistor as biasing voltage which shows 69%, 61.2% and 30% improvement in delay, power delay product and delay variation, respectively, over conventional routing SB. 相似文献
9.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(8):1048-1060
10.
FPGAs are considered as an attractive alternative to ASICs, thanks to their reconfigurability and their low development costs. However, since they are the first experiencing new technology nodes, their ability to tackle VLSI ageing mechanisms is crucial, especially in critical applications such as space and avionics ones. This work aims to understand ageing degradation on FPGAs. An experimental approach is adopted in order to characterize the effects of degradation on FPGAs Look up tables (LUTs). Different stress conditions were tested to accelerate ageing process and identify the mechanisms behind. Ageing tests have been executed on a total of 17 FPGAs belonging to Artix7 XILINX family. Results show that Negative-Bias Temperature Instability ageing damage is the main cause of timing degradation on the studied FPGAs. 相似文献
11.
Beauchamp M.J. Hauck S. Underwood K.D. Hemmert K.S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(2):177-187
With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB. 相似文献
12.
13.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(3):356-369
14.
Borriello G. Ebeling C. Hauck S.A. Burns S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(4):491-501
Field-programmable gate arrays (FPGAs) are an important implementation medium for digital logic. Unfortunately, they currently suffer from poor silicon area utilization due to routing constraints. In this paper we present Triptych, an FPGA architecture designed to achieve improved logic density with competitive performance. This is done by allowing a per-mapping tradeoff between logic and routing resources, and with a routing scheme designed to match the structure of typical circuits. We show that, using manual placement, this architecture yields a logic density improvement of up to a factor of 3.5 over commercial FPGAs, with comparable performance. We also describe Montage, the first FPGA architecture to fully support asynchronous and synchronous interface circuits 相似文献
15.
Choi D. Kyu Choi Villasenor J.D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(7):874-881
A new set of programmable elements (PEs) using a new non-volatile device for use with routing switches and logical elements within a field-programmable gate array (FPGA) is described. The PEs have small area, can be combined with components that use low operational voltage on the same CMOS logic process, are non-volatile, enable the use of fast thin-oxide pass transistors, and are reprogrammable. A novel non-volatile flip-flop for use within the logical elements is presented as well. In combination, these methods enable programmable logic devices with improved area efficiency, the speed advantages of SRAM-based FPGAs, and a wide range of opportunities for power down strategies. 相似文献
16.
Bhatia D. Haralambides J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(3):346-355
Field-programmable interconnection chips (FPIC's) provide the capability of realizing user programmable interconnection for any desired permutation. Such an interconnection is very much desired for supporting rapid prototyping of hardware systems and for providing programmable communication networks for parallel and distributed computing. An FPIC should realize any possible permutation of input to output pins via a set of programmable switches. In this paper, we show that any such architecture requires a minimum of Ω(n log n) switches, where Ω is the number of I/O pins. The result stems from an analysis of the underlying permutation network. In addition, for networks of bounded degree d, we prove an Ω(logd-1 n) bound on the routing delay (maximum length of routing paths for specific I/O permutations) and an Ω(n logd-1 n) bound on the average utilization of programmable switches used by the FPIC to implement a specific permutation. For the same type of networks, we prove an Ω(n logd-1 n) bound on the number of nodes of the network. Furthermore, we design efficient architectures for FPIC's offering a wide variety of routing delays, high average programmable resource utilization, and O(n2)-area two-layer layouts. The proposed structures are called hybrid Benes-Crossbar (HBC) architectures and clearly exhibit a tradeoff between performance (routing delay utilization) and area of the layout 相似文献
17.
Salvatore Pontarelli Gian Carlo Cardarilli Marco Re Adelio Salsano 《Journal of Signal Processing Systems》2012,67(3):201-212
In this paper optimized Residue Number System (RNS) arithmetic blocks to better exploit some of the architectural characteristics
of the last generation FPGAs are presented. The implementation of modulo m adders, modulo m constant and general multipliers, input and output converters are presented. These architectures are based on moduli sets
chosen in order to optimally use the 6-input Look-Up Tables (LUTs) available in the Complex Logic Blocks (CLBs) of the new
generation FPGAs. Experiments based on the implementation of Finite Impulse Response (FIR) filters characterized by different
number of taps and wordlengths shows that the use of RNS together with suitable moduli sets optimally fits the 6-input LUTs
in the last generation FPGAs architectures. 相似文献
18.
Look Up Tables(LUTs) are the key components of Field-Programmable Gate Arrays(FPGAs). Many LUT architectures have been studied; nevertheless, it is difficult to quantificationally evaluate an LUT based architecture. Traditionally, dedicated efforts on specific modifications to the technology mapping tools are required for LUT architecture evaluation. A more feasible evaluation method for logic functionality is strongly required for the design of LUT architecture. In this paper, a mathematical method for logic functionality calculation is proposed and conventional and fracturable LUT architectures are analyzed. Furthermore, a cascaded fracturable LUT architecture is presented, which achieves twice logic functionality compared with the conventional LUTs and fracturable LUTs. 相似文献
19.
Hauck S. Borriello G. Ebeling C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(3):400-408
There is currently great interest in using fixed arrays of FPGAs for logic emulators, custom computing devices, and software accelerators. An important part of designing such a system is determining the proper routing topology to use to interconnect the FPGAs. This topology can have a great effect on the area and delay of the resulting system. Crossbar, Hierarchical Crossbar, and Mesh interconnection schemes have all been proposed for use in FPGA-based systems. In this paper, we examine Mesh interconnection schemes, and propose several constructs for more efficient topologies. These reduce interchip delays by more than 60% over the basic four-way Mesh 相似文献
20.
Khalid M.A.S. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):30-39
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value 相似文献