期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sharing of SRAM Tables Among NPN-Equivalent LUTs in SRAM-Based FPGAs

Meyer J. Kocan F. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(2):182-195

This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3% 相似文献

2.

基于LUT的SRAM-FPGA结构研究 总被引：3，自引：0，他引：3

下载免费PDF全文

马群刚杨银堂李跃进高海霞《电子器件》2003,26(1):10-14

作为微电子工业中发展最迅速的一个领域，现场可编程门阵列（FPGA）的内部结构设计越来越受到业内人士的关注。为此针对目前普遍采用的基于查找表（LUT）的SRAM－FPGA，着重研究了其逻辑模块设计、布线结构设计和输入输出模块设计，同时也对此类FPGA基本结构进行了优化设计。相似文献

3.

LUT尺寸对FPGA面积和延迟影响的理论分析

高海霞杨银堂董刚《半导体学报》2005,26(5):893-898

在分析隔离岛式FPGA结构的基础上,提出了基于LUT的面积和延迟模型,用于分析LUT尺寸对FPGA面积和性能的影响.结果表明利用计算模型得到的最佳LUT尺寸与实验结论一致:4-LUT获得最好的面积有效性,5-LUT获得较好的延迟. 相似文献

4.

基于LUT 的SRAM2FPGA 结构研究

下载免费PDF全文

马群刚杨银堂李跃进高海霞《电子器件》2003,26(1)

作为微电子工业中发展最迅速的一个领域,现场可编程门阵列(FPGA) 的内部结构设计越来越受到业内人士的关注。为此针对目前普遍采用的基于查找表(LUT) 的SRAM2FPGA ,着重研究了其逻辑模块设计、布线结构设计和输入输出模块设计,同时也对此类FPGA 基本结构进行了优化设计。相似文献

5.

FPGA高性能查找表的设计与实现

张惠国唐玉兰于宗光陶宇峰《固体电子学研究与进展》2009,29(4)

从电路角度探讨了查找表(LUT)实现原理,基于双相不交叠时钟,设计实现了一种LUT,能高效地完成移位寄存器与RAM的功能扩展。基于SMIC0.25μmCMOS工艺优化设计了对应的版图,给出了相应的HSPICE仿真结果。此电路结构增强了逻辑块的性能,提高了FPGA的整体效率与灵活性,已被应用于FPGA的设计中。相似文献

6.

Theoretical Analysis of Effect of LUT Size on Area and Delay of FPGA 总被引：1，自引：1，他引：0

Gao Haixi Yang Yintang an Dong Gang 《半导体学报》2005,26(5):893-898

Based on architecture analysis of island style FPGA,area and delay models of LUT FPGA are proposed.The models are used to analyze the effect of LUT size on FPGA area and performance.Results show optimal LUT size obtained by computation models is the same as that from experiments:a LUT size of 4 produces the best area results,and a LUT size of 5 provides the better performance. 相似文献

7.

The effect of LUT and cluster size on deep-submicron FPGA performance and density 总被引：2，自引：0，他引：2

Ahmed E. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(3):288-298

In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA. 相似文献

8.

Speed and area tradeoffs in cluster-based FPGA architectures

Marquardt A. Betz V. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):84-93

One way to reduce the delay and area of field-programmable gate arrays (FPGAs) is to employ logic-cluster-based architectures, where a logic cluster is a group of logic elements connected with high-speed local interconnections. In this paper, we empirically evaluate FPGA architectures with logic clusters ranging in size from 1 to 20, and show that compared to architectures with size 1 clusters, architectures with size 8 clusters have 23% less delay (30% faster clock speed) and require 14% less area. We also show that FPGA architectures with large cluster sizes can significantly reduce design compile time-an increasingly important concern as the logic capacity of FPGA's rises. For example, an architecture that uses size 20 clusters requires seven times less compile time than an architecture with size 1 clusters 相似文献

9.

Design and implementation of a programming circuit in radiation-hardened FPGA

吴利华韩小炜赵岩刘忠立于芳陈陵都《半导体学报》2011,32(8):132-137

We present a novel programming circuit used in our radiation-hardened field programmable gate array （FPGA） chip.This circuit provides the ability to write user-defined configuration data into an FPGA and then read it back.The proposed circuit adopts the direct-access programming point scheme instead of the typical long token shift register chain.It not only saves area but also provides more flexible configuration operations.By configuring the proposed partial configuration control register,our smallest configuration section can be conveniently configured as a single data and a flexible partial configuration can be easily implemented.The hierarchical simulation scheme, optimization of the critical path and the elaborate layout plan make this circuit work well.Also,the radiation hardened by design programming point is introduced.This circuit has been implemented in a static random access memory（SRAM）-based FPGA fabricated by a 0.5μm partial-depletion silicon-on-insulator CMOS process.The function test results of the fabricated chip indicate that this programming circuit successfully realizes the desired functions in the configuration and read-back.Moreover,the radiation test results indicate that the programming circuit has total dose tolerance of 1×10⁵ rad（Si）,dose rate survivability of 1.5×10¹¹ rad（Si）/s and neutron fluence immunity of 1×10¹⁴ n/cm². 相似文献

10.

Stateful-NOR based reconfigurable architecture for logic implementation

《Microelectronics Journal》2015,46(6):551-562

Most commercial Field Programmable Gate Arrays (FPGAs) have limitations in terms of density, speed, configuration overhead and power consumption mostly due to the use of SRAM cells in Look-Up Tables (LUTs), configuration memory and programmable interconnects. Also, hardwired Application Specific Integrated Circuit (ASIC) blocks designed for high performance arithmetic circuits in FPGA reduce the area available for reconfiguration. In this paper, we propose a novel generalized hybrid CMOS-memristor based architecture using stateful-NOR gates as basic building blocks for implementation of logic functions. These logic functions are implemented on memristor nanocrossbar layers, while the CMOS layer is used for selection and connection of memristors. The proposed pipelined architecture combines the features of ASIC, FPGA and microprocessor based designs. It has high density due to the use of nanocrossbar layer and high throughput especially for arithmetic circuits. The proposed architecture for three input one output logic block is compared with conventional LUT based Configurable Logic Block (CLB) having the same number of inputs and outputs; which shows 1.82×area saving, 1.57×speedup and 3.63×less power consumption. The automation algorithm to implement any logic function using proposed architecture is also presented. 相似文献

11.

A novel FPGA architecture supporting wide, shallow memories

Oldridge S.W. Wilton S.J.E. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(6):758-762

This paper investigates an architecture designed to implement wide, shallow memories on a field programmable gate array (FPGA). In the proposed architecture, existing configuration memory normally used to control the connectivity pattern of the FPGA is made user accessible. Typically, not all the switch blocks in an FPGA are used to transport signals. By adding only a modest amount of circuitry, the configuration memory in these unused switch blocks (or unused paths within used switch blocks) can be used to implement wide, shallow buffers and other similar memory structures. The size of FPGA required to implement a benchmark circuit that makes use of the wide, shallow memories, is 20% smaller than a standard memory architecture. In addition, the benchmark circuit is on average 40% faster using the proposed architecture. 相似文献

12.

A Library-Based Early Soft Error Sensitivity Analysis Technique for SRAM-Based FPGA Design

C. Thibeault Y. Hariri S. R. Hasan C. Hobeika Y. Savaria Y. Audet F. Z. Tazi 《Journal of Electronic Testing》2013,29(4):457-471

相似文献

13.

Evaluating Different Solutions to Design Fault Tolerant Systems with SRAM-based FPGAs

L. Sterpone M. Sonza Reorda M. Violante F. Lima Kastensmidt L. Carro 《Journal of Electronic Testing》2007,23(1):47-54

The latest SRAM-based FPGA devices are making the development of low-cost, high-performance, re-configurable systems feasible, paving the way for innovative architectures suitable for mission- or safety-critical applications, such as those dominating the space or avionic fields. Unfortunately, SRAM-based FPGAs are extremely sensitive to Single Event Upsets (SEUs) induced by radiation. SEUs may alter the logic value stored in the memory elements the FPGAs embed. A large part of the FPGA memory elements is dedicated to the configuration memory, whose content dictates how the resources inside the FPGA have to be used to implement any given user circuit, SEUs affecting configuration memory cells can be extremely critics. Facing the effects of SEUs through radiation-hardened FPGAs is not cost-effective. Therefore, various fault-tolerant design techniques have been devised for developing dependable solutions, starting from Commercial-Off-The-Shelf (COTS) SRAM-based FPGAs. These techniques present advantages and disadvantages that must be evaluated carefully to exploit them successfully. In this paper we mainly adopted an empirical analysis approach. We evaluated the reliability of a multiplier, a digital FIR filter, and an 8051 microprocessor implemented in SRAM-based FPGA’s, by means of extensive fault-injection experiments, assessing the capability provided by different design techniques of tolerating SEUs within the FPGA configuration memory. Experimental results demonstrate that by combining architecture-level solutions (based on redundancy) with layout-level solutions (based on reliability-oriented place and route) designers may implement reliable re-configurable systems choosing the best solution that minimizes the penalty in terms of area and speed degradation. 相似文献

14.

GlitchLess: Dynamic Power Minimization in FPGAs Through Edge Alignment and Glitch Filtering

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(11):1521-1534

This paper describes GlitchLess, a circuit-level technique for reducing power in field-programmable gate arrays (FPGAs) by eliminating unnecessary logic transitions called glitches. This is done by adding programmable delay elements to the logic blocks of the FPGA. After routing a circuit and performing static timing analysis, these delay elements are programmed to align the arrival times of the inputs of each lookup table (LUT), thereby preventing new glitches from being generated. Moreover, the delay elements also behave as filters that eliminate other glitches generated by upstream logic or off-chip circuitry. On average, the proposed implementation eliminates 87% of the glitching, which reduces overall FPGA power by 17%. The added circuitry increases the overall FPGA area by 6% and critical-path delay by less than 1%. Furthermore, since it is applied after routing, the proposed technique requires little or no modifications to the routing architecture or computer-aided design (CAD) flow. 相似文献

15.

SRAM-Based FPGAs: Testing the Embedded RAM Modules

M. Renovell J.M. Portal J. Figueras Y. Zorian 《Journal of Electronic Testing》1999,14(1-2):159-167

This paper addresses the problem of testing the RAM mode of the LUT/RAM modules of configurable SRAM-based Field Programmable Gate Arrays (FPGAs) using a minimum number of test configurations. A model of architecture for the LUT/RAM module with N inputs and 2N memory cells is proposed taking into account the LUT and RAM modes. Targeting the RAM mode, we demonstrate that a unique test configuration is required for a single module. The problem is shown equivalent to the test of a classical SRAM circuit allowing to use existing algorithms such as the March tests. We also propose a unique test configuration called pseudo shift register for an m × m array of modules. In the proposed configuration, the circuit operates as a shift register and an adapted version of the MATS++ algorithm called shifted MATS++ is described. 相似文献

16.

Evaluating logic functionality of cascaded fracturable LUTs

下载免费PDF全文

GUO Zhenhong LIN Yu LI Tianyi JIA Rui GAO Tongqiang YANG Haigang 《太赫兹科学与电子信息学报》2016,14(3):474-480

Look Up Tables(LUTs) are the key components of Field-Programmable Gate Arrays(FPGAs). Many LUT architectures have been studied; nevertheless, it is difficult to quantificationally evaluate an LUT based architecture. Traditionally, dedicated efforts on specific modifications to the technology mapping tools are required for LUT architecture evaluation. A more feasible evaluation method for logic functionality is strongly required for the design of LUT architecture. In this paper, a mathematical method for logic functionality calculation is proposed and conventional and fracturable LUT architectures are analyzed. Furthermore, a cascaded fracturable LUT architecture is presented, which achieves twice logic functionality compared with the conventional LUTs and fracturable LUTs. 相似文献

17.

一种SRAM型FPGA互连资源的位流码配置方法

下载免费PDF全文

李智华黄娟李威杨立群黄俊英杨海钢《太赫兹科学与电子信息学报》2016,14(1):136-142

针对静态随机存取存储器(SRAM)型现场可编程门阵列(FPGA)位流码配置问题,提出一种自动配置互连资源的方法。该方法从描述FPGA结构的行为级Verilog文件中,采用基于端口映射的记忆FPGA配置模型搜索(MCMS)算法自动提取互连资源的配置位模型,然后结合布线结果生成布线路径上互连资源的位流码。实验结果表明,对于包含30 Mb配置位的3 000万门SRAM型同质FPGA,采用人工方法提取互连资源配置位模型需要6天时间,而采用端口映射MCMS算法仅需要29分钟,效率提高了298倍;对于同等规模的异质FPGA,采用人工方法需要7天时间,而采用端口映射MCMS算法仅需26分钟,效率提高了394倍。该算法作为一种通用的互连资源配置位模型提取方法,可以应用于不同的FPGA芯片。在缩短位流码配置时间的同时,提高位流码配置的准确性。相似文献

18.

Efficient Parallelization of Polyphase Arbitrary Resampling FIR Filters for High-Speed Applications

Hannes Ramon Haolin Li Piet Demeester Johan Bauwelinck Guy Torfs 《Journal of Signal Processing Systems》2018,90(3):295-303

This article describes a method for increasing the sampling rate of efficient polyphase arbitrary resampling FIR filters. An FPGA proof of concept prototype of this architecture has been implemented in a Xilinx Kintex-7 FPGA which is able to convert the sampling rate of a signal from 500 MHz to 600 MHz. This article compares this new architecture with other best known efficient resampling architectures implemented on the same FPGA. The area usage on the FPGA shows that our proposed implementation is very proficient in high bandwidth applications without requiring significantly more resources on the FPGA. A theoretical calculation of the resampling error introduced on a modulated data stream is provided to evaluate the new architecture against other existing resampling architectures. 相似文献

19.

Architecture level optimization of 3-dimensional tree-based FPGA

Vinod Pangracious Emna Amouri Zied Marakchi Habib Mehrez 《Microelectronics Journal》2014

We describe a methodology to design and optimize Three-dimensional (3D) Tree-based FPGA by introducing a break-point at particular tree level interconnect to optimize the speed, area, and power consumption. The ability of the design flow to decide a horizontal or vertical network break-point based on design specifications is a defining feature of our design methodology. The vertical partitioning is organized in such a way to balance the placement of logic blocks and switch blocks into multiple tiers while the horizontal partitioning optimizes the interconnect delay by segregating the logic blocks and programmable interconnect resources into multiple tiers to build a 3D stacked Tree-based FPGA. We finally evaluate the effect of Look-Up-Table (LUT) size, cluster size, speed, area and power consumption of the proposed 3D Tree-based FPGA using our home grown experimental flow and show that the horizontal partitioned 3D stacked Tree-based FPGA with LUT and cluster sizes equal to 4 has the best area-delay product to design and manufacture 3D Tree-based FPGA. 相似文献

20.

Automatic Design of Reconfigurable Domain-Specific Flexible Cores

Compton K. Hauck S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(5):493-503

Reconfigurable hardware is ideal for use in systems-on-a-chip (SoC), as it provides both hardware-level performance and post-fabrication flexibility. However, any one architecture is rarely equally optimized for all applications. SoCs targeting a specific set of applications can greatly benefit from incorporating customized reconfigurable logic instead of generic field-programmable gate-array (FPGA) logic. Unfortunately, manually designing a domain-specific architecture for every SoC would require significant design time. Instead, this paper discusses our initial efforts towards creating a reconfigurable hardware generator capable of automatically creating flexible, yet domain-specific, designs. Our tests indicate that our generated architectures are more than 5times smaller than equivalent FPGA implementations and nearly as area-efficient as standard cell designs. We also use a novel technique employing synthetic circuit generation to demonstrate the flexibility of our architecture generation techniques. 相似文献