期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

高海霞杨银堂董刚《半导体学报》2005,26(5):893-898

在分析隔离岛式FPGA结构的基础上,提出了基于LUT的面积和延迟模型,用于分析LUT尺寸对FPGA面积和性能的影响.结果表明利用计算模型得到的最佳LUT尺寸与实验结论一致:4-LUT获得最好的面积有效性,5-LUT获得较好的延迟. 相似文献

2.

The effect of LUT and cluster size on deep-submicron FPGA performance and density 总被引：2，自引：0，他引：2

Ahmed E. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(3):288-298

In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA. 相似文献

3.

Architecture level optimization of 3-dimensional tree-based FPGA

Vinod Pangracious Emna Amouri Zied Marakchi Habib Mehrez 《Microelectronics Journal》2014

We describe a methodology to design and optimize Three-dimensional (3D) Tree-based FPGA by introducing a break-point at particular tree level interconnect to optimize the speed, area, and power consumption. The ability of the design flow to decide a horizontal or vertical network break-point based on design specifications is a defining feature of our design methodology. The vertical partitioning is organized in such a way to balance the placement of logic blocks and switch blocks into multiple tiers while the horizontal partitioning optimizes the interconnect delay by segregating the logic blocks and programmable interconnect resources into multiple tiers to build a 3D stacked Tree-based FPGA. We finally evaluate the effect of Look-Up-Table (LUT) size, cluster size, speed, area and power consumption of the proposed 3D Tree-based FPGA using our home grown experimental flow and show that the horizontal partitioned 3D stacked Tree-based FPGA with LUT and cluster sizes equal to 4 has the best area-delay product to design and manufacture 3D Tree-based FPGA. 相似文献

4.

Circuits and architectures for field programmable gate array with configurable supply voltage

Lin Y. Fei Li Lei He 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(9):1035-1047

Field programmable gate arrays (FPGAs) with supply voltage (Vdd) programmability have been proposed recently to reduce FPGA power, where the Vdd-level can be customized for FPGA circuit elements and unused circuit elements can be power-gated. In this paper, we first design novel Vdd-programmable and Vdd-gateable interconnect switches with minimal number of configuration SRAM cells. We then evaluate Vdd-programmable FPGA architectures using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, our best architecture reduces the minimal energy-delay product by 54.39% with 17% more area and 3% more configuration SRAM cells. Our evaluation results also show that LUT size 4 gives the lowest energy consumption, and LUT size 7 leads to the highest performance, both for all evaluated architectures. 相似文献

5.

具有高资源利用率特征的改进型查找表电路结构与优化方法

高丽江杨海钢李威郝亚男刘长龙石彩霞《电子与信息学报》2019,41(10):2382-2388

该文着重研究了FPGA芯片中核心模块基本可编程逻辑单元(BLE)的电路结构与优化设计方法,针对传统4输入查找表(LUT)进行逻辑操作和算术运算时资源利用率低的问题,提出一种融合多路选择器的改进型LUT结构,该结构具有更高面积利用率;同时提出一种对映射后网表进行统计的评估优化方法,可以对综合映射后网表进行重新组合,通过预装箱产生优化后网表;最后,对所提结构进行了实验评估和验证。结果表明:与Intel公司Stratix系列FPGA相比,采用该文所提优化结构,在MCNC电路集和VTR电路集下,资源利用率平均分别提高了10.428% 和 10.433%,有效提升了FPGA的逻辑效能。相似文献

6.

Exploration and optimization of a homogeneous tree-based application specific inflexible FPGA

Umer Farooq Husain Parvez Habib Mehrez Zied Marrakchi 《Microelectronics Journal》2013

An Application Specific Inflexible FPGA (ASIF) is a modified form of an FPGA which is designed for a predefined set of applications that operate at mutually exclusive times. An ASIF is a compromise between FPGAs and Application Specific Integrated Circuits (ASICs). Compared to an FPGA, an ASIF has reduced flexibility and improved density while compared to an ASIC, it has larger area but improved flexibility. This work presents a new homogeneous tree-based ASIF and uses a set of 16 MCNC benchmarks for experimentation. Experimental results show that, on average, a homogeneous tree-based ASIF gives 64% area gain when compared to an equivalent tree-based FPGA. Further, the experiments are performed to explore the effect of look-up table (LUT) and arity size on a tree-based ASIF. Later, comparison between tree and mesh-based ASIF is performed and results show that tree-based ASIF is 12% smaller in terms of routing area and consumes 77% less wires than mesh-based ASIF. Finally the quality comparison between two ASIFs reveals that, on average, tree-based ASIF gives 33% area gain as compared to mesh-based ASIF. 相似文献

7.

ALTO: an iterative area/performance tradeoff algorithm forLUT-based FPGA technology mapping

Juinn-Dar Huang Jing-Yang Jou Wen-Zen Shen 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(4):392-400

In this paper, we propose an iterative area/performance tradeoff algorithm for look-up table (LUT)-based field programmable gate array (FPGA) technology mapping. First, it finds an area-optimized, performance-considered initial network by a modified area optimization technique. Then, an iterative algorithm consisting of several resynthesizing techniques is applied to trade the area for the performance in the network gracefully. Experimental results show that this approach can efficiently provide a complete set of mapping solutions from the area-optimized one to the performance-optimized one for the given design. Furthermore, these two extreme solutions produced by our algorithm outperform the results provided by most existing algorithms. Therefore, our algorithm is very useful for the timing-driven, LUT-based FPGA synthesis 相似文献

8.

Design of a memristor-based look-up table (LUT) for low-energy operation of FPGAs

《Integration, the VLSI Journal》2016

This paper presents a scheme for designing a memristor-based look-up table (LUT) in which the memristors are connected in rows and columns. As the columns are isolated, the states of the unselected memristors in the proposed scheme are not affected by the WRITE/READ operations; therefore, the prevalent problems associated with nanocrossbars (such as the write half-select and the sneak path currents) are not encountered. Extensive simulation results of the proposed scheme are presented with respect to the WRITE and READ operations; its performance is compared with previous LUT schemes using memristors as well as SRAMs. It is shown that the proposed scheme is significantly better in terms of WRITE time and energy dissipation for both memory operations (i.e. WRITE and READ); moreover it is shown that the READ delay is nearly independent of the LUT dimension. Simulation using benchmark circuits for FPGA implementation show that the proposed LUT offers significant improvements also at this level. 相似文献

9.

基于FPGA分布式算法的滤波器设计

崔永强高晓丁贺素馨《现代电子技术》2010,33(16):117-119

设计了FPGA的分布式算法结构和具体的硬件环境。基于FPGA的分布式算法充分利用FPGA的并行处理特性设计算法,简化了滤波器系统设计。采用了分割查找表技术,节省了FPGA硬件资源。对查找表（LUT）中内容经过相应的修改即可方便地实现低通、高通、带通滤波。对基于FPGA分布式算法的滤波器进行了仿真及工况环境下的测试实验。实验结果表明,该算法不仅提高了系统运行速度,而且节省了大量的FPGA资源,还具有极大的灵活性。相似文献

10.

On area/depth trade-off in LUT-based FPGA technology mapping

Cong J. Yuzheng Ding 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(2):137-148

In this paper, we study the area and depth trade-off in lookup-table (LUT) based FPGA technology mapping. Starting from a depth-optimal mapping solution, we perform a sequence of depth relaxation operations and area-minimizing mapping procedures to produce a set of mapping solutions for a given design with smooth area and depth trade-off. As the core of the area minimization step, we have developed a polynomial time optimal algorithm for computing an area-minimum mapping solution without node duplication for a K-bounded general Boolean network, which makes a significant step towards complete understanding of the general area minimization problem in FPGA technology mapping. The experimental results on MCNC benchmark circuits show that our solution sets outperform the solutions produced by most existing mapping algorithms in terms of both area and depth minimization 相似文献

11.

FPGA高性能查找表的设计与实现

张惠国唐玉兰于宗光陶宇峰《固体电子学研究与进展》2009,29(4)

从电路角度探讨了查找表(LUT)实现原理,基于双相不交叠时钟,设计实现了一种LUT,能高效地完成移位寄存器与RAM的功能扩展。基于SMIC0.25μmCMOS工艺优化设计了对应的版图,给出了相应的HSPICE仿真结果。此电路结构增强了逻辑块的性能,提高了FPGA的整体效率与灵活性,已被应用于FPGA的设计中。相似文献

12.

Tree-structured method for LUT inverse halftoning and for imagehalftoning

Mese M. Vaidyanathan P.P. 《IEEE transactions on image processing》2002,11(6):644-655

The authors previously proposed a look up table (LUT) based method for inverse halftoning of images. The LUT for inverse halftoning is obtained from the histogram gathered from a few sample halftone images and corresponding original images. Many of the entries in the LUT are unused because the corresponding binary patterns hardly occur in commonly encountered halftones. These are called nonexistent patterns. In this paper, we propose a tree structure which will reduce the storage requirements of an LUT by avoiding nonexistent patterns. We demonstrate the performance on error diffused images and ordered dither images. Then, we introduce LUT based halftoning and tree-structured LUT (TLUT) halftoning. Even though the TLUT method is more complex than LUT halftoning, it produces better halftones and requires much less storage than LUT halftoning. We demonstrate how the error diffusion characteristics can be achieved with this method. Afterwards, our algorithm is trained on halftones obtained by direct binary search (DBS). The complexity of TLUT halftoning is higher than the error diffusion algorithm but much lower than the DBS algorithm. Also, the halftone quality of TLUT halftoning increases if the size of the TLUT gets bigger. Thus, the halftone image quality between error diffusion and DBS will be achieved depending on the size of the tree-structure in the TLUT algorithm 相似文献

13.

基于与非锥的新型FPGA逻辑簇互连结构研究

黄志洪杨海钢杨立群李威江政泓林郁《电子与信息学报》2015,37(12):3030-3040

该文针对新型FPGA可编程逻辑单元与非锥(And-Inverter Cone, AIC)的结构特性,提出一系列方案以得到优化的逻辑簇互连结构,包括:移除输出级交叉矩阵,单级反相交叉矩阵,低负载电路优化,将反馈和输出选择功能分开,限制AIC输出级数的基础上移除中间级交叉矩阵,与LUT架构进行混合等。通过大量的实验,得出针对面积延时积最优的AIC簇互连结构,与Altera公司的FPGA芯片Stratix-IV结构相比,该结构逻辑功能簇本身面积减小9.06%, MCNC应用电路集在基于优化的AIC FPGA架构上实现的平均面积延时积减小40.82%, VTR应用电路集平均面积延时积减小17.38%;与原有的AIC结构相比,簇面积减小23.16%, MCNC应用电路集平均面积延时减小27.15%, VTR应用电路集平均面积延时积减小15.26%。相似文献

14.

Optimization of EDGE terminal power amplifiers using memoryless digital predistortion 总被引：1，自引：0，他引：1

Ceylan N. Mueller J.-E. Weigel R. 《Microwave Theory and Techniques》2005,53(2):515-522

This paper describes a lookup-table (LUT)-based digital predistortion system usable for enhanced data for global system for mobile evolution (EDGE) handset transmitters. The system is memoryless and capable of improving average efficiency and performance in terms of the leakage power at offset frequencies and error vector magnitude. The obtainable efficiency at maximum linear output power is comparable, but at backoffs superior to commercial EDGE power amplifiers (PAs). Minimum system requirements on word length and LUT size have been investigated, which shows that a LUT having approximately 500 coefficients and a system word length of 13 bits are sufficient for EDGE. The proposed system is simple compared to basestation implementations comprising PA memory compensation and can be easily implemented in handsets in order to improve the overall system performance. The effects of antenna mismatch on system performance have been investigated 相似文献

15.

一种基于与非锥簇架构FPGA输入交叉互连设计优化方法

黄志洪李威杨立群江政泓魏星林郁杨海钢《电子与信息学报》2016,38(9):2397-2404

该文针对与非锥(And-Inverter Cone, AIC)簇架构FPGA开发中面临的簇面积过大的瓶颈问题,对其输入交叉互连设计优化进行深入研究,在评估优化流程层次,首次创新性提出装箱网表统计法对AIC簇输入和反馈资源占用情况进行分析,为设计及优化输入交叉互连结构提供指导,以更高效获得优化参数。针对输入交叉互连模块,在结构参数设计层次,首次提出将引脚输入和输出反馈连通率分离独立设计,并通过大量的实验,获得最优连通率组合。在电路设计实现层次,有效利用AIC逻辑锥电路结构特点,首次提出双相输入交叉互连电路实现。相比于已有的AIC簇结构,通过该文提出的优化方法所得的AIC簇自身面积可减小21.21%,面积制约问题得到了明显改善。在实现MCNC和VTR应用电路集时,与Altera公司的FPGA芯片Stratix IV(LUT架构)相比,采用具有该文所设计的输入交叉互连结构的AIC架构FPGA,平均面积延时积分别减小了48.49%和26.29%;与传统AIC架构FPGA相比,平均面积延时积分别减小了28.48%和28.37%,显著提升了FPGA的整体性能。相似文献

16.

Sharing of SRAM Tables Among NPN-Equivalent LUTs in SRAM-Based FPGAs

Meyer J. Kocan F. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(2):182-195

This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3% 相似文献

17.

Stateful-NOR based reconfigurable architecture for logic implementation

《Microelectronics Journal》2015,46(6):551-562

Most commercial Field Programmable Gate Arrays (FPGAs) have limitations in terms of density, speed, configuration overhead and power consumption mostly due to the use of SRAM cells in Look-Up Tables (LUTs), configuration memory and programmable interconnects. Also, hardwired Application Specific Integrated Circuit (ASIC) blocks designed for high performance arithmetic circuits in FPGA reduce the area available for reconfiguration. In this paper, we propose a novel generalized hybrid CMOS-memristor based architecture using stateful-NOR gates as basic building blocks for implementation of logic functions. These logic functions are implemented on memristor nanocrossbar layers, while the CMOS layer is used for selection and connection of memristors. The proposed pipelined architecture combines the features of ASIC, FPGA and microprocessor based designs. It has high density due to the use of nanocrossbar layer and high throughput especially for arithmetic circuits. The proposed architecture for three input one output logic block is compared with conventional LUT based Configurable Logic Block (CLB) having the same number of inputs and outputs; which shows 1.82×area saving, 1.57×speedup and 3.63×less power consumption. The automation algorithm to implement any logic function using proposed architecture is also presented. 相似文献

18.

基于FPGA的FIR滤波器设计与仿真

宋承文魏选平刘浩淼《电子技术》2011,38(4):49-51

FIR数字滤波器以其良好的线性相位特性被广泛使用,属于数字信号处理的基本模块之一.FPGA具有的灵活的可编程逻辑可以方便地实现高速数字信号处理.为了提高实时数字信号处理的速度,利用FPGA芯片内部的ROM实现一种查找表结构的FIR数字滤波器.并用MATLAB对实验结果进行仿真和分析,证明了设计的可行性. 相似文献

19.

一种新型的低泄漏功耗FPGAs查找表

杨松王宏杨志家《微电子学与计算机》2007,24(3):27-29,33

提出了一种新型的低泄漏功耗FPGAs查找表（Look-up Tables，LUTs）结构。这种结构的LUTs可以工作在三种不同的模式：高速工作模式、省电模式以及睡眠模式。在高速工作模式时。此LUTs具有与传统的LUTs类似的性能和功耗。在省电模式下。通过牺牲电路的速度来实现降低功耗的目的，泄漏功耗与高速工作模式相比可以减小约68％-73％。而在睡眠模式下，泄渭功耗更是可以减小95％以上。相似文献

20.

基于LUT 的SRAM2FPGA 结构研究

下载免费PDF全文

马群刚杨银堂李跃进高海霞《电子器件》2003,26(1)

作为微电子工业中发展最迅速的一个领域,现场可编程门阵列(FPGA) 的内部结构设计越来越受到业内人士的关注。为此针对目前普遍采用的基于查找表(LUT) 的SRAM2FPGA ,着重研究了其逻辑模块设计、布线结构设计和输入输出模块设计,同时也对此类FPGA 基本结构进行了优化设计。相似文献