期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

闫永鹏于海勋《微电子学》2010,40(5)

目前,在基于FPGA的滤波器设计中,仍然不能同时很好地减小面积消耗和传输延时.分布式算法(DA)由于具有节省硬件资源的优势而被广泛应用于FIR滤波器设计,但在某些系统的设计中,资源消耗和传输延迟仍然不能满足要求.针对分布式算法存在的问题,设计了一种基于LUT的改进FIR滤波器,在相同的滤波要求下,面积消耗和传输延迟更低.在QUARTUS Ⅱ上分别对DA算法实现的滤波器及改进LUT结构实现的滤波器进行综合仿真,结果表明,改进LUT结构实现的滤波器节省近15%的面积,而且具有更低的延时. 相似文献

2.

基于DA和专用累加器的高性能DCT结构

郭向东赵峰《微电子学》2006,36(6):830-833

提出了一种基于分布式算法(DA)和专用累加器的高性能DCT结构。该专用累加器由32压缩器、42压缩器、条件和选择器(CSS)和超前进位加法器构成,可以在单周期内实现来自LUT的四个部分积的累加。文章提出的结构以50%的额外硬件资源,实现基于循环累加的传统DA结构8倍的数据处理速度。分析了不同运算精度的条件下,DCT结构在面积和速度上的优化。该DCT结构设计采用TSMC 0.18μm工艺库,其工作频率可达120 MHz,达到每秒480兆像素的处理能力。相似文献

3.

基于DSP+FPGA的高速数字信号处理平台的电源设计

王溦王广君《现代电子技术》2007,30(6):44-47

介绍了一种高速数字信号处理平台的电源设计实现方案,主要是基于FPGA DSP的结构实现高速数字信号处理。该方案采用先进的FPGA,DA转换器和DSP芯片,通过对DSP芯片和FPGA芯片及DA芯片的正确供电和电源监控来实现具有通用性、可扩充性的硬件平台,并对电源设计中的多项关键参数进行分析与阐述。相似文献

4.

并行分布式算法FIR滤波器的FPGA实现

下载免费PDF全文

王一海《电子器件》2012,35(5):545-548

分布式算法(DA)是FPGA中实现FIR滤波器的重要手段。采用基本DA算法实现较高阶数的FIR滤波器时,占用的硬件资源较高,且随着变量的位数增加,其串行运算的特点也使其运行速度不高。为此,运用并行式的分布算法,将原LUT分解为若干较小LUT,并使参加运算的各变量各位组合同时送达查找表。QUARTUSⅡ仿真结果表明,滤波效果良好,资源消耗减少,运行速度显著提高。相似文献

5.

基于FPGA的字符串匹配算法

杜旭邱庆哲黄建《微电子学与计算机》2007,24(3):91-94

在全字节比较法的基础上提出了一种基于FPGA的子字符串LUT重用算法。该算法通过位宽扩展．以及流水线问字符串、LUT共享，用低端FPGA成功解决了高速字符串匹配问题，与传统字符串匹配算法相比．该算法大幅缩小了匹配算法芯片资源的占用率，是一种高效的并行多模式字符串匹配算法。相似文献

6.

基于DA算法的高速高阶FIR滤波器的FPGA实现

孙建明赵刚张迎华《太赫兹科学与电子信息学报》2007,5(6):432-436

针对在数字信号处理中，以专用DSP芯片设计高阶有限长单位冲激响应（FIR）滤波器速度较慢的情况，提出了一种基于分布式算法（DA）和现场可编程门阵列（FPGA）实现高速高阶滤波器的新方法，并以一个16阶FIR滤波器在Xilinx公司的xc2v500芯片上实现为例说明了设计过程，仿真结果表明：电路工作正确可靠，满足设计要求。相似文献

7.

基于改进的DA算法实现高阶FIR滤波器

王哲周前能《现代电子技术》2014,(4):8-12

介绍了几种应用于高阶FIR滤波器的算法,着重介绍DA算法,该算法主要基于LUT形式,从根本上减少了乘法器资源的消耗,提高了系统的实时性。并以此为基础介绍了几种改进的算法,分别是基于LUT分解、基于OBC编码以及基于双向选择器的DA改进算法,这些算法都从不同程度上降低了资源的使用,提高了信号处理速度,降低了系统复杂性。最后,对FIR滤波器的改进算法进行了全面的分析比较和仿真,仿真结果均正确并且符合设计要求。相似文献

8.

DSP基本体系结构和特点

窦海霓朱铭锆《今日电子》2003,(6):32-34

结合TI公司的第五代DSP芯片介绍了数字信号处理技术的基本常识，DSP芯片的基本体系结构，指令系统，并以DSP在FPGA上的应用为例介绍了DSP的基本算法。相似文献

9.

基于FPGA的高速FIR数字滤波器的设计

王心焕《现代电子技术》2007,30(15):184-187

采用了分布式算法、Booth算法、Wallace树和超前进位加法器、进位选择加法器结构,以及流水线技术,基于FPGA进行了高速FIR数字滤波器的设计。以低通FIR数字滤波器为例,利用Matlab辅助滤波器设计并做了频谱特性的验证,在ISE软件上进行了功能仿真、时序仿真和综合,并给出了综合的电路框图、资源使用情况以及最高工作频率。通过运用多种优秀的快速算法及流水线技术,可以打破FPGA中缺乏实现乘累加运算有效结构的缺点,实现高速FIR数字滤波器的设计,使FPGA在数字信号处理方面有长足发展。相似文献

10.

分布式算法在FIR数字滤波器实现中的应用 总被引：2，自引：1，他引：1

LI Mei 王兰勋《通信技术》2008,41(8)

文章提出了一种利用FPGA实现FIR数字滤波器的设计方案,在设计过程中应用了分布式算法(DA).FPGA有着规整的内部逻辑阵列和丰富的连线资源,特别适合于数字信号处理任务.分布式算法(DA)是一项重要的FPGA技术,它使得在FPGA中实现FIR滤波器的关键运算--乘加运算,转化为了查找表,大大提高了FIR滤波器的速度.文中给出了VHDL语言编写的程序和仿真波形. 相似文献

11.

A Library-Based Early Soft Error Sensitivity Analysis Technique for SRAM-Based FPGA Design

C. Thibeault Y. Hariri S. R. Hasan C. Hobeika Y. Savaria Y. Audet F. Z. Tazi 《Journal of Electronic Testing》2013,29(4):457-471

相似文献

12.

A novel and efficient routing architecture for multi-FPGA systems

Khalid M.A.S. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):30-39

Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value 相似文献

13.

Implementation of RNS-Based Distributed Arithmetic Discrete Wavelet Transform Architectures Using Field-Programmable Logic

Javier Ramírez Antonio García Uwe Meyer-Bäse Fred Taylor Antonio Lloris 《The Journal of VLSI Signal Processing》2003,33(1-2):171-190

Currently there are design barriers inhibiting the implementation of high-precision digital signal processing (DSP) objects with field programmable logic (FPL) devices. This paper explores overcoming these barriers by fusing together the popular distributed arithmetic (DA) method with the residue number system (RNS) for use in FPL-centric designs. The new design paradigm is studied in the context of a high-performance filter bank and a discrete wavelet transform (DWT). The proposed design paradigm is facilitated by a new RNS accumulator structure based on a carry save adder (CSA). The reported methodology also introduces a polyphase filter structure that results in a reduced look-up table (LUT) budget. The 2C-DA and RNS-DA are compared, in the context of a FPL implementation strategy, using a discrete wavelet transform (DWT) filter bank as a common design theme. The results show that the RNS-DA, compared to a traditional 2C-DA design, enjoys a performance advantage that increases with precision (wordlength). 相似文献

14.

Sharing of SRAM Tables Among NPN-Equivalent LUTs in SRAM-Based FPGAs

Meyer J. Kocan F. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(2):182-195

This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3% 相似文献

15.

Evaluating logic functionality of cascaded fracturable LUTs

下载免费PDF全文

GUO Zhenhong LIN Yu LI Tianyi JIA Rui GAO Tongqiang YANG Haigang 《太赫兹科学与电子信息学报》2016,14(3):474-480

Look Up Tables(LUTs) are the key components of Field-Programmable Gate Arrays(FPGAs). Many LUT architectures have been studied; nevertheless, it is difficult to quantificationally evaluate an LUT based architecture. Traditionally, dedicated efforts on specific modifications to the technology mapping tools are required for LUT architecture evaluation. A more feasible evaluation method for logic functionality is strongly required for the design of LUT architecture. In this paper, a mathematical method for logic functionality calculation is proposed and conventional and fracturable LUT architectures are analyzed. Furthermore, a cascaded fracturable LUT architecture is presented, which achieves twice logic functionality compared with the conventional LUTs and fracturable LUTs. 相似文献

16.

The effect of LUT and cluster size on deep-submicron FPGA performance and density 总被引：2，自引：0，他引：2

Ahmed E. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(3):288-298

In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA. 相似文献

17.

Circuits and architectures for field programmable gate array with configurable supply voltage

Lin Y. Fei Li Lei He 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(9):1035-1047

Field programmable gate arrays (FPGAs) with supply voltage (Vdd) programmability have been proposed recently to reduce FPGA power, where the Vdd-level can be customized for FPGA circuit elements and unused circuit elements can be power-gated. In this paper, we first design novel Vdd-programmable and Vdd-gateable interconnect switches with minimal number of configuration SRAM cells. We then evaluate Vdd-programmable FPGA architectures using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, our best architecture reduces the minimal energy-delay product by 54.39% with 17% more area and 3% more configuration SRAM cells. Our evaluation results also show that LUT size 4 gives the lowest energy consumption, and LUT size 7 leads to the highest performance, both for all evaluated architectures. 相似文献

18.

Path reuse-aware routing for non-volatile memory based FPGAs

《Integration, the VLSI Journal》2017

Non-volatile memory-based FPGAs (NV-FPGAs) are expected to replace traditional SRAM-based FPGAs to achieve higher scalability and lower power consumption. Yet the slow write performance of NVMs not only challenges FPGA reconfiguration speed and overhead but also constrains the programming cycles of FPGAs. To efficiently configure switch boxes, the majority component of an FPGA, this paper presents a routing path reuse technique. The reconfiguration cost of routing resources is first modeled mathematically and then minimized through a reuse-aware routing algorithm, which is incorporated into the standard VTR CAD tool. Experiments on standard MCNC and Titan benchmarks show that the proposed scheme is able to achieve as much as 58% path reuse rate and reduce as much as 45% configuration cost for routing resources. 相似文献

19.

LMS adaptive filters using distributed arithmetic for high throughput 总被引：1，自引：0，他引：1

Allred D.J. Heejong Yoo Krishnan V. Huang W. Anderson D.V. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(7):1327-1337

We present a new hardware adaptive filter architecture for very high throughput LMS adaptive filters using distributed arithmetic (DA). DA uses bit-serial operations and look-up tables (LUTs) to implement high throughput filters that use only about one cycle per bit of resolution regardless of filter length. However, building adaptive DA filters requires recalculating the LUTs for each adaptation which can negate any performance advantages of DA filtering. By using an auxiliary LUT with special addressing, the efficiency and throughput of DA adaptive filters can be of the same order as fixed DA filters. In this paper, we discuss a new hardware adaptive filter structure for very high throughput LMS adaptive filters. We describe the development of DA adaptive filters and show that practical implementations of DA adaptive filters have very high throughput relative to multiply and accumulate architectures. We also show that DA adaptive filters have a potential area and power consumption advantage over digital signal processing microprocessor architectures. 相似文献

20.

Framework for Digital Filter Design Optimization (DiFiDOT) using MCM Based Register Minimization Retiming for Noise Removal ECG Filters

Deepa Yagain VijayaKrishna A 《Journal of Signal Processing Systems》2016,82(2):197-206

In all the DSP(Digital Signal Processing) blocks such as digital filters, the filter coefficients are known before hand. Hence, full flexibility of the multiplier is not necessary. Multiplierless Multiple Constant Multiplication(MCM) technique can be used along with retiming for better digital filter optimization.This method is more efficient when compared to shift and add multiplications as intermediate results in MCM technique can be shared which reduces the area of multiplierless implementation of digital filters. The multiplierless filter circuit is further retimed to reduce the overall clock period which increases the clock frequency. Critical path and shortest path computations consume most of the time in retiming computation. The retiming minimizes the overall clock period by reducing the filter critical path. In the general purpose processor where actual retiming vectors are computed for digital filters, the speed with which the retiming transformation is performed suffers as the entire transformation code will be written in the form of a soft core. Hence, FPGA based path solver architecture are proposed in this paper can reduces the burden on general purpose processors while retiming. This work contributes to reduced processing time for retiming using FPGA based path solvers. Due to complexity and transistor size reduction, designing of VLSI architectures for DSP blocks has become very challenging. Automated Tools are required most often to introduce the products to market in a timely manner and to make the VLSI designs more stable, reliable and tractable. A framework called DiFiDOT(Digital Filter Design Optimization Tool) is developed in this work for synthesizing the optimized filter architectures. Finally, an application for Electrocardiography(ECG) is designed using MCM based retimed digital filters to remove the power supply interference, baseline drift and the broadband noise from the ECG signal. 相似文献