共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
针对高阶FIR抽取滤波器直接型结构和多相滤波结构中存在乘法器资源使用较多,导致实际系统实现困难的问题,提出了一种适合FPGA实现的高效多相结构。该结构采用分时复用技术,通过提高FPGA工作时钟频率,对降采样后的滤波路数和每一路FIR滤波器中乘积和操作均复用一个乘法器,从而大幅节约了FPGA中乘法器资源的使用。结果表明,针对4 096阶滤波器和降采样率为512的实际抽取滤波器系统,只需要8个乘法器,且在Xilinx公司Virtex IV芯片上能稳定工作在204.8 MHz的时钟频率上。 相似文献
4.
通过分析FPGA可配置逻辑块的细致结构,提出了一种基于FPGA的细粒度映射方法,并使用该方法高效实现了大数模乘脉动阵列.在保持高速计算特点的同时,将模乘脉动阵列的资源消耗降低为原来的三分之一.在低成本的20万门级FPGA器件中即可实现1024位模乘器.该实现每秒可进行20次RSA签名.如果换用高性能FPGA,签名速度更可提高至每秒40次. 相似文献
5.
Low Complexity Reconfigurable DSP Circuit Implementations Based on Common Sub-expression Elimination
A design technique based on a combination of Common Sub-Expression Elimination and Bit-Slice (CSE-BitSlice) arithmetic for
hardware and performance optimization of multiplier designs with variable operands is presented in this paper. The CSE-BitSlice
technique can be extended to hardware optimization of multiplier circuits operating on vectors or matrices of variables. The
CSE-BitSlice technique has been applied to the design and implementation of 12 × 12 and 42 × 42 bit real multipliers, a complex
multiplier, a 6-tap FIR filter, and a 5-point DFT circuit. For comparison purposes, circuit implementations of the same arithmetic
and DSP functions have been carried out using Radix-4 Booth and CSA algorithms. Simulation results based on implementations
using the Xilinx FPGA 5VLX330FF1760-2 device shows that the circuits based on the CSE-BitSlice techniques require fewer logic
resources and yield higher throughput as compared to the CSA and Radix-4 Booth based circuits. 相似文献
6.
7.
Macpherson K.N. Stewart R.W. 《Vision, Image and Signal Processing, IEE Proceedings -》2006,153(6):711-720
A new algorithm that synthesises multiplier blocks with low hardware requirement suitable for implementation as part of full-parallel finite impulse response (FIR) filters is presented. Although the techniques in use are applicable to implementation on application-specific integrated circuit (ASIC) and Structured ASIC technologies, analysis is performed using field programmable gate array (FPGA) hardware. Fully pipelined, full-parallel transposed-form FIR filters with multiplier block were generated using the new and previous algorithms, implemented on an FPGA target and the results compared. Previous research in this field has concentrated on minimising multiplier block adder cost but the results presented here demonstrate that this optimisation goal does not minimise FPGA hardware. Minimising multiplier block logic depth and pipeline registers is shown to have the greatest influence in reducing FPGA area cost. In addition to providing lower area solutions than existing algorithms, comparisons with equivalent filters generated using the distributed arithmetic technique demonstrate further area advantages of the new algorithm 相似文献
8.
9.
Recently, a number of classification techniques have been introduced. However, processing large dataset in a reasonable time has become a major challenge. This made classification task more complex and expensive in calculation. Thus, the need for solutions to overcome these constraints such as field programmable gate arrays (FPGAs). In this paper, we give an overview of the various classification techniques. Then, we present the existing FPGA based implementation of these classification methods. After that, we investigate the confronted challenges and the optimizations strategies. Finally, we highlight the hardware accelerator architectures and tools for hardware design suggested to improve the FPGA implementation of classification methods. 相似文献
11.
Floating-point (FP) multipliers are the main energy consumers in many modern embedded digital signal processing (DSP) and multimedia systems. For lossy applications, minimizing the precision of FP multiplication operations under the acceptable accuracy loss is a well-known approach for reducing the energy consumption of FP multipliers. This paper proposes a multiple-precision FP multiplier to efficiently trade the energy consumption with the output quality. The proposed FP multiplier can perform low-precision multiplication that generates 8?, 14?, 20?, or 26-bit mantissa product through an iterative and truncated modified Booth multiplier. Energy saving for low-precision multiplication is achieved by partially suppressing the computation of mantissa multiplier. In addition, the proposed multiplier allows the bitwidth of mantissa in the multiplicand, multiplier, and output product to be dynamically changed when it performs different FP multiplication operations to further reduce energy consumption. Experimental results show that the proposed multiplier can achieve 59 %, 71 %, 73 %, and 82 % energy saving under 0.1 %, 1 %, 5 %, and 11 % accuracy loss, respectively, for the RGB-to-YUV & YUV-to-RGB conversion when compared to the conventional IEEE single-precision multiplier. In addition, the results also exhibit that the proposed multiplier can obtain more energy reduction than previous multiple-precision iterative FP multipliers. 相似文献
12.
阐述了系统中断扩展的方法,并分析了其优缺点.基于常用的微处理器 FPGA系统构架,提出了一种基于FPGA的外部中断管理扩展方法,可以充分利用FPGA的优点,从而克服传统方法的不足. 相似文献
13.
In this paper, we have developed a new full-adder cell using multiplexing control input techniques (MCIT) for the sum operation and the Shannon-based technique to implement the carry. The proposed adder cell is applied to the design of several 8-bit array multipliers, namely a Braun array multiplier, a CSA multiplier, and Baugh–Wooley multipliers. The multiplier circuits are designed using DSCH2 VLSI CAD tools and their layouts are generated by Microwind 3 VLSI CAD tools. The output parameters such as propagation delay, total chip area, and power dissipation are calculated from the simulated results. We have also calculated energy per instruction (EPI), throughput, latency, signal-to-noise ratio (SNR), and the effect of temperature on the drain current by using the generated layout output parameter of a BSIM 4 advanced analyzer. The simulated results of the proposed adder-based multiplier circuit are compared with a cell multiplier that utilizes a MCIT-based adder, a cell multiplier composed of complementary pass transistor logic-based (CPL) adders and those of other published multipliers circuits. From the analysis of these simulated results, it was found that the proposed multiplier circuit gives better performance in terms of power, propagation delay, latency and throughput than other published results. 相似文献
14.
Sangjin Hong Kyoung-Su Park Jun-Hee Mun 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(4):380-392
This paper presents a flexible 2/spl times/2 matrix multiplier architecture. The architecture is based on word-width decomposition for flexible but high-speed operation. The elements in the matrices are successively decomposed so that a set of small multipliers and simple adders are used to generate partial results, which are combined to generate the final results. An energy reduction mechanism is incorporated in the architecture to minimize the power dissipation due to unnecessary switching of logic. Two types of decomposition schemes are discussed, which support 2's complement inputs, and its overall functionality is verified and designed with a field-programmable gate array (FPGA). The architecture can be easily extended to a reconfigurable matrix multiplier. We provide results on performance of the proposed architecture from FPGA post-synthesis results. We summarize design factors influencing the overall execution speed and complexity. 相似文献
15.
乘法器是科学计算的重要硬件内核。为验证两种结构乘法器(串行、流水线)的功耗差异,分别采用三种功耗分析技术,包括QuartusⅡ自带功耗分析工具PowerPlay Power Analyzer Tool、Altera提供的FPGA估算实际功耗的经验公式,以及Synopsys的综合工具DC,结果表明,在可比较即计算位宽相同、所加工作频率相等的前提下,流水线乘法器的时延仅为串行乘法器的38.5%(因为前者强调了并行),消耗的硬件资源为后者的2.76倍。利用QuartusⅡ自带功耗分析功能得到的结果不明显,而经验公式估算法和DC工具法都得出串行与流水线乘法器功耗之比为0.36。 相似文献
16.
Shuli Gao Dhamin Al-Khalili Noureddine Chabini 《Circuits, Systems, and Signal Processing》2008,27(5):713-731
This paper presents an optimized design approach for two’s complement large size multipliers using smaller size embedded multiplier
blocks available as resources in field programmable gate arrays (FPGAs). The realization is based on the Baugh–Wooley algorithm,
which segments the multiplication into unsigned and signed components. To achieve efficient implementation results, a set
of optimized schemes for the realization of the additions required for the unsigned multipliers and for combining the unsigned
and signed components is proposed. The implementations of the multipliers have been carried out for operands with sizes ranging
from 20 to 128 bits. The designs are synthesized and implemented on Xilinx’s Spartan-3 in the ISE 8.1 design platform and
compared with three other realizations using the following approaches: (1) conventional sign-extension approach, (2) Xilinx’s
IP-Core generator, and (3) sign-and-magnitude-based approach. The experimental results indicate that our proposed method outperforms
the other techniques. 相似文献
17.
Training deep neural networks(DNNs)requires a significant amount of time and resources to obtain acceptable results,which severely limits its deployment in resource-limited platforms.This paper proposes DarkFPGA,a novel customizable framework to efficiently accelerate the entire DNN training on a single FPGA platform.First,we explore batch-level parallelism to enable efficient FPGA-based DNN training.Second,we devise a novel hardware architecture optimised by a batch-oriented data pattern and tiling techniques to effectively exploit parallelism.Moreover,an analytical model is developed to determine the optimal design parameters for the DarkFPGA accelerator with respect to a specific network specification and FPGA resource constraints.Our results show that the accelerator is able to perform about 10 times faster than CPU training and about a third of the energy consumption than GPU training using 8-bit integers for training VGG-like networks on the CIFAR dataset for the Maxeler MAX5 platform. 相似文献
18.
Mojtaba Ebrahimi Abbas Mohammadi Alireza Ejlali Seyed Ghassem Miremadi 《Microelectronics Reliability》2014
By technology down scaling in nowadays digital circuits, their sensitivity to radiation effects increases, making the occurrence of soft errors more probable. As a consequence, soft error rate estimation of complex circuits such as processors is becoming an important issue in safety- and mission-critical applications. Fault injection is a well-known and widely used approach for soft error rate estimation. Development of previous FPGA-based fault injection techniques is very time consuming mainly because they do not adequately exploit supplementary FPGA tools. This paper proposes an easy-to-develop and flexible FPGA-based fault injection technique. This technique utilizes debugging facilities of Altera FPGAs in order to inject single event upset (SEU) and multiple bit upset (MBU) fault models in both flip-flops and memory units. As this technique uses FPGA built-in facilities, it imposes negligible performance and area overheads on the system. The experimental results show that the proposed technique is on average four orders of magnitude faster than a pure simulation-based fault injection. These features make the proposed technique applicable to industrial-scale circuits. 相似文献
19.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(7):783-793
In this paper, techniques for efficient implementation of field-programmable gate-array (FPGA)-based wave-pipelined (WP) multipliers, accumulators, and filters are presented. A comparison of the performance of WP and pipelined systems has been made. Major contributions of this paper are development of an on-chip clock generation scheme which permits finer tuning of the frequency, a synthesis technique that reduces the area and latency by 25%, a placement utility that results in 10%–40% increase in speed and proposal of an interleaving scheme for filters that reduces the number of multipliers required by 50%. WP multipliers of size 2$times$ 6 and the filters using them are found to be 11% faster and require lower power than those using pipelined multipliers. Filters with higher order WP multipliers also operate with lower power at the cost of speed. The delay-register products of such filters are found to be about 60% lower than those using the pipelined multipliers. The paper also outlines applications of these techniques for the Spartan II FPGAs and a self-tuning scheme for optimizing the speed. 相似文献