首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
《Electronics letters》1998,34(7):638-639
A new architecture for configurable blocks is proposed which can be used to construct multipliers. An array of these blocks is capable of being configured to perform any 4m bit×4n bit signed/unsigned binary multiplication. The blocks are designed to be embedded within a conventional FPGA structure to increase the functionality of the device by freeing valuable general reconfigurable resources, particularly when used in the area of image processing  相似文献   

2.
With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. This paper presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. A two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. Fabricated using 130 nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125 MHz clock frequency and 1.2 V power supply. Experiments show 11.6× speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications.  相似文献   

3.
In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing constraints are satisfied for all the applications.  相似文献   

4.
Field Programmable Gate Array (FPGA) is an efficient reconfigurable integrated circuit platform and has become a core signal processing mieroehip device of digital systems over the last decade. With the rapid development of semiconductor technology, the performance and system inte- gration of FPGA devices have been significantly progressed, and at the same time new challenges arise. The design of FPGA architecture is required to evolve to meet these challenges, while also taking advantage of ever increased microchip density. This survey reviews the recent development of advanced FPGA architectures, including improvement of the programming technologies, logic blocks, intercon- nects, and embedded resources. Moreover, some important emerging design issues of FPGA archi- tectures, such as novel memory based FPGAs and 3D FPGAs, are also presented to provide an outlook for future FPGA development.  相似文献   

5.
In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications.  相似文献   

6.
Hou  Junjie  Zhu  Yongxin  Du  Sen  Song  Shijin 《Journal of Signal Processing Systems》2019,91(10):1137-1148

The high performance, power efficiency and reconfigurable characteristic of FPGA attract more and more attention in big data processing. In scientific data analytics, besides the consideration of computing performance, accuracy of the results and dynamic range of data representation are critical features that must be considered. At present, the floating-point IP cores in FPGA design use IEEE standard for floating-point arithmetic – IEEE 754. For FPGA based scientific data application, improving existing floating-point IP cores is a significant way to obtain better results. Posit is a floating-point arithmetic format first proposed by John L. Gustafson in 2017. In posit, the variable precision and efficient representation of exponent contribute a higher accuracy and larger dynamic range than IEEE 754. This work researches on the FPGA implementation of posit arithmetic for extending floating-point IP cores for FPGA based scientific data analytics. We design the logic for hardware implementation and implement it on FPGA. We compare the precision representation, dynamic range and performance of implemented posit FPU (Floating-Point Unit) with IEEE 754 floating-point IP cores. Posit exhibits better superiority in precision representation and dynamic range than IEEE 754, and through further optimization of the implementation, posit can be a good candidate for floating-point IP cores.

  相似文献   

7.
Image processing algorithms for template matching, two-dimensional (2-D) digital filtering, morphologic operations, and motion estimation share some common properties. They can all benefit from using reconfigurable computers that use coprocessor boards based on field-programmable gate array (FPGA) chips. This paper characterizes those applications as generalized template matching (GTM) operations and describes the mapping of the GTM operations onto reconfigurable computers. A three-step approach is described. The first two steps enumerate and prune the design space of basic GTM building blocks, which consist of FPGA buffers and GTM computation cores. The last step is to achieve a solution through an optimal combination of these building blocks where the cost function is the FPGA computation time and the constraints are FPGA coprocessor board resources. Various FPGA buffers are presented so as to introduce design options of basic GTM building blocks. Algorithms used for the mapping are described. Experimental results are summarized to reveal the relationship between the GTM mapping results and FPGA board resource parameters.  相似文献   

8.
With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB.  相似文献   

9.
This paper presents a novel scalable and runtime dynamically reconfigurable FFT architecture for different wireless standards. With only 8 butterfly units, a reconfigurable FFT architecture for three different FFT points is realized using mixed radix-22/23/24 FFT algorithm in a modified Single-path Delay Feedback (SDF) pipelined architecture. Via a proper data flow reconfiguration it can support 64, 128 and 256. It can even be extended up to 8192-point transforms and uses only 13 butterfly units to realize 8192 points. This paper describes the implementation method of 256 and 128 point FFT, which is reconfigured partially from 64 point FFT. The whole system is implemented on a Xilinx XC2VP30 FPGA device. The implementation design addresses area efficiency and flexibility allowing the insertion of the partial modules dynamically to realize various FFT sizes. To verify the efficacy of this dynamic partial reconfigurable FFT design method, a conventional multiplexer based reconfigurable architecture was designed and tested on the same platform. Tested FPGA results for the Dynamic Partial Reconfigurable (DPR) method show the configuration time improvement and good area efficiency as compared to the reconfigurable architecture using conventional multiplexer techniques.  相似文献   

10.
基于动态可重构的FFT处理器的设计与实现   总被引:3,自引:1,他引:2  
提出了一种基于局部动态可重构(DPR)的新型可重构FFT处理器.相比传统的FFT设计,该设计方法在重构时间上得到了很大改进,同时,处理器能够动态地添加或移除重构单元.采用新颖的FFT控制算法,使得可重构部分面积很小.该处理器结构在Xilinx Viirtex2p系列FPGA上进行了综合及后仿真.较之Xilinx IPcore,其运算效率明显提高,而且还实现了IP核所不具备的动态可重构性.  相似文献   

11.
This paper presents a flexible 2/spl times/2 matrix multiplier architecture. The architecture is based on word-width decomposition for flexible but high-speed operation. The elements in the matrices are successively decomposed so that a set of small multipliers and simple adders are used to generate partial results, which are combined to generate the final results. An energy reduction mechanism is incorporated in the architecture to minimize the power dissipation due to unnecessary switching of logic. Two types of decomposition schemes are discussed, which support 2's complement inputs, and its overall functionality is verified and designed with a field-programmable gate array (FPGA). The architecture can be easily extended to a reconfigurable matrix multiplier. We provide results on performance of the proposed architecture from FPGA post-synthesis results. We summarize design factors influencing the overall execution speed and complexity.  相似文献   

12.
In this work, a new technologic strategy that allows implementing large crossbars formed with memFETs, a new device concept, is introduced. This memFET is an electrically reconfigurable field effect and resistive switching device that can be used to implement logic functions and memory blocks into a crossbar structure, allowing the dynamic logic configuration of the crossbar and simplifying both the design and the implementation of computing hardware. Moreover, taking the advantage of reconfiguration capability of such a technology and architecture we introduce a novel technique to design evolvable hardware where not only the logic functions are changeable (as is the case of the Field Programmable Gate Array, FPGA) but also the physical position of the components on the surface of the integrated circuit. This technology and principle leads towards a new computing paradigm based on what we name Shape Shifting Digital Hardware (SSDH).  相似文献   

13.
This work proposes a new FPGA architecture, to meet the requirements of signal processing and testing of current system-on-chip designs. The proposed architecture provides the hardware reuse and the reconfigurability advantages of an FPGA, not only for the system functionality, but also for the system testing, while keeping the performance level required by current signal processing applications. This paper presents the new FPGA model, along with preliminary experimental results that clearly show the possible advantages at the system level of merging design and test in a reconfigurable device.  相似文献   

14.
ABSTRACT

This paper presents a unified reconfigurable coordinate rotation digital computer (CORDIC) processor for floating-point arithmetic. It can be configured to operate in multi-mode to achieve a variety of operations and replaces multiple single-mode CORDIC processors. A reconfigurable pipeline-parallel mixed architecture is proposed to adapt different operations, which maximises the sharing of common hardware circuit and achieves the area-delay efficiency. Compared with previous unified floating-point CORDIC processors, the consumption of hardware resources is greatly reduced. As a proof of concept, we apply it to 16384 × 16384 points target synthetic aperture radar (SAR) imaging system, which is implemented on Xilinx XC7VX690T field programmable gate array platform. The maximum relative error of each phase function between hardware and software computation and the corresponding SAR imaging result can meet the accuracy index requirements.  相似文献   

15.
This paper investigates an architecture designed to implement wide, shallow memories on a field programmable gate array (FPGA). In the proposed architecture, existing configuration memory normally used to control the connectivity pattern of the FPGA is made user accessible. Typically, not all the switch blocks in an FPGA are used to transport signals. By adding only a modest amount of circuitry, the configuration memory in these unused switch blocks (or unused paths within used switch blocks) can be used to implement wide, shallow buffers and other similar memory structures. The size of FPGA required to implement a benchmark circuit that makes use of the wide, shallow memories, is 20% smaller than a standard memory architecture. In addition, the benchmark circuit is on average 40% faster using the proposed architecture.  相似文献   

16.
高吞吐浮点可灵活重构的快速傅里叶变换(FFT)处理器可满足尖端雷达实时成像和高精度科学计算等多种应用需求。与定点FFT相比,浮点运算复杂度更高,使得浮点型FFT的运算吞吐率与其实现面积、功耗之间的矛盾问题尤为突出。鉴于此,为降低运算复杂度,首先将大点数FFT分解成若干个小点数基2k 级联子级实现,提出分别针对128/256/512/1024/2048点FFT的优化混合基算法。同时,结合所提出同时支持单通道单精度和双通道半精度两种浮点模式的新型融合加减与点乘运算单元,首次提出一款高吞吐率双模浮点可变点FFT处理器结构,并在28 nm标准CMOS工艺下进行设计并实现。实验结果表明,单通道单精度和双通道半精度浮点两种模式下的运算吞吐率和输出平均信号量化噪声比分别为3.478 GSample/s, 135 dB和6.957 GSample/s, 60 dB。归一化吞吐率面积比相比于现有其他浮点FFT实现可提高约12倍。  相似文献   

17.
The ever increasing adoption of field programmable devices in various application domains for building complex embedded systems based on FPGA processors along with the reliability issues having emerged for FPGA devices built with the latest nanometer technologies, have raised the need for new fault tolerant techniques in order to improve dependability and extend system lifetime. In addition, the runtime partial reconfiguration technology highly mature in the modern FPGA families along with the availability of unused programmable resources in most FPGA designs provide new and interesting opportunities to build advanced fault tolerance mechanisms. In this paper, we exploit the dynamic reconfiguration potential of today’s FPGA architectures and the advances in the related design support tools and we propose a fault-tolerant approach for FPGA embedded processors based on runtime partial reconfiguration. According to the proposed methodology, the processor core is partitioned into reconfigurable modules and each module is duplicated to implement a concurrent error detection mechanism. Precompiled configurations containing spare resources are generated for each duplicated module and are used to repair at runtime the defective modules. Also, a fault tolerance scheme for the proxy logic of the reconfigurable modules, which cannot move in the alternative configurations along with the rest logic, is proposed. Moreover, a compression method for the alternative partial bitstreams, which significantly reduces the high storage space requirements of the proposed approach, is presented. Two different hardware decompression schemes have been implemented in a Virtex-5 device and compared in terms of area overhead and decompression latency. Furthermore, a thorough examination has been performed, regarding how the percentage of the spare resources and their allocation in the reconfigurable regions affect the compression efficiency and the processor performance. Finally, the proposed approach has been demonstrated in three different components – ALU, multiplier-accumulator, and instruction-fetch unit – of an open-source embedded processor.  相似文献   

18.
Lightweight ciphers are increasingly employed in cryptography because of the high demand for secure data transmission in wireless sensor network, embedded devices, and Internet of Things. The PRESENT algorithm as an ultra-lightweight block cipher provides better solution for secure hardware cryptography with low power consumption and minimum resource. This study generates the key using key rotation and substitution method, which contains key rotation, key switching, and binary-coded decimal-based key generation used in image encryption. The key rotation and substitution-based PRESENT architecture is proposed to increase security level for data stream and randomness in cipher through providing high resistance to attacks. Lookup table is used to design the key scheduling module, thus reducing the area of architecture. Field-programmable gate array (FPGA) performances are evaluated for the proposed and conventional methods. In Virtex 6 device, the proposed key rotation and substitution PRESENT architecture occupied 72 lookup tables, 65 flip flops, and 35 slices which are comparably less to the existing architecture.  相似文献   

19.
陈其聪  顾明剑 《红外》2018,39(7):19-24
随着信号处理算法的发展,人们对航天用现场可编程门阵列(Field Programmable Gate Array, FPGA)提出了算法可更新的需求。而传统的固定算法模式已经无法满足要求,所以星上FPGA在轨可重构设计成为了解决这一问题的关键。提出了一种基于星地链路的FPGA在轨可重构设计方案。通过星地链路上载配置数据并将其存入电可擦除只读存储器(Electrically Erasable Programmable Read Only Memory, EEPROM)内,然后利用反熔丝器件对FPGA进行大规模算法重配置操作。这项设计方案已经通过了相关验证,同时也提升了星载FPGA的灵活性。  相似文献   

20.
张小妍  邵杰 《电子工程师》2009,35(11):24-27
运用流水线技术对单精度浮点乘法和加法运算单元进行了优化设计。浮点加法器采用了改进的双路径结构,重点对移位单元和前导1检测单元的结构进行了优化。浮点乘法器在对被乘数进行Booth编码后,采用改进的4-2压缩器构成Wallace树,在简化逻辑的同时,提高了系统的吞吐率。经过仿真验证,在Virtex-4系列FPGA(现场可编程门阵列)上,浮点加法器的最高运行速率达到405MHz,浮点乘法器的最高运行速率达到429MHz。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号