期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluation of performance and architectural efficiency of FPGAs and GPUs in the 40 and 28 nm generations for algorithms in 3D ultrasound computer tomography

Matthias Birk Matthias Balzer Nicole V. Ruiter Juergen Becker 《Computers & Electrical Engineering》2014

In heterogeneous computing, application developers have to identify the best-suited target platform from a variety of alternatives. In this work, we compare performance and architectural efficiency of Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) for two algorithms taken from a novel medical imaging method named 3D ultrasound computer tomography. From the 40 nm and 28 nm generations, we use top-notch devices and those with similar power consumption values. For our two benchmark algorithms from the signal processing and imaging domain, the results show that if power consumption is not considered, the GPU and FPGA from the 40nm generation give both, a similar performance and efficiency per transistor. In the 28 nm process, in contrast, the FPGA is superior to its GPU counterpart by 86% and 39%, depending on the algorithm. If power is limited, FPGAs outperform GPUs in each investigated case by at least a factor of four. 相似文献

2.

Xilinx XC4000系列FPGA自动测试平台搭建

刘肄倬杨志家王宏《微计算机信息》2010,(11)

随着FPGA的发展,FPGA测试技术也得到了相应的发展。因为FPGA的结构和传统专用集成电路(ASIC)有着本质的区别,在FPGA中不能形成可测性设计电路,但它的可编程能力决定了其测试电路可以通过编程的方法来实现。本文讨论了Xilinx XC4000系列FPGA中CLB资源和互连资源的自动测试方法。而且提出了一种新的测试资源坐标定位方法,使得由软件仿真向器件真实测试取得了突破。并搭建了硬件测试平台。相似文献

3.

Minimizing FPGA interconnect delays

Brown S. Khellah M. Vranesic N. 《Design & Test of Computers, IEEE》1996,13(4):16-23

Optimizing FPGA routing architectures for speed performance also involves improving the CAD tools for mapping circuits. We provide a detailed example of how to design FPGA architectures by examining several important issues associated with interconnect resources for FPGAs that use SRAM programming technology. Our experiments examine two important metrics: the speed performance of implemented circuits and the effective use of available interconnect resources. The goal is to improve upon FPGA speed performance by decreasing delays associated with the interconnect. Our results are most directly applicable to FPGA architectures similar in style to the Xilinx XC4000 series. However, some significant results are of a more general nature and perhaps applicable to other styles of FPGAs as well. In addition to routing architectures, we address the CAD tools that allocate these routing resources to implement circuits 相似文献

4.

Soft error susceptibility analysis methodology of HLS designs in SRAM-based FPGAs

《Microprocessors and Microsystems》2017

SRAM-based FPGAs are attractive to critical applications due to their reconfiguration capability, which allows the design to be adapted on the field under different upset rate environments. High level Synthesis (HLS) is a powerful method to explore different design architectures in FPGAs. In this paper, the HLS tool from Xilinx is used to generate different design architectures and then analyze the probability of errors in those architectures. Two different case studies scenarios are investigated. First, it is evaluated the influence of control flow and pipeline architectures combined with the use of specialized DSP blocks in the FPGA. The number of errors classified as silent data corruption and timeout according to the architectures and DSP blocks usage is analyzed. Moreover, more possibilities of HLS designs are explored such as data organization, aggressive pipeline insertion and the implementation of the algorithm in a soft processor like the Microblaze from Xilinx. These architectures are strongly optimized in performance and the least susceptible design under soft errors is investigated. All case-study designs are evaluated in a 28 nm SRAM-based FPGA under fault injection. The dynamic cross section, soft error rate and mean work between failures are calculated based on the experimental results. The proposed characterization method can be used to guide designers to select better architectures concerning the susceptibility to upsets and performance efficiency. 相似文献

5.

Performance evaluation and optimal design for FPGA-based digit-serial DSP functions

Hanho LeeAuthor Vitae Gerald E. SobelmanAuthor Vitae 《Computers & Electrical Engineering》2003,29(2):357-377

As field programmable gate array (FPGA) technology has steadily improved, FPGAs are now viable alternatives to other technology implementations for high-speed classes of digital signal processing (DSP) applications. Digit-serial DSP architectures have been effective implementation method for FPGAs. In this work, a method of implementing digit-serial DSP architectures on FPGAs is presented, and their performance is evaluated with the objective of finding and developing the most efficient digit-serial DSP architectures on FPGAs. This paper discusses area costs and operational delays of the various digit-serial DSP functions and presents the area/delay models on Xilinx XC4000-series FPGAs. These area/delay models can make predictions of performance and hardware resource utilization before a lengthy layout and synthesis process is undertaken. The results show that the area/delay models proposed here are valid and the digit-serial DSP designs are promising candidates for efficient FPGA implementations. 相似文献

6.

Reconfiguring one-time programmable FPGAs

《Micro, IEEE》1999,19(6):53-63

Field-programmable gate arrays can suffer from a variety of faults, ranging from wire anomalies and defects to inoperative programmable connections. The solution to these faults depends on whether or not we are dealing with a reprogrammable FPGA or a one time programmable (OTP) FPGA. To correct faults, developers can reconfigure FPGAs such as those made by Xilinx and Altera by reprogramming. These devices can be programmed many times, for different designs and applications. Correcting faults in OTP FPGAs, such as those made by Actel is more difficult. For one thing, OTP FPGAs are based on antifuses. With an antifuse, the FPGAs configuration information has an initial (default) value that can be changed, but once changed cannot be restored. Therefore, the procedures to bypass faulty cells or faulty routing in an OTP FPGA must meet more stringent requirements than for reprogrammable FPGAs. The “Reconfiguration Approaches” sidebar describes two methods other researchers have tried. This article describes our approach to reconfiguring OTP FPGAs. We explain how we determine if reconfiguration is feasible, the algorithms we used, and the results of our experiments on a generic OTP FPGA model and a generic detail router 相似文献

7.

Managing Security in FPGA-Based Embedded Systems

Huffmire Ted Brotherton Brett Sherwood Timothy Kastner Ryan Levin Timothy Nguyen Thuy D. Irvine Cynthia 《Design & Test of Computers, IEEE》2008,25(6):590-598

FPGAs combine the programmability of processors with the performance of custom hardware. As they become more common in critical embedded systems, new techniques are necessary to manage security in FPGA designs. This article discusses FPGA security problems and current research on reconfigurable devices and security, and presents security primitives and a component architecture for building highly secure systems on FPGAs. 相似文献

8.

FPGA implementation of image processing technique for blood samples characterization

Telnaz Zarifi Mahsa Malek 《Computers & Electrical Engineering》2014

This work presents a hardware implementation of an image processing algorithm for blood type determination. The image processing technique proposed in this paper uses the appearance of agglutination to determine blood type by detecting edges and contrast within the agglutinated sample. An FPGA implementation and parallel processing algorithms are used in conjugation with image processing techniques to make this system reliable for the characterization of large numbers of blood samples. The program was developed using Matlab software then transferred and implemented on a Vertex 6 FPGA from Xilinx employing ISE software. Hardware implementation of the proposed algorithm on FPGA demonstrates a power consumption of 770 mW from a 2.5 V power supply. Blood type characterization using our FPGA implementation requires only 6.6 s, while a desktop computer-based algorithm with Matlab implementation on a Pentium 4 processor with a 3 GHz clock takes 90 s. The presented device is faster, more portable, less expensive, and consumes less power than conventional instruments. The proposed hardware solution achieved accuracy of 99.5% when tested with over 500 different blood samples. 相似文献

9.

Implementation of Digital Electronic Arithmetics and its application in image processing

Khader Mohammad Sos Agaian Fred Hudson 《Computers & Electrical Engineering》2010,36(3):424-434

In this paper we introduce new algorithm implementations of a new parametric image processing framework that will accurately process images and speed up computation for addition, subtraction, and multiplication. Its potential applications include computer graphics, digital signal processing and other multimedia applications. This Parameterized Digital Electronic Arithmetic (PDEA) model replaces linear operations with non-linear ones. The implementation of a parameterized model is presented. We also present the design of arithmetic circuits including parallel counters, adders and multipliers based in two high performance threshold logic gate implementations that we have developed. We will also explore new microprocessor architectures to take advantage of arithmetic. The experiments executed have shown that the algorithm provides faster and better enhancements from those described in the literature. The FPGA chips used is Spartan 3E from Xilinix. The critical length in the circuit implemented on the FPGA had the minimum period for the proposed subsystem is 10.209 ns (maximum frequency 97.957 MHz). Maximum power consumed is 2.4 mW using 32 nm process and we used parallelism and reuse of the Hardware components to accomplish and speed up the process. 相似文献

10.

High performance hardware support for elliptic curve cryptography over general prime field

《Microprocessors and Microsystems》2017

Secure information exchange in resource constrained devices can be accomplished efficiently through elliptic curve cryptography (ECC). Due to the high computational complexity of ECC arithmetic, a high performance dedicated hardware architecture is essential to provide sufficient performance in a computation of elliptic curve scalar multiplication. This paper presents a high performance hardware support for elliptic curve cryptography over a prime field GF(p). It exploited a best available possible parallelism of elliptic curve points in projective representation. The proposed hardware for ECC is implemented on Xilinx Virtex-4, Virtex-5 and Virtex-6 FPGAs. A 256-bit scalar multiplication is completed in 2.01 ms, 2.62 ms and 3.91 ms on Virtex-6, Virtex-5 and Virtex-4 FPGA platforms, respectively. The results show that the proposed design is 1.96 times faster with insignificant increase in area consumption as compared to the other reported designs. Therefore, it is a good choice to be used in many ECC based schemes. 相似文献

11.

A novel BRAM content accessing and processing method based on FPGA configuration bitstream

《Microprocessors and Microsystems》2017

This paper presents a new approach to manage data content of memories implemented in FPGAs through the configuration bitstream. The proposed approach is able to read and write the data content from Block RAMs (BRAMs) in FPGA based designs by reading and processing the information stored in the bitstream. Thanks to this method it is possible to extract, load, copy or compare the information of BRAMs without neither resource overhead nor performance penalty in the design. It can also be applied to existing designs without the need of re-synthesizing. Due to its advantages it becomes an interesting tool to carry out several applications, such as error detection and recovery or fault injection. It also opens the doors to the design of cutting-edge applications. The approach has been implemented in a Xilinx ZYNQ System-on-Chip (SoC) device, which combines an FPGA and an ARM9 microprocessor. The access to the configuration bitstream has been performed using the ZYNQ’s Processor Configuration Access Port (PCAP). Nevertheless, the flow presented in this article can be adapted to devices from other Xilinx families or vendors. The proposed approach has been fully tested and compared with specifically designed memory controllers. The results obtained in the experimental tests confirm that the proposed approach works properly without increasing the resource overhead but at a penalty in terms of processing time. 相似文献

12.

Analog and digital FPGA implementation of BRIN for optimization problems.

H S Ng K P Lam 《Neural Networks, IEEE Transactions on》2003,14(5):1413-1425

The binary relation inference network (BRIN) shows promise in obtaining the global optimal solution for optimization problem, which is time independent of the problem size. However, the realization of this method is dependent on the implementation platforms. We studied analog and digital FPGA implementation platforms. Analog implementation of BRIN for two different directed graph problems is studied. As transitive closure problems can transform to a special case of shortest path problems or a special case of maximum spanning tree problems, two different forms of BRIN are discussed. Their circuits using common analog integrated circuits are investigated. The BRIN solution for critical path problems is expressed and is implemented using the separated building block circuit and the combined building block circuit. As these circuits are different, the response time of these networks will be different. The advancement of field programmable gate arrays (FPGAs) in recent years, allowing millions of gates on a single chip and accompanying with high-level design tools, has allowed the implementation of very complex networks. With this exemption on manual circuit construction and availability of efficient design platform, the BRIN architecture could be built in a much more efficient way. Problems on bandwidth are removed by taking all previous external connections to the inside of the chip. By transforming BRIN to FPGA (Xilinx XC4010XL and XCV800 Virtex), we implement a synchronous network with computations in a finite number of steps. Two case studies are presented, with correct results verified from simulation implementation. Resource consumption on FPGAs is studied showing that Virtex devices are more suitable for the expansion of network in future developments. 相似文献

13.

Fast and standalone Design Space Exploration for High-Level Synthesis under resource constraints

《Journal of Systems Architecture》2014,60(1):79-93

相似文献

14.

Implementation of efficient SR-Latch PUF on FPGA and SoC devices

《Microprocessors and Microsystems》2017

In this paper we present a reliable and efficient SR-Latch based PUF design, with two times improvement in area over the state of the art, thus making it very attractive for low-area designs. This PUF is able to reliably generate a cryptographic key. The PUF response is generated by quantifying the number of oscillations during the metastability state for preselected latches. The derived design has been verified on 25 Xilinx Spartan-6 FPGAs (XC6SLX16) and 10 Xilinx Zynq SoC (XC7Z010) devices. The design exhibited ∼49% uniqueness figures when tested on both types of FPGAs. The reliability figures were >94% for temperature variation (0–85 °C) and ±5% of core voltage variation. 相似文献

15.

A high level FPGA-based abstract machine for image processing

《Journal of Systems Architecture》1999,45(10):809-824

Image processing requires high computational power, plus the ability to experiment with algorithms. Recently, reconfigurable hardware devices in the form of field programmable gate arrays (FPGAs) have been proposed as a way of obtaining high performance at an economical price. At present, however, users must program FPGAs at a very low level and have a detailed knowledge of the architecture of the device being used. They do not therefore facilitate easy development of, or experimentation with, image processing algorithms. To try to reconcile the dual requirements of high performance and ease of development, this paper reports on the design and realisation of an FPGA based image processing machine and its associated high level programming model. This abstract programming model allows an application developer to concentrate on the image processing algorithm in hand rather than on its hardware implementation. The abstract machine is based on a PC host system with a PCI-bus add-on card containing Xilinx XC6200 series FPGA(s). The machine's high level instruction set is based on the operators of image algebra. XC6200 series FPGA configurations have been developed to implement each high level instruction. 相似文献

16.

High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

Ling Zhuo Morris G.R. Prasanna V.K. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(10):1377-1392

Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design trade-offs among the number of adders, buffer size, and latency. We then propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-ll Pro FPGA as the target device, we implemented our designs and present performance and area results. 相似文献

17.

Speed optimal FPGA implementation of the encryption algorithms for telecom applications

《Microprocessors and Microsystems》2020

The last two decades have seen a revolution in telecom technology with the evolution of three wireless mobile communication standards, namely, GPRS to 3G, 3G to 4G, and 4G to 5G. 5G offers faster download speeds and enables high connectivity between devices such as mobile phones, displays, smart homes, and smart cars because of its high reliability and high bandwidths (up to 10 Gbps). However, at the same time, data and personal information are also more susceptible to theft because of the high connectivity. Such threats can be addressed using electronic data encryption using the advanced encryption standard (AES). Because of their reconfigurable and parallel architectures, Field-Programmable Gate Arrays (FPGAs) are getting popular in VLSI design flows to enable the pre-silicon validation of designs faster data rates in real-time. FPGAs also serve as platforms for software development in the pre-silicon environment owing to their faster speeds. The design community is also heavily relying on High-Level Synthesis (HLS) tools in VLSI design flows. HLS platforms enable the new designs to improve the process with sustained authentication between two analytical selections from conventional functional specifications. We propose a high-throughput FPGA implementation based on high-level Synthesis for the AES algorithm. The implementation uses a 128-bit key and is highly suited for telecom applications such as 5G. Researchers have developed and tested the setup and then used the Vivado HLS tool to evaluate various HLS guidelines as per the implementation. The generated Verilog RTL was verified and implemented on Xilinx Kintex 7 and Virtex 6 FPGAs. Since using the same resources, we have seen significant results than existing methods achieved by individual investigators. We have also verified the design for functionality by checking the ciphertext output from our design against a reference design output for the same input plaintext. 相似文献

18.

Optimal utilization of available reconfigurable hardware resources

Kashif Latif Arshad Aziz Athar MahboobAuthor vitae 《Computers & Electrical Engineering》2011,37(6):1043-1057

Field programmable gate arrays (FPGAs) are continuously gaining momentum and becoming essential part of today’s digital systems and applications. The growing use of these devices coupled with increasingly more complex and integrated designs necessitates search for techniques in efficient utilization of their internal resources. Standard HDL coding techniques and synthesis tools implement logic to look up table (LUT) based architecture. The resulting design utilizes more area on the chip and some fast and dedicated areas and resources of the chip remain unutilized. This in turn results in slower clock rates and larger critical path lengths, hence the design remains inefficient in terms of both speed and area. In this paper we present and discuss techniques to effectively utilize the FPGA dedicated resources in order to speed up achievable clock rates and reduce the FPGA area utilization. Various useful HDL constructs are presented that utilize dedicated hardware resources of modern Xilinx FPGAs. Optimization techniques are presented with implementation examples and corresponding quantitative performance evaluation. In most of the cases we have achieved 50% reduction in chip area utilization and simultaneously improved timing results significantly. 相似文献

19.

Virtex-5系列SRAM型FPGA单粒子效应重离子辐照试验技术研究

下载免费PDF全文

赖晓玲郭阳明巨艇朱启贾亮《计算机测量与控制》2024,32(1):304-311

针对SRAM型FPGA在空间辐射环境下易发生单粒子效应,影响星载设备正常工作甚至导致功能中断的问题,开展了SRAM型FPGA单粒子效应地面辐照试验方法研究,提出了配置寄存器和BRAM的单粒子翻转效应测试方法,并以Xilinx公司工业级Virtex-5系列SRAM型FPGA为测试对象,设计了单粒子效应测试系统,开展了重离子辐照试验,获取了配置寄存器、BRAM以及典型用户电路三模冗余前与三模冗余后的单粒子翻转效应试验数据和器件单粒子闩锁试验数据,最后利用在轨预示分析软件针对高轨环境进行了在轨翻转率分析计算,可为该器件的空间应用辐射敏感性分析提供基础数据与加固设计指导。相似文献

20.

Verification of FPGA Layout Generators in Higher-Order Logic

Oliver Pell 《Journal of Automated Reasoning》2006,37(1-2):117-152

相似文献