期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A bitstream readback-based automatic functional test and diagnosis method for Xilinx FPGAs

Aiwu Ruan Bairui JieLi Wan Junhao YangChuanyin Xiang Zujian ZhuYu Wang 《Microelectronics Reliability》2014

In this paper, a novel bitstream readback-based test and diagnosis method including a bitstream parsing algorithm as well as a corresponding bitstream readback-based fault and diagnosis algorithm for Xilinx FPGAs is presented. The proposed method can be applied to both configurable logic block (CLB) and interconnect resource (IR) test. Further, the algorithm is suitable for all Virtex and Spartan series FPGAs. The issues such as fault coverage, diagnostic resolution, I/O numbers, as well as configuration numbers not addressed well by some previous works can be solved or partly relieved. The proposed method is evaluated by testing several Xilinx series FPGAs, and experimental results are provided. 相似文献

2.

A Novel BIST Approach for Testing Input/Output Buffers in SoCs

Lei Chen Zhi-Ping Wen Zhi-Quan Zhang Min Wang 《中国电子科技》2009,7(4):322-325

A novel built-in self-test （BIST） approach to test the configurable input/output buffers in Xilinx Virtex series SoCs （system on a chip） using hard macro has been proposed in this paper. The proposed approach can completely detect single and multiple stuck-at gate-level faults as well as associated routing resources in I/O buffers. The proposed BIST architecture has been implemented and verified on Xilinx Virtex series FPGAs （field programmable gate configurations are required array）. Only total of 10 to completely test the I/O buffers of Virtex devices. 相似文献

3.

Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration

Corbetta S. Morandi M. Novati M. Santambrogio M.D. Sciuto D. Spoletini P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(11):1650-1654

The research described in this paper shows how the runtime relocation of a reconfigurable component can be obtained using a system component that is able to update the bitstream information, moving the reconfigurable module in the desired position. This scenario defines the so-called partial bitstream relocation activity. This paper proposes a relocation filter that can be implemented both as a hardware and a software component. The former is hosted in the static part of the reconfigurable architecture, while the latter is made to be run on the processor placed on the field-programmable gate array (FPGA). The proposed approach has also been validated over different FPGAs, i.e., Virtex II Pro, Virtex 4, and Virtex 5, proposing a runtime relocation support that can be customized to meet all the different constraints associated with these different target architectures. 相似文献

4.

FPGA Based Implementation and Comparison of Beamformers for CDMA2000

Sener Dikmese Adnan Kavak Kerem Kucuk Suhap Sahin Ali Tangel 《Wireless Personal Communications》2011,57(2):233-253

For the integration of smart antennas into third generation code division multiple access (CDMA) base stations, it still remains as a challenging task to implement smart antenna algorithms on programmable processors. In this paper, we study implementations of some CDMA compatible beamforming algorithms, namely least mean square (LMS), constant modulus (CM), and space code correlator (SCC) algorithms, using Xilinx??s Virtex family FPGAs. This study exhibits feasibility of implementing even simple, practical, and computationally small algorithms based on today??s most powerful FPGA technologies. 16 and 32 bits floating point implementations of the algorithms are investigated using both Virtex II and Virtex IV FPGAs. CDMA2000 reverse link baseband signal format is used in the signal modeling. Randomly changing fading and Direction-of-arrivals (DOAs) of multipaths are considered as a channel condition. The implementation results in terms of beamforming accuracy, FPGA resource utilization, weight vector computation time, and DOA estimation error are presented. Beamformer weight vectors using LMS and CM can be computed within less than 20 ??s on Virtex II FPGA and 10 ??s on Virtex IV FPGA, and using SCC it can be achieved within less than 22 ??s on Virtex IV FPGA. These results show that FPGAs provide approximately 500 times faster speed in implementations than our previous work with DSPs. 相似文献

5.

Hybrid Multi-FPGA Board Evaluation by Permitting Limited Multi-Hop Routing

Sushil Chandra Jain Anshul Kumar Shashi Kumar 《Design Automation for Embedded Systems》2003,8(4):309-326

Multi-FPGA Boards (MFBs) have been in use for more than a decade for implementing systems requiring high performance and for emulation/prototyping of multimillion gate chips. It is important to develop an MFB architecture which can be used for emulation or prototyping of a large number of circuits. A key feature of an MFB is its routing architecture defined by its inter-Field-Programmable Gate Array (FPGA) connections. There are two types of inter-FPGA connections, namely–fixed connections (FCs) connecting a pair of FPGAs through dedicated wires and programmable connections (PCs) which connect a pair of FPGAs through a programmable switch. An architecture which has a mix of both these type of connections is called a hybrid routing architecture. It has been shown in the literature [7] that a hybrid MFB architecture is more efficient for emulation than an architecture with only one type of connections. The cost of an MFB and delay of the emulated circuit on it depends on the number of PCs used for emulation. An objective of a designer of an MFB for circuit emulation is to minimize the required number of PCs. In this paper, we describe algorithms to evaluate the requirement of PCs for many hybrid routing architectures.The requirement of PCs can be reduced if some programmable connections are replaced by a connection using only FCs by routing through FPGAs. Such a routing is called multi-hop routing. We present an optimal and a heuristic algorithm for estimation of PCs when limited number of hops through FPGAs are permitted. The unique feature of our evaluation scheme is that it is generic and treat routing architecture as a parameter. We have used benchmark circuits as well as synthetic cloned circuits for testing our algorithms. Our heuristic algorithm is very fast and gives optimal results most of the time. Our algorithms can be used for actual routing during circuit emulation. 相似文献

6.

Placement and routing tools for the Triptych FPGA

Ebeling C. McMurchie L. Hauck S.A. Burns S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(4):473-482

Field-programmable gate arrays (FPGAs) are becoming an increasingly important implementation medium for digital logic. One of the most important keys to using FPGAs effectively is a complete, automated software system for mapping onto the FPGA architecture. Unfortunately, many of the tools necessary require different techniques than traditional circuit implementation options, and these techniques are often developed specifically for only a single FPGA architecture. In this paper we describe automatic mapping tools for Triptych, an FPGA architecture with improved logic density and performance over commercial FPGAs. These tools include a simulated-annealing placement algorithm that handles the routability issues of fine-grained FPGAs, and an architecture-adaptive routing algorithm that can easily be retargeted to other FPGAs. We also describe extensions to these algorithms for mapping asynchronous circuits to Montage, the first FPGA architecture to completely support asynchronous and synchronous interface applications 相似文献

7.

A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS

Samuel Antão Leonel Sousa 《Journal of Signal Processing Systems》2014,76(3):249-259

Modular arithmetic is a building block for a variety of applications potentially supported on embedded systems. An approach to turn modular arithmetic more efficient is to identify algorithmic modifications that would enhance the parallelization of the target arithmetic in order to exploit the properties of parallel devices and platforms. The Residue Number System (RNS) introduces data-level parallelism, enabling the parallelization even for algorithms based on modular arithmetic with several data dependencies. However, the mapping of generic algorithms to full RNS-based implementations can be complex and the utilization of suitable hardware architectures that are scalable and adaptable to different demands is required. This paper proposes and discusses an architecture with scalability features for the parallel implementation of algorithms relying on modular arithmetic fully supported by the Residue Number System (RNS). The systematic mapping of a generic modular arithmetic algorithm to the architecture is presented. It can be applied as a high level synthesis step for an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) design flow targeting modular arithmetic algorithms. An implementation with the Xilinx Virtex 4 and Altera Stratix II Field Programmable Gate Array (FPGA) technologies of the modular exponentiation and Elliptic Curve (EC) point multiplication, used in the Rivest-Shamir-Adleman (RSA) and (EC) cryptographic algorithms, suggests latency results in the same order of magnitude of the fastest hardware implementations of these operations known to date. 相似文献

8.

From application descriptions to hardware in seconds: a logic-based approach to bridging the gap

Benkrid K. Crookes D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(4):420-436

相似文献

9.

Metro-on-FPGA: A feasible solution to improve the congestion and routing resource management in future FPGAs

A. Belghadr A. Jahanian 《Integration, the VLSI Journal》2014

Asynchronous serial transceivers have been recently used for data serializing in large on-chip systems to alleviate the routing congestion and improve the routability. FPGAs have considerable potential for using the asynchronous serial transmission but they have serious challenges to use this technology. In this paper, we present a new FPGA architecture corresponding with a new routing algorithm to use the asynchronous data serializing technique in modern FPGAs. Experimental results show that allocated routing tracks and routing congestion can be reduced considerably (18.81% and 48.73%, respectively) by using the asynchronous data serializing without any performance degradation in cost of reasonable overhead in area and power consumption. The resulting improvements will increase for larger and more complex FPGAs. 相似文献

10.

Floating-Point Divider Design for FPGAs

Hemmert K. S. Underwood K. D. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(1):115-118

Growth in floating-point applications for field-programmable gate arrays (FPGAs) has made it critical to optimize floating-point units for FPGA technology. The divider is of particular interest because the design space is large and divider usage in applications varies widely. Obtaining the right balance between clock speed, latency, throughput, and area in FPGAs can be challenging. The designs presented here cover a range of performance, throughput, and area constraints. On a Xilinx Virtex4-11 FPGA, the range includes 250-MHz IEEE compliant double precision divides that are fully pipelined to 187-MHz iterative cores. Similarly, area requirements range from 4100 slices down to a mere 334 slices 相似文献

11.

Overview of a compiler for synthesizing MATLAB programs onto FPGAs

Banerjee P. Haldar M. Nayak A. Kim V. Saxena V. Parkes S. Bagchi D. Pal S. Tripathi N. Zaretsky D. Anderson R. Uribe J.R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(3):312-324

相似文献

12.

A routing algorithm for FPGAs with time-multiplexed interconnects

Ruiqi Luo Xiaolei Chen Yajun Ha 《半导体学报》2020,(2):73-82

Previous studies show that interconnects occupy a large portion of the timing budget and area in FPGAs.In this work,we propose a time-multiplexing technique on FPGA interconnects.In order to fully exploit this interconnect architecture,we propose a time-multiplexed routing algorithm that can actively identify qualified nets and schedule them to multiplexable wires.We validate the algorithm by using the router to implement 20 benchmark circuits to time-multiplexed FPGAs.We achieve a 38%smaller minimum channel width and 3.8%smaller circuit critical path delay compared with the state-of-the-art architecture router when a wire can be time-multiplexed six times in a cycle. 相似文献

13.

Hierarchical Resampling Algorithm and Architecture for Distributed Particle Filters

Yun Pan Ning Zheng Qinglin Tian Xiaolang Yan Ruohong Huan 《Journal of Signal Processing Systems》2013,71(3):237-246

In this paper, we introduce a hierarchical resampling (HR) algorithm and architecture for distributed particle filters (PFs). While maintaining the same accuracy as centralized resampling in statistics, the proposed HR algorithm decomposes the resampling step into two hierarchies including intermediate resampling (IR) and unitary resampling (UR), which suits PFs for distributed hardware implementation. Also presented includes a residual cumulative resampling (RCR) method that pipelines and accelerates the UR step. The corresponding architecture, when compared with traditional distributed architectures, eliminates the particle redistribution step, and has such advantages as short execution time and high memory efficiency. The prototype containing 8 PEs has been developed in Xilinx Virtex IV FPGA (XC4VFX100-12FF1152) for the bearings-only tracking (BOT) problem, and the result shows that the input observations can be processed at 37.21 KHz with 8 K particles and a clock speed of 80 MHz. 相似文献

14.

ARCHITECTURE MODEL AND RESOURCE GRAPH BUILDING ALGORITHM FOR DETAILED FPGA ARCHITECTURE DESIGN

Li Zhihua ;Yang Haigang ;Yang Liqun ;Li Wei ;Huang Juan 《电子科学学刊(英文版)》2014,(6):505-512

相似文献

15.

An FPGA-based network intrusion detection system with on-chip network interfaces

C. R. Clark C. D. Ulmer D. E. Schimmel 《International Journal of Electronics》2013,100(6):403-420

Network intrusion detection systems (NIDS) are critical network security tools that help protect computer installations from malicious users. Traditional software-based NIDS architectures are becoming strained as network data rates increase and attacks intensify in volume and complexity. In recent years, researchers have proposed using FPGAs to perform the computationally-intensive components of intrusion detection analysis. In this work, we present a new NIDS architecture that integrates the network interface hardware and packet analysis hardware into a single FPGA chip. This integration enables a higher performance and more flexible NIDS platform. To demonstrate the benefits of this technique, we have implemented a complete and functional NIDS in a Xilinx Virtex II Pro FPGA that performs in-line packet analysis and filtering on multiple Gigabit Ethernet links using rules from the open-source Snort attack database. 相似文献

16.

Flooding-based watershed algorithm and its prototype hardware architecture

Rambabu C. Chakrabarti I. Mahanta A. 《Vision, Image and Signal Processing, IEE Proceedings -》2004,151(3):224-234

Watershed transformation is a powerful image segmentation technique. The potential of its real-time application can be realised by a dedicated hardware architecture. However, little work has been reported so far on hardware realisation of watershed transformation. The authors propose an improved watershed algorithm derived from Meyer's simulated flooding-based algorithm by ordered queues and a prototype FPGA-based architecture for its effective implementation. The improvement in computational complexity results from use of a single queue and conditional neighbourhood comparisons while processing the 3 /spl times/ 3 neighbouring pixels. Besides analysing the computational complexity of the principal steps of the proposed algorithm, the authors present simulation results of running the proposed algorithm and the conventional algorithm on different images for comparison. The proposed architecture has been modelled in VHDL and synthesised for Virtex FPGA. The implementation results show acceptable performance of the proposed architecture. 相似文献

17.

基于泡沫挤压的图像流水线设计与FPGA验证

下载免费PDF全文

汪克念田毅阎芳常立博《电子器件》2018,41(2)

在图像处理器中,图像处理过程具有待处理像素量大、处理过程复杂以及数据传输通道多等特点,因而图像处理器存在着处理速度慢的问题。针对该问题,文中提出一种具有泡沫挤压功能的图像流水线FPGA设计方案,并在Xilinx公司的Virtex XC6VLX550T FPGA芯片上对该FPGA实现方案进行了验证和综合,结果表明本设计方案的正确性且同基本流水线相比该流水线设计能够在不大量增加电路资源的情况下提高图像处理器的处理速度。相似文献

18.

Fast algorithm and efficient hardware architecture of half-pixel interpolation unit for H.264/AVC

Wei Wang Tao Lin Yuting Xie Mao Mu Jie Hu 《电子科学学刊(英文版)》2014,31(3):214-221

A fast half-pixel motion estimation algorithm and its corresponding hardware architecture are presented. Unlike three steps are needed in typical half-pixel motion estimation algorithm, the presented algorithm needs only two steps to obtain all the interpolated pixels of an entire 8′8 block. The proposed architecture works in a parallel way and is simulated by Modelsim 6.5 SE, synthesized to the Xilinx Virtex4 XC4VLX15 Field Programmable Gate Array(FPGA) device, and verified by hardware platform. The implementation results show that this architecture can achieve 190 MHz and 11 clock cycles are reduced to complete the entire interpolation process in comparison with typical half-pixel interpolation, which meets the requirements of real-time application for very high defination videos. 相似文献

19.

ATOMi: An Algorithm for Circuit Partitioning Into Multiple FPGAs Using Time-Multiplexed, Off-Chip, Multicasting Interconnection Architecture

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(7):861-864

Logic emulation is so far the fastest method to verify the system functionality in the gate level before chip fabrication. Field-programmable gate array (FPGA)-based logic emulator with large gate capacity generally comprises a large number of FPGAs or special processors connected in mesh or crossbar topology. However, gate utilization of FPGAs and speed of emulation are limited by the number of signal pins among FPGAs and the interconnection architecture of the logic emulator. This paper first describes a new interconnection architecture called TOMi (Time-multiplexed, Off-chip, Multicasting interconnection) and proposes a circuit partitioning algorithm called ATOMi (Algorithm for TOMi) for multi-FPGA system incorporating four to eight FPGAs where FPGAs are interconnected through TOMi. ATOMi reduces the number of off-chip signal transfers to optimize the performance for multi-FPGA system implemented by TOMi. Experimental results using Partitioning93 benchmarks show that, by adopting the proposed TOMi interconnection architecture along with ATOMi, the pin count is reduced to 14.4%–88.6% while the critical path delay is reduced to 66.1%–90.1% compared to traditional architectures including mesh, crossbar, and VirtualWire architecture. 相似文献

20.

A novel and efficient routing architecture for multi-FPGA systems

Khalid M.A.S. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):30-39

Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value 相似文献