首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper presents a reconfigurable particle filter design methodology for a real-time bearings-only tracking application. The methodology provides the capability of selecting a single particle filter from multiple particle filter realizations with maximum resource sharing. The autonomous buffer controller mechanism for the architecture ensures correct operation of the particle filters. Parameter adaptation and algorithm reconfiguration can be accomplished with negligible reconfiguration overhead through buffer controllers and a set of switches for transforming dataflow structures such that any desired particle filter can be implemented. Two target particle filters, sample importance resample filter (SIRF) and Gaussian particle filter (GPF), are realized using field programmable gate array (FPGA) based on the proposed methodology. However, the architecture can be extended for a wide range of particle filters with different sets of dynamics. This paper successfully demonstrates that implementation of a domain specific processor for particle filters is feasible with performance that is much higher than that of commercially available digital signal processors (DSPs).  相似文献   

2.
Application of Reconfigurable CORDIC Architectures   总被引:1,自引:0,他引:1  
Reconfiguration enables the adaption of Coordinate Rotation DIgital Computer (CORDIC) units to the specific needs of sets of applications, hence creating application specific CORDIC-style implementations. Reconfiguration can be implemented at a high level, taking the entire CORDIC unit as a basic cell (CORDIC-cells) implemented in VLSI, or at a low level such as Field-Programmable Gate Arrays (FPGAs). We suggest a design methodology and analyze area/time results for coarse (VLSI) and fine-grain (FPGA) reconfigurable CORDIC units. For FPGAs we implement CORDIC units in Verilog HDL and our object-oriented design environment, PAM-Blox. For CORDIC-cells, multiple reconfigurable CORDIC modules are synthesized with state-of-the-art CAD tools. At the algorithm level we present a case study combining multiple CORDICs based on a geometrical interpretation of a normalized ladder algorithm for adaptive filtering to reduce latency and area of a fully pipelined CORDIC implementation. Ultimately, the goal is to create automatic tools to map applications directly to reconfigurable high-level arithmetic units such as CORDICs.  相似文献   

3.
In this paper, we propose a compact threshold-based resampling algorithm and architecture for efficient hardware implementation of particle filters (PFs). By using a simple threshold-based scheme, this resampling algorithm can reduce the complexity of hardware implementation and power consumption. Simulation results indicate that this algorithm has approximately equal performance with the traditional systematic resampling (SR) algorithm when the root-mean-square error (RMSE) and lost track are considered. Experimental comparison of the proposed hardware architecture with those based on the SR and the residual systematic resampling (RSR) algorithms was conducted on a Xilinx Virtex-II Pro field programmable gate array (FPGA) platform in the bearings-only tracking context, and the results establish the superiority of the proposed architecture in terms of high memory efficiency, low power consumption, and low latency.  相似文献   

4.
Wireless networks are very widespread nowadays, so secure and fast cryptographic algorithms are needed. The most widely used security technology in wireless computer networks is WPA2, which employs the AES algorithm, a powerful and robust cryptographic algorithm. In order not to degrade the Quality of Service (QoS) of these networks, the encryption speed is very important, for which reason we have implemented the AES algorithm in an FPGA, taking advantage of the hardware characteristics and the software-like flexibility of these devices. In this paper, we propose our own methodology for doing an FPGA-based AES implementation. This methodology combines the use of three hardware languages (Handel-C, VHDL and JBits) with partial and dynamic reconfiguration, and a pipelined and parallel implementation. The same design methodology could be extended to other cryptographic algorithms. Thanks to all these improvements our pipelined and parallel implementation reaches a very high throughput (24.922 Gb/s) and the best efficiency (throughput/area ratio) of all the related works found in the literature (6.97 Mb/s per slice).  相似文献   

5.
A new algorithm that synthesises multiplier blocks with low hardware requirement suitable for implementation as part of full-parallel finite impulse response (FIR) filters is presented. Although the techniques in use are applicable to implementation on application-specific integrated circuit (ASIC) and Structured ASIC technologies, analysis is performed using field programmable gate array (FPGA) hardware. Fully pipelined, full-parallel transposed-form FIR filters with multiplier block were generated using the new and previous algorithms, implemented on an FPGA target and the results compared. Previous research in this field has concentrated on minimising multiplier block adder cost but the results presented here demonstrate that this optimisation goal does not minimise FPGA hardware. Minimising multiplier block logic depth and pipeline registers is shown to have the greatest influence in reducing FPGA area cost. In addition to providing lower area solutions than existing algorithms, comparisons with equivalent filters generated using the distributed arithmetic technique demonstrate further area advantages of the new algorithm  相似文献   

6.
Blind source separation of independent sources from their convolutive mixtures is a problem in many real-world multi-sensor applications. However, the existing BSS architectures are more often than not based upon software and thus not suitable for direct implementation on hardware. The existing software of feedback network algorithm is not suitable for real-time implementations. In this paper, we present a parallel algorithm and architecture for hardware implementation of blind source separation. The algorithm is based on feedback network and is highly suited for parallel processing. The implementation is designed to operate in real time for speech signal sequences. It is systolic and easily scalable by simple adding and connecting chips or modules. In order to verify the proposed architecture, we have also designed and implemented it in a hardware prototyping with Xilinx FPGAs running at 33 MHz.
H. JeongEmail: Email:
  相似文献   

7.
The elaborate design of folded finite-impulse response (FIR) filters based on pipelined multiplier arrays is presented in this paper. The design is considered at the bit-level and the internal delays of the pipelined multiplier array are fully exploited in order to reduce hardware complexity. Both direct and transposed FIR filter forms are considered. The carry-save and the carry-propagate multiplier arrays are studied for the filter implementations. Partially folded architectures are also proposed which are implemented by cascading a number of folded FIR filters. The proposed schemes are compared as to the aspect of hardware complexity with a straightforward implementation of a folded FIR filter based on the pipelined Wallace Tree multiplier. The comparison reveals that the proposed schemes require 20%-30% less hardware. Finally, efficient implementation of partially folded FIR filter circuits is presented when constraints in area, power consumption and clock frequency are given.  相似文献   

8.
一种基于FPGA的并行流水线FIR滤波器结构   总被引:5,自引:0,他引:5  
王黎明  刘贵忠  刘龙  刘洁瑜 《微电子学》2004,34(5):582-585,588
提出了一种在FPGA器件上实现的流水线并行FIR滤波器结构。首先比较了FIR滤波器三种硬件实现所用的资源,然后在理论上推出该流水线并行结构滤波器的实现方法及其可行性,给出了硬件实现模块。实验结果表明,这种改进滤波器结构实现的算法可以灵活地处理综合的面积和速度的约束关系,使设计达到最优。  相似文献   

9.
SMS4密码算法高速引擎实现   总被引:1,自引:0,他引:1       下载免费PDF全文
周洲  何一凡  沈海斌  赵旭鑫   《电子器件》2007,30(4):1469-1471,1480
SMS4是国内公布的无线局域网分组密码算法,文章在分析该算法特点的基础上,介绍了一种单轮内流水的高速引擎实现结构.通过配置多个引擎进行二次流水或并行处理,可以灵活调整输出性能和硬件开销,以适应不同应用的需求.在FPGA的对比测试中,相同输出性能下,具有比32轮流水线结构更小的硬件开销.而且在单引擎应用时,SMS4加/解密系统可以非常容易的在低端的Spartan Ⅱ XC2S50 FPGA上实现.  相似文献   

10.
There has been growing recent interest in configurable computing, which can be viewed as a hybrid between ASICs and programmable processors. Configurable computing machines are implemented with programmable logic: flexible hardware that can be structured to fit the natural organization and data flow of a computation. The enabling device for configurable computing is the field-programmable array (FPGA). For applications characterized by deeply pipelined, highly parallel, and integer arithmetic processing, configurable computing machines can outperform alternative solutions by up to an order of magnitude. The combination in a single device of dedicated hardware and rapid, submillisecond-scale reprogrammability constitutes an exciting and promising development whose implications are only just beginning to be exploited. We begin with a brief tutorial on FPGAs that describes the most common FPGA architectures and how these architectures are used to support computation, memory access, and data flow. We then present FPGAs as computing machines and focus on devices that are reconfigured during run time. Ongoing research involving FPGAs and future directions are also discussed  相似文献   

11.
提出了一种基于提升算法的二维离散5/3小波变换(DWT)高效并行VLSI结构设计方法。该方法使得行和列滤波器同时进行滤波,采用流水线设计方法处理,在保证同样的精度下,大大减少了运算量,提高了变换速度,节约了硬件资源。该方法已通过了VerilogHDL行为级仿真验证,可作为单独的IP核应用在JPEG2000图像编、解码芯片中。该结构可推广到9/7小波提升结构。  相似文献   

12.
In this paper, techniques for efficient implementation of field-programmable gate-array (FPGA)-based wave-pipelined (WP) multipliers, accumulators, and filters are presented. A comparison of the performance of WP and pipelined systems has been made. Major contributions of this paper are development of an on-chip clock generation scheme which permits finer tuning of the frequency, a synthesis technique that reduces the area and latency by 25%, a placement utility that results in 10%–40% increase in speed and proposal of an interleaving scheme for filters that reduces the number of multipliers required by 50%. WP multipliers of size 2$times$6 and the filters using them are found to be 11% faster and require lower power than those using pipelined multipliers. Filters with higher order WP multipliers also operate with lower power at the cost of speed. The delay-register products of such filters are found to be about 60% lower than those using the pipelined multipliers. The paper also outlines applications of these techniques for the Spartan II FPGAs and a self-tuning scheme for optimizing the speed.  相似文献   

13.
In this paper, we propose a parallel systematic resampling (PSR) algorithm for particle filters, which is a new form of systematic resampling (SR). The PSR algorithm makes iterations independent, thus allowing the resampling algorithm to perform loop iterations in parallel. A fixed-point version of the PSR algorithm is also proposed, with a modification to ensure that a correct number of particles is generated. Experiments show that the fixed-point implementation of the PSR algorithm can use as few as 22 bits for representing the weights, when processing 512 particles, while achieving results equivalent to a floating-point SR implementation. Four customized instructions were designed to accelerate the proposed PSR algorithm in Application-Specific Instruction-set Processors. These four custom instructions, when configured to support four weight inputs in parallel, lead to a 73.7 \(\times \) speedup over a floating-point SR implementation on a general-purpose processor at a cost of 47.3 K additional gates.  相似文献   

14.
有限冲激响应(FIR)滤波器设计遇到的难题是滤波要进行大量乘法运算,即使是在全定制的专用集成电路中也会导致过大的面积与功耗.对于用硬件实现系数是常量的专用滤波器,可以通过分解系数变为应用加、减和移位而实现乘法.FIR滤波器的复杂性主要由用于系数乘法的加法器/减法器的数量决定.而对于自适应FIR滤波器,大多数场合下可用数字信号处理器(DSP)或CPU通过软件编程的方法来实现,但是对于要求高速运算的场合,VLSI实现是很好的选择.基于这一考虑,可以用符号数的正则表示(CSD)码表示系数, 再利用可重构现场可编程门阵列(FPGA)技术实现.可重构结构的应用,能保证系统的其余部分同时处于运行状态时实现FIR滤波器系数的更新.文中利用CSD码和可重构思想,提出了用FPGA实现自适应FIR滤波器的一种方案.  相似文献   

15.
The technical analysis used in determining which of the potential Advanced Encryption Standard candidates was selected as the Advanced Encryption Algorithm includes efficiency testing of both hardware and software implementations of candidate algorithms. Reprogrammable devices such as field-programmable gate arrays (FPGAs) are highly attractive options for hardware implementations of encryption algorithms, as they provide cryptographic algorithm agility, physical security, and potentially much higher performance than software solutions. This contribution investigates the significance of FPGA implementations of the Advanced Encryption Standard candidate algorithms. Multiple architectural implementation options are explored for each algorithm. A strong focus is placed on high-throughput implementations, which are required to support security for current and future high bandwidth applications. Finally, the implementations of each algorithm will be compared in an effort to determine the most suitable candidate for hardware implementation within commercially available FPGAs  相似文献   

16.
This paper presents a simple implementation method of pipelined asynchronous circuits, suitable for commercial field programmable gate arrays (FPGAs). Contrary to other existing asynchronous design techniques, the presented method does not require the application of additional user actions such as constraining or building hard macros. As a design example, an architecture of the asynchronous PicoBlaze compatible microcontroller and 12-bit pipelined fast array multiplier have been considered. The developed synchronous and asynchronous versions of the microcontroller as well as fast array multiplier have been implemented and tested using Xilinx FPGAs, and then compared in terms of the area requirement, power consumption and performance.  相似文献   

17.
用于纯方位跟踪的简化粒子滤波算法及其硬件实现   总被引:2,自引:2,他引:0  
针对粒子滤波运算量大,硬件复杂性高的问题,该文提出了一种用于纯方位跟踪的简化粒子滤波算法,该算法引入了一种新的基于阈值的重采样方法,降低了硬件实现的复杂度。在算法研究的基础上,论文研究了基于FGPA的硬件电路实现方法,给出了系统的整体硬件结构及重采样/采样模块的实现方案,讨论了粒子滤波硬件实现的资源优化及时间优化问题。仿真结果表明,对于纯方位跟踪问题,该粒子滤波算法具有优于扩展Kalman滤波器(EKF)的性能;硬件电路实验表明,该滤波器可以实现对被动目标的纯方位跟踪,并具有比通用粒子滤波器较快的处理速度。  相似文献   

18.
A high performance digital architecture for the implementation of a nonlinear image enhancement technique is proposed in this paper. The image enhancement is based on an illuminance-reflectance model which improves the visual quality of digital images and video captured under insufficient or non-uniform lighting conditions. The algorithm shows robust performance with appropriate dynamic range compression, good contrast, accurate and consistent color rendition. The algorithm contains a large number of complex computations and thus it requires specialized hardware implementation for real-time applications. Systolic, pipelined and parallel design techniques are utilized effectively in the proposed FPGA-based architectural design to achieve real-time performance. Approximation techniques are used in the hardware algorithmic design to achieve high throughput. The video enhancement system is implemented using Xilinx's multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 63 Mega-pixels (Mpixels) per second.  相似文献   

19.
Wave-pipelining enables a digital circuit to be operated at a higher frequency. In the literature, only trial-and-error and manual procedures are adopted for the choice of the optimum value of clock frequency and clock skew between the input and output registers of wave-pipelined circuits. One of the major contributions of this paper is the proposal for automating the above procedure. A second contribution is the study of how logic depths determine the superiority of wave-pipelining over pipelining with regard to power dissipation. For the study and implementation of wave-pipelined circuits, filters using the distributed arithmetic algorithm are considered. In this paper, two automation schemes are proposed for the implementation of the wave-pipelined filters on both Xilinx and Altera field programmable gate arrays (FPGAs). In the first scheme, a self-tuning finite state machine (FSM) is used to choose the clock skew and clock period for I/O registers between the wave-pipelined blocks. In the second approach, an on-chip soft-core processor is used to choose the clock skew and clock period. To test the efficacy of the schemes proposed, filters with different taps are implemented using three schemes: synchronous pipelining, sub-optimal wave-pipelining and no pipelining (i.e. using neither synchronous pipelining nor wave-pipelining). From the implementation results, it is observed that wave-pipelined distributed arithmetic (DA) filters are faster by a factor of 1.31–1.61 compared to non-pipelined DA filters. The synchronous pipelined DA filters are in turn faster by a factor of 1.73–3.27 compared to the wave-pipelined DA filters. The increased speeds are achieved in the pipelined filters at the cost of an increase in the number of slices by 15–33% and in the number of registers by 350–530%. To compare the power dissipation, both pipelined and wave-pipelined DA filters are tested by operating them at the same frequency. For medium logic depths, the wave-pipelined DA filters dissipate less power than pipelined filters. This work is carried out with the funding received from the Department of Information Technology, Ministry of information and Telecommunication, New Delhi, India.  相似文献   

20.
The stochastic gradient adaptive lattice filter is pipelined by the application of relaxed look-ahead. This form of look-ahead maintains the functional behavior of the algorithm instead of the input-output mapping and is suitable for pipelining adaptive filters. The sum and product relaxations are employed to pipeline the filter. The hardware complexity of the pipelined filters is the same as for the sequential filter and is independent of the level of pipelining or speedup. Two pipelined architectures along with their convergence analyses are presented to illustrate the tradeoff offered by relaxed look-ahead. Simulation results supporting the conclusions of the convergence analysis are provided. The proposed architectures are then employed to develop a pipelined adaptive differential pulse-code modulation (DPCM) codec for video compression applications. Speedup factors up to 20 are demonstrated via simulations with image data  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号