共查询到20条相似文献,搜索用时 46 毫秒
1.
Sangjin Hong Jinseok Lee Athalye A. Djuric P.M. We-Duke Cho 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(9):1987-2000
This paper presents a reconfigurable particle filter design methodology for a real-time bearings-only tracking application. The methodology provides the capability of selecting a single particle filter from multiple particle filter realizations with maximum resource sharing. The autonomous buffer controller mechanism for the architecture ensures correct operation of the particle filters. Parameter adaptation and algorithm reconfiguration can be accomplished with negligible reconfiguration overhead through buffer controllers and a set of switches for transforming dataflow structures such that any desired particle filter can be implemented. Two target particle filters, sample importance resample filter (SIRF) and Gaussian particle filter (GPF), are realized using field programmable gate array (FPGA) based on the proposed methodology. However, the architecture can be extended for a wide range of particle filters with different sets of dynamics. This paper successfully demonstrates that implementation of a domain specific processor for particle filters is feasible with performance that is much higher than that of commercially available digital signal processors (DSPs). 相似文献
2.
Application of Reconfigurable CORDIC Architectures 总被引:1,自引:0,他引:1
Oskar Mencer Luc Séméria Martin Morf Jean-Marc Delosme 《The Journal of VLSI Signal Processing》2000,24(2-3):211-221
Reconfiguration enables the adaption of Coordinate Rotation DIgital Computer (CORDIC) units to the specific needs of sets of applications, hence creating application specific CORDIC-style implementations. Reconfiguration can be implemented at a high level, taking the entire CORDIC unit as a basic cell (CORDIC-cells) implemented in VLSI, or at a low level such as Field-Programmable Gate Arrays (FPGAs). We suggest a design methodology and analyze area/time results for coarse (VLSI) and fine-grain (FPGA) reconfigurable CORDIC units. For FPGAs we implement CORDIC units in Verilog HDL and our object-oriented design environment, PAM-Blox. For CORDIC-cells, multiple reconfigurable CORDIC modules are synthesized with state-of-the-art CAD tools. At the algorithm level we present a case study combining multiple CORDICs based on a geometrical interpretation of a normalized ladder algorithm for adaptive filtering to reduce latency and area of a fully pipelined CORDIC implementation. Ultimately, the goal is to create automatic tools to map applications directly to reconfigurable high-level arithmetic units such as CORDICs. 相似文献
3.
Shao-Hua Hong Zhi-Guo Shi Ji-Ming Chen Kang-Sheng Chen 《Circuits, Systems, and Signal Processing》2010,29(1):155-167
In this paper, we propose a compact threshold-based resampling algorithm and architecture for efficient hardware implementation
of particle filters (PFs). By using a simple threshold-based scheme, this resampling algorithm can reduce the complexity of
hardware implementation and power consumption. Simulation results indicate that this algorithm has approximately equal performance
with the traditional systematic resampling (SR) algorithm when the root-mean-square error (RMSE) and lost track are considered.
Experimental comparison of the proposed hardware architecture with those based on the SR and the residual systematic resampling
(RSR) algorithms was conducted on a Xilinx Virtex-II Pro field programmable gate array (FPGA) platform in the bearings-only
tracking context, and the results establish the superiority of the proposed architecture in terms of high memory efficiency,
low power consumption, and low latency. 相似文献
4.
José M. Granado-Criado Author Vitae Miguel A. Vega-Rodríguez Author Vitae Author Vitae Juan A. Gómez-Pulido Author Vitae 《Integration, the VLSI Journal》2010,43(1):72-80
Wireless networks are very widespread nowadays, so secure and fast cryptographic algorithms are needed. The most widely used security technology in wireless computer networks is WPA2, which employs the AES algorithm, a powerful and robust cryptographic algorithm. In order not to degrade the Quality of Service (QoS) of these networks, the encryption speed is very important, for which reason we have implemented the AES algorithm in an FPGA, taking advantage of the hardware characteristics and the software-like flexibility of these devices. In this paper, we propose our own methodology for doing an FPGA-based AES implementation. This methodology combines the use of three hardware languages (Handel-C, VHDL and JBits) with partial and dynamic reconfiguration, and a pipelined and parallel implementation. The same design methodology could be extended to other cryptographic algorithms. Thanks to all these improvements our pipelined and parallel implementation reaches a very high throughput (24.922 Gb/s) and the best efficiency (throughput/area ratio) of all the related works found in the literature (6.97 Mb/s per slice). 相似文献
5.
Macpherson K.N. Stewart R.W. 《Vision, Image and Signal Processing, IEE Proceedings -》2006,153(6):711-720
A new algorithm that synthesises multiplier blocks with low hardware requirement suitable for implementation as part of full-parallel finite impulse response (FIR) filters is presented. Although the techniques in use are applicable to implementation on application-specific integrated circuit (ASIC) and Structured ASIC technologies, analysis is performed using field programmable gate array (FPGA) hardware. Fully pipelined, full-parallel transposed-form FIR filters with multiplier block were generated using the new and previous algorithms, implemented on an FPGA target and the results compared. Previous research in this field has concentrated on minimising multiplier block adder cost but the results presented here demonstrate that this optimisation goal does not minimise FPGA hardware. Minimising multiplier block logic depth and pipeline registers is shown to have the greatest influence in reducing FPGA area cost. In addition to providing lower area solutions than existing algorithms, comparisons with equivalent filters generated using the distributed arithmetic technique demonstrate further area advantages of the new algorithm 相似文献
6.
Blind source separation of independent sources from their convolutive mixtures is a problem in many real-world multi-sensor
applications. However, the existing BSS architectures are more often than not based upon software and thus not suitable for
direct implementation on hardware. The existing software of feedback network algorithm is not suitable for real-time implementations.
In this paper, we present a parallel algorithm and architecture for hardware implementation of blind source separation. The
algorithm is based on feedback network and is highly suited for parallel processing. The implementation is designed to operate
in real time for speech signal sequences. It is systolic and easily scalable by simple adding and connecting chips or modules.
In order to verify the proposed architecture, we have also designed and implemented it in a hardware prototyping with Xilinx
FPGAs running at 33 MHz.
相似文献
H. JeongEmail: Email: |
7.
Bougas P. Kalivas P. Tsirikos A. Pekmestzi K.Z. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(1):108-118
The elaborate design of folded finite-impulse response (FIR) filters based on pipelined multiplier arrays is presented in this paper. The design is considered at the bit-level and the internal delays of the pipelined multiplier array are fully exploited in order to reduce hardware complexity. Both direct and transposed FIR filter forms are considered. The carry-save and the carry-propagate multiplier arrays are studied for the filter implementations. Partially folded architectures are also proposed which are implemented by cascading a number of folded FIR filters. The proposed schemes are compared as to the aspect of hardware complexity with a straightforward implementation of a folded FIR filter based on the pipelined Wallace Tree multiplier. The comparison reveals that the proposed schemes require 20%-30% less hardware. Finally, efficient implementation of partially folded FIR filter circuits is presented when constraints in area, power consumption and clock frequency are given. 相似文献
8.
9.
10.
There has been growing recent interest in configurable computing, which can be viewed as a hybrid between ASICs and programmable processors. Configurable computing machines are implemented with programmable logic: flexible hardware that can be structured to fit the natural organization and data flow of a computation. The enabling device for configurable computing is the field-programmable array (FPGA). For applications characterized by deeply pipelined, highly parallel, and integer arithmetic processing, configurable computing machines can outperform alternative solutions by up to an order of magnitude. The combination in a single device of dedicated hardware and rapid, submillisecond-scale reprogrammability constitutes an exciting and promising development whose implications are only just beginning to be exploited. We begin with a brief tutorial on FPGAs that describes the most common FPGA architectures and how these architectures are used to support computation, memory access, and data flow. We then present FPGAs as computing machines and focus on devices that are reconfigured during run time. Ongoing research involving FPGAs and future directions are also discussed 相似文献
11.
12.
《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(7):783-793
In this paper, techniques for efficient implementation of field-programmable gate-array (FPGA)-based wave-pipelined (WP) multipliers, accumulators, and filters are presented. A comparison of the performance of WP and pipelined systems has been made. Major contributions of this paper are development of an on-chip clock generation scheme which permits finer tuning of the frequency, a synthesis technique that reduces the area and latency by 25%, a placement utility that results in 10%–40% increase in speed and proposal of an interleaving scheme for filters that reduces the number of multipliers required by 50%. WP multipliers of size 2$times$ 6 and the filters using them are found to be 11% faster and require lower power than those using pipelined multipliers. Filters with higher order WP multipliers also operate with lower power at the cost of speed. The delay-register products of such filters are found to be about 60% lower than those using the pipelined multipliers. The paper also outlines applications of these techniques for the Spartan II FPGAs and a self-tuning scheme for optimizing the speed. 相似文献
13.
Qifeng Gan J. M. Pierre Langlois Yvon Savaria 《Circuits, Systems, and Signal Processing》2014,33(11):3591-3602
In this paper, we propose a parallel systematic resampling (PSR) algorithm for particle filters, which is a new form of systematic resampling (SR). The PSR algorithm makes iterations independent, thus allowing the resampling algorithm to perform loop iterations in parallel. A fixed-point version of the PSR algorithm is also proposed, with a modification to ensure that a correct number of particles is generated. Experiments show that the fixed-point implementation of the PSR algorithm can use as few as 22 bits for representing the weights, when processing 512 particles, while achieving results equivalent to a floating-point SR implementation. Four customized instructions were designed to accelerate the proposed PSR algorithm in Application-Specific Instruction-set Processors. These four custom instructions, when configured to support four weight inputs in parallel, lead to a 73.7 \(\times \) speedup over a floating-point SR implementation on a general-purpose processor at a cost of 47.3 K additional gates. 相似文献
14.
有限冲激响应(FIR)滤波器设计遇到的难题是滤波要进行大量乘法运算,即使是在全定制的专用集成电路中也会导致过大的面积与功耗.对于用硬件实现系数是常量的专用滤波器,可以通过分解系数变为应用加、减和移位而实现乘法.FIR滤波器的复杂性主要由用于系数乘法的加法器/减法器的数量决定.而对于自适应FIR滤波器,大多数场合下可用数字信号处理器(DSP)或CPU通过软件编程的方法来实现,但是对于要求高速运算的场合,VLSI实现是很好的选择.基于这一考虑,可以用符号数的正则表示(CSD)码表示系数, 再利用可重构现场可编程门阵列(FPGA)技术实现.可重构结构的应用,能保证系统的其余部分同时处于运行状态时实现FIR滤波器系数的更新.文中利用CSD码和可重构思想,提出了用FPGA实现自适应FIR滤波器的一种方案. 相似文献
15.
Elbirt A.J. Yip W. Chetwynd B. Paar C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(4):545-557
The technical analysis used in determining which of the potential Advanced Encryption Standard candidates was selected as the Advanced Encryption Algorithm includes efficiency testing of both hardware and software implementations of candidate algorithms. Reprogrammable devices such as field-programmable gate arrays (FPGAs) are highly attractive options for hardware implementations of encryption algorithms, as they provide cryptographic algorithm agility, physical security, and potentially much higher performance than software solutions. This contribution investigates the significance of FPGA implementations of the Advanced Encryption Standard candidate algorithms. Multiple architectural implementation options are explored for each algorithm. A strong focus is placed on high-throughput implementations, which are required to support security for current and future high bandwidth applications. Finally, the implementations of each algorithm will be compared in an effort to determine the most suitable candidate for hardware implementation within commercially available FPGAs 相似文献
16.
This paper presents a simple implementation method of pipelined asynchronous circuits, suitable for commercial field programmable gate arrays (FPGAs). Contrary to other existing asynchronous design techniques, the presented method does not require the application of additional user actions such as constraining or building hard macros. As a design example, an architecture of the asynchronous PicoBlaze compatible microcontroller and 12-bit pipelined fast array multiplier have been considered. The developed synchronous and asynchronous versions of the microcontroller as well as fast array multiplier have been implemented and tested using Xilinx FPGAs, and then compared in terms of the area requirement, power consumption and performance. 相似文献
17.
用于纯方位跟踪的简化粒子滤波算法及其硬件实现 总被引:2,自引:2,他引:0
针对粒子滤波运算量大,硬件复杂性高的问题,该文提出了一种用于纯方位跟踪的简化粒子滤波算法,该算法引入了一种新的基于阈值的重采样方法,降低了硬件实现的复杂度。在算法研究的基础上,论文研究了基于FGPA的硬件电路实现方法,给出了系统的整体硬件结构及重采样/采样模块的实现方案,讨论了粒子滤波硬件实现的资源优化及时间优化问题。仿真结果表明,对于纯方位跟踪问题,该粒子滤波算法具有优于扩展Kalman滤波器(EKF)的性能;硬件电路实验表明,该滤波器可以实现对被动目标的纯方位跟踪,并具有比通用粒子滤波器较快的处理速度。 相似文献
18.
Hau T. Ngo Author Vitae Author Vitae Ming Z. Zhang Author Vitae Author Vitae 《Integration, the VLSI Journal》2008,41(4):474-488
A high performance digital architecture for the implementation of a nonlinear image enhancement technique is proposed in this paper. The image enhancement is based on an illuminance-reflectance model which improves the visual quality of digital images and video captured under insufficient or non-uniform lighting conditions. The algorithm shows robust performance with appropriate dynamic range compression, good contrast, accurate and consistent color rendition. The algorithm contains a large number of complex computations and thus it requires specialized hardware implementation for real-time applications. Systolic, pipelined and parallel design techniques are utilized effectively in the proposed FPGA-based architectural design to achieve real-time performance. Approximation techniques are used in the hardware algorithmic design to achieve high throughput. The video enhancement system is implemented using Xilinx's multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 63 Mega-pixels (Mpixels) per second. 相似文献
19.
G. Seetharaman B. Venkataramani G. Lakshminarayanan 《Circuits, Systems, and Signal Processing》2008,27(3):261-276
Wave-pipelining enables a digital circuit to be operated at a higher frequency. In the literature, only trial-and-error and
manual procedures are adopted for the choice of the optimum value of clock frequency and clock skew between the input and
output registers of wave-pipelined circuits. One of the major contributions of this paper is the proposal for automating the
above procedure. A second contribution is the study of how logic depths determine the superiority of wave-pipelining over
pipelining with regard to power dissipation. For the study and implementation of wave-pipelined circuits, filters using the
distributed arithmetic algorithm are considered. In this paper, two automation schemes are proposed for the implementation
of the wave-pipelined filters on both Xilinx and Altera field programmable gate arrays (FPGAs). In the first scheme, a self-tuning
finite state machine (FSM) is used to choose the clock skew and clock period for I/O registers between the wave-pipelined
blocks. In the second approach, an on-chip soft-core processor is used to choose the clock skew and clock period. To test
the efficacy of the schemes proposed, filters with different taps are implemented using three schemes: synchronous pipelining,
sub-optimal wave-pipelining and no pipelining (i.e. using neither synchronous pipelining nor wave-pipelining). From the implementation
results, it is observed that wave-pipelined distributed arithmetic (DA) filters are faster by a factor of 1.31–1.61 compared
to non-pipelined DA filters. The synchronous pipelined DA filters are in turn faster by a factor of 1.73–3.27 compared to
the wave-pipelined DA filters. The increased speeds are achieved in the pipelined filters at the cost of an increase in the
number of slices by 15–33% and in the number of registers by 350–530%. To compare the power dissipation, both pipelined and
wave-pipelined DA filters are tested by operating them at the same frequency. For medium logic depths, the wave-pipelined
DA filters dissipate less power than pipelined filters.
This work is carried out with the funding received from the Department of Information Technology, Ministry of information
and Telecommunication, New Delhi, India. 相似文献
20.
The stochastic gradient adaptive lattice filter is pipelined by the application of relaxed look-ahead. This form of look-ahead maintains the functional behavior of the algorithm instead of the input-output mapping and is suitable for pipelining adaptive filters. The sum and product relaxations are employed to pipeline the filter. The hardware complexity of the pipelined filters is the same as for the sequential filter and is independent of the level of pipelining or speedup. Two pipelined architectures along with their convergence analyses are presented to illustrate the tradeoff offered by relaxed look-ahead. Simulation results supporting the conclusions of the convergence analysis are provided. The proposed architectures are then employed to develop a pipelined adaptive differential pulse-code modulation (DPCM) codec for video compression applications. Speedup factors up to 20 are demonstrated via simulations with image data 相似文献