首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a low-cost serial decoder architecture for low-density parity-check convolutional codes (LDPC-CCs). It has been shown that LDPC-CCs can achieve comparable performance to LDPC block codes with constraint length much less than the block length. The proposed serial decoder architecture for LDPC-CCs uses a single decoding processor. Terminated data frames are sent through the processor iteratively until correctly decoded or a maximum number of iterations is reached. This architecture saves memory consumption and uses a very small number of logic elements, making it especially suitable for strong LDPC-CCs with large code memory. The proposed architecture is realized for a (2048,3,6) regular LDPC-CC on an Altera Stratix FPGA. With a maximum of 100 iterations, the design achieves up to 9-Mb/s throughput using only a very small portion of the field-programmable gate array resources.   相似文献   

2.
Systolic Kalman filter (SKF) designs based on a triangular array (triarray) configuration are presented. A least squares formulation, which is an expanded matrix representation of the state space iteration, is adopted to develop an efficient iterative QR triangularization and consecutive data prewhitening formulations. This formulation has advantages in both numerical accuracy and processor utilization efficiency. Moreover, it leads naturally to pipelined architectures such as systolic or wavefront arrays. For an n state and m measurement dynamic system, the SKF triarray design uses n(n+3)/2 processors and requires only 4n+m timesteps to complete one iteration of prewhitened Kalman filtering system. This means a speedup factor of approximately n2/4 when compared with a sequential processor. Also proposed for the colored noise case are data prewhitening triarrays which offer compatible speedup performance for the preprocessing stage. Based on a comparison of several competing alternatives, the proposed array processor may be considered a most efficient systolic or wavefront design for Kalman filtering  相似文献   

3.
A systolic block implementation is described of two-dimensional (2D) FIR and quarter-plane digital filters. Initially, a general 2D block realization model is presented, which does not assume any restricted relation with respect to the block lengths. A high degree of concurrency is achieved by exploiting the pipelining of the array processors in conjunction with the inherent parallelism of the block realization structures. The resulting systolic implementation is characterized by a high degree of modularity, regularity, repetitiveness and local communications and permits very high sampling rates. The increase of the block lengths of the implementation is analogous to the attained throughput rate, with respect to the cost of supporting hardware. The proposed systolic implementation is suitable for real-time image processing applications.  相似文献   

4.
The theory and design of systolic arrays for Viterbi processing in communication systems with a time-dispersive time-varying channel is discussed. The architecture, algorithms, and processor elements, for a two-dimensional systolic array are described. The array supports the branch metric computations required for an adaptive Viterbi processor. The array is designed so that computations propagate along the rows of the array, while data symbols propagate along the columns. All interprocessor data flow and connections within the array are nearest-neighbor. The array illustrates how the Viterbi-processor algorithms can be structured to achieve a high degree of computational concurrency. Variations in the array design are described and evaluated in terms of computational resource requirements and utilization and computational throughput. A high-bandwidth memory interface is proposed, and system design considerations are discussed  相似文献   

5.
A general structure is presented for the block realization of two-dimensional infinite impulse response digital filters, which is based on the two-dimensional matrix convolution equations and the decomposition of their associated transfer function matrices. The proposed decomposition may be considered as an extension of the scalar decomposition technique, which has already been used for the realization of two-dimensional digital filters associated with two-variable polynomials. The decomposition structure is considered in two different forms, which correspond to the direct forms I and II. It is shown that if a given two-dimensional single-input, single-output filter is realizable, then realizable block decomposition structures may be always selected. The proposed approach is general and applies without any restriction for the block implementation of any two-dimensional filter. The resulting structures are characterized by high inherent parallelism, modularity, regularity, reconfigurability, local interconnections, and very high sampling and throughput rates. Thus they are well suited for VLSI implementation and implementation via multiprocessor systems and array processors, such as systolic and wavefront arrays.  相似文献   

6.
The problem of mapping algorithms onto regular arrays has received great attention in the past. Results are available on the mapping of regular algorithms onto systolic or wavefront arrays. On the other hand, many algorithms that can be implemented on parallel architectures are not completely regular but are composed of a set of regular subalgorithms. Recently, a class of configurable processor arrays has been proposed that allows the efficient implementation of piecewise regular algorithms. In contrary to pure systolic of wavefront arrays they are distinguished by a dynamic configuration structure. The known trajectories, however, cannot be applied to the design of configurable processor arrays because the functions of the procesing elements and the interconnection structure are time- and space-dependent. In this paper, a systematic procedure is introduced that allows the efficient design of configurable processor arrays including the specification of the processing elements and the generation of control signals. Control signals are propagated through the processor array. The proposed design trajectory can be used for the design of regular arrays or configurable arrays.  相似文献   

7.
刘召庆  李力  董冰  金伟其 《红外技术》2021,43(8):717-722
夏克-哈特曼波前传感器是目前自适应光学系统中应用最为广泛的实时波前探测器。本文针对具有高分辨、高帧速、大规模子孔径数的夏克-哈特曼传感器,根据其波前处理计算量和实时性的要求,提出了一种基于现场可编程门阵列(FPGA,field-programmable gate array)的实时波前处理机结构及波前斜率计算方法。该方法利用核心处理模块重复利用的方式完成子孔径内光斑质心的计算,并通过USB3.0将处理后的质心数据实时传输给PC机。处理机以一片XILINX公司Kintex7-XC7K325T的FPGA作为处理芯片进行了设计,结果表明:该算法可对560帧/s的1020×1020图像(580 MB/s数据量),56×56子孔径哈特曼传感器,进行低延时实时光斑质心计算,提高了系统的波前处理速度和整个自适应光学系统的控制速度。  相似文献   

8.
In this paper, we present 64/128/256/512‐point inverse fast Fourier transform (IFFT)/FFT processor for single‐user and multi‐user multiple‐input multiple‐output orthogonal frequency‐division multiplexing based IEEE 802.11ac wireless local area network transceiver. The multi‐mode processor is developed by an eight‐parallel mixed‐radix architecture to efficiently produce full reconfigurability for all multi‐user combinations. The proposed design not only supports the operation of IFFT/FFT for 1–8 different data streams operated by different users in case of downlink transmission, but also, it provides different throughput rates to meet IEEE 802.11ac requirements at the minimum possible clock frequency. Moreover, less power is needed in our design compared with traditional software approach. The design is carefully optimized to operate by the minimum wordlengths that fulfill the performance and complexity specifications. The processor is designed and implemented on Xilinx Vertix‐5 field programmable gate array technology. Although the maximum clock frequency is 377.84 MHz, the processor is clocked by the operating sampling rate to further reduce the power consumption. At the operation clock rate of 160 MHz, our proposed processor can calculate 512‐point FFT with up to eight independent data sequences within 3.2~μs meeting IEEE 802.11ac standard requirements. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

9.
针对自适应光学系统,提出了一种基于线性双向流水脉动阵列的快速波前复原方法。该方法结合直接斜率波前复原算法和线性双向脉动阵列工作特点,通过对复原矩阵进行PCT数据变换和引入资源共享,提高了阵列的单元利用率,减少了资源占用且保证了计算的实时性,同时具有阵列结构简单规整、模块性强、可扩展性好等优点。最后在FPGA上实现了61单元48子孔径自适应光学系统的波前复原,验证了方法的可行性。  相似文献   

10.
An efficient low-latency array architecture is presented for telescopic-search-based block-matching motion estimation. It consists of a novel ring-like systolic array and a comparing tree. The new architecture achieves higher processor utilisation, delivers a higher throughput rate, and requires minimal memory bandwidth  相似文献   

11.
This paper demonstrates an optimal time, fully systolic algorithm for edge detection on a mesh connected processor array. It uses only inexpensive addition and comparison operations which makes it ideal for fine grained parallelism in VLSI. Given anN xN image in the form of a two-dimensional array of pixels, our algorithm computes the Sobel and Laplacian operators for skimming lines in the image and then generates the Hough array using thresholding. The Hough transforms forM different angles of projection are obtained in a fully systolic manner inO(M+N) time, which is asymptotically optimal. In comparison, a previously published multiplication free algorithm has a time complexity ofO(NM). An implementation of our algorithm on a mesh connected finegrained processor array is discussed, which computes at the rate of approximately 170,000 Hough transforms per second using a 50 MHz clock.This research was partially supported by National Science Foundation under Grant No. MIP 8902636  相似文献   

12.
The space-time mapping of the dependency matrix of an algorithm may be used to study proposed systolic array implementations. In this paper we consider nested loop structures and use the space-time mapping approach to examine six objective functions, processor pipelining rate, computation time, throughput, number of processing elements, geometric area and space utilization. An elementary expression for each of these objective functions is derived which depends only on the space-time transformation and the size of loops. Moreover, several necessary and sufficient conditions for optimizing an individual objective function are provided  相似文献   

13.
In many scientific and signal processing applications, there are increasing demands for large-volume and/or high-speed computations which call for not only high-speed computing hardware, but also for novel approaches in computer architecture and software techniques in future supercomputers. Tremendous progress has been made on several promising parallel architectures for scientific computations, including a variety of digital filters, fast Fourier transform (FFT) processors, data-flow processors, systolic arrays, and wavefront arrays. This paper describes these computing networks in terms of signal-flow graphs (SFG) or data-flow graphs (DFG), and proposes a methodology of converting SFG computing networks into synchronous systolic arrays or data-driven wavefront arrays. Both one- and two-dimensional arrays are discussed theoretically, as well as with illustrative examples. A wavefront-oriented programming language, which describes the (parallel) data flow in systolic/wavefront-type arrays, is presented. The structural property of parallel recursive algorithms points to the feasibility of a Hierarchical Iterative Flow-Graph Design (HIFD) of VLSI Array Processors. The proposed array processor architectures, we believe, will have significant impact on the development of future supercomputers.  相似文献   

14.
The authors propose a systolic block Householder transformation (SBHT) approach to implement the HT on a systolic array and also propose its application to the RLS (recursive least squares) algorithm. Since the data are fetched in a block manner, vector operations are in general required for the vectorized array. However, a modified HT algorithm permits a two-level pipelined implementation of the SBHT systolic array at both the vector and word levels. The throughput rate can be as fast as that of the Givens rotation method. The present approach makes the HT amenable for VLSI implementation as well as applicable to real-time high-throughput applications of modern signal processing. The constrained RLS problem using the SBHT RLS systolic array is also considered  相似文献   

15.
In this paper we solve the Stein equationX+AXB=C, with A and B upper triangular matrices, by means of a bidimensional systolic array processor, independent of problem size. The problem is decomposed into two basic subproblems: the solution of an upper triangular system and a GAXPY operation. Then we obtain a size-dependent systolic algorithm by means of an appropriate chaining of the solutions of these subproblems. This systolic algorithm is transformed into a size-independent systolic array processor by using the Dense-to-Banded Transformation.  相似文献   

16.
The authors discuss several advances in the evolution of radar CFAR (constant false alarm rate) detectors, from the classical mean-level detector to more recent designs using order statistics, or sorted data values. These algorithms can be implemented by modifying the existing running window order statistic filtering techniques used in signal/image processing. Although the signal processing theory of CFAR detection is well advanced, practical applications lag because of the high throughput required in radar. This intensive computational requirement is unlikely to be met by further advances in VLSI technology alone; it must result from parallel processing techniques. Systolic array architectures are proposed for several important CFAR detectors. Techniques for improving the processor utilization efficiency of the proposed array architectures are also discussed  相似文献   

17.
In this paper, a fully-pipeline linear systolic array based on adjusted Montgomery's algorithm is presented to perform modular multiplication at extremely high speed. The processing element (PE) consists of only 4 full-adders and 14 flip-flops. Three-stage internal pipelined PE results in a very short critical path with only a one-bit full-adder delay. Thus, it can run at a very high cycle rate. The total execution time for an n-bit modular multiplication is 2n + 11 cycles with only (n/2 + 2) PEs. A modular exponentiation based on it takes (3n + 16.5)n cycles in average. Compared with most published VLSI modular multipliers, the hardware complexity is greatly reduced while keeping very high throughput. Therefore it is a good candidate of the arithmetic units used in the many public-key crypto-systems, e.g. RSA, Elliptic Curve and so on, especially for the embedded applications concerning information security.  相似文献   

18.
波前处理机用于完成自适应光学系统中的波前处理运算,其延时直接影响系统的控制带宽。该处理机根据帧频835HZ的哈特曼波前传感器输出信号的特点,用5片TMS320C50和1片TMS320C31,采用流水和并行处理技术,峰值运算速度达3亿次/秒,运算延时仅0.7ms。  相似文献   

19.
Kwan  H.K. 《Electronics letters》1987,23(9):442-443
A new systolic array for the realisation of a second-order recursive digital filter is presented. The systolic array is derived from a transformed digital transfer function. In so doing, a high throughput and regular systolic array consisting of only one type of basic cell can be derived.  相似文献   

20.
In this paper we show how systolic/wavefront arrays can be automatically designed and partitioned to solve problems of arbitrary size. Buffer memory and control of a resulting array is regular and simple, and is generated automatically. Also, the throughput of the array is matched with the I/O speed of the host to which it is to be attached. The approach strongly relies upon classical concepts in signal processing, such as signal flow graphs and state transition functional behavior. Some illustrative examples are included.This research was supported in part by the Dutch National Applied Science Foundation under Grant STW DEL 44.0643 and by the Commission of the EEC under the ESPRIT 991 program.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号