首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.  相似文献   

2.
提出一种基于提升算法(lifting scheme)实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.该结构由一个行处理器和一个列处理器组成,行、列处理器通过时分复用同时进行滤波,用优化的移位加操作替代乘法操作,采用嵌入式数据延拓算法处理边界延拓.整个结构采用流水线设计方法,减少了运算量,提高了硬件资源利用率,该结构可应用于JPEG2000图像编码芯片中.  相似文献   

3.
辛勤  钟艳华  刘春风  潘利明 《现代电子技术》2010,33(18):124-126,130
提升算法的推出使得离散小波变换硬件的快速实现成为可能。翻转结构在提升架构的基础上进一步提高运算速度。在此,对翻转结构的舍入误差进行了分析,在翻转结构的基础上,对提升步骤进行了合并,提出一种有效的DWT硬件实现方案。实验结果表明,通过采用流水线模式提出的这种硬件结构,在关键路径约束的条件下,可以充分利用硬件资源。  相似文献   

4.
Context-based Binary Arithmetic Coding (CBAC) is a normative part of the newest X Profile of Advanced Audio Video coding Standard (AVS). This paper presents an efficient VLSI architecture for CBAC decoding in AVS. Compared with CBAC in H.264/AVC, the simpler binarization methods and context selection schemes are adopted in AVS. In order to avoid the slow multiplications, the traditional arithmetic calculation is transformed to the logarithm domain. Although these features can obtain better balance between the compression gain and implementation cost, it still brings huge challenge for high-throughput implementation. The fact that current bin decoding depends on previous bin results in long latency and limits overall system performance. In this paper, we present a software–hardware co-design by using bin distribution feature. A novel pipeline-based architecture is proposed where the arithmetic decoding engine works in parallel with the context maintainer. A finite state machine (FSM) is used to control the decoding procedure flexibly and the context scheduling is organized carefully to minimize the access times of context RAMs. In addition, the critical path is optimized for the timing. The proposed implementation can work at 150 MHz and achieve the real-time AVS CBAC decoding for 1080i HDTV video.  相似文献   

5.
Novel architectures for 1-D and 2-D discrete wavelet transform (DWT) by using lifting schemes are presented in this paper. An embedded decimation technique is exploited to optimize the architecture for 1-D DWT, which is designed to receive an input and generate an output with the low- and high-frequency components of original data being available alternately. Based on this 1-D DWT architecture, an efficient line-based architecture for 2-D DWT is further proposed by employing parallel and pipeline techniques, which is mainly composed of two horizontal filter modules and one vertical filter module, working in parallel and pipeline fashion with 100% hardware utilization. This 2-D architecture is called fast architecture (FA) that can perform J levels of decomposition for N * N image in approximately 2N2(1 - 4(-J))/3 internal clock cycles. Moreover, another efficient generic line-based 2-D architecture is proposed by exploiting the parallelism among four subband transforms in lifting-based 2-D DWT, which can perform J levels of decomposition for N * N image in approximately N2(1 - 4(-J))/3 internal clock cycles; hence, it is called high-speed architecture. The throughput rate of the latter is increased by two times when comparing with the former 2-D architecture, but only less additional hardware cost is added. Compared with the works reported in previous literature, the proposed architectures for 2-D DWT are efficient alternatives in tradeoff among hardware cost, throughput rate, output latency and control complexity, etc.  相似文献   

6.
JPEG2000并行阵列式小波滤波器的VLSI结构设计   总被引:2,自引:0,他引:2       下载免费PDF全文
兰旭光  郑南宁  梅魁志  刘跃虎 《电子学报》2004,32(11):1806-1809
提出一种基于提升算法实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.利用该方法所得结构由两个行处理器,一个列处理器以及少量行缓存组成;行列处理器内部是由并行阵列式的处理单元组成;能使行和列滤波器同时进行滤波,用优化的移位加操作替代乘法操作.整个结构采用流水线的设计方法处理,在保证同样的精度下,大大减少了运算量和提高了硬件资源利用率,几乎达到100%,加快了变换速度,也减少了电路的规模.该结构对于N×N大小的图像,处理速度达到O(N2/2)个时钟周期.二维离散小波滤波器结构已经过FPGA验证,并可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中.  相似文献   

7.
Memory requirements and critical path are essential for 2-D Discrete Wavelet Transform (DWT). In this paper, we address this problem and develop a memory-efficient high-speed architecture for multi-level two-dimensional DWT. First, dual data scanning technique is first adopted in 2-D 9/7 DWT processing unit to perform lifting operations, which doubles the throughputs per cycle. Second, for 2-D DWT architecture, the proposed Row Transform Unit and Column Transform Unit take advantage of input sample availabilities and provision computing resources accordingly to optimize the processing speed, in which the number of processors is further optimized to significantly reduce the hardware cost. Third, to address the problem of high cost of memory for the immediate computing results from each level and the computation time as resolution level increases, multiple proposed 2-D DWT units were combined to build a parallel multi-level architecture, which can perform up to six levels of 2-D DWT in a resolution level parallel way on any arbitrary image size at competitive hardware cost. Experimental results demonstrated that the proposed scheme achieves improved hardware performance with significantly reduced on-chip memory resource and computational time, which outperforms the-state-of-the-art schemes and makes it desirable in memory-constrained real-time application systems.  相似文献   

8.
Turbo乘积码(TPC)作为一种高码率编码在带限通信系统中有着广泛的应用,但是大多数TPC译码器存在结构复杂、资源消耗高、处理时延大的问题.为此,提出了一种交错并行流水线处理结构的译码器,并通过译码过程中测试序列的合理排序以及使用相关运算代替最小欧式距离计算等算法优化设计,简化了译码器的实现复杂度,现场可编程门阵列(FPGA)资源消耗相比传统设计降低了35%,提高了译码速度.在Xilinx公司的FPGA芯片XC5VSX95T上完成了译码器的硬件实现,达到80 Mbit/s的译码速度,通过增加子译码器个数还可进一步提升译码吞吐率.  相似文献   

9.
Universal mobile telecommunication system (UMTS) has specified security mechanisms with extra features compared to the security mechanisms of previous mobile communication systems (GSM, DECT). A hardware implementation of the UMTS security mechanism is presented in this paper. The proposed VLSI system supports the Authentication and Key Agreement procedure (AKA), the data confidentiality procedure, and the integrity protection procedure. The AKA procedure is based on RIJNDAEL Block Cipher. An efficient RIJNDAEL architecture is proposed in order to minimize the usage of hardware resources. The proposed implementation performs the AKA procedure within 76 µs comparing with the 500 ms that UMTS specifies. The data confidentiality and the integrity protection is based on KASUMI Block Cipher. The proposed KASUMI architecture reduces the hardware resources and power consumption. It uses feedback logic and positive‐negative edge‐triggered pipeline in order to make the critical path shorter, without increasing the execution latency. The S‐BOXes that are used from RIJNDAEL and KASUMI block ciphers have been implemented with combinational logic as well as with ROM blocks. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

10.
In this brief, a low-complexity hardware architecture for multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) symbol detectors with two transmit and two receive antennas is proposed. The detectors support two MIMO-OFDM schemes of space-frequency block coded OFDM and space-division multiplexing OFDM in order to achieve higher performance and throughput. However, symbol detection processes for these two schemes have high computational complexity, which is a burden to hardware implementation of MIMO-OFDM symbol detectors. In order to reduce complexity, the proposed symbol detector is designed with shared architecture, where similar functional blocks are merged and share the hardware resources, and results in the reduction of logic gates by 34% over a conventional architecture employing two individual detectors  相似文献   

11.
This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9,?7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65?nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486?MHz with a power consumption of 2.56?mW. This architecture is suitable for real-time video compression even with large frame dimensions.  相似文献   

12.
基于提升算法的离散小波变换FPGA实现   总被引:1,自引:0,他引:1       下载免费PDF全文
吴志林  王超  李杰  卜爱国   《电子器件》2007,30(1):290-293
离散小波变换是当今许多图像处理和压缩技术的基础,并得到了广泛的应用.本文以4阶Daubechies小波为例阐述基于提升算法的离散小波变换的原理,并给出其硬件实现架构,然后进行仿真,将仿真结果与Matlab软件实现结果进行比较,结果表明硬件实现与软件实现基本一致,该硬件架构与基于传统的卷积方法实现相比,可以减小硬件实现面积,并利用插入流水线寄存器的方法,缩短关键路径,提高运算速度.  相似文献   

13.
A folded architecture and a digit-serial architecture are proposed for implementation of one- and two-dimensional discrete wavelet transforms. In the one-dimensional folded architecture, the computations of all wavelet levels are folded to the same low-pass and high-pass filters. The number of registers in the folded architecture is minimized by the use of a generalized life time analysis. The converter units are synthesized with a minimum number of registers using forward-backward allocation. The advantage of the folded architecture is low latency and its drawbacks are increased hardware area, less than 100% hardware utilization, and the complex routing and interconnection required by the converters used. These drawbacks are eliminated in the alternate digit-serial architecture at the expense of an increase in the system latency and some constraints on the wordlength. In latency-critical applications, the use of the folded architecture is suggested. If latency is not so critical, the digit-serial architecture should be used. The use of a combined folded and digit-serial architecture is proposed for implementation of two-dimensional discrete wavelet transforms  相似文献   

14.
In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated.  相似文献   

15.
In this paper, we present an easy-hardware-implementation multiple model particle filter (MMPF) for maneuvering target tracking. In the proposed filter, the sampling importance resampling (SIR) filter typically used for nonlinear and/or non-Gaussian application is extended to incorporating multiple models that are composed of a constant velocity (CV) model and a “current” statistical (CS) model, and the Independent Metropolis Hasting (IMH) sampler is utilized for the resampling unit in each model. Compared with the bootstrap MMPF, the proposed MMPF requires no knowledge of models and model transition probabilities for different maneuvering motions, and keeps a constant number of particles per model at all times. This allows a regular pipelined hardware structure and can be implemented in hardware easily. Furthermore, using the IMH sampler for the resampling unit avoids the bottleneck introduced by the traditional systematic resampler and reduces the latency of the whole implementation. Simulation results indicate that the proposed filter has approximately equal tracking performance with the bootstrap MMPF. Hardware architecture of the IMH sampler and its corresponding sample unit are presented, and a parallel architecture consisting of CV model processing element (PE), CS model PE and a central unit (CU) is described. The proposed architecture is evaluated on a Xilinx Virtex-II Pro FPGA platform for a maneuvering target tracking application and the results show many advantages of the proposed MMPF over existing approaches in terms of efficiency, lower latency, and easy hardware implementation.  相似文献   

16.
SHA-256安全散列算法广泛应用于数据完整性校验及数字签名等领域.为满足安全SoC系统对SHA-256高工作频率和低硬件成本的设计需求,提出了一种新颖的SHA-256 VLSI实现方法,通过分解算法实现步骤,进而缩短关键路径,节省硬件资源.采用SMIC 0.13μm CMOS工艺综合实现,结果表明其最高工作频率达334.5MHz,资源消耗减少了70%.  相似文献   

17.
A parallel algorithm and architecture for pruned bit-reversal interleaving (PBRI) are proposed. For a pruned interleaver of size $N$ with mother interleaver size $M=2^{n} geq N$, the proposed algorithm interleaves any number $xin [0,N-1]$ in at most $n-1$ steps, as opposed to $x$ steps using existing PBRI algorithms. A parallel architecture of the proposed algorithm employing simple logic gates and having a short critical path delay is presented. The proposed architecture is valuable in reducing (de-)interleaving latency in emerging wireless standards that employ PBRI channel (de-)interleaving in their PHY layer such as the 3GPP2 Ultra Mobile Broadband standard.   相似文献   

18.
李萱  郭炜 《信息技术》2007,31(5):51-53,57
提出了一种适用于JPEG2000标准中并行通道编码的Embedded Block Coding with Optimized Truncation (EBCOT)高速MQ编码器的硬件架构。首先对JPEG2000标准流程的标码流程选择和字节输出等流程进行改进,使之更适应于硬件实现,并提出一种区间重整时对前导零位数的更简洁的判断方法和电路实现,充分利用硬件并行性,提高了编码速度。进而提出了四级流水的MQ编码器硬件架构,有效提高了MQ编码速率,充分满足并行通道编码的要求。  相似文献   

19.
并行BCH伴随式计算电路的优化   总被引:1,自引:0,他引:1  
张亮  王志功  胡庆生 《信号处理》2010,26(3):458-461
随着通信系统的速率越来越高,对BCH译码器吞吐量的要求也不断提高。由于BCH码是串行的处理数据,在吞吐量大的应用时一般需要并行处理,但这会导致电路的复杂度显著增加。本文主要研究并行伴随式计算电路的优化。通过合并输入端的常量乘法器,得到改进的并行伴随式结构。该结构克服了传统方法只能对局部的乘法器进行优化的缺点,可以对全部乘法器进行优化,从而有效的减少逻辑资源。实验结果表明,对于并行度为64的BCH(2040,1952)译码器,本文的优化结构可以节省67%的逻辑资源,而且在并行度、纠错能力和码长变化时,仍然可以获得较好的优化结果。   相似文献   

20.
1000BASE-T Gigabit Ethernet employs eight-state 4-dimensional trellis-coded modulation to achieve robust 1-Gb/s transmission over four pairs of Category-5 copper cabling. This paper compares several postcursor equalization and trellis decoding algorithms with respect to performance, hardware complexity, and critical path. It is shown that parallel decision-feedback decoders (PDFD) offer the best tradeoff. The example of a 14-tap PDFD, however, shows that it is challenging to meet the required throughput of 1 Gb/s using current standard-cell CMOS technology. A modified approach is proposed which uses decision-feedback prefilters followed by a one-tap PDFD. This considerably reduces hardware complexity and improves the throughput while still meeting the bit-error-rate requirement. The critical path is further reduced by employing a look-ahead technique. The proposed joint equalizer and trellis decoder architecture has been implemented in 3.3-V 0.25-/spl mu/m standard-cell CMOS process. It achieves a throughput of 1 Gb/s with a 125 MHz clock. Compared to a 14-tap PDFD, the design improves both gate count and throughput by a factor of two, while suffering only from a 1.3-dB performance degradation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号