期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

一种高效流水低存储的JPEG2000编码芯片设计

梅魁志郑南宁刘跃虎姚霁黄宇王勇《电子与信息学报》2006,28(4):741-746

该文提出了一种高效流水低存储的JPEG2000编码芯片的设计方案。该方案通过采用双缓存的小波系数存储结构,预速率控制方法,Tier2中的RD斜率值的字节表示,以减少片上存储器;对离散小波变换,算术编码和位平面编码使用高度并行流水等设计结构以提高编码单元电路速度;字节地址空间的RD斜率值搜索提高了Tier2的打包速度;对系统实现中的时钟分配,色度转换,帧存储器控制进行了优化设计。基于该设计方案的整个编码芯片已通过FPGA验证,主要性能参数:小波类型为5/3,支持最大Tile为256256,最大图像40964096,码块为3232,系统采样率在Tier1工作时钟为100MHz时可达45Msamples/s,压缩图像与JASPER在压缩20倍时相比均小于0.5dB,在SMIC.25库综合下,等效门为10.9万,片上RAM为862kb。相似文献

2.

JPEG2000小波变换器的VLSI结构设计 总被引：3，自引：1，他引：2

刘雷波王学进孟鸿鹰王志华陈弘毅夏宇闻《电子学报》2002,30(11):1609-1612

新一代静止图像压缩标准JPEG2000将离散小波变换(DWT)作为其核心变换技术,并推荐采用推举体制(lifting)快速算法来实现.空间组合推举体制算法(SCLA)大大降低了lifting的运算量.当选用9/7小波滤波器时,SCLA的乘法运算量只有lifting的7/12.本文提出了一种实现SCLA算法的VLSI结构,降低了基于lifting实现的运算量, 加快了变换的速度,减小了电路的规模.本文的二维正反小波变换器已经作为单独的IP核应用于我们目前正在开发的JPEG2000图像编解码芯片中. 相似文献

3.

Two-Symbol FPGA Architecture for Fast Arithmetic Encoding in JPEG 2000

Nandini Ramesh Kumar Wei Xiang Yafeng Wang 《Journal of Signal Processing Systems》2012,69(2):213-224

JPEG 2000 is one of the most popular image compression standards offering significant performance advantages over previous image standards. High computational complexity of the JPEG 2000 algorithms makes it necessary to employ methods that overcomes the bottlenecks of the system and hence an efficient solution is imperative. One such crucial algorithms in JPEG 2000 is arithmetic coding and is completely based on bit level operations. In this paper, an efficient hardware implementation of arithmetic coding is proposed which uses efficient pipelining and parallel processing for intermediate blocks. The idea is to provide a two-symbol coding engine, which is efficient in terms of performance, memory and hardware. This architecture is implemented in Verilog hardware definition language and synthesized using Altera field programmable gate array. The only memory unit used in this design is a FIFO (first in first out) of 256 bits to store the CX-D pairs at the input, which is negligible compared to the existing arithmetic coding hardware designs. The simulation and synthesis results show that the operating frequency of the proposed architecture is greater than 100 MHz and it achieves a throughput of 212 Msymbols/sec, which is double the throughput of conventional one-symbol implementation and enables at least 50% throughput increase compared to the existing two-symbol architectures. 相似文献

4.

JAGUAR: a fully pipelined VLSI architecture for JPEG imagecompression standard

Kovac M. Ranganathan N. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(2):247-258

In this paper, we describe a fully pipelined single chip VLSI architecture for implementing the JPEG baseline image compression standard. The architecture exploits the principles of pipelining and parallelism to the maximum extent in order to obtain high speed and throughput. The architecture for discrete cosine transform and the entropy encoder are based on efficient algorithms designed for high speed VLSI implementation. The entire architecture can be implemented on a single VLSI chip to yield a clock rate of about 100 MHz which would allow an input rate of 30 frames per second for 1024×1024 color images 相似文献

5.

基于整数小波变换和改进嵌入零树编码的图像压缩 总被引：3，自引：0，他引：3

丁绪星朱日宏李建欣《电子与信息学报》2004,26(7):1064-1069

整数小波变换(IWT)具有输入输出都是整数、只需进行位内运算、便于硬件实现等优点,特别适于图像的无损压缩;而在有损压缩方面,其效果通常稍逊于传统的离散小波变换(DWT)。为了提高IWT图像有损压缩的性能,该文采用基于提升格式的IWT,结合基于形态膨胀运算的改进嵌入零树小波编码(EZW)方法。实验结果表明,在没有增加运算复杂度的情况下,此算法与传统的DWT相比,提高了峰值信噪比(PSNR),具有较好的有损压缩效果。相似文献

6.

A high throughput pass parallel block decoder architecture for JPEG 2000 that prevents stalling in the decoding process

《Integration, the VLSI Journal》2020

The Block Decoder (BD) which is an indispensable component of the JPEG 2000 image compression standard has the highest computational complexity and determines the speed of the overall decoder system. This paper proposes a high throughput pass parallel BD architecture, which can decode more than one bit per clock cycle. In BD, the dependency between context generation and arithmetic decoding unit incorporates stalling and reduces the throughput of the decoding process. The proposed selective byte input and synchronous sample skipping techniques are used to prevent stalling in the decoding process. The proposed architecture achieves 86% more throughput with 50% increment in the hardware cost than that of the best available serial BD architecture. In comparison with the best available pass parallel architecture, throughput improves almost 8.2 times with 61% increment in the hardware cost. Incorporation of the speed up techniques in the design is the main reason for more hardware consumption. The Figure of Merit of the proposed design, which is the ratio of throughput and hardware cost, is more than that of the available BD architectures for typical code block (CB) size of 32 × 32. The ASIC implementation of the proposed design consumes 66 mW power at maximum operating frequency. 相似文献

7.

一种基于提升的二维离散小波变换VLSI架构

王超曹鹏李杰黄伟达《现代电子技术》2007,30(14):114-118

离散小波变换(Discrete Wavelet Transform,DWT)需要较多的运算量以及较大的存储器空间,为了使之适用于实时的图像处理应用,就需要开发特殊的架构和芯片来提高离散小波变换的运算性能。基于提升的二维DWT提出了一种新型的VLSI结构——LLSP架构,其结合逐级和基于行的架构这两者特点,带来了硬件开销和存储器空间的降低,并可以用于多提升步骤的扩展以及多级二维离散小波变换。相似文献

8.

JPEG2000中DWT-EBCOT联合的高效低存储VLSI结构

郭杰李云松吴成柯刘凯王柯俨《电子与信息学报》2009,31(3):731-735

针对JPEG2000硬件实现中小波变换与编码之间占用大量存储的问题,该文提出一种基于码块的存储方案。通过对码块大小片内存储最大程度的复用以及对其高效简单的调度控制,从面积和功耗两方面减小了硬件实现的开销。在实现中,采用基于行的提升变换结构和比特平面并行的编码方式,提高了效率,确保整个过程的实时处理。实验结果表明:在实时编码要求下,对分辨率为512512的图像分片进行四级9/7或者5/3小波分解,码块大小为3232,采用本文结构所用的存储量与直接使用外部存储器的方法相比可减少80%以上。整个结构已通过FPGA验证,且系统时钟可以工作在100MHz。相似文献

9.

VLSI design of memory-efficient, high-speed baseline MQ coder for JPEG 2000

Kishor Sarawadekar^{Author Vitae} Swapna Banerjee Author Vitae 《Integration, the VLSI Journal》2012,45(1):1-8

The embedded block coding with optimized truncation (EBCOT) algorithm is the heart of the JPEG 2000 image compression system. The MQ coder used in this algorithm restricts throughput of the EBCOT because there is very high correlation among all procedures to be performed in it. To overcome this obstacle, a high throughput MQ coder architecture is presented in this paper. To accomplish this, we have studied the number of rotations performed and the rate of byte emission in an image. This study reveals that in an image, on an average 75.03% and 22.72% of time one and two shifts occur, respectively. Similarly, about 5.5% of time two bytes are emitted concurrently. Based on these facts, a new MQ coder architecture is proposed which is capable of consuming one symbol per clock cycle. The throughput of this coder is improved by operating the renormalization and byte out stages concurrently. To reduce the hardware cost, synchronous shifters are used instead of hard shifters. The proposed architecture is implemented on Stratix FPGA and is capable of operating at 145.9 MHz. Memory requirement of the proposed architecture is reduced by a minimum of 66% compared to those of the other existing architectures. Relative figure of merit is computed to compare the overall efficiency of all architectures which show that the proposed architecture provides good balance between the throughput and hardware cost. 相似文献

10.

Efficient architectures for two-dimensional discrete wavelet transform using lifting scheme.

Chengyi Xiong Jinwen Tian Jian Liu 《IEEE transactions on image processing》2007,16(3):607-614

Novel architectures for 1-D and 2-D discrete wavelet transform (DWT) by using lifting schemes are presented in this paper. An embedded decimation technique is exploited to optimize the architecture for 1-D DWT, which is designed to receive an input and generate an output with the low- and high-frequency components of original data being available alternately. Based on this 1-D DWT architecture, an efficient line-based architecture for 2-D DWT is further proposed by employing parallel and pipeline techniques, which is mainly composed of two horizontal filter modules and one vertical filter module, working in parallel and pipeline fashion with 100% hardware utilization. This 2-D architecture is called fast architecture (FA) that can perform J levels of decomposition for N * N image in approximately 2N2(1 - 4(-J))/3 internal clock cycles. Moreover, another efficient generic line-based 2-D architecture is proposed by exploiting the parallelism among four subband transforms in lifting-based 2-D DWT, which can perform J levels of decomposition for N * N image in approximately N2(1 - 4(-J))/3 internal clock cycles; hence, it is called high-speed architecture. The throughput rate of the latter is increased by two times when comparing with the former 2-D architecture, but only less additional hardware cost is added. Compared with the works reported in previous literature, the proposed architectures for 2-D DWT are efficient alternatives in tradeoff among hardware cost, throughput rate, output latency and control complexity, etc. 相似文献

11.

JPEG2000并行阵列式小波滤波器的VLSI结构设计 总被引：2，自引：0，他引：2

下载免费PDF全文

兰旭光郑南宁梅魁志刘跃虎《电子学报》2004,32(11):1806-1809

提出一种基于提升算法实现JPEG2000编码系统中的二维离散小波变换(Discrete Wavelet Transform)的并行阵列式的VLSI结构设计方法.利用该方法所得结构由两个行处理器,一个列处理器以及少量行缓存组成;行列处理器内部是由并行阵列式的处理单元组成;能使行和列滤波器同时进行滤波,用优化的移位加操作替代乘法操作.整个结构采用流水线的设计方法处理,在保证同样的精度下,大大减少了运算量和提高了硬件资源利用率,几乎达到100％,加快了变换速度,也减少了电路的规模.该结构对于N×N大小的图像,处理速度达到O(N²/2)个时钟周期.二维离散小波滤波器结构已经过FPGA验证,并可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中. 相似文献

12.

A Programmable Parallel VLSI Architecture for 2-D Discrete Wavelet Transform

Chien-Yu Chen Zhong-Lan Yang Tu-Chih Wang Liang-Gee Chen 《The Journal of VLSI Signal Processing》2001,28(3):151-163

Many VLSI architectures for computing the discrete wavelet transform (DWT) were presented, but the parallel input data sequence and the programmability of the 2-D DWT were rarely mentioned. In this paper, we present a parallel-processing VLSI architecture to compute the programmable 2-D DWT, including various wavelet filter lengths and various wavelet transform levels. The proposed architecture is very regular and easy for extension. To eliminate high frequency components, the pixel values outside the boundary of the image are mirror-extended as the symmetric wavelet transform (SWT) and the mirror-extension is realized via the routing network. Owing to the property of the parallel processing, we adopt the row-based recursive pyramid algorithm (RPA), similar to 1-D RPA, as the data scheduling. This design has been implemented and fabricated in a 0.35 m 1P4M CMOS technology and the working frequency is 50 MHz. The chip size is about 5200 m × 2500 m. For a 256 × 256 image, the chip can perform 30 frames per second with the filter length varying from 2 to 20 and with various levels. The proposed architecture is suitable for real-time applications such as JPEG 2000. 相似文献

13.

Three-dimensional discrete wavelet transform architectures 总被引：2，自引：0，他引：2

Weeks M. Bayoumi M.A. 《Signal Processing, IEEE Transactions on》2002,50(8):2050-2063

The three-dimensional (3-D) discrete wavelet transform (DWT) suits compression applications well, allowing for better compression on 3-D data as compared with two-dimensional (2-D) methods. This paper describes two architectures for the 3-D DWT, called the 3DW-I and the 3DW-II. The first architecture (3DW-I) is based on folding, whereas the 3DW-II architecture is block-based. Potential applications for these architectures include high definition television (HDTV) and medical data compression, such as magnetic resonance imaging (MRI). The 3DW-I architecture is an implementation of the 3-D DWT similar to folded 1-D and 2-D designs. It allows even distribution of the processing load onto 3 sets of filters, with each set performing the calculations for one dimension. The control for this design is very simple, since the data are operated on in a row-column-slice fashion. Due to pipelining, all filters are utilized 100% of the time, except for the start up and wind-down times. The 3DW-II architecture uses block inputs to reduce the requirement of on-chip memory. It has a central control unit to select which coefficients to pass on to the lowpass and highpass filters. The memory on the chip will be small compared with the input size since it depends solely on the filter sizes. The 3DW-I and 3DW-II architectures are compared according to memory requirements, number of clock cycles, and processing of frames per second. The two architectures described are the first 3-D DWT architectures 相似文献

14.

An Efficient VLSI Architecture for the Computation of 1-D Discrete Wavelet Transform

A.B. Premkumar A.S. Madhukumar 《The Journal of VLSI Signal Processing》2002,31(3):231-241

This paper presents a new architecture for VLSI implementation of the one dimensional Discrete Wavelet Transform (DWT). The architecture uses single filter for generation of both the DWT coefficients and scaling function for orthogonal wavelets as opposed to the conventional two filter approach. For multilevel decomposition, the fold back architecture principle, which interleaves the decimated scaling function back into the filter for subsequent levels, is applied. Limited use of memory in the design enables efficient implementation of the DWT computation in VLSI. 相似文献

15.

A Very Efficient Storage Structure for DWT and IDWT Filters

Robert M. Owens Mohan Vishwanath 《The Journal of VLSI Signal Processing》1998,19(3):215-225

In this paper, we present an area-efficient storage and routing structure to be used as part of either a DWT or an IDWT filter. Such efficient structures are necessary for the single chip implementation of multidimensional DWT and IDWT filters for processing images and video. While the storage structures described in previously published architectures were adequate for the 1D DWT/IDWT filter, they do not scale well to a multidimensional implementation. The storage structure design and implementation described in this paper utilizes a combination of well-known efficient RAM cells with simple control to achieve compact size and scalability. When compared to other alternatives, the structure uses less power.In this paper, we examine the problem of constructing, on a single chip, filters for both the multidimensional Discrete Wavelet Transform (DWT) and the multidimensional Inverse Discrete Wavelet Transform (IDWT). We will use the following example to illustrate where the difficulty lies in constructing such a chip. Consider a filter that executes transforms on 2D images at the rate of 30 images per second. Furthermore, the size N × N of the images is 1024 × 1024, the length L of the filter is 8, the number of octaves O to be generated is 4, and the arithmetic precision P is 24. In image compression, such a filter would be a good candidate for the replacement of the filters presently used to perform the block Discrete Cosine Transform (DCT). 相似文献

16.

A VLSI architecture for lifting-based forward and inverse wavelettransform

Andra K. Chakrabarti C. Acharya T. 《Signal Processing, IEEE Transactions on》2002,50(4):966-977

We propose an architecture that performs the forward and inverse discrete wavelet transform (DWT) using a lifting-based scheme for the set of seven filters proposed in JPEG2000. The architecture consists of two row processors, two column processors, and two memory modules. Each processor contains two adders, one multiplier, and one shifter. The precision of the multipliers and adders has been determined using extensive simulation. Each memory module consists of four banks in order to support the high computational bandwidth. The architecture has been designed to generate an output every cycle for the JPEG2000 default filters. The schedules have been generated by hand and the corresponding timings listed. Finally, the architecture has been implemented in behavioral VHDL. The estimated area of the proposed architecture in 0.18-μ technology is 2.8 nun square, and the estimated frequency of operation is 200 MHz 相似文献

17.

小波图像编码的硬件实现

乔世杰王国裕智贵连《现代电子技术》2002,(2):91-96

设计了二维离散小波变换和快速零树编码的硬件结构,实现了一小波图像编码系统.编写了各个模块的Verilog HDL模型,并进行了仿真和逻辑综合.最后用Altera公司的CPLD对整个编码系统进行了验证.结果表明,设计的硬件结构是正确的,可以用来实现小波图像编码系统. 相似文献

18.

JPEG2000小波提升算法的硬件设计 总被引：7，自引：1，他引：6

下载免费PDF全文

董文辉刘明业《电子学报》2003,31(11):1674-1677

离散小波变换是当今许多图像处理和压缩技术的基础,并被最新的ISO/IEC静态图像压缩标准JPEG2000所采用.基于提升方法的离散小波变换比传统的基于卷积的运算量小.我们为JPEG2000中的小波提升算法提出一个硬件结构,该结构整体运算速度高,存储需求低,硬件资源耗费少.我们提出在数据通道之外实现边界扩展,以降低数据通道的复杂性,提高运算效率.我们通过采用流水线技术,进一步提高了硬件设计的运算效率. 相似文献

19.

VLSI implementation of discrete wavelet transform

Grzeszczak A. Mandal M.K. Panchanathan S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1996,4(4):421-433

This paper presents a VLSI implementation of discrete wavelet transform (DWT). The architecture is simple, modular, and cascadable for computation of one or multidimensional DWT. It comprises of four basic units: input delay, filter, register bank, and control unit. The proposed architecture is systolic in nature and performs both high- and low-pass coefficient calculations with only one set of multipliers. In addition, it requires a small on-chip interface circuitry for interconnection to a standard communication bus. A detailed analysis of the effect of finite precision of data and wavelet filter coefficients on the accuracy of the DWT coefficients is presented. The architecture has been simulated in VLSI and has a hardware utilization efficiency of 87.5%. Being systolic in nature, the architecture can compute DWT at a data rate of N×10⁶ samples/s corresponding to a clock speed of N MHz 相似文献

20.

A HIGH-PERFORMANCE VLSI ARCHITECTURE OF EBCOT BLOCK CODING IN JPEG2000

Liu Kai Wu Chengke Li Yunsong 《电子科学学刊(英文版)》2006,23(1):89-93

The paper presents a new architecture composed of bit plane-parallel coder for Embedded Block Coding with Optimized Truncation (EBCOT) entropy encoder used in JPEG2000. In the architecture, the coding information of each bit plane can be obtained simultaneously and processed parallel. Compared with other architectures, it has advantages of high parallelism, and no waste clock cycles for a single point. The experimental results show that it reduces the processing time about 86% than that of bit plane sequential scheme. A Field Programmable Gate Array (FPGA) prototype chip is designed and simulation results show that it can process 512×512 gray-scaled images with more than 30 frames per second at 52MHz. 相似文献