期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An Efficient Architecture for the In-Place Fast Cosine Transform

Manuel Sánchez Juan López Oscar Plata Maria A. Trenas Emilio L. Zapata 《The Journal of VLSI Signal Processing》1999,21(2):91-102

The two-dimensional discrete cosine transform (2D-DCT) is at the core of image encoding and compression applications. We present a new architecture for the 2D-DCT which is based on row-column decomposition. An efficient architecture to compute the one-dimensional fast direct (1D-DCT) and inverse cosine (1D-IDCT) transforms, which is based in reordering the butterflies after their computation, is also discussed. The architectures designed exploit locality, allowing pipelining between stages and saving memory (in-place). The result is an efficient architecture for high speed computation of the (1D, 2D)-DCT that significantly reduces the area required for VLSI implementation. 相似文献

2.

Optimization and implementation of the integer wavelet transformfor image coding

Grangetto M. Magli E. Martina M. Olmo G. 《IEEE transactions on image processing》2002,11(6):596-604

This paper deals with the design and implementation of an image transform coding algorithm based on the integer wavelet transform (IWT). First of all, criteria are proposed for the selection of optimal factorizations of the wavelet filter polyphase matrix to be employed within the lifting scheme. The obtained results lead to the IWT implementations with very satisfactory lossless and lossy compression performance. Then, the effects of finite precision representation of the lifting coefficients on the compression performance are analyzed, showing that, in most cases, a very small number of bits can be employed for the mantissa keeping the performance degradation very limited. Stemming from these results, a VLSI architecture is proposed for the IWT implementation, capable of achieving very high frame rates with moderate gate complexity. 相似文献

3.

2D DWT VLSI architecture for wavelet image processing

Seung-Kwon Pack Lee-Sup Kim 《Electronics letters》1998,34(6):537-538

A cost-effective VLSI architecture with separate data-paths and their corresponding filter structure is proposed for performing a two-dimensional discrete wavelet transform (2D DWT). Compared with the conventional 2D DWT VLSI architectures, the proposed semi-recursive 2D DWT VLSI architecture has minimum hardware cost, and optimised data-bus utilisation, scheduling control overhead and storage size 相似文献

4.

High performance VLSI architecture for block based visible image watermarking

V.E. Jayanthi V. Rajamani P. Karthikeyan 《International Journal of Electronics》2013,100(9):1191-1206

In this article, a novel block-based visible image watermark VLSI architecture design and its hardware implementation in field programmable gate array (FPGA) is proposed. In this watermarking process, 1D-DCT is introduced to facilitate hardware implementation. Mathematical model is developed to reduce the computational complexity for the calculation of embedding and scaling factors, which are used to make the resultant image of best quality with uniform watermark visibility. The proposed architecture has a 12–stage pipeline. Parallelism techniques are employed in block level in order to achieve high performance. A single 8-point fast 1D-DCT is used to calculate the DCT coefficient values of the host image and the watermark image to minimize the resource utilization and power consumption. The hardware implementation of this algorithm leads to numerous advantages including reduced power, area and higher pipeline throughput. The performance of the architecture is studied by implementing Xilinx Virtex V technology based FPGA with DSP 48E. Throughput achieved based on this VLSI architecture is 5.21 Gbits/s with a total resource utilization of 4058BELs. 相似文献

5.

Multiprocessor system for high-resolution image correlation in realtime

Cavadini M. Wosnitza M. Troster G. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(3):439-449

In this work, the correlation-based image localization problem is targeted toward implementation for real-time operation on high-resolution images. The main contribution resides in a global approach to the two-dimensional correlation problem which concurrently considers the algorithm, the system architecture, and its implementation. Analyzing and studying the alternatives concerning algorithm specification, VLSI architecture, and system architecture at different levels of the design phase, a scalable multiprocessor system, based on multiple instances of identical modules, is derived which achieves a real-time image correlation performance exceeding that of currently available solutions by an order of magnitude concerning frame-rate and input image resolution. As a result, the relevance of image correlation traditionally limited to the image processing domain is now expanded into applied machine vision applications. Addressing packaging and reusability aspects, an MCM implementation of the relevant system components is presented 相似文献

6.

连续小波变换VLSI实现综述 总被引：11，自引：2，他引：11

苏立何怡刚《电路与系统学报》2003,8(2):86-91

小波变换是信号处理、图像压缩和模式识别等诸多领域中一个非常有效的数学分析工具。然而，实时小波变换计算量大，需要专用硬件来实现。连续小波变换的VLSI实现在处理速度、功耗及适用频率范围方面部具有较明显的优势，且实现方法灵活。本文对近年来有关该领域的研究情况作了综合评述，讨论了其中存在的问题，并指出了今后的若干发展方向，特别是瞬时缩展电路技术是实现低电压低功耗小波变换芯片的重要途经之一。相似文献

7.

A content-addressable memory architecture for image coding usingvector quantization

Panchanathan S. Goldberg M. 《Signal Processing, IEEE Transactions on》1991,39(9):2066-2078

An architecture suitable for real-time image coding using adaptive vector quantization (VQ) is presented. This architecture is based on the concept of content-addressable memory (CAM), where the data is accessed simultaneously and in parallel on the basis of its content. VQ essentially involves, for each input vector, a search operation to obtain the best match codeword. A speedup results if a CAM-based implementation is used. This speedup, coupled with the gains in execution time for the basic distortion operation, implies that even codebook generation is possible in real time (<32 ms). In using the CAM, the conventional mean square error measure is replaced by the absolute difference measure. This measure results in little degradation and in fact limits large errors. The regular and iterable architecture is particularly well suited for VLSI implementation 相似文献

8.

一种全搜索块匹配运动估计VLSI结构

郑兆青桑红石付生猛沈绪榜《固体电子学研究与进展》2007,27(3):397-401

提出了一种新的两维全搜索运动估计VLSI结构。该结构基于两维脉动阵列,能够完全实现两维数据重用,减少了对外部存储器数据量的访问,具有100%的硬件效率和高吞吐率。该结构也可以很容易地应用于不同块尺寸、不同的搜索范围的全搜索块匹配运动估计,具有通用性。相似文献

9.

Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Packet Transform

Wang C. Gan W. S. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(5):422-426

This brief presents a novel very large-scale integration (VLSI) architecture for discrete wavelet packet transform (DWPT). By exploiting the in-place nature of the DWPT algorithm, this architecture has an efficient pipeline structure to implement high-throughput processing without any on-chip memory/first-in first out access. A folded architecture for lifting-based wavelet filters is proposed to compute the wavelet butterflies in different groups simultaneously at each decomposition level. According to the comparison results, the proposed VLSI architecture is more efficient than the previous proposed architectures in terms of memory access, hardware regularity and simplicity, and throughput. The folded architecture not only achieves a significant reduction in hardware cost but also maintains both the hardware utilization and high-throughput processing with comparison to the direct mapped tree-structured architecture 相似文献

10.

A hardware accelerator for two-dimensional image analysis

《Integration, the VLSI Journal》1988,6(3):329-344

This paper describes the architecture and operation of a new hardware accelerator called MultiRing for performing various geometrical operations on two-dimensional image space. This hardware architecture is shown to be applicable for design rule checking in VLSI layout and many image processing operations including noise suppression and contour extraction. It has both a fast execution speed and extremely high flexibility. Each row data stored in ring memory is processed in the corresponding processor in full parallelism. Each processor is simultaneously configured by the instruction decoder/controller to perform one of the 20 basic instructions each ring cycle, which gives MultiRing maximal flexibility in terms of design rule change or the instruction set enhancement. Correct functional behavior of MultiRing was confirmed by successfully running a software simulator having one-to-one structural correspondence to the MultiRing hardware. 相似文献

11.

An area-efficient very large scale integration architecture for modified Euclidean algorithm with dynamic storage technique

Xiao-Chun Li Jun-Fa Mao 《International Journal of Electronics》2013,100(8):837-842

A Modified Euclidean (ME) algorithm has been used to solve the key equations in Reed-Solomon (RS) decoding. In this article, the degree properties of the ME algorithm are derived. On the basis of the degree properties, an area-efficient very large scale integration (VLSI) architecture with dynamic storage technique is proposed to perform the ME algorithm. The dynamic storage technique is used to avoid data exchange and save hardware resources. The proposed architecture with dynamic storage technique can reduce 50% computation hardware area and about 30% memory hardware area. VLSI implementation results of different RS codes show that the proposed architecture is significantly area-efficient, especially for RS codes with long code lengths. 相似文献

12.

JPEG2000中小波滤波器的定点分析及其VLSI实现

朱珂华林周晓方章倩苓《固体电子学研究与进展》2004,24(4):466-471,504

对JPEG2 0 0 0中推荐的 5 /3整数滤波器和 9/7实数滤波器进行了硬件实现时所需要的有限精度分析 ;确定了小波变换过程中各个参数的最佳数据宽度 ,还确定了整个变换系统的数据通路的数据宽度。基于lifting的小波变换的特点结合嵌入式延拓算法提出了两种小波变换———折叠结构和长流水线结构 ;对两种结构进行了分析比较。最后 ,对折叠结构和相关的其它结构在所需存储单元的数量、存储单元的访问次数、处理能力以及功耗等方面进行了分析比较 ,可以看出文中提出的结构在性能上有明显优点。相似文献

13.

A VLSI architecture for real-time image coding using a vectorquantization based algorithm

Dezhgosha K. Jamali M.M. Kwatra S.C. 《Signal Processing, IEEE Transactions on》1992,40(1):181-189

Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel 相似文献

14.

Line-based, reduced memory, wavelet image compression 总被引：29，自引：0，他引：29

Chrysafis C. Ortega A. 《IEEE transactions on image processing》2000,9(3):378-389

This paper addresses the problem of low memory wavelet image compression. While wavelet or subband coding of images has been shown to be superior to more traditional transform coding techniques, little attention has been paid until recently to the important issue of whether both the wavelet transforms and the subsequent coding can be implemented in low memory without significant loss in performance. We present a complete system to perform low memory wavelet image coding. Our approach is "line-based" in that the images are read line by line and only the minimum required number of lines is kept in memory. There are two main contributions of our work. First, we introduce a line-based approach for the implementation of the wavelet transform, which yields the same results as a "normal" implementation, but where, unlike prior work, we address memory issues arising from the need to synchronize encoder and decoder. Second, we propose a novel context-based encoder which requires no global information and stores only a local set of wavelet coefficients. This low memory coder achieves performance comparable to state of the art coders at a fraction of their memory utilization. 相似文献

15.

VLSI systolic binary tree-searched vector quantizer for imagecompression

Wai-Chi Fang Chi-Yung Chang Sheu B.J. Chen O.T.-C. Curlander J.C. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(1):33-44

A high-speed image compression VLSI processor based on the systolic architecture of difference-codebook binary tree-searched vector quantization has been developed to meet the increasing demands on large-volume data communication and storage requirements. Simulation results show that this design is applicable to many types of image data and capable of producing good reconstructed data quality at high compression ratios. Various design aspects of the binary tree-searched vector quantizer including the algorithm, architecture, and detailed functional design are thoroughly investigated for VLSI implementation. An 8-level difference-codebook binary tree-searched vector quantizer can be implemented on a custom VLSI chip that includes a systolic array of eight identical processors and a hierarchical memory of eight subcodebook memory banks. The total transistor count is about 300000 and the die size is about 8.67×7.72 mm² in a 1.0 μm CMOS technology. The throughput rate of this high-speed VLSI compression system is approximately 25 Mpixels per second and its equivalent computation power is 600 million instructions per second 相似文献

16.

二维离散5/3小波变换并行VLSI结构设计

杜会斌周旭张学庆吴晓娟《无线电通信技术》2006,32(6):39-41

提出了一种基于提升算法的二维离散5/3小波变换(DWT)高效并行VLSI结构设计方法。该方法使得行和列滤波器同时进行滤波,采用流水线设计方法处理,在保证同样的精度下,大大减少了运算量,提高了变换速度,节约了硬件资源。该方法已通过了VerilogHDL行为级仿真验证,可作为单独的IP核应用在JPEG2000图像编、解码芯片中。该结构可推广到9/7小波提升结构。相似文献

17.

二维DWT／IDWT处理器的VLSI设计

陈旭昀周汀《电子学报》1997,25(2):29-32

在本文中，我们设计了基于多分辨分析，适合于硬件实现的二维ＤＷＴ和ＩＤＷＴ实时系统，采用了ｔｏｐ－ｄｏｗｎ的ＶＬＳＩ设计方法，用硬件描述语言ＶＨＤＬ，在Ｓｙｎｏｐｓｙｓ系统中进行了验证和综合，综合结果表明：系统的规模为７１４０单元面积，对于四层信小波变换，数据处理速度约可达到４Ｍｐｉｘｅｌ／ｓ。相似文献

18.

A fast and low memory image coding algorithm based on lifting wavelet transform and modified SPIHT 总被引：1，自引：0，他引：1

Hong Pan Wan-Chi Siu Ngai-Fong Law 《Signal Processing: Image Communication》2008,23(3):146-161

Due to its excellent rate–distortion performance, set partitioning in hierarchical trees (SPIHT) has become the state-of-the-art algorithm for image compression. However, the algorithm does not fully provide the desired features of progressive transmission, spatial scalability and optimal visual quality, at very low bit rate coding. Furthermore, the use of three linked lists for recording the coordinates of wavelet coefficients and tree sets during the coding process becomes the bottleneck of a fast implementation of the SPIHT. In this paper, we propose a listless modified SPIHT (LMSPIHT) approach, which is a fast and low memory image coding algorithm based on the lifting wavelet transform. The LMSPIHT jointly considers the advantages of progressive transmission, spatial scalability, and incorporates human visual system (HVS) characteristics in the coding scheme; thus it outperforms the traditional SPIHT algorithm at low bit rate coding. Compared with the SPIHT algorithm, LMSPIHT provides a better compression performance and a superior perceptual performance with low coding complexity. The compression efficiency of LMSPIHT comes from three aspects. The lifting scheme lowers the number of arithmetic operations of the wavelet transform. Moreover, a significance reordering of the modified SPIHT ensures that it codes more significant information belonging to the lower frequency bands earlier in the bit stream than that of the SPIHT to better exploit the energy compaction of the wavelet coefficients. HVS characteristics are employed to improve the perceptual quality of the compressed image by placing more coding artifacts in the less visually significant regions of the image. Finally, a listless implementation structure further reduces the amount of memory and improves the speed of compression by more than 51% for a 512×512 image, as compared with that of the SPIHT algorithm. 相似文献

19.

Application Specific Efficient VLSI Architectures for Orthogonal Single- and Multiwavelet Transforms

Peter Rieder Sven Simon Christian V. Schimpfle 《The Journal of VLSI Signal Processing》1999,21(2):77-90

In this paper, efficient VLSI architectures for orthogonal wavelet transforms with respect to common applications are presented. One class of orthogonal wavelet transforms is the singlewavelet transform which is based on one scaling and one wavelet function. An important application of this transform is signal denoising for which an efficient VLSI implementation and layout is derived in this paper. Contrary to singlewavelets, orthogonal multiwavelets are based on several scaling and wavelet functions. Since they allow properties like compact support, regularity, orthogonality and symmetry, simultaneously, being impossible in the singlewavelet case, multiwavelets are well suited bases for image compression applications. With respect to an efficient implementation of these orthogonal wavelet transforms, approximations of the exact rotation angles of the corresponding wavelet lattice filters are used. The approximations are realized by elementary CORDIC rotations. This method reduces the number of shift and add operations significantly with no influence on the good performance of the transforms. VLSI architectures for the computationally cheap transforms and related implementation aspects are discussed and design examples from architectural level down to layout are given. 相似文献

20.

System-Level Data-Flow Transformation Exploration and Power-Area Trade-offs Demonstrated on Video Codecs

Francky Catthoor Martin Janssen Lode Nachtergaele Hugo De Man 《The Journal of VLSI Signal Processing》1998,18(1):39-50

A VLSI architecture for the block matching motion estimation is described in this paper. The proposed architecture achieves 100% PE utilization and alleviates I/O bottleneck problem using small amount of distributed on-chip image memory. The number of processing elements is scalable according to the degree of parallel processing and throughput requirement. The overall computations are performed in pipelined manner and the data fill time for contiguous block is eliminated to increase throughput. The VLSI system implementation methodologies and the layouts are also described. Finally, the performances are evaluated and the advantages are outlined, compared to other architectures. 相似文献