期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

VLSI Implementation of Discrete Wavelet Transform for Lossless Compression of Medical Images

《Real》2001,7(2):203-217

This paper presents a VLSI architecture to implement the forward and inverse two dimensional Discrete Wavelet Transform (DWT), to compress medical images for storage and retrieval. Lossless compression is usually required in the medical image field. The word length required for lossless compression makes too expensive the area cost of the architectures that appear in the literature. Thus, there is a clear need for designing a cost-effective architecture to implement the lossless compression of medical images using DWT. The data path word length has been selected to ensure the lossless accuracy criteria leading a high speed implementation with small chip area. The pyramid algorithm is reorganized and the algorithm locality is improved in order to obtain an efficient hardware implementation. The result is a pipelined architecture that supports single chip implementation in VLSI technology. The implementation employs only one multiplier and 352 memory elements to compute all scales what results in a considerable smaller chip area (45 mm²) than former implementations. The hardware design has been captured by means of the VHDL language and simulated on data taken from random images. Implemented in a 0.7 μm technology, it can compute both the forward and inverse DWT at a rate of 3.5 512×512 12 bit images/s corresponding to a clock speed of 33 MHz. This chip is the core of a PCI board that will speedup the DWT computation on desktop computers. 相似文献

2.

基于Lifting的小波变换通用硬件实现方法研究

王芳朱珂周晓方《计算机工程与应用》2006,42(8):87-91

文章提出了一套完整的基于Lifting算法小波通用硬件实现的流程。该方法系统分析了目前基于Lifting算法小波实现过程中的有限精度分析,基本一维、两维小波变换结构设计以及常用优化算法,提炼出了一套小波设计的通用方法。该文将以6/10小波变换为例介绍该方法的具体实现过程,为设计者提供了一整套设计一个高性能小波可遵循的流程,它可以应用于小波变换的硬件设计以及相关的基于小波变换的信号处理系统设计。相似文献

3.

低功耗并行的二维离散小波变换的VLSI结构

刘鸿瑾何星张铁军王东辉于其英侯朝焕《计算机工程与应用》2008,44(18):73-75

提出了一种基于提升算法的低功耗并行的二维离散小波变换的VLSI结构。提出结构的同时进行行和列方向的处理,不需要额外的缓存来存储用于列变换的中间变换系数。通过分时复用关键的运算功能模块,该结构同时可以对两行数据进行处理,硬件的利用率达到100%。边界对称扩展通过嵌入式电路实现,大大降低了需要的片上存储器的数量以及对片外存储器的访问,有效地降低了系统的功耗。相似文献

4.

An area-efficient VLSI implementation for programmable FIR filters based on a parameterized divide and conquer approach

Thomas Adly T. 《Journal of Systems Architecture》2008,54(12):1122-1128

In this paper, we propose an optimal VLSI implementation for a class of programmable FIR filters with binary coefficients, whose architecture is based on a parameterized divide and conquer approach. The proposed design is shown to be easily extendable to FIR filters with multibit coefficients of arbitrary sign. The area efficiency achieved in comparison to direct form realization is demonstrated by VLSI implementation examples, synthesized in TSMC 0.18-μm single poly six metal layer CMOS process using state-of-art VLSI EDA tools. The possible saving in average power consumption is estimated using gate-level power analysis. Suggestions for applications and topics for further research conclude the paper. 相似文献

5.

A hierarchical pipelining architecture and FPGA implementation for lifting-based 2-D DWT

Chunhui Zhang Yun Long Fadi Kurdahi 《Journal of Real-Time Image Processing》2007,2(4):281-291

Numerous VLSI architectures for 2-D discrete wavelet transform (DWT) have been brought forward. While most of the designs displayed good performance through parallel processing, few of them addressed thoroughly how to sustain such high throughput computing which is crucial in real-time applications. Although the affordable data transfer bandwidth has been increased tremendously during the past decade, the pressure on data communication has not yet been relieved from stream-intensive applications. The design of 2-D DWT belongs to such cases. In this paper, we expose the performance gap between the computing core and the entire system, distinguishing them by quantitative approach with metrics of peak performance and mean-time performance. In order to narrow down the discrepancy without degrading either of the two criteria, on the one hand, we introduce a software-pipelining lifting-based computing kernel to remove data dependence for peak performance, on the other hand, we apply loop fusing technique and a hierarchical pipelining method to enhance data locality and boost the mean-time performance. The architecture has been implemented in Xilinx Virtex-II FPGA, taking advantage of Virtex-II’s embedded multipliers and block RAMs. We use Daubechies (9, 7) and LeGall (5, 3) filters (the default lossy and lossless filters in JPEG2000) for illustration whereas it is a general method for other DWT filters. The post-place and routing operation frequency for Daubechies (9, 7) is 138 MHz. Notably, the mean-time performance parameterized by image size and decomposition level achieves closely to peak performance.

Chunhui ZhangEmail:

Chunhui Zhang received his B.S. degree in Electronic Engineering and his M.S. degree in Microelectronics both from Tsinghua University, Beijing, China, in 1998 and 2001 respectively. He completed his Ph.D. in Electrical and Computer Enginnering from the University of California, Irvine. In 2005, he joined Intel in the Mobile Wireless Communication Group. His research interests include VLSI architectures and algorithms for signal processing, reconfigurable computing, and memory access optimization for multimedia systems. Yun Long received B.S. in 1997 and M.S. in 2001, both in Electronic Engineering from Tsinghua University, China. While pursuing Ph.D. degree in Dept. of EECS, UC-Irvine, he is working with nVidia corp., Santa Clara, CA, on ASIC design and verification. His research interest includes high performance application specific system design, reconfigurable architecture, and data scheduling optimization, especially on multimedia applications. Fadi Kurdahi received his PhD from the University of Southern California in 1987. Since then, he has been a faculty at the Department of Electrical& Computer Engineering at UCI, where he conducts research in the areas of Computer Aided Design of VLSI circuits, high-level synthesis, and design methodology of large scale systems. He was Associate Editor for IEEE Transactions on Circuits and Systems II 1993–1995, Area Editor in IEEE Design and Test for reconfigurable computing, and served as program chair, general chair or on program committees of several workshops, symposia and conferences in the area of CAD, VLSI, and system design. He received the best paper award for the IEEE Transactions on VLSI in 2002, the best paper award at ISQED in 2006, and three distinguished paper awards. He is a Fellow of the IEEE. 相似文献

6.

小波滤波器低功耗并行的VLSI结构设计

兰旭光郑南宁薛建儒王飞刘跃虎《计算机研究与发展》2005,42(11):1889-1895

提出一种基于行和提升算法,实现JPEG2000编码系统中的小波正反变换(discretewavelettransform)的低功耗、并行的VLSI结构设计方法·利用该方法所得结构一次处理两行数据,分时复用行处理器,使行处理器内以及行、列处理器实现并行处理,且最小化行缓存·对称扩展通过嵌入式电路实现,整个结构采用流水线设计方法优化,加快了变换速度,增加了硬件资源利用率,降低了功耗,效率几乎达到100%·小波滤波器正反变换结构已经经过FPGA验证,可作为单独的IP核应用于正在开发的JPEG2000图像编解码芯片中· 相似文献

7.

Efficient FPGA implementation of DWT and modified SPIHT for lossless image compression

《Journal of Systems Architecture》2007,53(7):369-378

In this paper, we present an implementation of the image compression technique set partitioning in hierarchical trees (SPIHT) in programmable hardware. The lifting based Discrete Wavelet Transform (DWT) architecture has been selected for exploiting the correlation among the image pixels. In addition, we provide a study on what storage elements are required for the wavelet coefficients. A modified SPIHT (Set Partitioning in Hierarchical Trees) algorithm is presented for encoding the wavelet coefficients. The modifications include a simplification of coefficient scanning process, use of a 1-D addressing method instead of the original 2-D arrangement for wavelet coefficients and a fixed memory allocation for the data lists instead of the dynamic allocation required in the original SPIHT. The proposed algorithm has been illustrated on both the 2-D Lena image and a 3-D MRI data set and is found to achieve appreciable compression with a high peak-signal-to-noise ratio (PSNR). 相似文献

8.

Multiplierless Adaptive Filtering

《Digital Signal Processing》2002,12(1):107-118

Bose, T., Venkatachalam, A., and Thamvichai, R., Multiplierless Adaptive Filtering, Digital Signal Processing12 (2002) 107–118When digital filters are designed with power-of-2 coefficients, the multiplications can be implemented by simple shifting operations. For VLSI implementations, multiplierless filters are faster and more compact than filters with multipliers. In this paper, an algorithm for finding and updating the power-of-2 coefficients of an adaptive filter is designed. The new method uses the well-known Genetic Algorithm (GA) for this purpose. The GA is used in a unique way in order to reduce computations. Small blocks of data are used for the GA and only one new generation is produced per sample of data. This, coupled with the fact that the coefficients are power-of-2, yields a computational complexity of O(N) additions and no multiplications. The algorithm is investigated for applications in adaptive linear prediction and system identification. The results are very promising and illustrate the performance of the new algorithm. 相似文献

9.

An associative processing module for a heterogeneous visionarchitecture

Storer R. Pout M.R. Thomson A.R. Dagless E.L. Duller A.W.G. Marriott A.P. Hicks P.J. 《Micro, IEEE》1992,12(3):42-55

The heterogeneous vision architecture that satisfies the computing demands of real-time computer vision by providing parallelism in three different forms is described. A pipeline of digital signal processing (DSP) chips initially processes signals. Then a SIMD associative processor array processes images and extract features, and a MIMD network of transputers processes extracted objects in parallel. The array's VLSI implementation, the processing modes available due to the use of content-addressable memory, and the means of achieving efficient 2-D interprocessor communication in the linear array are described. An application as a vehicle number plate recognition system is presented 相似文献

10.

FPGA implementation of XOR-MUX full adder based DWT for signal processing applications

《Microprocessors and Microsystems》2020

In the recent past there is a rapid development in the field of digital technology especially in signal processing and image processing based applications Excellent performance high speed, compactable in size low power and less delay are the essential needs of the devices used for applications such as signal processing, audio processing and software define radio and so on. Particularly, digital gadgets are prone to have more critical logic size and power consumption and take large area in VLSI Implementation due to arithmetic operations of adders and multiplier designs. Thus priority architecture of Digital Wavelet Transform (DWT) is affected as it comprises a number of Filter banks in level basics, thus all Filter banks have number of adders and multipliers due to coefficient decompositions of low and high pass filters. On this n-size repeated filter logic takes more logic size and power consumption. Here, the proposed work presents a novel approach of DWT by replacing conventional adders and multipliers with XOR-MUX adders and Truncations multipliers thereby reducing the 2n logic size to n-size logic. Finally, the proposed DWT architecture designed in VHDL and also implemented in FPGA XC6SLX9-2TQG144 proved the performance in terms of delay, area and power. 相似文献

11.

一种并行提升小波基的设计方法与VLSI实现研究

田昕谭毅华田金文《计算机学报》2008,31(3):411-418

设计了一种具有二进制特点且消失矩为4的高性能9/7小波基,提出了其VLSI高速实现结构.该小波基的提升系数的分母均可转化为2的幂次有理数,有利于简化VLSI设计.实验结果显示,其压缩性能和CDF97小波相当;在有限位宽下,其压缩性能甚至优于CDF97.新的VLSI结构实现仅需加法和移位等简单运算,可有效地减少硬件资源,缩短关键路径.通过折叠技术和重调度技术,该硬件结构转化为一种嵌入式折叠提升结构,使得每个加法运算可并行执行,关键路径可减小至接近于一个加法器的延时,达到资源的优化利用.仿真结果表明,该硬件结构最大工作频率可达到250MHz左右,可工作的最大系统频率提高到了原来的4倍左右,与传统CDF97的4级流水线结构相比,逻辑单元数减少了约66.7%,特别适合于实时高速压缩应用. 相似文献

12.

A Novel VLSI Architecture for Real-Time Line-Based Wavelet Transform Using Lifting Scheme

下载免费PDF全文

Kai Liu Ke-Yan Wang Yun-Song Li and Cheng-Ke Wu 《计算机科学技术学报》2007,22(5):661-672

In this paper, we propose a VLSI architecture that performs the line-based discrete wavelet transform （DWT） using a lifting scheme. The architecture consists of row processors, column processors, an intermediate buffer and a control module. Row processor and Column processor work as the horizontal and vertical filters respectively. Intermediate buffer is composed of five FIFOs to store temporary results of horizontal filter. Control module schedules the output order to external memory. Compared with existing ones, the presented architecture parallelizes all levels of wavelet transform to compute multilevel DWT within one image transmission time, and uses no external but one intermediate buffer to store several line results of horizontal filtering, which decreases resource required significantly and reduces memory efficiently. This architecture is suitable for various real-time image/video applications. 相似文献

13.

一种精简的二维DWT结构设计

曹志研季振洲胡铭曾《计算机工程》2007,33(23):228-229

设计了一种低功耗的二维离散小波变换(DWT)结构，用于无线传感器网络中的图像压缩。该结构实现了精简复杂性的(5,3)整数离散小波变换，采用流水线和延迟线技术，在获得高运算吞吐率的同时，使数据尽可能被处理单元高效利用，以减少对片内存储器和片外存储器的访问次数。多级二维DWT采用展开方法实现，这种方法可尽早开始下一级变换，不需要大的片内存储器和片内存取操作。模拟试验和FPGA实现验证了系统在满足需要性能的前提下具有低复杂性、低功耗、片内存储器小等优点。相似文献

14.

Efficient hardware architecture of 2D-scan-based wavelet watermarking for image and video

Sourour Karmani Ridha Djemal Rached Tourki 《Computer Standards & Interfaces》2009,31(4):801-811

This paper describes an efficient hardware architecture of 2D-Scan-based-Wavelet watermarking for image and video. The potential application for this architecture includes broadcast monitoring of video sequences for High Definition Television (HDTV) and DVD protection and access control. The proposed 2D design allows even distribution of the processing load onto a set of filters, with each set performing the calculation for one dimension according to the scan-based process. The video protection is achieved by the insertion of watermarks bank within the middle frequency of wavelet coefficients related to video frames by their selective quantization. The 2-D DWT is applied for both video stream and watermark in order to make the watermarking scheme robust and perceptually invisible. The proposed architecture has a very simple control part, since the data are operated in a row-column-slice fashion. This organization reduces the requirement of on-chip memory. In addition, the control unit selects which coefficient to pass to the low-pass and high-pass filters. The on-chip memory will be small as compared to the input size since it depends solely on the filter sizes. Due to the pipelining, all filters are utilized for 100% of the time except during the start-up and wind-down times. The major contribution of this research is towards the selection of appropriate real time watermarking scheme and performing a trade-off between the algorithmic aspects of our proposed watermarking scheme and the hardware implementation technique. The hardware architecture is designed, as a watermarking based IP core with the Avalon interface related to NIOS embedded processor, and tested in order to evaluate the performance of our proposed watermarking algorithm. This architecture has been implemented on the Altera Stratix-II Field Programmable Gate Array (FPGA) prototyping board. Experimental results are presented to demonstrate the capability of the proposed watermarking system for real time applications and its robustness against malicious attacks. 相似文献

15.

高性能二维9/7离散小波变换VLSI结构

宋有才韩波王诗兵谭拂晓赵正平《计算机工程与应用》2014,50(20):187-191

为了降低二维小波变换中的存储消耗并同时提高电路处理速度,提出了一种二维并行的VLSI结构。通过充分挖掘二维变换中行变换和列变换之间的关系,优化了行变换核和列变换核的并行数据扫描输入方式,将9/7小波变换的中间存储降低至4N。同时,采用基于翻转格式的流水线技术,将电路的关键路径缩短至一级乘法器延时,有效地提高了电路处理速度,并通过伸缩电路合并的优化方法将乘法器个数降低至10个,从而有效地减少了硬件资源消耗。相似文献

16.

Parallel 2-d convolution on a mesh connected array processor 总被引：2，自引：0，他引：2

Lee SY Aggarwal JK 《IEEE transactions on pattern analysis and machine intelligence》1987,(4):590-594

In this correspondence, a parallel 2-D convolution scheme is presented. The processing structure is a mesh connected array processor consisting of the same number of simple processing elements as the number of pixels in the image. For most windows considered, the number of computation steps required is the same as that of the coefficients of a convolution window. The proposed scheme can be easily extended to convolution windows of arbitrary size and shape. The basic idea of the proposed scheme is to apply the 1-D systolic concept to 2-D convolution on a mesh structure. The computation is carried out along a path called a convolution path in a systolic manner. The efficiency of the scheme is analyzed for windows of various shapes. The ideal convolution path is a Hamiltonian path ending at the center of the window, the length of which is equal to the number of window coefficients. The simple architecture and control strategy make the proposed scheme suitable for VLSI implementation. 相似文献

17.

Digit pipelined processors

Mary Jane Irwin Robert Michael Owens 《The Journal of supercomputing》1987,1(1):61-86

Digit serial data transmission can be used to an advantage in the design of special purpose processors where communication issues dominate and where digit pipelining can be used to maintain high data rates. VLSI signal processing applications are one such problem domain. We have developed a family of VLSI components that have digit serial transmission and that can be pipelined at the digit level. These components can be used to construct VLSI processors that are especially suited to signal processing applications. One such particularly attractive processor is a structure we call the arithmetic cube. The arithmetic cube can be programmed to solve linear transformations such as convolutions and DFTs, and has nearest neighbor interconnects, regular layout, simple control, and a limited number of interconnections. Regular layout and simple control derive naturally from the algorithms on which the processor is based. Long wires are eliminated by the nearest neighbor interconnect. High throughput can be achieved by pipelining the processor at the digit level. The arithmetic cube is programmable in the problem size n; once implemented for a certain size N, smaller problems can be solved on the same implementation without a loss in performance. In addition, the architecture extends to larger N in a regular and automatic fashion.This work has been supported in part by the Army Research Office under Contract DAAG29-83-K-0126. 相似文献

18.

The Systolic Pixel: A Visible Surface Algorithm for VLSI.

P L Mills† 《Computer Graphics Forum》1984,3(1):47-59

The Systolic Pixel or Spixel is a novel architecture for an intelligent pixel-based graphics database for geometric-solid models. An algorithm is described which performs visible surface calculations for any complexity of coloured 3-dimensional (3-D) surface and which structures geometric-solid model data in a natural way. The algorithm/architecture of the spixel features a simple set of priority rules acting upon data in nearest neighbour locations and a simple set of movement rules of data to nearest neighbour locations. The spixel is constructed out of identical functional units. These features are attractive for an implementation of the algorithm in Very Large Scale Integration (VLSI). 相似文献

19.

Parallel merged multiplier-accumulator coprocessor optimized for digital filters

H. Parandeh-Afshar Author Vitae Author Vitae O. Fatemi Author Vitae 《Computers & Electrical Engineering》2010,36(5):864-873

In an attempt to improve the speed of VLSI signal processing systems, a new architecture for a high-speed multiply-accumulate (MAC) unit optimized for digital filters is proposed. This unit is designed as a coprocessor for the LEON2 RISC processor [LEON2 Processor; 2005 [Online]. <http://www.gaisler.com/products/leon2/leon.html>]. In this work, four parallel MAC units with two dual-port coefficient register-files, a three-port general register-file and a control unit are included in the coprocessing block. With the existence of four parallel units, several SIMD format instructions have been added to LEON2 instruction set. Each MAC unit has two 16-bit inputs, 32-bit output register and a programmable round-saturate block. The MAC unit uses a new architecture which embeds the accumulate module within the partial products summation tree of the multiplier with minimum overhead. A central control unit controls inputs of the four MACs and loading of the output registers. Our experimental results demonstrate a high performance in implementation of digital filters at elevated speeds of up to 33 millions of input samples per second in a 0.18 μm technology. 相似文献

20.

视频插值算法及其VLSI结构实现

下载免费PDF全文

刘鹏张岩《计算机工程》2006,32(11):29-31

提出一种基于三角分析和高频补偿的视频插值算法及其VLSI实现方法。三角分析能够检测并保护视频图像中的边缘信息；高频补偿技术用来进一步改善插值结果图像的视觉效果。算法的规则性决定了对应的VLSI结构的规则性、紧凑性和视频信号处理的高速度。此算法和它对应的VLSI结构有实用价值。用CMOS工艺实现VLSI结构。仿真实验结果表明，用此算法获得的插值结果图像在主观视觉效果和客观评价指标上优于传统的插值算法。VLSI芯片工作在100MHz频率、1.98V电压下，此结构的功耗为18.96mW。相似文献