首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
《Real》2001,7(2):203-217
This paper presents a VLSI architecture to implement the forward and inverse two dimensional Discrete Wavelet Transform (DWT), to compress medical images for storage and retrieval. Lossless compression is usually required in the medical image field. The word length required for lossless compression makes too expensive the area cost of the architectures that appear in the literature. Thus, there is a clear need for designing a cost-effective architecture to implement the lossless compression of medical images using DWT. The data path word length has been selected to ensure the lossless accuracy criteria leading a high speed implementation with small chip area. The pyramid algorithm is reorganized and the algorithm locality is improved in order to obtain an efficient hardware implementation. The result is a pipelined architecture that supports single chip implementation in VLSI technology. The implementation employs only one multiplier and 352 memory elements to compute all scales what results in a considerable smaller chip area (45 mm2) than former implementations. The hardware design has been captured by means of the VHDL language and simulated on data taken from random images. Implemented in a 0.7 μm technology, it can compute both the forward and inverse DWT at a rate of 3.5 512×512 12 bit images/s corresponding to a clock speed of 33 MHz. This chip is the core of a PCI board that will speedup the DWT computation on desktop computers.  相似文献   

2.
Morphological granulometries constitute one of the most useful and versatile image analysis techniques applied to a wide range of tasks, from size distribution of objects, to feature extraction and to texture characterization in industrial and research applications where high-performance instrumentation and online signal processing are required. Since granulometries are based on sequences of openings with structuring elements (SEs) of increasing size, they are computational demanding on non-specialized hardware. In this paper, a pipelined hardware architecture for fast computation of gray-level morphological granulometries is presented, centered around two systolic-like processing arrays able to process with flat SEs of different shapes and sizes. To validate the proposed scheme, the architecture was modeled, simulated and implemented into a field programmable gate array. Implementation results show that the architecture is able to compute particle size distribution on 512 × 512 sized images with flat non-rectangular SEs of up to 51 × 51, in around 60 ms at a clock frequency of 260 MHz. It is shown that a speed up over two orders of magnitude is obtained compared to a naive software implementation. The architecture performance compares favorably to similar hardware architectural schemes and to optimized high-performance graphical processing units-based implementations.  相似文献   

3.
In this paper we present a novel hardware architecture for real-time image compression implementing a fast, searchless iterated function system (SIFS) fractal coding method. In the proposed method and corresponding hardware architecture, domain blocks are fixed to a spatially neighboring area of range blocks in a manner similar to that given by Furao and Hasegawa. A quadtree structure, covering from 32 × 32 blocks down to 2 × 2 blocks, and even to single pixels, is used for partitioning. Coding of 2 × 2 blocks and single pixels is unique among current fractal coders. The hardware architecture contains units for domain construction, zig-zag transforms, range and domain mean computation, and a parallel domain-range match capable of concurrently generating a fractal code for all quadtree levels. With this efficient, parallel hardware architecture, the fractal encoding speed is improved dramatically. Additionally, attained compression performance remains comparable to traditional search-based and other searchless methods. Experimental results, with the proposed hardware architecture implemented on an Altera APEX20K FPGA, show that the fractal encoder can encode a 512 × 512 × 8 image in approximately 8.36 ms operating at 32.05 MHz. Therefore, this architecture is seen as a feasible solution to real-time fractal image compression.
David Jeff JacksonEmail:
  相似文献   

4.
The Euclidean distance transform (EDT) is an operation to convert a binary image consisting of black and white pixels to a representation where each pixel has the Euclidean distance of the nearest black pixel. The EDT has many applications in computer vision and image processing. In this paper, we present a constant-time algorithm for computing the EDT of an N×N image on a reconfigurable mesh. Our algorithm has two variants. (i) If the image is initially given in an N×N mesh, one pixel per processor, our algorithm requires an N×N×N mesh for computing the EDT. (ii) If the image is given in an N×N2 mesh, each row of the image in the first row of a separate N×N mesh, we can compute the EDT in the same N×N2 mesh. The AT2 bounds for these two variants are O(N4) and O(N3) respectively. The best previously known algorithm (Y. Pan and K. Li, Inform. Sci.120 (1999), 209–221) for this problem assumes input similar to the second variant of our algorithm and runs in constant-time on an N2×N2 reconfigurable mesh with an AT2 bound of O(N4). Hence both variants of our algorithm improve upon the processor complexity of the algorithm in Pan and Li (1999) by a factor of N and the second variant improves upon the AT2 complexity by a factor of N.  相似文献   

5.
Two-dimensional processor arrays, such as CLIP4, have become familiar and effective solutions to many image processing problems in recent years. However, some applications require pixel resolutions far greater than that of any presently feasible array. To deal with this problem, a system (CLIP4S) has been constructed at University College London which scans a 512×4 processor array over a 512×512 pixel data area. The principal elements of interest in the system comprise the provision of data storage for sixtyfour 512×512 pixel images, hardware circuits for automatic passing of neighbourhood signals between processed sub-arrays, and the control arrangements employed.  相似文献   

6.
《Real》2004,10(1):31-39
This paper presents a new hardware design for a neural network based colour image compression. The compressed image consists of a colour palette containing few best colours and the coded image. Kohonen's map neural network is applied to construct the colour palette and the coded image, both forming the compressed image. The Kohonen's map based compression results in linear time complexity (in the size of the image). It is advantageous over traditional JPEG in colour quantization applications and compression of images with limited colours. The architecture of the hardware unit is based on single instruction multiple data methodology. The architecture has been implemented in an application specific integrated circuit and results show that the proposed design achieves high speed allowing inputs at a video rate for compression of images up to size of 512×512 with low area requirement.  相似文献   

7.
The watershed transformation is a popular image segmentation technique for gray scale images. This paper describes a real-time image segmentation based on a parallel and pipelined watershed algorithm which is designed for hardware implementation. In our algorithm: (1) pixels in a given image are repeatedly scanned from top-left to bottom-right, and then from bottom-right to top-left, in order to achieve high performance on a pipelined circuit by simplifying memory access sequences, (2) all steps in the algorithm are executed at the same time in the pipelined circuit, (3) the amount of data that are scanned is gradually reduced as the calculation progresses by memorizing which data are modified in the previous scan, and (4) N pixels can be processed in parallel. In our current implementation on an off-the-shelf field-programmable gate array board, up to four pixels can be processed in parallel. The performance for 512 × 512 pixel images is fast enough to be the first step in real-time applications.
Tsutomu Maruyama (Corresponding author)Email:
  相似文献   

8.
《Real》1998,4(3):171-180
Scene matching is the problem of matching regions of two images of the same scene taken by different sensors at different times or under different viewing conditions. In this paper, we describe an efficient architecture for scene matching called SMAC (Scene Matching ArChitecture). The architecture achieves a significant amount of speedup by utilizing a large amount of parallelism and pipelining. Such an architecture can be used to compute the exhaustive search task of hierarchical scene matching, a technique used to reduce the amount of computations involved in scene matching applications. A prototype very large scale integration (VLSI) chip implementing a scaled down version of the proposed architecture has been designed and built. The prototype chip has been tested to be fully functional at a frequency of 50 MHz with a clock cycle of 20 ns. Based on the prototype design, it is estimated that the proposed architecture can process a 512 × 512 image with an 128 × 128 size template in about 15.36 μs, which corresponds to a rate of 65K frames per second.  相似文献   

9.
Adigitized plane Π of sizeM is a rectangular √M × √M array of integer lattice points called pixels. A √M × √M mesh-of-processors in which each processorP ij represents pixel (i,j) is a natural architecture to store and manipulate images in Π; such a parallel architecture is called asystolic screen. In this paper we consider a variety of computational-geometry problems on images in a digitized plane, and present optimal algorithms for solving these problems on a systolic screen. In particular, we presentO(√M)-time algorithms for determining all contours of an image; constructing all rectilinear convex hulls of an image (peeling); solving the parallel and perspective visibility problem forn disjoint digitized images; and constructing the Voronoi diagram ofn planar objects represented by disjoint images, for a large class of object types (e.g., points, line segments, circles, ellipses, and polygons of constant size) and distance functions (e.g., allL p metrics). These algorithms implyO(√M)-time solutions to a number of other geometric problems: e.g., rectangular visibility, separability, detection of pseudo-star-shapedness, and optical clustering. One of the proposed techniques also leads to a new parallel algorithm for determining all longest common subsequences of two words.  相似文献   

10.
This paper presents an FPGA-based architecture for local tone mapping of gray scale high dynamic range images. The architecture is described in VHDL and has been synthesized using Altera Quartus tools. It achieves an operating frequency consistent with a video rate of 60 frames per second for a frame of 1,024 × 768 pixels. The proposed architecture is a modification of the nine-scale Reinhard operator. Approximations to the original Reinhard operator ensure that the operator is amenable to implementation in hardware. A peak signal-to-noise ratio study shows that our fixed-point hardware approximation produces results similar to a floating-point original.
Joan E. CarlettaEmail:
  相似文献   

11.

This paper presents novel hardware of a unified architecture to compute the 4?×?4, 8?×?8, 16?×?16 and 32?×?32 efficient two dimensional (2-D) integer DCT using one block 1-D DCT for the HEVC standard with less complexity and material design. As HEVC large transforms suffer from the huge number of computations especially multiplications, this paper presents a proposition of a modified algorithm reducing the computational complexity. The goal is to ensure the maximum circuit reuse during the computation while keeping the same quality of encoded videos. The hardware architecture is described in VHDL language and synthesized on Altera FPGA. The hardware architecture throughput reaches a processing rate up to 52 million of pixels per second at 90 MHz frequency clock. An IP core is presented using the embedded video system on a programmable chip (SoPC) for implementation and validation of the proposed design. Finally, the proposed architecture has significant advantages in terms of hardware cost and improved performance compared to related work existing in the literature. This architecture can be used in ultra-high definition real-time TV coding (UHD) applications.

  相似文献   

12.
This note describes a cellular array algorithm for performing a general class of geometrical operations on an n × n digital image in O(n) time.  相似文献   

13.
The use of local features in images has become very popular due to its promising results. They have shown significant benefits in a variety of applications such as object recognition, image retrieval, robot navigation, panorama stitching, and others. SIFT is one of the local features methods that have shown better results. Among its main disadvantages is its high computational cost. In order to speedup this algorithm, this work proposes the design and implementation of an efficient hardware architecture based on FPGAs for SIFT interest point detection In order to take full advantage of the parallelism in this algorithm and to minimize the device area occupied by its implementation in hardware, part of the algorithm was reformulated. The main contribution of the hardware architecture proposed in this paper and the main difference with the rest of the architectures reported in the literature is that as the number of octaves to be processed is increased, the amount of occupied device area remains almost constant. The evaluations and experiments to the architecture support this contribution, as well as accuracy, repeatability, and distinctiveness of the results. Experiments also showed device area occupation and time constraints of the hardware implementation. The architecture presented in this paper is able to detect interest points in an image of 320 × 240 in 11 ms, which represents a speedup of 250 × with respect to a software implementation.  相似文献   

14.
In this study, we describe a GPU-based filter for image denoising, whose principle rests on Matheron’s level sets theory first introduced in 1975 but rarely implemented because of its high computation cost. We use the fact that, within a natural image, significant contours of objects coincide with parts of the image level-lines. The presented algorithm assumes an a priori knowledge of the corrupting noise type and uses the polygonal level-line modeling constraint to estimate the gray-level of each pixel of the denoised image by local maximum likelihood optimization. Over the 512 × 512 pixel test images, the freely available implementation of the state-of-the-art BM3D algorithm achieves 9.56 dB and 36 % of mean improvement in 4.3 s, respectively, for peak signal to noise ratio and mean structural similarity index. Over the same images, our implementation features a high quality/runtime ratio, with a mean improvement of 7.14 dB and 30 % in 9 ms, which is 470 times as fast and potentially allows processing high-definition video images at 19 fps.  相似文献   

15.
高性能的EBCOT编码及其VLSI结构   总被引:1,自引:0,他引:1  
刘凯  李云松  吴成柯 《软件学报》2006,17(7):1553-1560
提出了比特平面与编码过程全并行处理的EBCOT(embedded block coding with optimizedtruncation)编码结构.通过分析JPEG2000和国内外提出的EBCOT编码结构,指出不仅每一个比特平面,而且对应的编码过程的编码信息可以同时获得,从而给出了比特平面与编码过程全并行处理的块编码方法,并且详细说明了实现的VLSI结构.理论分析以及具体实验结果表明,比特平面与编码过程全并行处理所需的时钟周期最少,FPGA原型系统最高时钟频率可达65MHz,对于512×512的灰度图像,处理速度可达30fps,完全可以实时处理,图像质量达到了公布的JPEG2000标准.  相似文献   

16.
17.
This paper presents a real-time hardware implementation of a gradient domain dynamic range compression algorithm for high dynamic range (HDR) images. This technique works by calculating the gradients of the HDR image, manipulating those gradients, and reconstructing an output low dynamic range image that corresponds to the manipulated gradients. Reconstruction involves solving the Poisson equation. We propose a Poisson solver that utilizes only local information around each pixel along with special boundary conditions, and requires a small and fixed amount of hardware for any image size, with no need to buffer the entire image. The hardware implementation is described in VHDL and synthesized for a field programmable gate array (FPGA) device. The maximum operating frequency achieved is fast enough to process high dynamic range videos with one megapixel per frame at a rate of about 100 frames per second. The hardware is tested on standard HDR images from the Debevec library. The output images produced have good visual quality.  相似文献   

18.
A neural architecture for texture classification running on the Graphics Processing Unit (GPU) under a stream processing model is presented in this paper. Textural features extraction is done in three different scales, it is based on the computations that take place on the mammalian primary visual pathway and incorporates both structural and color information. Feature vectors classification is done using a fuzzy neural network which introduces pattern analysis for orientation invariant texture recognition. Performance tests are done over a varying number of textures and the entire VisTex database. The intrinsic parallelism of the neural system led us to implement the whole architecture to run on GPUs, providing a speed-up between × 16 and × 25 for classifying textures of sizes 128 × 128 and 512 × 512 px with respect to an implementation on the CPU. A comparison of classification rates obtained with other methods is included and shows the great performance of the architecture. An average classification rate of 85.2% is obtained for 167 textures of size 512 × 512 px.  相似文献   

19.
《Parallel Computing》1988,7(1):111-130
The principal theme herein is the direct hardware implementation on a special-purpose network of processors, the Wavefront Array Processor (WAP), of an alternate matrix procedure for the solution of linear systems Ax = b, where A is a compact dense (n × n) matrix. The method is based on the factorization of the coefficient matrix into components which are of ‘butterfly’ form, i.e., interlocking matrix quadrants, and for its implementation the concept of computational ‘dewavefronts’ is investigated.  相似文献   

20.
Affine transform is widely used in the high speed image processing systems. This transform plays an important role in various high speed applications like Optical quadrature microscopy (OQM), image stabilisation in digital camera and image registration etc. In these applications, transformations of image consume most of the execution time. Hence, for high speed imaging systems, acceleration of Affine transform is very much sought for. In this paper, the pipelined architecture implementation of a proposed inherent parallel algorithm for Affine transform has been presented. The acceleration of the image transformation will help in reducing the processing time of high speed imaging systems. The architecture is mapped in Field programmable gate array (FPGA) and the result shows that the proposed algorithm is almost 4 times faster than the conventional algorithm while retaining the image quality. Using the proposed algorithm, an image of size 1,920 × 1,080 can be transformed with a frame rate of 540 frames per second and the multiplane image synthesis for image stabilisation on the same digital image can be performed with a frame rate of 65 fps.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号