首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A VLSI architecture for real-time edge linking   总被引:1,自引:0,他引:1  
A real-time algorithm and its VLSI implementation for edge linking is presented. The linking process is based on the break points' directions and the weak level points. The proposed VLSI architecture is capable of outputting one pixel of the linked edge map per clock cycle with a latency of 11n+12 clock cycles, where n is the number of pixel columns in the image  相似文献   

2.
Corner detection is a low-level feature detection operator that is of great use in image processing applications, for example, optical flow and structure from motion by image correspondence. The detection of corners is a computationally intensive operation. Past implementations of corner detection techniques have been restricted to software. In this paper we propose an efficient very large-scale integration (VLSI) architecture for detection of corners in images. The corner detection technique is based on the half-edge concept and the first directional derivative of Gaussian. Apart from the location of the corner points, the algorithm also computes the corner orientation and the corner angle and outputs the edge map of the image. The symmetrical properties of the masks are utilized to reduce the number of convolutions effectively, from eight to two. Therefore, the number of multiplications required per pixel is reduced from 1800 to 392. Thus, the proposed architecture yields a speed-up factor of 4.6 over conventional convolution architectures. The architecture uses the principles of pipelining and parallelism and can be implemented in VLSI.  相似文献   

3.
The organization and operation of a semantic network array processor (SNAP) are described. The architecture consists of an array of identical cells each containing a content addressable memory, microprogram control, and communication unit. Each cell is dedicated to one node of the semantic network and its associated relations. The array can perform global associative functions under the supervision of an outside controller. In addition, each-cell is equipped with the necessary logic to perform individual functions. A set of primitive instructions was carefully chosen. Some of the applications discussed include pattern search operations, production systems, and inferences. A LISP simulator was developed for this architecture, and some simulation results are presented.  相似文献   

4.
The Applicative Programming System Architecture contains a novel Data Structure Memory (DSM) which supports fast access operations on compact linear data structures. Several problems that arise in implementations of applicative and functional programming languages can be solved efficiently using special data representations on the DSM. Each memory word in the DSM contains a very small local processor, and there is also a tree-structured communications network within the DSM. Therefore the DSM is a massively parallel SIMD machine. This paper describes a VLSI implementation of the DSM architecture and compares its performance with implementations on a conventional sequential computer and the NASA Massively Parallel Processor.  相似文献   

5.
The recent advances in integration technology for microelectronic circuitry will provide unprecedented systems capabilities in the upcoming decade. Among the most significant aspects of these systems will be their increasing “intelligence,” based on their manipulation of a variety of sensory data. Presently, the impressive advances in image understanding technology for visible, infrared, and synthetic array radar data provide added impetus to the development of truly autonomous systems. With the impending advent of these systems, it is crucially important to understand the impact that the new integration technologies will have on the necessary hardware. Furthermore, how the resulting systems may best be made to serve the requirements of each intended application must be understood. The computational requirements of autonomous systems based on image understanding are examined, and how those requirements might be satisfied by a cellular machine employing three-dimensional microelectronic technology is demonstrated.  相似文献   

6.
This paper presents the design of a VLSI fuzzy processor, which is capable of dealing with complex fuzzy inference systems, i.e., fuzzy inferences that include rule chaining. The architecture of the processor is based on a computational model whose main features are: the capability to cope effectively with complex fuzzy inference systems; a detection phase of the rule with a positive degree of activation to reduce the number of rules to be processed per inference; parallel computation of the degree of activation of active rules; and representation of membership functions based on α-level sets. As the fuzzy inference can be divided into different processing phases, the processor is made up of a number of stages which are pipelined. In each stage several inference processing phases are performed parallelly. Its performance is in the order of 2 MFLIPS with 256 rules, eight inputs, two chained variables, and four outputs and 5.2 MFLIPS with 32 rules, three inputs, and one output with a clock frequency of 66 MHz  相似文献   

7.
8.
The use of a special purpose VLSI chip for relational operations is proposed The chip is structured like a tree with processors at the nodes, called TOP (Tree of Processors). Each node is capable of storing a data element and of performing elementary operations on elements. A table ofn tuples ofk elements each (e. g., a relation defined as in data base theory) is stored inn subtrees of at leastk nodes each, at the lowest levels of TOP. The upper portion of TOP is used for routing and bookkeeping purposes. A number of elementary operations are defined for the nodes, and high level operations on tables are performed as combinations of the former ones. In particular, some operations for data input/output and update are discussed, and the basic operations of UNION, DIFFERENCE, PROJECTION, PRODUCT, SELECTION, and JOIN, defined in relational algebra, are studied for TOP realization. Even the most complex operations are executed inO (kn) steps, that is the size of data. This result is optimal in our system, where we assume that data are transmitted to TOP's through channels of constan bandwidth. Dedicated to Professor S. Faedo on his 70th birthday This research has been partially supported by Ministero della Pubblica Istruzione of Italy.  相似文献   

9.
An application-specific architecture for the parallel calculation of the decimation in time and radix 2 fast Hartley (FHT) and Fourier (FFT) transforms is presented. A real sequence with N=2n data items is considered as input. The system calculates the FHT and the FFT in n and n+1 stages. respectively. The modular and regular parallel architecture is based on a constant geometry algorithm using butterflies of four data items and the perfect unshuffle permutation. With this permutation, the mapping of the algorithm in VLSI technology is simplified and the communications among processors are minimized. Organization of the processor memory based on first-in, first-out (FIFO) queues facilitates a systolic data flow and permits the implementation in a direct way of the complex data movements and address sequences of the transforms. This is accomplished by means of simple multiplexing operations, using hardwired control. The total calculation time is (Nlog2N)/4Q cycles for the FHT and N(1+log2N)/4Q cycles for the FFT, where Q is the number of processors ( Q= 2q, QN/4)  相似文献   

10.
In this paper, we consider the circuit partitioning problem, which is a fundamental problem in computer-aided design of very large-scale-integrated circuits. We formulate the problem as an equivalent constrained integer programming problem by constructing an auxiliary function. A global search method, entitled the dynamic convexized method, is developed for the integer programming problem. We modify the Fiduccia–Mattheyses (FM) algorithm, which is a fundamental partitioning algorithm for the circuit partitioning problem, to minimize the auxiliary function. We show both computationally and theoretically that our method can escape successfully from previous discrete local minimizers by taking increasing values of a parameter. Experimental results on ACM/SIGDA and ISPD98 benchmarks show up to 58% improvements over the well-known FM algorithm in terms of the best cutsize. Furthermore, we integrate the algorithm with the state-of-the-art practical multilevel partitioner MLPart. Experiments on the same set of benchmarks show that the solutions obtained in this way has 3–7% improvements over that of the MLPart.  相似文献   

11.
为了降低二维小波变换中的存储消耗并同时提高电路处理速度,提出了一种二维并行的VLSI结构。通过充分挖掘二维变换中行变换和列变换之间的关系,优化了行变换核和列变换核的并行数据扫描输入方式,将9/7小波变换的中间存储降低至4N。同时,采用基于翻转格式的流水线技术,将电路的关键路径缩短至一级乘法器延时,有效地提高了电路处理速度,并通过伸缩电路合并的优化方法将乘法器个数降低至10个,从而有效地减少了硬件资源消耗。  相似文献   

12.
提出了半像素运动估计算法的硬件实现方案,该方案可有效地提高视频编码的速度,耗费较低的硬件资源,减小处理器的面积。  相似文献   

13.
The CORDIC algorithm, originally proposed using nonredundant radix-2 arithmetic, has been refined in terms of throughput and latency with the introduction of redundant arithmetic and higher radix techniques. In this paper, we propose a pipelined architecture using signed digit arithmetic for the VLSI efficient implementation of rotational radix-4 CORDIC algorithm, eliminating z path completely. A detailed comparison of the proposed architecture with the available radix-2 architectures shows the latency and hardware improvement. The proposed architecture achieves latency improvement over the previously proposed radix-4 architecture with a relatively small hardware overhead. The proposed architecture for 16-bit precision was implemented using VHDL and extensive simulations have been performed to validate the results. The functionally simulated net list has been synthesized for 16-bit precision with 90 nm CMOS technology library and the area-time measures are provided. This architecture was also implemented using Xilinx ISE9.1 software and a Virtex device.  相似文献   

14.
一种基于整数小波变换的VLSI实现   总被引:1,自引:1,他引:0  
提出了基于整数小波变换的VLSI结构的设计。根据整数小波变换的一些特点,也就是小波矩阵的最佳因数分解和有限精度的表现效果,应用少数位来表示尾数使得性能退化非常有限,基于这些结果,提出了基于小波整数变换的VLSI实现,用适度的门复杂性来得到快速帧率。  相似文献   

15.
CARMEL-2 is a high performance VLSI uniprocessor, tuned forFlat Concurrent Prolog (FCP). CARMEL-2 shows almost 5-fold speedup over its predecessor, CARMEL-1, and it achieves 2,400 KLIPS executingappend. This high execution rate was gained as a result of an optimized design, based on an extensive architecture-oriented execution analysis of FCP, and the lessons learned with CARMEL-1. CARMEL-2 is a RISC processor in its character and performance. The instruction set includes only 29 carefully selected instructions. The 10 special instructions, the prudent implementation and pipeline scheme, as well as sophisticated mechanisms such as intelligent dereference, distinguish CARMEL-2 as a RISC processor for FCP.  相似文献   

16.
Despite more than 40 years of research, motion estimation is still considered an emerging field, a field especially relevant today because of its vast utility for real-world applications. Currently, even the best bio-inspired algorithms lack certain characteristics that are readily found, for example, naturally in, say, mammals. Furthermore, the vast computational resources required are not usually affordable in real-time application. We present here a useful framework for building bio-inspired systems in real-time environments, reducing computational complexity. A complete quantization study of neuromorphic robust optical flow architecture is performed, using properties found in the cortical motion pathway. This architecture is designed for VLSI systems. An extensive analysis is performed to avoid compromising the viability and the robustness of the final system. A set of simulations and techniques that can be helpful for designing real-time artificial vision embedded systems and, specifically, gradient-based optical flow systems is shown. This work includes the final error results, resource usage, and performance data.  相似文献   

17.
In this paper we describe a VLSI system architecture for high-speed synthesis of 3D images composed of diffusely reflective surfaces. The system consists of two loosely coupled sub-systems. The first sub-system computes the form-factor matrixF. The form-factors are computed by an efficient ray-tracing algorithm. The second sub-system, a multiprocessor Gauss-Seidel iterative system solver, solves the sparse system of radiosity equations(I–F)b=e.This work has been supported by the Dutch National Applied Science Foundation under grant STW DEL 47.0643This article is the second part of the Special Feature onGraphics Hardware, guest-edited by T. Nakamura. (For the first part, see The Visual Computer 4:175–221)  相似文献   

18.
The natural scenarios show that the images which are taken under low light environments experiences weak luminosity and low-contrast issues frequently. One of the vital units of image contrast enhancement accelerator is a Histogram computation. In this paper, a new VLSI structure is developed for contrast enhancement using optimized memory-based histogram analysis. Initially, a comparator-less parallel rank ordering filter is proposed to remove the noise in the pre-processing stage. Then, the pipelining and parallel process of the memory-based histogram analysis is enhanced by introducing an enhanced transient search optimization (ETSO) approach that selects suitable data to perform the data comparison process in the processing elements (PEs). In addition, a flexible digital comparators (FDC), pipelined structure of bilinear interpolation and reciprocal units are introduced in the proposed contrast enhancement accelerator to save area and power usage. Also, the histogram memory unit is designed by considering dual data flow with self-configurable characteristics for the selection of appropriate data flow in the contrast enhancement application. At last, the stated VLSI structures for both histogram computation and enhancement unit are implemented in FPGA for meeting their speed and resource constraints. The key factors of the proposed design are lowest delay of 5.256 ns and lower power consumption of 0.289 W against existing accelerators.  相似文献   

19.
提出一种高效的分像素运动估计VLSI结构。通过采用对整个搜索窗口进行并行插值,并设计数据路由结构对数据流进行分配存储的方法,在节省存储空间的同时,有效降低了存储器的访问次数,提高了数据利用率,解决了分像素运动估计数据存储量大、搜索窗口访存次数多以及搜索时间长的问题。在SMIC 0.13μm工艺下,用Synopsys DC进行逻辑综合。在时钟频率300 MHz下,处理1080P的视频图像,速度可以达到65 frame/s。  相似文献   

20.
This paper discusses the challenges of the design of real-time image and video processing systems and reviews some practical design approaches for these systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号