共查询到20条相似文献,搜索用时 15 毫秒
1.
Philip Dang 《Journal of Real-Time Image Processing》2006,1(1):57-62
This paper discusses the challenges of the design of real-time image and video processing systems and reviews some practical design approaches for these systems. 相似文献
2.
3.
Roussopoulos N. Mark L. Sellis T. Faloutsos C. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(1):22-33
Commercially available database systems do not meet the information and processing needs of design and manufacturing environments. A new generation of systems-engineering information systems-must be built to meet these needs. The architectural and computational aspects of such systems are addressed, and solutions are proposed. The authors argue that a mainframe-workstation architecture is needed to provide distributed functionality while ensuring high availability and low communication overhead, that explicit control of metaknowledge is needed to support extendibility and evolution, that large rule bases are needed to make the knowledge of the systems active, and that incremental computation models are needed to achieve the required performance of such engineering information systems 相似文献
4.
The use of a special purpose VLSI chip for relational operations is proposed The chip is structured like a tree with processors
at the nodes, called TOP (Tree of Processors). Each node is capable of storing a data element and of performing elementary
operations on elements. A table ofn tuples ofk elements each (e. g., a relation defined as in data base theory) is stored inn subtrees of at leastk nodes each, at the lowest levels of TOP. The upper portion of TOP is used for routing and bookkeeping purposes. A number
of elementary operations are defined for the nodes, and high level operations on tables are performed as combinations of the
former ones. In particular, some operations for data input/output and update are discussed, and the basic operations of UNION,
DIFFERENCE, PROJECTION, PRODUCT, SELECTION, and JOIN, defined in relational algebra, are studied for TOP realization. Even
the most complex operations are executed inO (kn) steps, that is the size of data. This result is optimal in our system, where we assume that data are transmitted to TOP's
through channels of constan bandwidth.
Dedicated to Professor S. Faedo on his 70th birthday
This research has been partially supported by Ministero della Pubblica Istruzione of Italy. 相似文献
5.
Lee J.-C. Sheu B.J. Fang W.-C. Chellappa R. 《Neural Networks, IEEE Transactions on》1993,4(2):178-191
The system design of a locally connected competitive neural network for video motion detection is presented. The motion information from a sequence of image data can be determined through a two-dimensional multiprocessor array in which each processing element consists of an analog neuroprocessor. Massively parallel neurocomputing is done by compact and efficient neuroprocessors. Local data transfer between the neuroprocessors is performed by using an analog point-to-point interconnection scheme. To maintain strong signal strength over the whole system, global data communication between the host computer and neuroprocessors is carried out in a digital common bus. A mixed-signal very large scale integration (VLSI) neural chip that includes multiple neuroprocessors for fast video motion detection has been developed. Measured results of the programmable synapse, and winner-takes-all circuitry are presented. Based on the measurement data, system-level analysis on a sequence of real-world images was conducted. 相似文献
6.
An architecture for interchip communication among analog VLSI neural networks is proposed. Activity is encoded in a neuron's pulse emission frequency. Information is transmitted through the non-arbitered, asynchronous access of pulses to a common bus. The impact of collisions when the bus is accessed by more than one user is investigated. The information-carrying capability is assessed and the trade-off between accuracy of the transmitted information and attainable dynamic range is brought out in terms of simple global parameters that characterize the application. It is found that the proposed architecture is well suited for the kind of communication requirements associated to neural computation systems. A coding scheme aimed at pushing the system towards its theoretical performance is also presented and evaluated. 相似文献
7.
Graham R. Nudd R.David Etchells Jan Grinberg 《Journal of Parallel and Distributed Computing》1985,2(1):1-29
The recent advances in integration technology for microelectronic circuitry will provide unprecedented systems capabilities in the upcoming decade. Among the most significant aspects of these systems will be their increasing “intelligence,” based on their manipulation of a variety of sensory data. Presently, the impressive advances in image understanding technology for visible, infrared, and synthetic array radar data provide added impetus to the development of truly autonomous systems. With the impending advent of these systems, it is crucially important to understand the impact that the new integration technologies will have on the necessary hardware. Furthermore, how the resulting systems may best be made to serve the requirements of each intended application must be understood. The computational requirements of autonomous systems based on image understanding are examined, and how those requirements might be satisfied by a cellular machine employing three-dimensional microelectronic technology is demonstrated. 相似文献
8.
Microsystem Technologies - This paper presents FPGA implementation of retimed high speed adaptive filter structures for speech enhancement. In this work, various high speed adaptive filtering... 相似文献
9.
This paper evaluates the possibility of using a general purpose superscalar architecture as the main computational engine for high performance DSP algorithms. Real-time sample rate conversion (SRC) in a software defined radio (SDR) has been taken as an example representing a class of computationally demanding DSP tasks. This scenario corresponds to digital filters operating at a high sampling rate in intermediate frequency (IF) stage of a multi-standard wireless transceiver. However, instead of a dedicated signal processing engine, a superscalar processor is designed for SRC implementation. An iterative, SimpleScalar based architectural modeling tool has been developed to analyze various parameters of superscalar processors. Both power and performance metrics have been taken under consideration to come up with an efficient design. It has been shown that the resulting superscalar architecture can provide a fully programmable solution capable of supporting future wireless communication standards in real-time. The design methodology explored in this work can be extended to obtain efficient processor architectures for a range of other applications. 相似文献
10.
A high performance digital architecture for the implementation of a non-linear image enhancement technique is proposed in this paper. The image enhancement is based on a luminance dependent non-linear enhancement algorithm which achieves simultaneous dynamic range compression, colour consistency and lightness rendition. The algorithm provides better colour fidelity, enhances less noise, prevents the unwanted luminance drop at the uniform luminance areas, keeps the ‘bright’ background unaffected, and enhances the ‘dark’ objects in ‘bright’ background. The algorithm contains a large number of complex computations and thus it requires specialized hardware implementation for real-time applications. Systolic, pipelined and parallel design techniques are utilized effectively in the proposed FPGA-based architectural design to achieve real-time performance. Estimation techniques are also utilized in the hardware algorithmic design to achieve faster, simpler and more efficient architecture. The video enhancement system is implemented using Xilinx’s multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 67 Mega-pixels (Mpixels) per second. 相似文献
11.
This paper presents the design of a VLSI fuzzy processor, which is capable of dealing with complex fuzzy inference systems, i.e., fuzzy inferences that include rule chaining. The architecture of the processor is based on a computational model whose main features are: the capability to cope effectively with complex fuzzy inference systems; a detection phase of the rule with a positive degree of activation to reduce the number of rules to be processed per inference; parallel computation of the degree of activation of active rules; and representation of membership functions based on α-level sets. As the fuzzy inference can be divided into different processing phases, the processor is made up of a number of stages which are pipelined. In each stage several inference processing phases are performed parallelly. Its performance is in the order of 2 MFLIPS with 256 rules, eight inputs, two chained variables, and four outputs and 5.2 MFLIPS with 32 rules, three inputs, and one output with a clock frequency of 66 MHz 相似文献
12.
13.
A VLSI architecture for real-time edge linking 总被引:1,自引:0,他引:1
A real-time algorithm and its VLSI implementation for edge linking is presented. The linking process is based on the break points' directions and the weak level points. The proposed VLSI architecture is capable of outputting one pixel of the linked edge map per clock cycle with a latency of 11n+12 clock cycles, where n is the number of pixel columns in the image 相似文献
14.
提出了半像素运动估计算法的硬件实现方案,该方案可有效地提高视频编码的速度,耗费较低的硬件资源,减小处理器的面积。 相似文献
15.
提出了基于整数小波变换的VLSI结构的设计。根据整数小波变换的一些特点,也就是小波矩阵的最佳因数分解和有限精度的表现效果,应用少数位来表示尾数使得性能退化非常有限,基于这些结果,提出了基于小波整数变换的VLSI实现,用适度的门复杂性来得到快速帧率。 相似文献
16.
Corner detection is a low-level feature detection operator that is of great use in image processing applications, for example, optical flow and structure from motion by image correspondence. The detection of corners is a computationally intensive operation. Past implementations of corner detection techniques have been restricted to software. In this paper we propose an efficient very large-scale integration (VLSI) architecture for detection of corners in images. The corner detection technique is based on the half-edge concept and the first directional derivative of Gaussian. Apart from the location of the corner points, the algorithm also computes the corner orientation and the corner angle and outputs the edge map of the image. The symmetrical properties of the masks are utilized to reduce the number of convolutions effectively, from eight to two. Therefore, the number of multiplications required per pixel is reduced from 1800 to 392. Thus, the proposed architecture yields a speed-up factor of 4.6 over conventional convolution architectures. The architecture uses the principles of pipelining and parallelism and can be implemented in VLSI. 相似文献
17.
Yong-Ju Lee Yoo-Hyun Park Song-Woo Sok Hag-Young Kim Cheol-Hoon Lee 《The Journal of supercomputing》2009,50(2):99-120
In a disk-network scenario where expensive data transfers are the norm, such as in multimedia streaming applications, for
example, a fast-path I/O architecture is generally considered to be “good practice.” Here, I/O performance can be improved
through minimizing the number of in-memory data movements and context switches. In this paper, we report the results of the
design and implementation of a high-performance streaming server using cheap hardware units assembled directly on a test card
(i.e., NS card). The hardware part of our architecture is open to further reuse, extension, and integration with other applications
even in the case of inexpensive and/or faster hardware. From the viewpoint of software-aided I/O, we offer Stream Disk Array
(SDA) for scatter/gather-style block I/O, EXT3NS multimedia file system for large-scale file I/O, and interoperable streaming
server for stream I/O. 相似文献
18.
Modern microprocessors achieve high application performance at an acceptable level of power dissipation. Reorder buffer is used for out-of-order instructions to be committed in-order. The reorder buffer plays a key role in modern microprocessors because performance improvement techniques highly rely on aggressive speculation to feed wider issue, out-of-order, and deep pipelines. In terms of power to performance trade-off, reorder buffer is particularly important. This is because enlarging the reorder buffer size achieves high performance but naive scaling of the conventional reorder buffer architecture can severely increase the complexity and power consumption. In this paper, we propose low-power reorder buffer techniques for contemporary microprocessors. First, the separated reorder buffer reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until all their data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. The result of the instruction is written into rename buffers immediately after the execution completes. Then, the result values in the rename buffer are written into the architectural register file at the commit state. The proposed approaches in this paper provide higher resource utilization and low power consumption. 相似文献
19.
Xiaofeng Huang Huizhu Jia Binbin Cai Chuang Zhu Jie Liu Mingyuan Yang Don Xie Wen Gao 《Journal of Real-Time Image Processing》2016,12(2):285-302
The emerging intra-coding tools of High Efficiency Video Coding (HEVC) standard can achieve up to 36 % bit-rate reduction compared to H.264/AVC, but with significant complexity increase. The design challenges, such as data dependency and computational complexity, make it difficult to implement a hardware encoder for real-time applications. In this paper, firstly, the data dependency in HEVC intra-mode decision is fully analyzed, which is cost by the reconstruction loop, the Most Probable Mode, the context adaption during Context-based Adaptive Binary Arithmetic Coding based rate estimation, and the Chroma derived mode. Then, several fast algorithms are proposed to remove the data dependency and to reduce the computational complexity, which include source signal based Rough Mode Decision, coarse to fine rough mode search, Prediction Mode Interlaced RDO mode decision, parallelized context adaption and Chroma-free Coding Unit (CU)/Prediction Unit (PU) decision. Finally, the parallelized VLSI architecture with CU reordering and Chroma reordering scheduling is proposed to improve the throughput. The experimental results demonstrate that the proposed intra-mode decision achieves 41.6 % complexity reduction with 4.3 % Bjontegaard Delta Rate (BDR) increase on average compared to the reference software, HM-13.0. The intra-mode decision scheme is implemented with 1571.7K gate count in 55 nm CMOS technology. The implementation results show that our design can achieve 1080p@60fps real time processing at 294 MHz operation frequency. 相似文献
20.
Multimedia Tools and Applications - High efficiency video coding (HEVC) has been standardized as a means of meeting the coding requirements of 4 K (3840 × 2160)... 相似文献