共查询到20条相似文献,搜索用时 15 毫秒
1.
With current low-cost high-performance workstations, application-to-application throughput is limited more by host memory bandwidth than by the cost of protocol processing. Conventional network architectures are inefficient in their use of this memory bandwidth, because data is copied several times between the application and the network. As network speeds increase further, network architectures must be developed that reduce the demands on host memory bandwidth. The authors discuss the design of a single-copy network architecture, where data is copied directly between the application buffer and the network interface. Protocol processing is performed by the host, and transport layer buffering is provided on the network interface. They describe a prototype implementation for the HP Apollo Series 700 workstation family that consists of an FDDI network interface and a modified 4.3BSD TCP/IP protocol stack, and report some early results that demonstrate twice the throughput of a conventional network architecture and significantly lower latency 相似文献
2.
CORBA是一种已经成熟的分布式计算模型,很多分布式系统均构建在此结构之上,但,其紧密耦合特性限制了其在Internet上的进一步发展,Web服务成为一种分布式计算的新的实现方法.现阐述了Web服务的基本技术规范,介绍了一种基于SOAP的CORBA架构向Web服务架构迁移技术。 相似文献
3.
4.
Klaus Gaedke Hartwig Jeschke Peter Pirsch 《The Journal of VLSI Signal Processing》1993,5(2-3):159-169
A MIMD based multiprocessor architecture for real-time video processing applications consisting of identical bus connected processing elements has been developed. Each processing element contains a RISC processor for controlling and data-dependent tasks and a Low Level Coprocessor for fast processing of convolution-type video processing tasks. To achieve efficient parallel processing of video input signals, the architecture supports independent processing of overlapping image segments. Running at a clock rate of 40 MHz, a single processing element provides a peak performance of 640 Mega arithmetic operations per second (MOPS). For the real-time processing of basic video processing tasks like 3×3 FIR-filter, 8×8 2D-DCT and motion estimation, a single processing element provides a sufficient computational rate for video signals with Common Intermediate Format (CIF) at a frame rate up to 30 Hz. For hybrid source coding of CIF video signals at a frame rate of 30 Hz a multiprocessor system consisting of six processing elements is required. A linear speedup of the multiprocessor system compared to a single processing element is achieved. A VLSI implementation of a processing element in 0.8 µm CMOS technology is under development. 相似文献
5.
Techniques for enhancing real-time CORBA quality of service 总被引:5,自引:0,他引:5
Pyarali I. Schmidt D.C. Cytron R.K. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2003,91(7):1070-1085
End-to-end predictability of remote operations is essential for many fixed-priority distributed real-time and embedded (DRE) applications, such as command and control systems, manufacturing process control systems, large-scale distributed interactive simulations, and testbeam data acquisition systems. To enhance predictability, the Real-time CORBA specification defines standard middleware features that allow applications to allocate, schedule, and control key CPU, memory, and networking resources necessary to ensure end-to-end quality of service support. This paper provides two contributions to the study of Real-time CORBA middleware for DRE applications. First, we identify potential problems with ensuring predictable behavior in conventional middleware by examining the end-to-end critical code path of a remote invocation and identifying sources of unbounded priority inversions. Experimental results then illustrate how the problems we identify can yield unpredictable behavior in conventional middleware platforms. Second, we present design techniques for ensuring real-time quality of service in middleware. We show how middleware can be redesigned to use nonmultiplexed resources to eliminate sources of unbounded priority inversion. The empirical results in this paper are conducted using TAO, which is widely used and open-source DRE middleware compliant with the Real-time CORBA specification. 相似文献
6.
This paper contributes a distributed packet controller which reduces queueing to a single stage in two-stage packet switches. Software and neural network based controllers are described. Simulations under a range of traffic conditions for a 1024×1024 switch size shows the simplest architecture has the best performance 相似文献
7.
一种高性能的适用于AVS的二维整数逆变换实现结构 总被引:1,自引:0,他引:1
针对AVS视频标准中的整数逆变换,本文提出了一种高性能的硬件实现方案.本方案采用两个一维逆变换核和4个16(16的双口SRAM.通过合理控制SRAM的读写方式,避免了数据的预处理与后处理,流水线的深度也得到减少.在列变换时,改变数据运算次序,从而保证了4个双口SRAM不影响运算速度.处理8(8的数据块,本结构仅需要37个时钟,与传统的实现方案相比,在同等运算速度下,面积节约28%.实验表明该结构适用于采用AVS标准的HDTV编解码器. 相似文献
8.
In this paper, we have analyzed the register complexity of direct-form and transpose-form structures of FIR filter and explored the possibility of register reuse. We find that direct-form structure involves significantly less registers than the transpose-form structure, and it allows register reuse in parallel implementation. We analyze further the LUT consumption and other resources of DA-based parallel FIR filter structures, and find that the input delay unit, coefficient storage unit and partial product generation unit are also shared besides LUT words when multiple filter outputs are computed in parallel. Based on these finding, we propose a design approach, and used that to derive a DA-based architecture for reconfigurable block-based FIR filter, which is scalable for larger block-sizes and higher filter-lengths. Interestingly, the number of registers of the proposed structure does not increase proportionately with the block-size. This is a major advantage for area-delay and energy efficient high-throughput implementation of reconfigurable FIR filters of higher block-sizes. Theoretical comparison shows that the proposed structure for block-size 8 and filter-length 64 involves 60% more flip-flops, 6.2 times more adders, 3.5 times more AND-OR gates, and offers 8 times higher throughput. ASIC synthesis result shows that the proposed structure for block-size 8 and filter-length 64 involves 1.8 times less area-delay product (ADP) and energy per sample (EPS) than the existing design, and it can support 8 times higher throughput. The proposed structure for block sizes 4 and 8, respectively, consumes 38% and 50% less power than the exiting structure for the same throughput rates on average for different supply voltages. 相似文献
9.
In this paper, we propose a hardware (H/W) architecture to find disparities for stereo matching in real time. After analyzing the arithmetic characteristic of stereo matching, we propose a new calculating method that reuses the intermediate results to minimize the calculation load and memory access. From this, we propose a stereo matching calculation cell and a new H/W architecture. Finally, we propose a new stereo matching processor. The implemented H/W can operate at the clock frequency of 250 MHz at least in the FPGA (field programmable gate array) environment and produce about 120 disparity images per second for HD stereo images. 相似文献
10.
This paper addresses how to support both real-time and non-real-time communication services in a wireless LAN with dynamic time-division duplexed (D-TDD) transmission. With D-TDD, a frequency channel is time-shared for both downlink and uplink transmissions under the dynamic access control of the base station. The base station (1) handles uplink transmissions by polling mobiles in a certain order determined on a per-connection (per-message) basis for transmitting real-time (non-real-time) traffic from mobiles and (2) schedules the transmission of downlink packets. To handle location-dependent, time-varying, and bursty errors, we adopt the channel-state prediction, transmission deferment, and retransmission. We consider the problems of scheduling and multiplexing downlink packet transmissions, and polling mobiles for uplink transmissions depending on the channel state. We also establish conditions necessary to admit each new real-time connection by checking if the connection's delivery-delay bound can be guaranteed as long as the channel stays in good condition without compromising any of the existing guarantees. Last, the performance of the proposed protocol is evaluated to demonstrate how the protocol works and to study the effects of various parameters of the protocol 相似文献
11.
Hernandez O.J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2006,14(2):111-121
Image feature separation is a crucial step for image segmentation in computer vision systems. One efficient and powerful approach is the unsupervised clustering of the resulting data set; however, it is a very computationally intensive task. This paper presents a high-performance architecture for unsupervised data clustering. This architecture is suitable for VLSI implementations. It exploits paradigms of massive connectivity like those inspired by neural networks, and parallelism and functionality integration that can be afforded by emerging nanometer semiconductor technologies. By utilizing a "global-quasi-systolic, local-hyper-connected" architectural approach, the hardware can process real-time DVD-quality video at the highest rate allowed by the MPEG-2 standard. The architecture is a realization of the histogram peak-climbing clustering algorithm, and it is the first special-purpose architecture that has been proposed for this important problem. The architecture has also been prototyped using a Xilinx field programmable gate array (FPGA) development environment. Although this paper discusses a computer vision application, the architecture presented can be utilized in the acceleration of the clustering process of any type of high-dimensionality data. 相似文献
12.
A case for end system multicast 总被引:11,自引:0,他引:11
Yang-hua Chu Rao S.G. Seshan S. Hui Zhang 《Selected Areas in Communications, IEEE Journal on》2002,20(8):1456-1471
The conventional wisdom has been that Internet protocol (IP) is the natural protocol layer for implementing multicast related functionality. However, more than a decade after its initial proposal, IP multicast is still plagued with concerns pertaining to scalability, network management, deployment, and support for higher layer functionality such as error, flow, and congestion control. We explore an alternative architecture that we term end system multicast, where end systems implement all multicast related functionality including membership management and packet replication. This shifting of multicast support from routers to end systems has the potential to address most problems associated with IP multicast. However, the key concern is the performance penalty associated with such a model. In particular, end system multicast introduces duplicate packets on physical links and incurs larger end-to-end delays than IP multicast. We study these performance concerns in the context of the Narada protocol. In Narada, end systems self-organize into an overlay structure using a fully distributed protocol. Further, end systems attempt to optimize the efficiency of the overlay by adapting to network dynamics and by considering application level performance. We present details of Narada and evaluate it using both simulation and Internet experiments. Our results indicate that the performance penalties are low both from the application and the network perspectives. We believe the potential benefits of transferring multicast functionality from end systems to routers significantly outweigh the performance penalty incurred. 相似文献
13.
A multimedia communication system includes both the communication protocols used to transport the real-time data and the distributed computing system (DCS) within which any applications using the protocols must execute. The architecture presented attempts to integrate these communications protocols with the DCS in a smooth fashion in order to ease the writing of multimedia applications. Two issues are identified as being essential to the success of this integration: the synchronization of related real-time data streams, and the management of heterogeneous multimedia hardware. The synchronization problem is tackled by defining explicit synchronization properties at the presentation level and by providing control and synchronization operations within the DCS which operate in terms of these properties. The heterogeneity problems are addressed by separating the data transport semantics (protocols themselves) from the control semantics (protocol interfaces) 相似文献
14.
D. Chaikalis N.P. Sgouros D. Maroulis 《Journal of Visual Communication and Image Representation》2010,21(1):9-16
In this paper, we present a hardware architecture for real-time three-dimensional (3D) surface model reconstruction from Integral Images (InIms). The proposed parallel digital system realizes a number of computational-heavy calculations in order to achieve real-time operation. The processing elements are deployed in a systolic architecture and operate on multiple image areas simultaneously. Moreover, memory organization allows random access to image data and copes with the increased processing throughput of the system. Operating results reveal that the proposed architecture is able to process 3D data at a real-time rate. The proposed system can handle large sized InIms in real time and outputs 3D scenes of enhanced depth and detailed texture, which apply to emerging 3D applications. 相似文献
15.
16.
《Electron Devices, IEEE Transactions on》1985,32(11):2232-2237
This paper describes the VLSI for high-performance graphic control which utilizes two-level multiprocessor architecture. The VLSI chip is constructed of multiprocessor modules processing in parallel, and each processor module is constructed of multiexecutors using pipeline processing. This dedicated VLSI chip, designated as advanced CRT controller (ACRTC), has three processor modules, each independently controlling drawing, display, and timing. The graphic architecture of the drawing processor, which controls graphic drawing, is described. A high-level graphic language based on anX-Y coordinate system is adopted. High-speed drawing is realized (drawing rate is 500 ns/pixel for drawing a line) by pipeline processing with three executors, the logical address executor, physical address executor, and color data executor. 相似文献
17.
Jung S. Thewes R. Scheiter T. Goser K.F. Weber W. 《Solid-State Circuits, IEEE Journal of》1999,34(7):978-984
A CMOS fingerprint sensor architecture with embedded cellular logic for image processing is presented. The system senses a fingerprint image with a capacitive technique and performs several image-processing algorithms, including thinning the ridges of the fingerprint structure and encoding it to its characteristic features. Image processing is achieved by application of hexagonal local operators implemented in pixel-parallel mixed neuron-MOS/CMOS logic circuits. The massive parallelism of the architecture leads to a very low power dissipation. Results of simulations and measurements on a demonstrator chip in 0.65-μm double-poly standard CMOS technology are shown. The approach is well suited for person-identification applications, especially in small and low-cost portable systems, such as smart cards 相似文献
18.
Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel 相似文献
19.
Recently, Siamese based methods have made a breakthrough in the visual tracking field. However, the existing trackers still cannot take full advantage of the deep features. In this work, we improve the performances of Siamese trackers by complementary learning with different types of matching features. Specifically, a Matching Activation Network (MAN) is firstly designed to highlight the matching regions of the search image given a template. Since only sparse parts of feature maps contribute to the matching result, an important design choice is to emphasize the weak-matching features by erasing the strong-matching ones and learn complementary classifiers from both types of features. Then we propose a novel complementary region proposal network (CoRPN) to take complementary features as inputs and their outputs complement to each other, which are fused to improve the performance. Experiments show that our proposed tracker achieves leading performances on five tracking datasets while retaining real-time speed. 相似文献