首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A high-performance network architecture for a PA-RISC workstation   总被引:1,自引:0,他引:1  
With current low-cost high-performance workstations, application-to-application throughput is limited more by host memory bandwidth than by the cost of protocol processing. Conventional network architectures are inefficient in their use of this memory bandwidth, because data is copied several times between the application and the network. As network speeds increase further, network architectures must be developed that reduce the demands on host memory bandwidth. The authors discuss the design of a single-copy network architecture, where data is copied directly between the application buffer and the network interface. Protocol processing is performed by the host, and transport layer buffering is provided on the network interface. They describe a prototype implementation for the HP Apollo Series 700 workstation family that consists of an FDDI network interface and a modified 4.3BSD TCP/IP protocol stack, and report some early results that demonstrate twice the throughput of a conventional network architecture and significantly lower latency  相似文献   

2.
CORBA是一种已经成熟的分布式计算模型,很多分布式系统均构建在此结构之上,但,其紧密耦合特性限制了其在Internet上的进一步发展,Web服务成为一种分布式计算的新的实现方法.现阐述了Web服务的基本技术规范,介绍了一种基于SOAP的CORBA架构向Web服务架构迁移技术。  相似文献   

3.
于绍娜  袁杰  胥桂仙 《电子测试》2008,(3):34-36,61
本文阐述了CORBA技术的基本原理,讨论了CORBA技术应用于分布式数据库的设计方法,在此基础上提出了一种基于CORBA的分布式考试系统的设计及实现方法.该系统采用三层结构,便于负载平衡及各层的独立升级.  相似文献   

4.
A MIMD based multiprocessor architecture for real-time video processing applications consisting of identical bus connected processing elements has been developed. Each processing element contains a RISC processor for controlling and data-dependent tasks and a Low Level Coprocessor for fast processing of convolution-type video processing tasks. To achieve efficient parallel processing of video input signals, the architecture supports independent processing of overlapping image segments. Running at a clock rate of 40 MHz, a single processing element provides a peak performance of 640 Mega arithmetic operations per second (MOPS). For the real-time processing of basic video processing tasks like 3×3 FIR-filter, 8×8 2D-DCT and motion estimation, a single processing element provides a sufficient computational rate for video signals with Common Intermediate Format (CIF) at a frame rate up to 30 Hz. For hybrid source coding of CIF video signals at a frame rate of 30 Hz a multiprocessor system consisting of six processing elements is required. A linear speedup of the multiprocessor system compared to a single processing element is achieved. A VLSI implementation of a processing element in 0.8 µm CMOS technology is under development.  相似文献   

5.
Techniques for enhancing real-time CORBA quality of service   总被引:5,自引:0,他引:5  
End-to-end predictability of remote operations is essential for many fixed-priority distributed real-time and embedded (DRE) applications, such as command and control systems, manufacturing process control systems, large-scale distributed interactive simulations, and testbeam data acquisition systems. To enhance predictability, the Real-time CORBA specification defines standard middleware features that allow applications to allocate, schedule, and control key CPU, memory, and networking resources necessary to ensure end-to-end quality of service support. This paper provides two contributions to the study of Real-time CORBA middleware for DRE applications. First, we identify potential problems with ensuring predictable behavior in conventional middleware by examining the end-to-end critical code path of a remote invocation and identifying sources of unbounded priority inversions. Experimental results then illustrate how the problems we identify can yield unpredictable behavior in conventional middleware platforms. Second, we present design techniques for ensuring real-time quality of service in middleware. We show how middleware can be redesigned to use nonmultiplexed resources to eliminate sources of unbounded priority inversion. The empirical results in this paper are conducted using TAO, which is widely used and open-source DRE middleware compliant with the Real-time CORBA specification.  相似文献   

6.
This paper contributes a distributed packet controller which reduces queueing to a single stage in two-stage packet switches. Software and neural network based controllers are described. Simulations under a range of traffic conditions for a 1024×1024 switch size shows the simplest architecture has the best performance  相似文献   

7.
一种高性能的适用于AVS的二维整数逆变换实现结构   总被引:1,自引:0,他引:1  
张丁  张明  郑伟  王匡 《电路与系统学报》2006,11(5):93-95,110
针对AVS视频标准中的整数逆变换,本文提出了一种高性能的硬件实现方案.本方案采用两个一维逆变换核和4个16(16的双口SRAM.通过合理控制SRAM的读写方式,避免了数据的预处理与后处理,流水线的深度也得到减少.在列变换时,改变数据运算次序,从而保证了4个双口SRAM不影响运算速度.处理8(8的数据块,本结构仅需要37个时钟,与传统的实现方案相比,在同等运算速度下,面积节约28%.实验表明该结构适用于采用AVS标准的HDTV编解码器.  相似文献   

8.
In this paper, we have analyzed the register complexity of direct-form and transpose-form structures of FIR filter and explored the possibility of register reuse. We find that direct-form structure involves significantly less registers than the transpose-form structure, and it allows register reuse in parallel implementation. We analyze further the LUT consumption and other resources of DA-based parallel FIR filter structures, and find that the input delay unit, coefficient storage unit and partial product generation unit are also shared besides LUT words when multiple filter outputs are computed in parallel. Based on these finding, we propose a design approach, and used that to derive a DA-based architecture for reconfigurable block-based FIR filter, which is scalable for larger block-sizes and higher filter-lengths. Interestingly, the number of registers of the proposed structure does not increase proportionately with the block-size. This is a major advantage for area-delay and energy efficient high-throughput implementation of reconfigurable FIR filters of higher block-sizes. Theoretical comparison shows that the proposed structure for block-size 8 and filter-length 64 involves 60% more flip-flops, 6.2 times more adders, 3.5 times more AND-OR gates, and offers 8 times higher throughput. ASIC synthesis result shows that the proposed structure for block-size 8 and filter-length 64 involves 1.8 times less area-delay product (ADP) and energy per sample (EPS) than the existing design, and it can support 8 times higher throughput. The proposed structure for block sizes 4 and 8, respectively, consumes 38% and 50% less power than the exiting structure for the same throughput rates on average for different supply voltages.  相似文献   

9.
In this paper, we propose a hardware (H/W) architecture to find disparities for stereo matching in real time. After analyzing the arithmetic characteristic of stereo matching, we propose a new calculating method that reuses the intermediate results to minimize the calculation load and memory access. From this, we propose a stereo matching calculation cell and a new H/W architecture. Finally, we propose a new stereo matching processor. The implemented H/W can operate at the clock frequency of 250 MHz at least in the FPGA (field programmable gate array) environment and produce about 120 disparity images per second for HD stereo images.  相似文献   

10.
This paper addresses how to support both real-time and non-real-time communication services in a wireless LAN with dynamic time-division duplexed (D-TDD) transmission. With D-TDD, a frequency channel is time-shared for both downlink and uplink transmissions under the dynamic access control of the base station. The base station (1) handles uplink transmissions by polling mobiles in a certain order determined on a per-connection (per-message) basis for transmitting real-time (non-real-time) traffic from mobiles and (2) schedules the transmission of downlink packets. To handle location-dependent, time-varying, and bursty errors, we adopt the channel-state prediction, transmission deferment, and retransmission. We consider the problems of scheduling and multiplexing downlink packet transmissions, and polling mobiles for uplink transmissions depending on the channel state. We also establish conditions necessary to admit each new real-time connection by checking if the connection's delivery-delay bound can be guaranteed as long as the channel stays in good condition without compromising any of the existing guarantees. Last, the performance of the proposed protocol is evaluated to demonstrate how the protocol works and to study the effects of various parameters of the protocol  相似文献   

11.
Image feature separation is a crucial step for image segmentation in computer vision systems. One efficient and powerful approach is the unsupervised clustering of the resulting data set; however, it is a very computationally intensive task. This paper presents a high-performance architecture for unsupervised data clustering. This architecture is suitable for VLSI implementations. It exploits paradigms of massive connectivity like those inspired by neural networks, and parallelism and functionality integration that can be afforded by emerging nanometer semiconductor technologies. By utilizing a "global-quasi-systolic, local-hyper-connected" architectural approach, the hardware can process real-time DVD-quality video at the highest rate allowed by the MPEG-2 standard. The architecture is a realization of the histogram peak-climbing clustering algorithm, and it is the first special-purpose architecture that has been proposed for this important problem. The architecture has also been prototyped using a Xilinx field programmable gate array (FPGA) development environment. Although this paper discusses a computer vision application, the architecture presented can be utilized in the acceleration of the clustering process of any type of high-dimensionality data.  相似文献   

12.
A case for end system multicast   总被引:11,自引:0,他引:11  
The conventional wisdom has been that Internet protocol (IP) is the natural protocol layer for implementing multicast related functionality. However, more than a decade after its initial proposal, IP multicast is still plagued with concerns pertaining to scalability, network management, deployment, and support for higher layer functionality such as error, flow, and congestion control. We explore an alternative architecture that we term end system multicast, where end systems implement all multicast related functionality including membership management and packet replication. This shifting of multicast support from routers to end systems has the potential to address most problems associated with IP multicast. However, the key concern is the performance penalty associated with such a model. In particular, end system multicast introduces duplicate packets on physical links and incurs larger end-to-end delays than IP multicast. We study these performance concerns in the context of the Narada protocol. In Narada, end systems self-organize into an overlay structure using a fully distributed protocol. Further, end systems attempt to optimize the efficiency of the overlay by adapting to network dynamics and by considering application level performance. We present details of Narada and evaluate it using both simulation and Internet experiments. Our results indicate that the performance penalties are low both from the application and the network perspectives. We believe the potential benefits of transferring multicast functionality from end systems to routers significantly outweigh the performance penalty incurred.  相似文献   

13.
A multimedia communication system includes both the communication protocols used to transport the real-time data and the distributed computing system (DCS) within which any applications using the protocols must execute. The architecture presented attempts to integrate these communications protocols with the DCS in a smooth fashion in order to ease the writing of multimedia applications. Two issues are identified as being essential to the success of this integration: the synchronization of related real-time data streams, and the management of heterogeneous multimedia hardware. The synchronization problem is tackled by defining explicit synchronization properties at the presentation level and by providing control and synchronization operations within the DCS which operate in terms of these properties. The heterogeneity problems are addressed by separating the data transport semantics (protocols themselves) from the control semantics (protocol interfaces)  相似文献   

14.
In this paper, we present a hardware architecture for real-time three-dimensional (3D) surface model reconstruction from Integral Images (InIms). The proposed parallel digital system realizes a number of computational-heavy calculations in order to achieve real-time operation. The processing elements are deployed in a systolic architecture and operate on multiple image areas simultaneously. Moreover, memory organization allows random access to image data and copes with the increased processing throughput of the system. Operating results reveal that the proposed architecture is able to process 3D data at a real-time rate. The proposed system can handle large sized InIms in real time and outputs 3D scenes of enhanced depth and detailed texture, which apply to emerging 3D applications.  相似文献   

15.
面向高性能虚拟网络转发架构及设备VPP,通过重构数据平面的流水线处理模块,设计了基于矢量数据包处理技术的带内网络遥测方案,并在此基础上通过引入源路由机制引导遥测报文走向,实现了一种带内全网遥测的扩展案例.最后,基于虚拟机组网方式,搭建网络拓扑并进行网络性能评测.实验结果表明,该遥测系统能够在0.13 ms精度的遥测时间...  相似文献   

16.
This paper describes the VLSI for high-performance graphic control which utilizes two-level multiprocessor architecture. The VLSI chip is constructed of multiprocessor modules processing in parallel, and each processor module is constructed of multiexecutors using pipeline processing. This dedicated VLSI chip, designated as advanced CRT controller (ACRTC), has three processor modules, each independently controlling drawing, display, and timing. The graphic architecture of the drawing processor, which controls graphic drawing, is described. A high-level graphic language based on anX-Ycoordinate system is adopted. High-speed drawing is realized (drawing rate is 500 ns/pixel for drawing a line) by pipeline processing with three executors, the logical address executor, physical address executor, and color data executor.  相似文献   

17.
A CMOS fingerprint sensor architecture with embedded cellular logic for image processing is presented. The system senses a fingerprint image with a capacitive technique and performs several image-processing algorithms, including thinning the ridges of the fingerprint structure and encoding it to its characteristic features. Image processing is achieved by application of hexagonal local operators implemented in pixel-parallel mixed neuron-MOS/CMOS logic circuits. The massive parallelism of the architecture leads to a very low power dissipation. Results of simulations and measurements on a demonstrator chip in 0.65-μm double-poly standard CMOS technology are shown. The approach is well suited for person-identification applications, especially in small and low-cost portable systems, such as smart cards  相似文献   

18.
Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel  相似文献   

19.
Recently, Siamese based methods have made a breakthrough in the visual tracking field. However, the existing trackers still cannot take full advantage of the deep features. In this work, we improve the performances of Siamese trackers by complementary learning with different types of matching features. Specifically, a Matching Activation Network (MAN) is firstly designed to highlight the matching regions of the search image given a template. Since only sparse parts of feature maps contribute to the matching result, an important design choice is to emphasize the weak-matching features by erasing the strong-matching ones and learn complementary classifiers from both types of features. Then we propose a novel complementary region proposal network (CoRPN) to take complementary features as inputs and their outputs complement to each other, which are fused to improve the performance. Experiments show that our proposed tracker achieves leading performances on five tracking datasets while retaining real-time speed.  相似文献   

20.
提出一种基于行的实时、二维提升整数小波变换的VLSI结构。该结构包括行变换器、列变换器、中间缓存器以及输出控制单元。利用中间缓存器暂存行变换的中间结果,由输出控制单元按优先级从高到低的顺序依次输出各级小波系数。由于在硬件实现中采用基于行的提升变换结构,从而水平和垂直方向上的变换能并行处理。与现有结构相比,该结构具有并行度高、存储量低的特点,并且能够在一幅图像逐行扫描的时间间隔内完成整幅图像的多级小波变换。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号