首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
For its low-latency, high bandwidth, and low CPU utilization, Remote Direct Memory Access (RDMA) has established itself as an effective data movement technology in many networking environments. However, the transport protocols of grid run-time systems, such as GridFTP in Globus, are not yet capable of utilizing RDMA. In this study, we examine the architecture of GridFTP for the feasibility of enabling RDMA. An RDMA-capable XIO (RXIO) framework is designed and implemented to extend its XIO system and match the characteristics of RDMA. Our experimental results demonstrate that RDMA can significantly improve the performance of GridFTP, reducing the latency by 32% and increasing the bandwidth by more than three times. In achieving such performance improvements, RDMA dramatically cuts down CPU utilization of GridFTP clients and servers. These results demonstrate that RXIO can effectively exploit the benefits of RDMA for GridFTP. It offers a good prototype to further leverage GridFTP on wide-area RDMA networks.  相似文献   

2.
To model a layered video streaming system in super-peer overlay networks that faces with heterogeneity and volatility of peers, we formulate a layer scheduling problem from understanding some constraints such as layer dependency, transmission rule, and bandwidth heterogeneity. To solve this problem, we propose a new layer scheduling algorithm using a real-coded messy genetic algorithm, providing a feasible solution with low complexity in decision. We also propose a peer-utility-based promotion algorithm that selects the most qualified neighbor to guarantee the sustained quality of streaming despite high intensity of churn. Simulation results show that the proposed layer scheduling scheme can achieve the most near-optimal solutions compared to the four conventional scheduling heuristics in the average streaming ratio. It also highly outperforms those with different peer selection strategies in terms of the average bandwidth (6.9 % higher at least) and the variation of utilization (11.3 % lower at least).  相似文献   

3.
适用于多核处理器的簇状片上网络设计   总被引:1,自引:1,他引:0       下载免费PDF全文
提出一种新型簇状片上网络架构。该架构以二维网状拓扑结构连接各个簇单元,每个簇单元由3个处理器、1个直接访存单元和1个簇共享存储单元组成。基于该架构的多核处理器可以获得更高的通信效率及存储器利用率。在实验系统上实现3 780点的快速傅里叶变换,结果表明,在快速傅里叶变换应用中存储器的利用率能提升至79.5%。  相似文献   

4.
Fast Motion Estimation on Graphics Hardware for H.264 Video Encoding   总被引:1,自引:0,他引:1  
The video coding standard H.264 supports video compression with a higher coding efficiency than previous standards. However, this comes at the expense of an increased encoding complexity, in particular for motion estimation which becomes a very time consuming task even for today's central processing units (CPU). On the other hand, modern graphics hardware includes a powerful graphics processing unit (GPU) whose computing power remains idle most of the time. In this paper, we present a GPU based approach to motion estimation for the purpose of H.264 video encoding. A small diamond search is adapted to the programming model of modern GPUs to exploit their available parallel computing power and memory bandwidth. Experimental results demonstrate a significant reduction of computation time and a competitive encoding quality compared to a CPU UMHexagonS implementation while enabling the CPU to process other encoding tasks in parallel.  相似文献   

5.
Accessing pixels in memory is a well-known bottleneck of SIMD (single instruction multiple data) processors in video/imaging. To tackle it, we propose new block and row access modes of parallel on-chip memory subsystem, which enable a higher processing throughput and lower energy consumption than the access modes of the state-of-the-art subsystems. The new access modes significantly reduce the number of on-chip memory accesses, and thereby accelerate one of key video/imaging kernels: sub-pixel block-matching motion estimation. The main idea is to exploit spatial overlaps of blocks/rows accessed for pixel interpolation, which are known at the subsystem design-time, and merge multiple accesses into a single one by accessing somewhat more pixels at a time than with other parallel memories. To avoid the need for a wider, and, therefore, more costly SIMD datapath, we propose new memory read operations that split all pixels accessed at a time into multiple SIMD-wide blocks/rows, in a convenient way for further processing. As a proof of concept, we describe a parametric, scalable, and cost-efficient architecture that supports the new access modes. The architecture is based on a previously proposed set of memory banks with multiple pixels per bank word, and a previously proposed shifted scheme for arranging pixels in the banks. We analytically and experimentally demonstrate advantages of this work on a case study of sub-pixel motion estimation for video frame-rate conversion. The implemented motion estimator processes 2160p video at 60 fps in real time, while clocked at 600 MHz. Compared to the implementations based on the state-of-the-art subsystems, this work enables 40–70 % higher throughput, consumes 17–44 % less energy and has similar silicon area and off-chip memory bandwidth costs. That is 1.8–2.9 times more efficient than the prior art, considering the throughput and all costs, i.e., consumption, area, and off-chip bandwidth. Such a higher efficiency is the result of the new access modes, which reduced the number of on-chip memory accesses by 1.6–2.1 times, and the cost-efficient architecture.  相似文献   

6.
Current commercial live video streaming systems are based either on a typical client–server (cloud) or on a peer-to-peer (P2P) architecture. The former architecture is preferred for stability and QoS, provided that the system is not stretched beyond its bandwidth capacity, while the latter is scalable with small bandwidth and management cost. In this paper, we propose a P2P live streaming architecture in which by adapting dynamically the playback rate we guarantee that peers receive the stream even in cases where the total upload bandwidth changes very abruptly. In order to achieve this we develop a scalable mechanism that by probing only a small subset of peers monitors dynamically the total available bandwidth resources and a playback rate control mechanism that dynamically adapts playback rate to the aforementioned resources. We model analytically the relationship between the playback rate and the available bandwidth resources by using difference equations and in this way we are able to apply a control theoretical approach. We also quantify monitoring inaccuracies and dynamic bandwidth changes and we calculate dynamically, as a function of these, the maximum playback rate for which the proposed system able to guarantee the uninterrupted and complete distribution of the stream. Finally, we evaluate the control strategy and the theoretical model in a packet level simulator of a complete P2P live streaming system that we designed in OPNET Modeler. Our evaluation results show the uninterrupted and complete stream delivery (every peer receives more than 99 % of video blocks in every scenario) even in very adverse bandwidth changes.  相似文献   

7.
IP-based design is used to tackle complexity and reduce time-to-market in systems-on-chip with high-performance requirements. Component integration, the main part in this process, is a complicated and time-consuming task, largely due to interfacing issues. Standard interfaces can help to reduce the integration efforts. However, existing implementations use more resources than necessary and lack of a formalism to capture and manipulate resource requirements and design constraints. In this paper, we propose a novel interface, the Component Interconnect and Data Access (CIDA), and its implementation, based on the interface automata formalism. CIDA can be used to capture system-on-chip architecture, with primarily focus on video processing applications, which are mostly based on data streaming paradigm, with occasional direct memory accesses. We introduce the notion of component-interface clustering for resource reduction and provide a method to automatize this process. With real-life video processing applications implemented in FPGA, we show that our approach can reduce the resource usage (#slices) by an average of 20 % and reduce power consumption by 5 % compared to implementation based on vendor interfaces.  相似文献   

8.
串行RapidIO支持两种工作方式:Message和DirectIO方式。DirectIO方式使用简单,但是它在连续传输多包的情况下,CPU需要等待LSU寄存器空闲。为了解决该问题,提出了RapidIO链的传输新方案,即用EDMA通道代替CPU配置SRIO的LSU寄存器。实验表明该方案能有效地降低CPU负荷。  相似文献   

9.
洪途  景乃锋 《计算机工程》2021,47(2):239-245
粗粒度可重构阵列架构兼具灵活性和高效性,但高计算吞吐量的特性也会给访存带来压力.在片下动态存储器带宽相对固定的情况下,设计一种存算解耦合的访存结构.将控制逻辑集成在轻量级的存储空间中,通过可配置的存储空间隔离访存和计算的循环迭代,从而掩盖内存延时,同时利用该结构进行串联和对齐操作,以适配不同的计算访存频率比并优化间接访...  相似文献   

10.
As mobile devices such as tablet PCs and smartphones proliferate, the online video consumption over a wireless network has been accelerated. From this phenomenon, there are several challenges to provide the video streaming service more efficiently and stably in the heterogeneous mobile environment. In order to guarantee the QoS of real-time HD video services, the steady and reliable wireless mesh is necessary. Furthermore, the video service providers have to maintain the QoS by provisioning streaming servers to respond the clients’ request of different video resolution. In this paper, we propose a reliable cloud-based video delivery scheme with the split-layer SVC encoding and real-time adaptive multi-interface selection over LTE and WiFi links. A split-layer video streaming can effectively scale to manage the required channels on each layer of various client connections. Moreover, split-layer SVC model brings streaming service providers a remarkable opportunity to stream video over multiple interfaces (e.g. WiFi, LTE, etc.) with a separate controlling based on their network status. Through the adaptive interface selection, the proposed system aims to ensure the maximizing video quality which the bandwidth of LTE/WiFi accommodates. In addition, the system offers cost-effective streaming to mobile clients by saving the LTE data consumption. In our system, an adaptive interface selection is developed with two different algorithms, such as INSTANT and EWMA methods. We implemented a prototype of mobile client based on iOS particularly by using iPhone5S. Moreover, we also employ the split-layer SVC encodes in streaming server-side as the add-on module to SVC reference encoding tool in a virtualized environment of KVM hypervisor. We evaluated the proposed system in an emulated and a real-world heterogeneous wireless network environments. The results show that the proposed system not only achieves to guarantee the highest quality of video frames via WiFi and LTE simultaneous connection, but also efficiently saves LTE bandwidth consumption for cost-effectiveness to client-side. Our proposed method provides the highest video quality without deadline misses, while it consumes 50.6% LTE bandwidth of ‘LTE-only’ method and 72.8% of the conventional (non-split) SVC streaming over a real-world mobile environment.  相似文献   

11.
杨宇红  郑世宝 《计算机工程》2006,32(14):258-260
提出了一种HDTV解码器片上系统(SoC)平台的设计,可进行多种IP核的集成,如MIPs CPU、HDTV视频解码器、视频处理器、OSD及外围IP设备,这些IP核分别可通过一个独立的接口与平台相连接。通过对总线和存储器访问带宽的估计,可以进行有效的数据通路管理。无需改变平台的系统结构就可灵活地添加新的功能,因此该SoC架构适合广泛地应用于数字视频媒体处理。  相似文献   

12.
基于GM8180的嵌入式视频服务器设计   总被引:1,自引:0,他引:1  
详细介绍了一种基于台湾智原科技公司GM8180芯片的嵌入式视频服务器设计。描述了该服务器的结构和功能,对系统的各个组成模块进行了分析和介绍,如视频采集模块、音频输入和输出模块、以太网模块等。在软件方面,对RTSP流媒体服务器软件架构和视频采集、编码的流程进行了说明。系统采用H.264视频编码技术,单芯片即可实现2路D1的H.264实时编码以及多用户的实时流媒体服务。  相似文献   

13.
一种Internet上流媒体代理高速缓存的框架   总被引:1,自引:0,他引:1  
潘浩  宋瀚涛 《计算机工程》2003,29(20):170-172
文章采用代理高速缓存技术解决Internet上流媒体传输时瓶颈带宽对传输质量的影响,分析了现有Web缓存技术用于音频、视频等连续媒体对象的不足,提出了一种新的流媒体代理高速缓存的框架,讨论了框架中各模块实现的关键技术。  相似文献   

14.
为了提高通信网络的带宽、增加通信网络同主机的耦合度,基于共享内存的思想,设计实现了一种直接内存通信网络适配卡(DMC网卡);首先参照内存条的标准,设计DMC网卡与主机之间的接口;使DMC网卡上的存储芯片作为共享内存被CPU和DMC网卡使用;然后,发送数据时,CPU将数据直接写入DMC网卡上的共享内存区;接收时,CPU则直接从共享内存区读取数据,从而减少了通信数据的拷贝次数;最后利用操作系统的内存管理机制,编写驱动程序,实现系统对DMC网卡的管理和操作;经DMC网卡原理样机测试,使用DMC技术能提高网络带宽,改善系统性能;证明了直接内存通信原理的正确性。  相似文献   

15.
Scratch‐pad memory (SPM), a small, fast, software‐managed on‐chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever‐widening performance gap between processors and main memory, it is very important to reduce the serious off‐chip memory access overheads caused by transferring data between SPM and off‐chip memory. In this paper, we propose a novel compiler‐assisted technique, ISOS (Iteration‐access‐pattern‐based Space Overlapping SPM management), for dynamic SPM management with DMA (Direct Memory Access). In ISOS, we combine both SPM and DMA for performance optimization by exploiting the chance to overlap SPM space so as to further utilize the limited SPM space and reduce the number of DMA operations. We implement our technique based on IMPACT and conduct experiments using a set of benchmarks from DSPstone and Mediabench on the cycle‐accurate VLIW simulator of Trimaran. The experimental results show that our technique achieves run‐time performance improvement compared with the previous work. The average improvements are 13.15, 19.05, and 25.52% when the SPM sizes are 1KB, 512 bytes, and 256 bytes, respectively. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

16.
文章介绍了一种基于MPEG-4的精细的可伸缩性编码(FGS)架构的视频编码器的软硬件架构。MPEG-4的FGS是为Internet视频流应用的需要最近发展出来的一种有效的视频编码方案。并结合FGS算法的特点,选择FGS算法中最适当的数据存储类型,节省了大量的外存带宽。  相似文献   

17.
Compared with the traditional client/server streaming model, peer-assisted video streaming has been shown to provide better scalability with lower infrastructure cost. In this paper, we describe how peer-assisted video streaming can be implemented through real-time service oriented architecture. The first part of the paper presents an overall design of the Peer-Assisted ContenT Service (PACTS). We discuss the motivation, principles and service oriented architecture of PACTS modules and specify the workflow among them. By organizing elements of traditional video streaming and peer-to-peer computing into loosely-coupled composable middleware services and distributing them among participating entities, PACTS enables high-quality low-cost video streaming at a large scale and in real time. The second part of the paper describes an implementation of PACTS using existing off-the-shelf software followed by a performance evaluation based on practical environment settings. We illustrate the challenges and our approaches in designing distributed and highly efficient algorithms. In particular, the algorithms for performing peering-selection and incentive-driven pre-fetching are studied in detail. These designs are extensively evaluated by packet-level simulations. We show that our implementation of PACTS effectively offload server’s bandwidth demand without sacrificing the service quality. This benefit is further verified in dynamic settings with system churns. The simulation results show that the incentive mechanism from our service level agreement efficiently stabilizes the server bandwidth utilization with less than 4.5% control traffic overhead.  相似文献   

18.
19.
Two crucial aspects of general-purpose embedded visual point tracking are addressed in this paper. First, the algorithm should reliably track as many points as possible. Second, the computation should achieve real-time video processing, which is challenging on low power embedded platforms. We propose a new multi-scale semi-dense point tracker called Video Extruder, whose purpose is to fill the gap between short-term, dense motion estimation (optical flow) and long-term, sparse salient point tracking. This paper presents a new detector, including a new salience function with low computational complexity and a new selection strategy that allows to obtain a large number of keypoints. Its density and reliability in mobile video scenarios are compared with those of the FAST detector. Then, a multi-scale matching strategy is presented, based on hybrid regional coarse-to-fine and temporal prediction, which provides robustness to large camera and object accelerations. Filtering and merging strategies are then used to eliminate most of the wrong or useless trajectories. Thanks to its high degree of parallelism, the proposed algorithm extracts beams of trajectories from the video very efficiently. We compare it with the state-of-the-art pyramidal Lucas–Kanade point tracker and show that, in short range mobile video scenarios, it yields similar quality results, while being up to one order of magnitude faster. Three different parallel implementations of this tracker are presented, on multi-core CPU, GPU and ARM SoCs. On a commodity 2010 CPU, it can track 8,500 points in a 640 × 480 video at 150 Hz.  相似文献   

20.
In Internet multimedia streaming, the quality of the delivered media can be adapted to the Quality of Service provided by the underlying network, thanks to encoding algorithms. These allow a fine grained enhancement of a low quality base layer at streaming time. The main objective that should be satisfied in such systems is to avoid the starvation of the decoding process and consequent playout interruptions. In this work, we tackle the problem using a control theoretic approach. In particular, we design and implement the novel end-to-end Quality Adaptive Scheduler for properly distributing the network available bandwidth among base and enhancement layers. The developed solution can be adopted in many contexts given that it has been designed without assumptions on the delivered media nor on the protocol stack. Anyway, to test its effectiveness, we have casted it in a H.264/AVC SVC based video streaming architecture for unicast Internet applications. The performance of the scheduler has been experimentally evaluated in both a controlled testbed and several “wild” Internet scenarios, including also UMTS and satellite radio links. Results have clearly demonstrated that our Quality Adaptive Scheduler is able to significantly improve the performance of the video streaming system in all operative conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号