期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

邓让钰谢伦国刘德峰潘国腾《计算机工程与科学》2012,34(1):43-48

随着半导体工艺水平的进步,CPU与存储器的速度差距越来越大,存储器带宽已成为计算机系统的关键资源。根据目前广泛使用的SDRAM存储器多体并行存储的结构特点,提出了一种基于虚通道的访存调度器和最小等待时间-读请求优先调度策略,避免了访存请求之间的数据相关性,加快了访存请求的调度,提高了存储器带宽的利用率。相似文献

2.

面向实时流处理的多核多线程处理器访存队列

田杭沛高德远樊晓桠朱怡安《计算机研究与发展》2009,46(10)

针对多核多线程处理器中乱序访存影响计算实时性的问题,在对典型访存队列进行研究的基础上提出了一种新的访存队列构建模型及其硬件结构.该模型采用窗口优化算法控制最差情况下的访存延迟,保证访存的实时性,同时又利用优化的乱序调度策略减少访存延迟.实验证明,该访存队列可控制最大访存延迟,与顺序访存相比,存储器具备更高的带宽,与传统的乱序访存相比较,可以充分满足计算的实时性需求,而存储器有效带宽基本不受影响,解决了多核多线程处理器承担实时流计算的基础难题. 相似文献

3.

Flubber: Two-level disk scheduling in virtualized environment

Hai Jin Xiao Ling Shadi Ibrahim Wenzhi Cao Song Wu Gabriel Antoniu 《Future Generation Computer Systems》2013,29(8):2222-2238

While virtualization enables multiple virtual machines (VMs)—with multiple operating systems and applications—to run within a physical server, it also complicates resource allocations trying to guarantee Quality of Service (QoS) requirements of the diverse applications running within these VMs. As QoS is crucial in the cloud, considerable research efforts have been directed towards CPU, memory and network allocations to provide effective QoS to VMs, but little attention has been devoted to disk resource allocation.This paper presents the design and implementation of Flubber, a two-level scheduling framework that decouples throughput and latency allocation to provide QoS guarantees to VMs while maintaining high disk utilization. The high-level throughput control regulates the pending requests from the VMs with an adaptive credit-rate controller, in order to meet the throughput requirements of different VMs and ensure performance isolation. Meanwhile, the low-level latency control, by the virtue of the batch and delay earliest deadline first mechanism (BD-EDF), re-orders all pending requests from VMs based on their deadlines, and batches them to disk devices taking into account the locality of accesses across VMs. We have implemented Flubber and made extensive evaluations on a Xen-based host. The results show that Flubber can simultaneously meet the different service requirements of VMs while improving the efficiency of the physical disk. The results also show improvement of up to 25% in the VM performance over state-of-art approaches: for example, in contract to the default Xen disk I/O scheduler—Completely Fair Queueing (CFQ)—besides achieving the desired QoS of each VM, Flubber speeds up the sequential and random reads by 17% and 25%, respectively, due to the efficient physical disk utilization. 相似文献

4.

Dynamic Scheduling of Multimedia Documents in a Single Server Multiple Clients Environment

《Journal of Parallel and Distributed Computing》1999,57(1):91-120

In a typical single server and multiple client distributed multimedia system, clients may send sporadic requests to the server for certain multimedia documents. The requests must be served with a fast response time and with the required quality of service guarantee. This requires the server to determine the transmission schedule of each multimedia stream while ensuring necessary inter- and intrastream synchronizations. There are two major drawbacks in the existing scheduling algorithms. First, it is assumed that all channels are available at the beginning of the scheduling, but in reality, requests arrive when others are in service; second, the cost of the scheduling itself is usually ignored. In general a feasible scheduling algorithm should have the following features: (1) the schedule must be generated in real time, (2) it should have small scheduling cost, and (3) it must be capable of handling multiple requests from multiple clients. In this paper, we propose two dynamic scheduling algorithms whose worst time complexity isO(n log nm+nm), wherenis the total number of data units in a retrieved multimedia document andmdenotes the number of available channels. The salient feature of the proposed algorithms is their inherent dynamic nature which can adjust the scheduling times for each individual request according to the slack time between consecutive requests. If the slack time between two requests is large, the scheduler can run longer in an attempt to find a better solution. This reduces the response time while maintaining a good quality of presentation. Through both simulation and analysis, we evaluate our algorithms and demonstrate their applicability in a realistic environment. 相似文献

5.

一种基于Inter-warp异构性的缓存管理与内存调度机制

方娟魏泽琳于婷雯《计算机工程与科学》2019,41(5):788-795

在GPU中,一个warp内的所有线程在锁步中执行相同的指令。某些线程的内存请求可以得到快速处理,而其余请求会经历较长时间。在最慢的请求完成之前,warp不能执行下一条指令,导致内存发散。对GPU中warp间的异构性进行了研究,实现并优化了一种基于inter warp异构性的缓存管理机制和内存调度策略,以减少内存发散和缓存排队延迟的负面影响。根据缓存命中率将warp分类,以驱动后面的3个组件：（1）基于warp类型的缓存旁路技术组件,使低缓存利用率的warp进入旁路,不访问L2缓存;（2）基于warp类型的缓存插入/提升策略组件,防止来自高缓存利用率warp的数据被过早清除;（3）基于warp类型的内存控制器组件,优先处理从高缓存利用率的warp接收到的请求,并优先处理来自相同warp的请求。基于warp间异构性的缓存管理和内存调度机制在8种不同的GPGPU应用中,与基准GPU相比,平均加速18.0％。相似文献

6.

Multi-bank闪存文件系统的一种I/O调度机制

赵培李国徽《计算机科学》2012,39(4):287-292

闪存以其体积小、抗震性强、能耗低、读取速度快等特点,被广泛应用于存储系统中。NOOP是闪存上传统的调度方法,但是NOOP的I/O性能较低,不能满足很多应用程序的要求。根据闪存读取速度快、多个banks(chips)可以并行运行等特点,提出了一种基于闪存文件系统YAFFS的Multi-bank闪存调度方法(简称MBS)。MBS并行地执行请求,且给予读请求更高的优先级。MBS根据AVL-based-tree机制识别出的写请求属性动态地将其分配到合适的bank中。实验结果表明,相比NOOP,MBS调度具有更高的I/O吞吐率、更短的请求响应时间并具有均匀的bank擦除次数和利用率。相似文献

7.

Managing Data Placement in Memory Systems with Multiple Memory Controllers

M. Awasthi D. Nellans K. Sudan R. Balasubramonian A. Davis 《International journal of parallel programming》2012,40(1):57-83

Modern processors such as Tilera’s Tile64, Intel’s Nehalem, and AMD’s Opteron are migrating memory controllers (MCs) on-chip, while maintaining a large, flat memory address space. This trend to utilize multiple MCs will likely continue and a core or socket will consequently need to route memory requests to the appropriate MC via an inter- or intra-socket interconnect fabric similar to AMD’s HyperTransport^TM, or Intel’s Quick-Path Interconnect^TM. Such systems are therefore subject to non-uniform memory access (NUMA) latencies because of the time spent traveling to remote MCs. Each MC will act as the gateway to a particular region of the physical memory. Data placement will therefore become increasingly critical in minimizing memory access latencies. Increased competition for memory resources will also increase the memory access latency variation in future systems. Proper allocation of workload data to the appropriate MC will be important in decreasing the variation and average latency when servicing memory requests. The allocation strategy will need to be aware of queuing delays, on-chip latencies, and row-buffer hit-rates for each MC. In this paper, we propose dynamic mechanisms that take these factors into account when placing data in appropriate slices of physical memory. We introduce adaptive first-touch page placement, and dynamic page-migration mechanisms to reduce DRAM access delays for multi-MC systems. We also introduce policies that can handle data placement in memory systems that have regions with heterogeneous properties. The proposed policies yield average performance improvements of 6.5% for adaptive first-touch page-placement, and 8.9% for a dynamic page-migration policy for a system with homogeneous DRAM DIMMs. We also show improvements in systems that contain DIMMs with different performance characteristics. 相似文献

8.

Memory access schedule minimization for embedded systems

Jingtong Hu Chun Jason Xue Wei-Che Tseng Qingfeng Zhuge Yingchao Zhao Edwin H.-M. Sha 《Journal of Systems Architecture》2012,58(1):48-59

The growing gap between microprocessor speed and DRAM speed is a major problem that computer designers are facing. In order to narrow the gap, it is necessary to improve DRAM’s speed and throughput. To achieve this goal, this paper proposes techniques to take advantage of the characteristics of the 3-stage access of contemporary DRAM chips by grouping the accesses of the same row together and interleaving the execution of memory accesses from different banks. A family of Bubble Filling Scheduling (BFS) algorithms are proposed in this paper to minimize memory access schedule length and improve memory access time for embedded systems.When the memory access trace is known in some application-specific embedded systems, this information can be fully utilized to generate efficient memory access schedules. The offline BFS algorithm can generate schedules which are 47.49% shorter than in-order scheduling and 8.51% shorter than existing burst scheduling on average. When memory accesses are received by the single memory controller in real time, the memory accesses have to be scheduled as they come. The online BFS algorithm in this paper serves this purpose and generates schedules which are 58.47% shorter than in-order scheduling and 4.73% shorter than burst scheduling on average. To improve the memory throughput and further reduce the memory access schedule, an architecture with dual memory controllers is proposed. According to the experimental results, the dual controller algorithm can generate schedules which are 62.89% shorter than in-order scheduling, 14.23% shorter than burst scheduling, and 10.07% shorter than single controller BFS algorithms on average. 相似文献

9.

Optimal admission control algorithms for scheduling burst data in CDMA multimedia systems

《Computer Networks》2002,38(6):765-777

The third generation mobile communication systems are widely envisioned to be based on wideband code division multiple access (CDMA) technologies to support high data rate (HDR) packet data services. To effectively harness the precious bandwidth while satisfying the HDR requests from users, it is crucial to use a judicious burst admission control algorithm. In this paper, we propose and evaluate the performance of a novel jointly adaptive burst admission algorithm, called the synergistic burst admission control algorithm to allocate valuable resources (i.e., channels) in wideband CDMA systems to burst HDR requests. We consider the spatial dimension only, and by that we mean the algorithm performs scheduling and admission control, for the current frame only, based solely on the selection diversity in the geographical and mobility aspects. The scheduler does not exploit the temporal dimension in that it does not make allocation decisions about future frames (i.e., requests that do not get allocation are simply ignored and such requests will be treated as new request in future frames). In the physical layer, we use a variable rate channel-adaptive modulation and coding system which offers variable throughput depending on the instantaneous channel condition. In the MAC layer, we use the proposed optimal multiple-burst admission algorithm, induced by our novel integer programming formulation of the admission control and scheduling problem. We demonstrate that synergy could be attained by interactions between the adaptive physical layer and the burst admission layer. Both the forward link and the reverse link burst requests are considered and the system is evaluated by dynamic simulations which takes into account of the user mobility, power control and soft handoff. We found that significant performance improvement, in terms of average packet delay, data user capacity and coverage, could be achieved by our scheme compared to the existing burst assignment algorithms. 相似文献

10.

Adaptive History-Based Memory Schedulers for Modern Processors

Hur I. Lin C. 《Micro, IEEE》2006,26(1):22-29

Careful memory scheduling can increase memory bandwidth and overall system performance. We present a new memory scheduler that makes decisions based on the history of recently scheduled operations, providing two advantages: It can better reason about the delays associated with complex DRAM structure, and it can adapt to different observed workload. 相似文献

11.

Demand look-ahead memory access scheduling for 3D graphics processing units

Chih-Chieh Hsiao Min-Jen Lo Slo-Li Chu 《Multimedia Tools and Applications》2014,73(3):1391-1416

With the rapid growing complexity of 3D applications, the memory subsystem has become the most bandwidth-exhausting bottleneck in a Graphics Processing Unit (GPU). To produce realistic images, tens to hundreds of thousands of primitives are used. Furthermore, each primitive generates thousands of pixels, and these pixels are computed by shaders with special effects, even to blend multiple texture pixels from external memory to obtain a final color. To hide the long latency texture operations, the shaders are usually highly multithreaded to increase its throughput. However, conventional memory scheduling mechanisms are unaware of the producer-consumer relationship between primitives and pixels. The conventional scheduling mechanisms neither assume that all initiators are independent nor that they use a fixed priority scheme. This paper proposes Demand Look-Ahead (DLA) memory access scheduling based on the statuses of each unit in the GPU, and dynamically generates priority for the memory request scheduler. By considering the producer-consumer relationship, the proposed mechanism reschedules most urgent requests to be serviced first. Experimental results show that the proposed DLA improves 1.47 % and 1.44 % in FPS and IPC, respectively, than First-Ready First-Come-First-Serve (FR-FCFS). By integrating DLA with Bank-level Parallelism Awareness (BPA), DLA-BPA improves FPS and IPC by 7.28 % and 6.55 %, respectively. Furthermore, shader thread performance is improved by 22.06 % and increases the attainable bandwidth by 5.91 % with DLA-BPA. 相似文献

12.

Exploring wait tolerance in effective batching for video-on-demand scheduling 总被引：3，自引：0，他引：3

Hadas Shachnai Philip S. Yu 《Multimedia Systems》1998,6(6):382-394

In a video-on-demand (VOD) environment, batching requests for the same video to share a common video stream can lead to significant improvement in throughput. Using the wait tolerance characteristic that is commonly observed in viewers behavior, we introduce a new paradigm for scheduling in VOD systems. We propose and analyze two classes of scheduling schemes: the Max_Batch and Min_Idle schemes that provide two alternative ways for using a given stream capacity for effective batching. In making a video selection, the proposed schemes take into consideration the next stream completion time, as well as the viewer wait tolerance. We compared the proposed schemes with the two previously studied schemes: (1) first-come-first-served (FCFS) that schedules the video with the longest waiting request and (2) the maximum queue length (MQL) scheme that selects the video with the maximum number of waiting requests. We show through simulations that the proposed schemes substantially outperform FCFS and MQL in reducing the viewer turn-away probability, while maintaining a small average response time. In terms of system resources, we show that, by exploiting the viewers wait tolerance, the proposed schemes can significantly reduce the server capacity required for achieving a given level of throughput and turn-away probability as compared to the FCFS and MQL. Furthermore, our study shows that an aggressive use of the viewer wait tolerance for batching may not yield the best strategy, and that other factors, such as the resulting response time, fairness, and loss of viewers, should be taken into account. 相似文献

13.

Scheduling broadcasts with deadlines

《Theoretical computer science》2004,325(3):479-488

We investigate the problem of scheduling broadcasts in data delivering systems via broadcast, where a number of requests from several clients can be simultaneously satisfied by one broadcast of a server. Most of prior work has focused on minimizing the total flow time of requests. It assumes that once a request arrives, it will be held until satisfied. In this paper, we are concerned with the situation that clients may leave the system if their requests are still unsatisfied after waiting for some time, that is, each request has a deadline. The problem of maximizing the throughput, for example, the total number of satisfied requests, is developed, and there are given online algorithms achieving constant competitive ratios. 相似文献

14.

WLAN中支持实时业务的自适应调度机制

黄景廉《计算机应用》2008,28(11):2759-2762

针对IEEE 802.11e无线局域网参考调度算法分配定长的发送机会（TXOP）的缺陷,提出了一种支持实时业务的自适应调度机制。该机制通过终端反馈业务流发送队列的缓存数据量,动态分配变长TXOP以满足不同负载、不同业务的要求;当有新业务请求加入系统时,调度机制在保障现有业务最低时延的前提下,采用按比例减少现有业务的请求TXOP时间的长度的方式,并采用线性规划的优化方法,尽可能为新业务安排时间允许接入。详细的仿真实验及与IEEE 802.11e参考调度机制的比较表明,提出的调度机制提高了系统的吞吐量,有效地降低了实时业务的时延。相似文献

15.

An 802.11e HCCA scheduler with an end-to-end quality aware territory method

Jorge Navarro-Ortiz Pablo Ameigeiras Juan J. Ramos-Munoz Juan M. Lopez-Soler 《Computer Communications》2009,32(11):1281-1297

In this paper we present a solution for the IEEE 802.11e HCCA (Hybrid coordination function Controlled Channel Access) mechanism which aims both at supporting strict real-time traffic requirements and, simultaneously, at handling TCP applications efficiently. Our proposal combines a packet scheduler and a dynamic resource allocation algorithm. The scheduling discipline is based on the Monolithic Shaper-Scheduler, which is a modification of a General Processor Sharing (GPS) related scheduler. It supports minimum-bandwidth and delay guarantees and, at the same time, it achieves the optimal latency of all the GPS-related schedulers. In addition, our innovative resource allocation procedure, called the territory method, aims at prioritizing real time services and at improving the performance of TCP applications. For this purpose, it splits the wireless channel capacity (in terms of transmission opportunities) into different territories for the different types of traffic, taking into account the end-to-end network dynamics. In order to give support to the desired applications, we consider the following traffic classes: conversational, streaming, interactive and best-effort. The so called territories shrink or expand depending on the current quality experienced by the corresponding traffic class. We evaluated the performance of our solution through extensive simulations in a heterogeneous wired-cum-wireless scenario under different traffic conditions. Additionally, we compare our proposal to other HCCA scheduling algorithms, the HCCA reference scheduler and Fair Hybrid Coordination Function (FHCF). The results show that the combination of the MSS and the territory method obtains higher system capacity for VoIP traffic (up to 32 users) in the simulated scenario, compared to FHCF and the HCCA reference scheduler (13 users). In addition, the MSS with the territory method also improves the throughput of TCP sources (one FTP application achieves between 6.1 Mbps without VoIP traffic and 2.1 Mbps with 20 VoIP users) compared to the reference scheduler (at most 388 kbps) and FHCF (with a maximum FTP throughput of 4.8 Mbps). 相似文献

16.

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

Jingyu ZHANG Chentao WU Dingyu YANG Yuanyi CHEN Xiaodong MENG Liting XU Minyi GUO 《Frontiers of Computer Science》2018,12(6):1090-1104

The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is larger but slower. In the existing research, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together in shared cache systems, ranging from SRAM structure improvement to optimizing cache tags and data access. However, little attention has been paid to designing a shared cache scheduling scheme for multiprogrammed workloads with different memory footprints in multicore systems. Motivated by this, we propose a hybrid shared cache scheduling scheme that allows a multicore system to utilize SRAM and 3D-stacked DRAM efficiently, thus achieving better workload performance. This scheduling scheme employs (1) a cache monitor, which is used to collect cache statistics; (2) a cache evaluator, which is used to evaluate the cache information during the process of programs being executed; and (3) a cache switcher, which is used to self-adaptively choose SRAM or DRAM shared cache modules. A cache data migration policy is naturally developed to guarantee that the scheduling scheme works correctly. Extensive experiments are conducted to evaluate the workload performance of our proposed scheme. The experimental results showed that our method can improve the multiprogrammed workload performance by up to 25% compared with state-of-the-art methods (including conventional and DRAM cache systems). 相似文献

17.

Client-side straggler-aware I/O scheduler for object-based parallel file systems

《Parallel Computing》2019

Object-based parallel file systems have emerged as promising storage solutions for high-performance computing (HPC) systems. Despite the fact that object storage provides a flexible interface, scheduling highly concurrent I/O requests that access a large number of objects still remains as a challenging problem, especially in the case when stragglers (storage servers that are significantly slower than others) exist in the system. An efficient I/O scheduler needs to avoid possible stragglers to achieve low latency and high throughput. In this paper, we introduce a log-assisted straggler-aware I/O scheduling to mitigate the impact of storage server stragglers. The contribution of this study is threefold. First, we introduce a client-side, log-assisted, straggler-aware I/O scheduler architecture to tackle the storage straggler issue in HPC systems. Second, we present three scheduling algorithms that can make efficient decision for scheduling I/Os while avoiding stragglers based on such an architecture. Third, we evaluate the proposed I/O scheduler using simulations, and the simulation results have confirmed the promise of the newly introduced straggler-aware I/O scheduler. 相似文献

18.

Performance analysis of scheduling and interference coordination policies for OFDMA networks

《Computer Networks》2008,52(6):1252-1271

In orthogonal frequency division multiple access systems there is an intimate relationship between the packet scheduler and the inter-cell interference coordination (ICIC) functionalities: they determine the set of frequency channels (sub-carriers) that are used to carry the packets of in-progress sessions. In this paper we build on previous work – in which we compared the so called random and coordinated ICIC policies – and analyze three packet scheduling methods. The performance measures of interest are the session blocking probabilities and the overall throughput. We find that the performance of the so-called Fifty-Fifty and What-It-Wants scheduling policies is somewhat improved by coordinated sub-carrier allocation, especially in poor signal-to-noise-and-interference situations and at medium traffic load values. The performance of the All-Or-Nothing scheduler is practically insensitive to the choice of the sub-carrier allocation policy. 相似文献

19.

一种资源优化的双最小均衡Web集群区分服务调度算法

刘安丰陈志刚龙国平曾志文《计算机研究与发展》2005,42(11):1969-1976

在一种新的Web集群体系结构的基础上,提出了一种资源优化的双最小均衡区分服务调度算法：首先在前端调度器按资源均衡度将Web请求分配到各后台服务器．然后将Web请求的优先级与资源均衡度两个特征参数结合起来,综合设计后台服务器的Web请求调度顺序,为了评估该算法的性能,进行了大量的模拟实验．在与其他著名调度策略如分离式调度的对比结果显示：双最小均衡调度算法使Web请求的效率提高了11％,同时很好地实现了区分服务．证实了资源优化调度策略具有一定的普遍意义．相似文献

20.

PASS: a simple,efficient parallelism-aware solid state drive I/O scheduler

Hong-yan LI Nai-xue XIONG Ping HUANG Chao GUI 《浙江大学学报:C卷英文版》2014,15(5):321-336

Emerging non-volatile memory technologies, especially flash-based solid state drives （SSDs）, have increasingly been adopted in the storage stack. They provide numerous advantages over traditional mechanically rotating hard disk drives （HDDs） and have a tendency to replace HDDs. Due to the long existence of HDDs as primary building blocks for storage systems, however, much of the system software has been specially designed for HDD and may not be optimal for non-volatile memory media. Therefore, in order to realistically leverage its superior raw performance to the maximum, the existing upper layer software has to be re-evaluated or re-designed. To this end, in this paper, we propose PASS, an optimized I/O scheduler at the Linux block layer to accommodate the changing trend of underlying storage devices toward flash-based SSDs. PASS takes the rich internal parallelism in SSDs into account when dispatching requests to the device driver in order to achieve high performance. Specifically, it parti-tions the logical storage space into fixed-size regions （preferably the component package sizes） as scheduling units. These scheduling units are serviced in a round-robin manner and for every chance that the chosen dispatching unit issues only a batch of either read or write requests to suppress the excessive mutual interference. Additionally, the requests are sorted according to their visiting addresses while waiting in the dispatching queues to exploit high sequential performance of SSD. The experimental results with a variety of workloads have shown that PASS outperforms the four Linux off-the-shelf I/O schedulers by a degree of 3%up to 41%, while at the same time it improves the lifetime significantly, due to reducing the internal write amplification. 相似文献