期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CA-DAG: Modeling Communication-Aware Applications for Scheduling in Cloud Computing

Dzmitry Kliazovich Johnatan E. Pecero Andrei Tchernykh Pascal Bouvry Samee U. Khan Albert Y. Zomaya 《Journal of Grid Computing》2016,14(1):23-39

This paper addresses performance issues of resource allocation in cloud computing. We review requirements of different cloud applications and identify the need of considering communication processes explicitly and equally to the computing tasks. Following this observation, we propose a new communication-aware model of cloud computing applications, called CA-DAG. This model is based on Directed Acyclic Graphs that in addition to computing vertices include separate vertices to represent communications. Such a representation allows making separate resource allocation decisions: assigning processors to handle computing jobs, and network resources for information transmissions. The proposed CA-DAG model creates space for optimization of a number of existing solutions to resource allocation and for developing novel scheduling schemes of improved efficiency. 相似文献

2.

Register allocation and spilling using the expected distance heuristic

下载免费PDF全文

Neil Burroughs 《Software》2016,46(11):1499-1523

The primary goal of the register allocation phase in a compiler is to minimize register spills to memory. Spill decisions by the allocator are often made based on the costs of spilling a virtual register and, therefore, on an assumed placement of spill instructions. However, because most allocators make these decisions incrementally, placement opportunities can change as allocation proceeds, calling into question the basis for the original spill decision. An alternative heuristic to placement costs for spill decisions focuses on where program execution will lead. Spilling the virtual register with the Furthest Next Use is known to lead to the minimum number of loads under certain conditions in straight‐line code. While it has been implemented in register allocation in different forms, none of these implementations fully exploits profiling information. We present a register allocator that can adapt to improved profiling information, using branch probabilities to compute an Expected Distance to Next Use for making spill decisions and block frequency information to optimize post‐allocation spill instruction placement. Spill placement is optimized after allocation using a novel method for minimizing spill instruction costs on the control flow graph. Our evaluation of the allocator compared with LLVM recognizes more than 36% and 50% reductions, on average, in the number of dynamically executed store and load instructions, respectively, when using statically derived profiling information. When using dynamically gathered profiling, these improvements increase to 50% and 60% reductions, on average, for stores and loads, respectively. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

3.

基于雾计算的计算资源分配方案

下载免费PDF全文

汤琳煜蒋加伏谷科《计算机工程与应用》2019,55(19):96-104

雾计算可以为用户提供近距离的数据存储、计算和其他服务，因此雾计算中的任务调度和资源分配已经成为一个新的研究热点。考虑终端用户和雾设备通常处于一种相对开放的状态，扩展了雾计算的体系结构，提出一种开放式雾计算环境中基于稳定匹配的计算资源分配方案，利用雾网络中动态的计算资源协同为用户提供计算服务并收取计算收益，同时终端用户向雾服务器提交任务请求并支付一定的费用。基于稳定匹配的思想，利用子任务的优先级列表、子任务和计算服务设备的偏好列表解决子任务与计算服务设备的分配问题，保证任务的完成时间和计算服务设备的收益。通过实验对方案性能进行了分析，实验结果表明该方案的资源分配时间相对稳定，且在执行雾计算任务时延以及任务违规率上都优于SGA算法和ACOSA算法。相似文献

4.

Elimination of parallel copies using code motion on data dependence graphs

《Computer Languages, Systems and Structures》2013,39(1):25-47

Register allocation regained much interest in recent years due to the development of decoupled strategies that split the problem into separate phases: spilling, register assignment, and copy elimination.Traditional approaches to copy elimination during register allocation are based on interference graphs and register coalescing. Variables are represented as nodes in a graph, which are coalesced, if they can be assigned the same register. However, decoupled approaches strive to avoid interference graphs and thus often resort to local recoloring.A common assumption of existing coalescing and recoloring approaches is that the original ordering of the instructions in the program is not changed. This work presents an extension of a local recoloring technique called Parallel Copy Motion. We perform code motion on data dependence graphs in order to eliminate useless copies and reorder instructions, while at the same time a valid register assignment is preserved. Our results show that even after traditional register allocation with coalescing our technique is able to eliminate an additional 3% (up to 9%) of the remaining copies and reduce the weighted costs of register copies by up to 25% for the SPECINT 2000 benchmarks. In comparison to Parallel Copy Motion, our technique removes 11% (up to 20%) more copies and up to 39% more of the copy costs. 相似文献

5.

A Game Theory-Based Pricing Strategy to Support Single/Multiclass Job Allocation Schemes for Bandwidth-Constrained Distributed Computing Systems

Ghosh P. Basu K. Das S.K. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(3):289-306

Today's distributed computing systems incorporate different types of nodes with varied bandwidth constraints which should be considered while designing cost-optimal job allocation schemes for better system performance. In this paper, we propose a fair pricing strategy for job allocation in bandwidth-constrained distributed systems. The strategy formulates an incomplete information, alternating-offers bargaining game on two variables, such as price per unit resource and percentage of bandwidth allocated, for both single and multiclass jobs at each node. We present a cost-optimal job allocation scheme for single-class jobs that involve communication delay and, hence, the link bandwidth. For fast and adaptive allocation of multiclass jobs, we describe three efficient heuristics and compare them under different network scenarios. The results show that the proposed algorithms are comparable to existing job allocation schemes in terms of the expected system response time over all jobs 相似文献

6.

基于整数线性规划的TTA代码优化 总被引：2，自引：2，他引：0

下载免费PDF全文

胡维祝永新姜雷《计算机工程》2008,34(21):219-221

针对传输触发结构代码生成中的指令调度、多寄存器堆分配、全局寄存器分配和软件旁路等优化问题,给出一个整数线性规划形式化模型,并实现了一个软件架构来验证该模型的正确性。试验结果表明该方法可以有效地应用到40条传输指令以内的基本块,并生成高质量的代码。相似文献

7.

分簇结构向量寄存器分配策略研究

王向前王昊《单片机与嵌入式系统应用》2017,17(7)

通过分簇结构实现向量化执行是一种高效而灵活的体系结构选择.在编译中间表示里,向量指令与标量指令交叠出现.分簇结构向量化实现的特殊方式给传统的寄存器分配框架带来了挑战.针对该问题,本文从向量指令的表示形式、Callee/Caller寄存器划分、向量寄存器分配等进行研究,并给出全局与局部向量寄存器的分配方法. 相似文献

8.

并行复算：一种面向高性能计算的新的容错方法

王攀峰杜云飞富弘毅杨学军周海芳《计算机科学》2009,36(3):21-25

Checkpointing是高性能计算领域最常用的容错技术.但是,当处理器数目变大时,这种技术的性能迅速恶化.提出一种在并行计算中容忍单进程故障的新方法:并行复算.这种方法的主要特征是利用冗余处理器的计算能力而不是冗余磁盘的存储能力实现低开销的容错.还提出这种方法的一个优化方法,将并行复算与checkpoint技术相结合,以进一步减小容错开销,并通过举例说明如何开发一个基于并行复算以及其优化方法的并行程序.最后通过实验对该方法进行评估.结果显示,当处理器数目变大时,并行复算的开销低于checkpointing,其优化方法能提供优于并行复算的性能. 相似文献

9.

Revisiting reorder buffer architecture for next generation high performance computing

Min Choi Jong Hyuk Park Young-Sik Jeong 《The Journal of supercomputing》2013,65(2):484-495

Modern microprocessors achieve high application performance at an acceptable level of power dissipation. Reorder buffer is used for out-of-order instructions to be committed in-order. The reorder buffer plays a key role in modern microprocessors because performance improvement techniques highly rely on aggressive speculation to feed wider issue, out-of-order, and deep pipelines. In terms of power to performance trade-off, reorder buffer is particularly important. This is because enlarging the reorder buffer size achieves high performance but naive scaling of the conventional reorder buffer architecture can severely increase the complexity and power consumption. In this paper, we propose low-power reorder buffer techniques for contemporary microprocessors. First, the separated reorder buffer reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until all their data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. The result of the instruction is written into rename buffers immediately after the execution completes. Then, the result values in the rename buffer are written into the architectural register file at the commit state. The proposed approaches in this paper provide higher resource utilization and low power consumption. 相似文献

10.

分簇结构超长指令字DSP编译器的设计与实现 总被引：5，自引：0，他引：5

胡定磊陈书明刘春林《小型微型计算机系统》2006,27(2):348-353

超长指令字（VLIW）是高端DSP普遍采用的体系结构。VLIW DSP在硬件上没有调度和冲突判决的机制，其性能的发挥完全依靠编译嚣的优化效果．基于可重定向编译基础设施IMPACT，为分簇VLIW DSP YHFT—D4设计与实现了优化编译器．其中着重讨论了可重定向信息的定义、代码注释、SIMD指令的支持、分簇寄存器分配以度指令级并行开发和资源冲突解决等内容．实验结果表明该编译器可以达到较好的优化效果．相似文献