期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王桂彬杜静唐滔《软件学报》2013,24(10):2460-2472

高功耗已成为制约高性能计算机发展的重要问题之一.近年来,大量研究关注于如何在满足系统功耗约束的条件下优化系统执行性能.然而,已有方法大都针对同构系统,未考虑异构处理器之间的功耗或速度差异,难以高效应用于基于加速器的异构系统.对当前异构并行系统执行模型进行了抽象,并提出了融合两级功耗控制机制的系统功耗管理框架,自顶向下依次为系统级功耗控制器和异构处理引擎功耗控制器.在异构处理引擎功耗控制中,针对类OpenMP 并行循环,首先分析了异构多处理器在满足功耗约束条件下达到性能最优的条件.基于该结果,给出了功耗受限的并行循环划分算法,该方法通过协调并行循环调度和动态电压频率调节技术以优化异构并行处理.在系统级功耗控制中,建立了异构处理引擎效能评估方法,以此作为功耗划分的依据,在兼顾并发应用公平性的同时,提高系统整体执行效能.最后,基于典型CPU-GPU 异构系统验证了方法的有效性. 相似文献

2.

异构并行系统能耗优化分析模型

王桂彬杨学军唐滔徐新海《软件学报》2012,23(6):1382-1396

随着处理器功耗不断增大,功耗问题逐渐成为高性能计算机系统设计与实现的首要问题.当前,异构系统已成为高性能计算机的发展趋势之一.与传统同构体系结构相比,异构体系结构具有更高的理论峰值性能和能效,但是如何在满足应用性能的条件下充分发掘异构系统的能效优势,仍是一个挑战性问题.通过将应用程序抽象为由串行段和并行段组成的一般程序模型,建立了异构并行系统能耗优化模型通过分析方法依次给出并行段以及全程序(多程序段)能耗最优时处理器间满足的关系,分别给出了时间约束下能耗最优的处理器频率选择算法.最后,以CPU-GPU异构系统为平台,通过8个典型应用程序验证了方法的有效性. 相似文献

3.

性能非对称多核处理器下异构感知调度技术

赵姗杨秋松李明树《软件学报》2019,30(4):1164-1190

为了满足应用程序的多样化需求,异构多核处理器出现并逐渐进入市场,其中的处理核心（core）具有不同的微架构或者指令集架构（ISA）,为应用提供多样化特性支持,比如指令级并行（ILP）、内存级并行（MLP）,这些核心协同工作满足整个计算系统的优化目标,比如高性能、低功耗或者良好的能效.然而,目前主流的调度技术主要是针对传统同构处理器架构设计,没有考虑异构硬件能力的差异性.在异构多核处理器环境下,调度技术如何感知硬件的异构特性,为不同类型的应用程序提供更加合适和匹配的硬件资源,这是值得探索的问题.对近年来在该研究领域的成果进行了综述研究,特别是在性能非对称多核处理器架构下,异构调度技术面临的优化目标、分析模型、调度决策和算法评估等主要问题进行了分析和描述,并依次对相关技术进行了系统的总结,最后从软硬件融合的角度对今后的研究工作进行了展望. 相似文献

4.

在异构系统上基于权重和复制的调度算法

邓德康张振荣《计算机工程与设计》2023,(10):3004-3011

动态电压和频率扩展技术（DVFS）的发展使异构系统可以实现低功耗，然而DVFS通过降低处理器的执行频率来降低功耗，大大增加了处理器临时故障风险，应用的可靠性受到极大威胁。针对先前算法在任务调度过程中容易出现调度失败的问题，提出一种基于权重和复制的调度算法（SAWR），以在异构系统上完成应用调度，满足并行应用的可靠性目标，同时降低系统功耗。仿真结果表明，与先前的算法相比，所提算法可以实现良好的性能。相似文献

5.

并行存储系统的功耗优化

董勇陈娟《计算机工程与科学》2009,31(11)

功耗问题已经成为高性能计算机系统设计的重要问题。并行存储系统是高性能计算机系统的重要组成部分,降低其功耗对于降低整个并行系统功耗具有重要意义。并行存储系统由存储结点组成,降低存储结点功耗是降低并行存储系统功耗的重要部分。本文针对存储结点的处理器提出了功耗优化方法,根据利用率信息调节处理器电压/频率,并通过元数据服务器指导的频率预调节算法缓解因调频所引发的响应时间滞后问题。分析表明,该方法可以有效降低存储结点功耗,实现并行存储系统的功耗优化。相似文献

6.

异构多处理器SoC 的应用算法性能优化方法 总被引：1，自引：0，他引：1

赵鹏严明李思昆《软件学报》2011,22(7):1475-1487

在嵌入式多媒体处理领域中,多处理器片上系统(multi-processor system-on-chip,简称MPSoC)的应用越来越广泛.多媒体处理MPSoC通常采用"主处理器核+多个异构协处理器核"的主流体系结构.该结构兼顾了MPSoC系统的通用性与灵活性、性能与功耗,但也向MPSoC的性能优化方法提出了更高的要求.针对异构MPSoC上的多媒体应用算法,提出了一种MPSoC多媒体处理性能优化方法.该方法经过应用特征分析、循环仿射划分、应用向MPSoC各处理器核的映射,实现了优化的数据局部性与多级并行性,从而提高了异构MPSoC上多媒体应用算法的性能.实验结果表明,该方法对于多媒体应用算法在异构MPSoC上的处理性能优化方面取得了明显效果. 相似文献

7.

一种面向异构众核处理器的并行编译框架 总被引：1，自引：0，他引：1

李雁冰赵荣彩韩林赵捷徐金龙李颖颖《软件学报》2019,30(4):981-1001

异构众核处理器是面向高性能计算领域处理器发展的重要趋势,但其更为复杂的体系结构使得编程难的问题更加突出.针对这一问题,基于开源编译器Open64,提出了一种面向异构众核处理器的并行编译框架,将程序自动转换为异构并行程序.该框架主要包括4个模块：任务划分模块用来识别适合进行加速计算的程序段,实现了嵌套循环的多维并行识别方法;数据布局模块完成数据在主存和SPM之间的布局,实现了数组边界分析和指针范围分析;传输优化模块实现了数据传输合并、传输外提、打包传输、数组转置等多种数据传输优化方法;收益评估模块在构建代价模型的基础上实现了一种动静结合的收益评估方法.并且,基于SW26010处理器,对该编译框架进行了实现,测试结果表明,该编译框架能够实现一些程序以面向异构众核结构的并行变换,且获得较好的加速效果. 相似文献

8.

GPU/CPU异构系统任务节能调度方法仿真

陈杰《计算机仿真》2013,30(7)

研究GPU/CPU异构系统任务调度的节能问题.与传统同构体系结构相比,异构系统任务调度呈现较大的随机性和不定性,GPU/CPU异构系统中时间间隙片段呈现了较大的随机性,导致传统调度方法很难建立规则的描述时间片段的模型,调度能耗较高.为解决上述问题,提出了一种改进功耗优化的GPU/CPU异构环境下的任务调度算法,将任务关系图按照依赖关系计算量拆分,并分配到计算节点.在计算节点内根据权重法的思想,统计所有计算节点的处理情况,进而将节点内的子任务调度到合适的处理器.实验结果表明,在不影响应用性能的前提下,降低了异构系统的能耗开销,优化效果明显. 相似文献

9.

S-Bridge:性能非对称多核处理器下负载均衡代理机制

赵姗郝春亮翟健李明树《软件学报》2020,31(9):2965-2979

近年来,在移动计算环境中,异构多核处理器已经逐渐成为主流.与传统同构的处理器设计相比,此类异构多核处理器以更低的功耗成本满足设备的计算需求.但是异构环境下CPU核之间的微架构差异,也为操作系统中的一些基本方法提出了新的挑战.面向性能非对称异构多核环境下调度的负载均衡问题,从系统层面提出了一种负载均衡机制S-Bridge,可以减少处理器微架构差异以及任务执行需求差异对传统负载均衡带来的影响.S-Bridge的主要贡献是从系统层提供了通用的、适配异构性的负载均衡相关接口,使任意调度器都能方便地与异构多核处理器系统进行适配.基于CFS和HMP调度器在ARM平台上进行实验,同时在X86平台上进行S-Bridge通用性的验证,结果表明：S-Bridge可以支持不同真实平台和内核版本的快速实现,平均性能提升超过15%,部分情况下可达65%. 相似文献

10.

基于通信感知任务划分的异构系统低功耗优化方法

王桂彬《小型微型计算机系统》2011,32(12)

针对由通用微处理器和专用加速部件构成的异构并行系统,提出结合通信感知的并行任务划分和动态电压频率调节技术的异构系统能耗优化方法,该方法旨在将并行任务图划分并映射在异构处理单元,在满足性能约束的条件下最小化系统能耗.在目前典型异构并行系统中,主处理器与加速部件大都通过系统总线连接,必然引入不可忽略的通信开销,因此通信感知的任务划分技术是该问题的关键.提出了基于整数线性规划的静态最优能耗优化方法和基于遗传算法的动态能耗优化方法.并通过一个典型科学计算应用验证了本文方法的有效性. 相似文献

11.

Distributed loop‐scheduling schemes for heterogeneous computer systems

Anthony T. Chronopoulos Satish Penmatsa Jianhua Xu Siraj Ali 《Concurrency and Computation》2006,18(7):771-785

Distributed computing systems are a viable and less expensive alternative to parallel computers. However, a serious difficulty in concurrent programming of a distributed system is how to deal with scheduling and load balancing of such a system which may consist of heterogeneous computers. Some distributed scheduling schemes suitable for parallel loops with independent iterations on heterogeneous computer clusters have been designed in the past. In this work we study self‐scheduling schemes for parallel loops with independent iterations which have been applied to multiprocessor systems in the past. We extend one important scheme of this type to a distributed version suitable for heterogeneous distributed systems. We implement our new scheme on a network of computers and make performance comparisons with other existing schemes. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

12.

Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters

Chao‐Tung Yang Chao‐Chin Wu Jen‐Hsiang Chang 《Concurrency and Computation》2011,23(8):721-744

Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

13.

A performance-based parallel loop scheduling on grid environments

Wen-Chung Shih Chao-Tung Yang Shian-Shyong Tseng 《The Journal of supercomputing》2007,41(3):247-267

The effectiveness of loop self-scheduling schemes has been shown on traditional multiprocessors in the past and computing clusters in the recent years. However, parallel loop scheduling has not been widely applied to computing grids, which are characterized by heterogeneous resources and dynamic environments. In this paper, a performance-based approach, taking the two characteristics above into consideration, is proposed to schedule parallel loop iterations on grid environments. Furthermore, we use a parameter, SWR, to estimate the proportion of the workload which can be scheduled statically, thus alleviating the effect of irregular workloads. Experimental results on a grid testbed show that the proposed approach can reduce the completion time for applications with regular or irregular workloads. Consequently, we claim that parallel loop scheduling can benefit applications on grid environments. 相似文献

14.

异构集群下的任务调度算法研究

刘莉姜明华《计算机应用研究》2014,31(1):80-84

针对异构集群下高效节能的任务调度算法进行了研究, 提出了一种基于复制的任务调度算法, 在任务初始分配的基础上, 分别从能源感知和性能—能源平衡两个角度考虑任务的复制。建立了由计算和通信造成的能源消耗的数学模型, 并进行了大量的实验。实验结果表明, 与已有的BEATA算法相比, 该算法能明显地减少异构集群处理并行应用的调度长度和能耗。分析结果发现, 任务复制的方法在减少调度长度的同时会增加相应的能耗, 能同比优化调度长度和能耗的任务调度方法是今后的研究方向。相似文献

15.

A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters

《Journal of Parallel and Distributed Computing》2005,65(8):885-900

In this paper, a heuristic dynamic scheduling scheme for parallel real-time jobs executing on a heterogeneous cluster is presented. In our system model, parallel real-time jobs, which are modeled by directed acyclic graphs, arrive at a heterogeneous cluster following a Poisson process. A job is said to be feasible if all its tasks meet their respective deadlines. The scheduling algorithm proposed in this paper takes reliability measures into account, thereby enhancing the reliability of heterogeneous clusters without any additional hardware cost. To make scheduling results more realistic and precise, we incorporate scheduling and dispatching times into the proposed scheduling approach. An admission control mechanism is in place so that parallel real-time jobs whose deadlines cannot be guaranteed are rejected by the system. For experimental performance study, we have considered a real world application as well as synthetic workloads. Simulation results show that compared with existing scheduling algorithms in the literature, our scheduling algorithm reduces reliability cost by up to 71.4% (with an average of 63.7%) while improving schedulability over a spectrum of workload and system parameters. Furthermore, results suggest that shortening scheduling times leads to a higher guarantee ratio. Hence, if parallel scheduling algorithms are applied to shorten scheduling times, the performance of heterogeneous clusters will be further enhanced. 相似文献

16.

Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

I. Riakiotakis F. M. Ciorba T. Andronikos G. Papakonstantinou A. T. Chronopoulos 《Concurrency and Computation》2012,24(18):2302-2327

Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous clusters. The loops with data dependencies require synchronization via cross‐node communication. Synchronization requires fine‐tuning to overcome the communication overhead and to yield the best possible overall performance. In this paper, a theoretical model is presented to determine the granularity of synchronization that minimizes the parallel execution time of loops with data dependencies when these are parallelized on heterogeneous systems using dynamic self‐scheduling algorithms. New formulas are proposed for estimating the total number of scheduling steps when a threshold for the minimum work assigned to a processor is assumed. The proposed model uses these formulas to determine the synchronization granularity that minimizes the estimated parallel execution time. The accuracy of the proposed model is verified and validated via extensive experiments on a heterogeneous computing system. The results show that the theoretically optimal synchronization granularity, as determined by the proposed model, is very close to the experimentally observed optimal synchronization granularity, with no deviation in the best case, and within 38.4% in the worst case. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

17.

基于异构CMP的改进蚁群优化任务调度策略

下载免费PDF全文

李静梅张大虎吴艳霞《计算机工程与应用》2015,51(18):47-51

为提高异构CMP任务调度执行效率,充分发挥异构CMP的异构性和并行能力,提出一种基于异构CMP的改进蚁群优化任务调度算法--IACOTS。IACOTS算法首先建立任务调度模型、路径选择规则和信息素更新规则,使蚁群算法能够适用于异构CMP任务调度问题。同时通过采用动态信息素更新、相遇并行搜索策略和引入遗传算法中的变异因子对基本的蚁群算法进行优化,克服蚁群算法搜索时间过长和“早熟”现象。通过仿真实验获得的结果表明,IACOTS算法执行效率优于现有的遗传算法,完成相同的任务需要的迭代次数最少,能有效降低程序执行时间,适用于异构CMP等大规模并行环境的任务调度。相似文献

18.

Multi-objective energy-efficient workflow scheduling using list-based heuristics

《Future Generation Computer Systems》2014

Workflow applications are a popular paradigm used by scientists for modelling applications to be run on heterogeneous high-performance parallel and distributed computing systems. Today, the increase in the number and heterogeneity of multi-core parallel systems facilitates the access to high-performance computing to almost every scientist, yet entailing additional challenges to be addressed. One of the critical problems today is the power required for operating these systems for both environmental and financial reasons. To decrease the energy consumption in heterogeneous systems, different methods such as energy-efficient scheduling are receiving increasing attention. Current schedulers are, however, based on simplistic energy models not matching the reality, use techniques like DVFS not available on all types of systems, or do not approach the problem as a multi-objective optimisation considering both performance and energy as simultaneous objectives. In this paper, we present a new Pareto-based multi-objective workflow scheduling algorithm as an extension to an existing state-of-the-art heuristic capable of computing a set of tradeoff optimal solutions in terms of makespan and energy efficiency. Our approach is based on empirical models which capture the real behaviour of energy consumption in heterogeneous parallel systems. We compare our new approach with a classical mono-objective scheduling heuristic and state-of-the-art multi-objective optimisation algorithm and demonstrate that it computes better or similar results in different scenarios. We analyse the different tradeoff solutions computed by our algorithm under different experimental configurations and we observe that in some cases it finds solutions which reduce the energy consumption by up to 34.5% with a slight increase of 2% in the makespan. 相似文献

19.

Loop scheduling and bank type assignment for heterogeneous multi-bank memory

Meikang Qiu Minyi Guo Meiqin Liu Chun Jason Xue Laurence T. Yang Edwin H.-M. Sha 《Journal of Parallel and Distributed Computing》2009

Many high-performance DSP processors employ multi-bank on-chip memory to improve performance and energy consumption. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. However, making effective use of multi-bank memory remains difficult, considering the combined effect of performance and energy requirement. This paper studies the scheduling and assignment problem about how to minimize the total energy consumption while satisfying the timing constraint with heterogeneous multi-bank memory for applications with loop. An algorithm, TASL (Type Assignment and Scheduling for Loops), is proposed. The algorithm uses bank type assignment with the consideration of variable partition to find the best configuration for both memory and ALU. The experimental results show that the average improvement on energy-saving is significant by using TASL. 相似文献