期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data-parallel programming on a network of heterogeneous workstations

Nenad Nedeljkovlc Michael J. Quinn 《Concurrency and Computation》1993,5(4):257-268

We describe a compiler and run-time system that allow data-parallel programs to execute on a network of heterogeneous UNIX workstations. The programming language supported is Dataparallel C, a SIMD language with virtual processors and a global name space. This parallel programming environment allows the user to take advantage of the power of multiple workstations without adding any message-passing calls to the source program. Because the performance of Individual workstations in a multi-user environment may change during the execution of a Dataparallel C program, the run-time system automatically performs dynamic load balancing. We present experimental results that demonstrate the usefulness of dynamic load-balancing In a multi-user environment These results suggest that initially allocating the same amount of work to each processor and letting the dynamic load balancing algorithm adjust the load during program execution yields very good performance. Hence neither the compiler nor the run-time system need a priori knowledge of the speeds of the machines that will participate in a program execution. 相似文献

2.

Compiling for distributed memory architectures

Rogers A. Pingali K. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):281-298

The lack of high-level languages and good compilers for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balancing. We have developed a parallelizing compiler that, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference. A process decomposition is obtained by specializing the program for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution. Each process's role in the computation is determined by examining the data required for execution at run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message transmission. Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speedup within 60% to 70% of handwritten code 相似文献

3.

Extremal Optimization applied to load balancing in execution of distributed programs

《Applied Soft Computing》2015

The paper describes methods for using Extremal Optimization (EO) for processor load balancing during execution of distributed applications. A load balancing algorithm for clusters of multicore processors is presented and discussed. In this algorithm the EO approach is used to periodically detect the best tasks as candidates for migration and for a guided selection of the best computing nodes to receive the migrating tasks. To decrease the complexity of selection for migration, the embedded EO algorithm assumes a two-step stochastic selection during the solution improvement based on two separate fitness functions. The functions are based on specific models which estimate relations between the programs and the executive hardware. The proposed load balancing algorithm is assessed by experiments with simulated load balancing of distributed program graphs. The algorithm is compared against a greedy fully deterministic approach, a genetic algorithm and an EO-based algorithm with random placement of migrated tasks. 相似文献

4.

S-Bridge:性能非对称多核处理器下负载均衡代理机制

赵姗郝春亮翟健李明树《软件学报》2020,31(9):2965-2979

近年来,在移动计算环境中,异构多核处理器已经逐渐成为主流.与传统同构的处理器设计相比,此类异构多核处理器以更低的功耗成本满足设备的计算需求.但是异构环境下CPU核之间的微架构差异,也为操作系统中的一些基本方法提出了新的挑战.面向性能非对称异构多核环境下调度的负载均衡问题,从系统层面提出了一种负载均衡机制S-Bridge,可以减少处理器微架构差异以及任务执行需求差异对传统负载均衡带来的影响.S-Bridge的主要贡献是从系统层提供了通用的、适配异构性的负载均衡相关接口,使任意调度器都能方便地与异构多核处理器系统进行适配.基于CFS和HMP调度器在ARM平台上进行实验,同时在X86平台上进行S-Bridge通用性的验证,结果表明：S-Bridge可以支持不同真实平台和内核版本的快速实现,平均性能提升超过15%,部分情况下可达65%. 相似文献

5.

Natural Load Indices (NLI) for scientific simulation

Stefan P. Muszala Gita Alaghband James Hack Daniel Connors 《The Journal of supercomputing》2012,59(1):392-413

We present Natural Load Indices (NLIs) as an alternative to measurement-based load indices. NLIs facilitate further performance improvement and better resource usage. Example NLIs are rainfall amounts in a climate simulation, mass of an atom in a Molecular Dynamics (MD) code and surface fluxes in an ocean model. The process of obtaining an NLI occurs during model development or as a preprocessing step and implementing NLIs minimizes run-time costs associated with dynamic load balancing. 相似文献

6.

Partitioning and Scheduling of Parallel Functional Programs for Larger Grain Execution

《Journal of Parallel and Distributed Computing》1995,26(2):151-165

This paper discusses how to exploit parallelism efficiently by improving the granularity of functional programs on a multiprocessor. The challenge is to partition a functional program (or a process) into appropriately sized subprocesses to make sure that the computation time of the local subprocess is at least greater than the communication overheads involved in sending other subprocesses for remote evaluation. Asymptotic complexity analyses of a function, to estimate the computation time and also the communication involved in sending the arguments and receiving the results from the remote processor, are found to be quite useful. It is shown how some parallel programs can be run more efficiently with prior information on time complexities (in big-O notation) and relative time complexities of its subexpressions with the help of analytical reasoning and some practical examples on the larger grain distributed multiprocessor machine LAGER. Ordered scheduling of the processes, determined by the priorities based on the relative time complexities, shows further improvement over run-time dynamic load balancing methods as well as better utilization of resources. 相似文献

7.

A Dynamic Load Balancing Framework for Real-time Applications in Message Passing Systems

Ghada F. El Kabbany Nayer M. Wanas Nadia H. Hegazi Samir I. Shaheen 《International journal of parallel programming》2011,39(2):143-182

Load balancing algorithms are designed essentially to equally distribute the load on processors and maximize their utilities while minimizing the total task execution time. In order to achieve these goals, the load-balancing mechanism should be “fair” in distributing the load across the different processors. This implies that the difference between the heaviest-loaded and the lightest-loaded processors should be minimized. Therefore, the load information on each processor must be updated such that the load-balancing mechanism can be more effective. In this work, we present an application independent dynamic algorithm for scheduling tasks and load- balancing in message passing systems. We propose a DAG-based Dynamic Load Balancing algorithm for Real time applications (DAG-DLBR) that is designed to work dynamically to cope with possible changes in the load that might occur during runtime. This algorithm addresses the challenge of devising a load balancing scheme which judicially deals with the hybrid execution of existing real-time application (represented by a Direct Acyclic Graph (DAG)) together with newly arriving jobs. The main objective of this algorithm is to reduce response times of the newly arriving jobs while maintaining the time constrains of the existing DAG. To evaluate the performance of the DAG-DLBR algorithm, a comparison with the performance of two common dynamic load balancing algorithms is presented. This comparison is performed by evaluating, experimentally, the execution time of different load balancing algorithms on a homogenous real parallel machine. In addition, the values of load imbalance, the execution time, and the communication overhead time are evaluated analytically using different benchmarks as test-bed workloads. These workloads cover a wide range of dynamic applications with different task types. Experimental results illustrate the improved performance of the DAG-DLBR algorithm compared to both distributed and hierarchal based algorithms by at least 12 and 19%, respectively. This improvement is true for all workloads, even with highly dependent workload. The DAG-DLBR algorithm achieves lower computation time than its corresponding values of both the distributed and the hierarchical-based algorithms for 4, 8, 12 and 16 processors. 相似文献

8.

网络入侵检测系统中动态负载平衡策略的设计

赵晓锋刘利军怀进鹏《计算机工程与应用》2003,39(34):146-150

随着互联网的广泛应用,网络信息量迅猛增长,网络攻击数量和方式大大增加,网络入侵检测系统需要部署多个感知器(Sensor)时网络监测和保护,通过增加Sensor,可以增强系统分析检测能力。然而,为了充分利用系统处理能力,需要动态分配处理节点任务,实现动态负载平衡。该文在分析网络入侵检测系统的基础上,提出负载值计算方法,结合通用负载平衡策略,提出了分布式入侵检测系统的动态负载平衡策略,实现了良好的动态负载平衡效果,提高了系统性能。相似文献

9.

Scalable Load Balancing Techniques for Parallel Computers

《Journal of Parallel and Distributed Computing》1994,22(1):60-79

In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics: the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh, and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristics. For each of these architectures, we are able to determine near optimal load balancing schemes. Results obtained from implementation of these schemes in the context of the Tautology Verification problem on the Ncube/2 (a trademark of the Ncube Corporation) multicomputer are used to validate our theoretical results for the hypercube architecture. These results also demonstrate the accuracy and viability of our framework for scalability analysis. 相似文献

10.

一种分布式动态负载平衡算法

须成忠张德富孙钟秀《软件学报》1993,4(1):22-28

在多处理机系统中,负载平衡是提高并行处理效率的一条重要途径。基于分布存贮的TRANSCUBE多处理机环境,本文提出一种分布式动态负载平衡算法。算法采用接收者开始的异步调度策略,通过“握手”协议在空载和重载处理机间建立联系,并自动实现任务(或进程)从重载处理机到空载处理机的迁移,该算法适于并行解具有动态特性的应用问题,而且在问题规模较大和处理机负载变化较慢时,性能较好。相似文献

11.

基于时间偏差协议的动态负载平衡技术*

韦慧吴悦杨洪斌《计算机应用研究》2007,24(12):118-120

对并行VHDL模拟的特殊性进行分析后,建立了一个并行VHDL模拟的动态负载平衡模型。在此模型中,提出动态调节最佳并行规模的动态负载平衡方法来解决系统资源紧张的问题,采用一种新的模拟中负载的度量方法——模拟推进度。此模型还包括基于标准偏差和最小通信变化量的动态负载平衡算法和一个运行中的负载迁移机制。最后对该模型进行可行性分析。相似文献

12.

集群动态负载平衡系统的性能评价 总被引：18，自引：0，他引：18

唐丹金海张永坤《计算机学报》2004,27(6):803-811

该文使用随机Petri网对集群动态负载平衡系统建立了一个抽象模型．通过细化模型中的节点本地处理部分对5种动态负载平衡算法的性能进行了分析，并讨论了集群负载特性对动态负载平衡系统性能的影响，最后得出的主要结论有：(1)动态负载平衡算法可以取得比静态负载平衡算法更好的性能；(2)与传统的只考虑CPU就绪队列的负载平衡算法相比，考虑了各种I／O请求队列的负载平衡算法可以取得更好的性能；(3)即使在极端的集群负载特性中。集群动态负载平衡算法仍然能取得比较理想的性能，因此实现即使是十分简单的集群动态负载平衡系统也是很有必要的。相似文献

13.

DistDLB: Improving cosmology SAMR simulations on distributed computing systems through hierarchical load balancing

《Journal of Parallel and Distributed Computing》2006,66(5):716-731

Cosmology SAMR simulations have played a prominent role in the field of astrophysics. The emerging distributed computing systems provide an economic alternative to the traditional parallel machines, and enable scientists to conduct cosmological simulations that require vast computing power. An important issue of conducting distributed cosmological simulations is about performance and efficiency. In this paper, we present a dynamic load balancing scheme called DistDLB that is designed to improve the performance of distributed cosmology simulations. Distributed systems, e.g. the Computation Grid, usually consist of heterogeneous resources connected with shared networks. By considering these features of distributed systems and unique characteristics of cosmology SAMR simulations, DistDLB focuses on reducing the redistribution cost through a hierarchical load balancing approach and a run-time decision making mechanism. Heuristic methods have been proposed to adaptively adjust load balancing strategies based on the observation of the current system and application state. Our experiments with real-world cosmology simulations on production systems indicate that the proposed DistDLB scheme can effectively improve the performance of cosmology simulations by 2.56–79.14% as compared to the scheme that does not consider the heterogeneous and dynamic features of distributed systems. 相似文献

14.

基于Nginx的DRC集群动态负载均衡策略

倪雅婷杨文晖苗放黄安琪蒋媛《计算机与现代化》2022,(4):58-64

面向数据的体系架构(DOA)为海量异构数据流通共享提供了新的有效解决方案。而数据注册中心(DRC)作为DOA的核心部件,它的访问性能尤为关键。针对高并发访问带来的DRC集群服务过载问题,采用Nginx反向代理负载均衡技术处理高并发访问。对Nginx的负载策略进行分析优化,提出一种由动态配置、负载收集、算法调度组成的动态负载均衡策略,并在负载调度模块对Nginx加权最小连接调度算法(WLC)进行改进,通过自适应权值不断调度下一个周期内性能最优的节点来处理请求。通过高并发性能测试验证了所提出的负载均衡策略在DRC集群中能更有效处理大流量的访问需求,提高集群的资源利用率和缩短请求响应时间。相似文献

15.

Adaptive Task Pools: Efficiently Balancing Large Number of Tasks on Shared-address Spaces

Ralf Hoffmann Thomas Rauber 《International journal of parallel programming》2011,39(5):553-581

Task based approaches with dynamic load balancing are well suited to exploit parallelism in irregular applications. For such applications, the execution time of tasks can often not be predicted due to input dependencies. Therefore, a static task assignment to execution resources usually does not lead to the best performance. Moreover, a dynamic load balancing is also beneficial for heterogeneous execution environments. In this article a new adaptive data structure is proposed for storing and balancing a large number of tasks, allowing an efficient and flexible task management. Dynamically adjusted blocks of tasks can be moved between execution resources, enabling an efficient load balancing with low overhead, which is independent of the actual number of tasks stored. We have integrated the new approach into a runtime system for the execution of task-based applications for shared address spaces. Runtime experiments with several irregular applications with different execution schemes show that the new adaptive runtime system leads to good performance also in such situations where other approaches fail to achieve comparable results. 相似文献

16.

一种自适应动态负载均衡算法 总被引：6，自引：0，他引：6

王玥蔡皖东段琪《计算机工程与应用》2006,42(21):121-123

负载均衡问题是一个经典的组合优化难题,该文建立了一个集群中的负载均衡问题模型,并提出了一种旨在最小化负载均衡开销的动态自适应算法。由于集群中存在网络延时,所以负载重分配的开销很大一部分取决于CPU间发送、接受的消息的最大数量。该负载均衡算法以最小化负载重分配时CPU间消息发送、接受的数量为目标,根据过载、轻载CPU数量的变化动态调用D算法和R算法,以降低负载均衡开销。相似文献

17.

Efficient task migration algorithm for distributed systems

Suen T.T.Y. Wong J.S.K. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):488-499

The objective of the study was to achieve balanced load among processors, reduce the communication overhead of the load balancing algorithm, and improve respource utilization, which results in better average resonse time. A communication protocol and a fully distributed algorithm for dynamic load balancing through task migration in a connected N-processor network are presented. Each processor communicates its load directly with only a subset (of the size √ N) of processors, reducing communication traffic and average response time. It is proved that the given algorithm will perform task migration even if there is only one light load processor and one heavy load processor in the system. Simulation results show that the proposed scheme can save up to 60% of the protocol messages used by the broadcast algorithms and can reduce the average response time 相似文献

18.

A parallel dynamic load-balancing algorithm for solution-adaptive finite element meshes on 2D tori

Yeh-Ching Chung Yaa-Jyun Yeh J.-S Liu 《Concurrency and Computation》1995,7(7):615-631

To efficiently execute a finite element program on a 2D torus, we need to map nodes of the corresponding finite element graph to processors of a 2D torus such that each processor has approximately the same amount of computational load and the communication among processors is minimized. If nodes of a finite element graph do not increase during the execution of a program, the mapping only needs to be performed once. However, if a finite element graph is solution-adaptive, that is, nodes of a finite element graph increase discretely due to the refinement of some finite elements during the execution of a program, a dynamic load-balancing algorithm has to be performed many times in order to balance the computational load of processors while keeping the communication cost as low as possible. In the paper we propose a parallel dynamic load-balancing algorithm (LB) to deal with the load-imbalancing problem of a solution-adaptive finite element program on a 2D torus. The algorithm uses an iterative approach to achieve load-balancing. We have implemented the proposed algorithm along with two parallel mapping algorithms, parallel orthogonal recursive bisection (ORB) and parallel recursive mincut bipartitioning (MC), on a simulated 2D torus. Three criteria, the execution time of load-balancing algorithms, the computation time of an application program under different load balancing algorithms, and the total execution time of an application program (under several refinement phases) are used for performance evaluation. Simulation results show that (1) the execution of LB is faster than those of MC and ORB; (2) the mappings of LB are better than those of ORB and MC; and (3) the speedups of LB are better than those of ORB and MC. 相似文献

19.

HeDPM: load balancing of linear pipeline applications on heterogeneous systems

Andreu Moreno Anna Sikora Eduardo César Joan Sorribes Tomàs Margalef 《The Journal of supercomputing》2017,73(9):3738-3760

This work presents a new algorithm, called Heterogeneous Dynamic Pipeline Mapping, that allows for dynamically improving the performance of pipeline applications running on heterogeneous systems. It is aimed at balancing the application load by determining the best replication (of slow stages) and gathering (of fast stages) combination taking into account processors computation and communication capacities. In addition, the algorithm has been designed with the requirement of keeping complexity low to allow its usage in a dynamic tuning tool. For this reason, it uses an analytical performance model of pipeline applications that addresses hardware heterogeneity and which depends on parameters that can be known in advance or measured at run-time. A wide experimentation is presented, including the comparison with the optimal brute force algorithm, a general comparison with the Binary Search Closest algorithm, and an application example with the Ferret pipeline included in the PARSEC benchmark suite. Results, matching those of the best existing algorithms, show significant performance improvements with lower complexity (\(O(N^3\)), where N is the number of pipeline stages). 相似文献

20.

多核并行技术在分子动力学模拟中的应用 总被引：1，自引：0，他引：1

刘青昆滕人达刘凤宫利东张建强《计算机工程与设计》2011,32(10):3395-3398

为了充分利用多核处理器资源,研究了一种用于分子动力学模拟中的多核并行技术。在多核处理器上利用OpenMP技术实现多线程创建与同步、动态设置子线程的调度运行方式以及负载均衡以减少子线程执行等待时间。通过对不同分子体系结构下的动力学模型测试,得出在不同子线程下并行计算的时间,并且得到了良好的性能加速比。实验结果表明,采用OpenMP并行技术可有效地提高电荷求解过程在分子动力学模拟运算中的时间效率,以及多核计算机资源的利用率。相似文献