首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 394 毫秒
1.
In this paper, a new adaptive scheme is presented for dynamic load balancing in a message-passing multicomputer. The scheme is based on using easy-to-implement heuristics and adaptive threshold in balancing the system load among dispersed nodes. It uses a distributed control over all computer nodes as coordinated by an information collector. Four heuristic methods are presented here, which are distinguished by the ranges for location and threshold update policies and by the disciplines used for determining the load transfer destination. A parallel simulator with distributed load balancers is developed on an iPSC/2 hypercube multi-computer. The load balancing scheme is evaluated on the basis of the effects of system utilization, load imbalance, communication and migration overhead, and multicomputer size. Relative merits of the four methods are revealed under various physical configurations of the multicomputer. The application potentials are discussed for parallel execution of AI-oriented programs and distributed semantic network data bases.  相似文献   

2.
The two-way stripes partition mapping and the greedy assignment mapping are proposed to map finite element graphs composed of a number of rectilinear four-node elements on hypercubes. The two-way stripes partition mapping is a two-phase mapping approach. In the first phase a two-way stripes partition heuristic is used to lower the communication cost. In the second phase the load transfer heuristic is used to balance the computational load among processors. The greedy assignment mapping tries to minimize the communication cost and balance the computational load of processors simultaneously. Our simulation results show that the speedups for the two-way stripes partition mapping are better than those for the greedy assignment mapping when the load balancing criterion is achieved in both approaches (that is, the number of nodes in each processor is at most one more than the number of nodes in any other processor). However, the greedy approach performs well at a much lower cost.The work of this author was supported in part by NSF under contract CCR-9110812.  相似文献   

3.
Amnon Barak  Amnon Shiloh 《Software》1985,15(9):901-913
This paper deals with the organization of a distributed load-balancing policy for a multicomputer system which consists of a cluster of independent computers that are interconnected by a local area communication network. We introduce three algorithms necessary to maintain load balancing in this system: the local load algorithm, used by each processor to monitor its own load; the exchange algorithm, for exchanging load information between the processors, and the process migration algorithm that uses this information to dynamically migrate processes from overloaded to underloaded processors. The policy that we present is distributed, i.e. each processor uses the same policy. It is both dynamic, responding to load changes without using an a priori knowledge of the resources that each process requires; and stable, unnecessary overloading of a processor is minimized. We give the essential details of the implementation of the policy and initial results on its performance. Our results confirm the feasibility of building distributed systems that are based on network communication for uniform access, resource sharing and improved reliability, as well as the use of workstations without a secondary storage device.  相似文献   

4.
To solve the load imbalance problem of a solution-adaptive finite element application program on a distributed memory multicomputer, nodes of a refined finite element graph can be remapped to processors or load of a refined finite element graph can be redistributed based on the current load of each processor. For the former case, remapping can be performed by some fast mapping algorithms. For the latter case, a load-balancing algorithm can be applied to balance the computational load of each processor. In this paper, three tree-based parallel load-balancing methods, the MCSTLB method, the BTLB method, and the CBTLB method, were proposed to deal with the load imbalance problems of solution-adaptive finite element application programs. To evaluate the performance of the proposed methods, we have implemented those methods along with three mapping methods, the AE/ORB method, the AE/MC method, and the MLkP method, on an SP2 parallel machine. Three criteria, the execution time of mapping/load-balancing methods, the execution time of a solution-adaptive finite element application program under different mapping/load-balancing methods, and the speedups achieved by mapping/load-balancing methods for a solution-adaptive finite element application program, are used for the performance evaluation. The experimental results show that 1) if the initial mapping is performed by a mapping method and the same mapping method and load-balancing methods were used in each refinement to balance the load of processors, the execution time of an application program under a load-balancing method is always shorter than that of the mapping method, and 2) the execution time of an application program under the CBTLB method is shorter than that of the BTLB method and the MCSTLB method  相似文献   

5.
Multicomputer cache simulation results derived from address traces collected from an Intel iPSC/2 hypercube multicomponent are presented. The primary emphasis is on examining how increasing the number of processor nodes executing a parallel application affects the overall multicomputer cache performance. The effects on multicomputer direct-mapped cache performance of application-specific data partitioning, data access patterns, communication distribution, and communication frequency are illustrated. The effects of system accesses on total cache performance are explored, as well as the reasons for application-specific differences in cache behavior for system and user accesses. Comparing user code results with full user and system code analysis reveals the significant effect of system accesses, and this effect increases with multicomputer size. The time distribution of an application's message-passing operations is found to more strongly affect cache performance than the total amount of time spent in message-passing code  相似文献   

6.
We study the static load balancing problem in a distributed computer system that consists of a set of heterogeneous computer systems interconnected by a star network with two-way traffic. We formulate the static load balancing problem as a nonlinear optimization problem which minimizes the mean response time. We prove that in the optimal solution the satellite nodes in the star network can be divided into four different types: the idle source nodes, the active source nodes, the neutral nodes, and the sink nodes. The necessary and sufficient conditions for optimal solution are studied. An efficient algorithm of complexity O(n) is proposed for the static load balancing of an n-satellite system. The effects of link communication time on optimal load balancing in a star network are also studied by parametric analysis. By employing the proposed algorithm, a significant system performance improvement over that without load balancing is illustrated in a numerical example. The numerical example also shows that the effects of the link communication time in a star network are large.  相似文献   

7.
The growing importance of expert systems in real-time applications reveals the necessity of reducing response times. Since monoprocessor optimizations of production systems have widely been explored, only multiple processor architectures appear to provide further performance gain. Efficient exploitation of the inherent parallelism of production systems, however, requires suitable algorithms for load balancing without simultaneously increasing organization or communication overhead. We present a novel parallel algorithm for PAMELA expert systems, based on dynamic distribution of data processing. The concept is supported by a transputer based architecture with an advanced interconnection structure.  相似文献   

8.
In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log2P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies.  相似文献   

9.
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics: the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh, and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristics. For each of these architectures, we are able to determine near optimal load balancing schemes. Results obtained from implementation of these schemes in the context of the Tautology Verification problem on the Ncube/2 (a trademark of the Ncube Corporation) multicomputer are used to validate our theoretical results for the hypercube architecture. These results also demonstrate the accuracy and viability of our framework for scalability analysis.  相似文献   

10.
Multicomputer systems achieve high performance by utilizing a number of computing nodes. Recently, by achieving significant reductions in communication delay, the three-dimensional (3D) torus has emerged as a new candidate interconnection topology for message-passing multicomputer systems. In this paper, we propose an efficient processor allocation scheme-scan search scheme-for the 3D torus based on a first-fit approach. The scan search scheme minimizes the average allocation time for an incoming task by effectively manipulating the 3D information on a torus as 2D information using a data structure called the CST (Coverage Status Table). Comprehensive computer simulation reveals that the allocation time of the scan search scheme is always smaller than that of the earlier scheme based on a best-fit approach. The difference gets larger as the input load increases, and it is as much a factor of 3 for high load. To investigate the performance of the proposed scheme in different scheduling environments, we also consider a non-FCFS scheduling policy along with the typical FCFS policy. The allocation time complexity of the scan search scheme is O(LW2H2). This is significantly smaller than that of the existing scheme which is O(L4W4H4). Here, L, W, and H represent the length, width, and height of 3D torus, respectively  相似文献   

11.
Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method called RBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such as AQ‐type rule induction programs, or extracted from a tree generated by another method, such as the ID3 or C4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably with AQDT‐1 and AQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable.  相似文献   

12.
软件定义网络(SDN)为网络虚拟化提供了新的解决方案,通过网络虚拟化技术可以将一套基础设施虚拟化为多个逻辑网络从而满足不同的网络需求.本文研究了SDN网络虚拟化时多个物理交换机虚拟为一个大虚拟交换机的过程中,虚拟网络规则与物理网络规则的映射问题.综合考虑链路负载、规则分布以及节点负载,提出了三段式规则映射优化算法.首先根据虚拟网络的规则请求生成组播源节点和目的节点集,采用MPH算法生成规则映射树;然后采用入节点最近原则,将虚拟网络规则请求的指令序列部署到规则映射树中的中间节点和叶子节点中;最后考虑节点负载,对规则部署进行微调,最终生成虚拟规则映射策略.通过仿真实验,与直接边缘节点部署相比,平均降低了网络节点规则总数量40%以上.  相似文献   

13.
A method for executing nested loops with constant loop-carried dependencies in parallel on message-passing multiprocessor systems to reduce communication overhead is presented. In the partitioning phase, the nested loop is divided into blocks that reduce the interblock communication, without regard to the machine topology. The execution ordering of the iterations is defined by a given time function based on L. Lamport's (1974) hyperplane method. The iterations are then partitioned into blocks so that the execution ordering is not disturbed, and the amount of interblock communication is minimized. In the mapping phase, the partitioned blocks are mapped onto a fixed-size multiprocessor system in such a manner that the blocks that have to exchange data frequently are allocated to the same processor or neighboring processors. A heuristic mapping algorithm for hypercube machines is proposed  相似文献   

14.
Diffusion Schemes for Load Balancing on Heterogeneous Networks   总被引:1,自引:0,他引:1  
Several different diffusion schemes have previously been developed for load balancing on homogeneous processor networks. We generalize existing schemes, in order to deal with heterogeneous networks. Generalized schemes may operate efficiently on networks where each processor can have arbitrary computing power, i.e., the load will be balanced proportionally to these powers. The balancing flow that is calculated by schemes for homogeneous networks is minimal with regard to the l 2 -norm and we prove this to hold true for generalized schemes, too. We demonstrate the usability of generalized schemes by a number of experiments on several heterogeneous networks.  相似文献   

15.
针对云渲染系统中渲染节点与任务不匹配调度而带来的时间负载不均衡和耗时长的问题,提出一种基于时间负载均衡的任务调度方式来优化系统耗时的策略.该算法采用Min-min与Max-min相结合的思想,建立时间负载均衡模型进行前期迭代,将迭代结果作为蚁群算法的初始序列,并按照适应度规则计算出相应的初始信息素,同时通过单一变量法确定合理的参数,蚁群算法采用已有的初始资源和参数值进行后期迭代,根据标准量度自定义函数进行高效寻优,进而求得最终的任务调度序列.仿真结果表明,本策略既具有较高的搜索效率和较强的全局寻优能力,又能有效降低任务完成时间,且在时间负载均衡和寻优速度方面均显著优于蚁群算法和蚁群退火算法.  相似文献   

16.
Strategies for dynamic load balancing on highly parallel computers   总被引:5,自引:0,他引:5  
Dynamic load balancing strategies for minimizing the execution time of single applications running in parallel on multicomputer systems are discussed. Dynamic load balancing (DLB) is essential for the efficient use of highly parallel systems when solving non-uniform problems with unpredictable load estimates. With the evolution of more highly parallel systems, centralized DLB approaches which make use of a high degree of knowledge become less feasible due to the load balancing communication overhead. Five DLB strategies are presented which illustrate the tradeoff between 1) knowledge - the accuracy of each balancing decision, and 2) overhead - the amount of added processing and communication incurred by the balancing process. All five strategies have been implemented on an Inter iPSC/2 hypercube  相似文献   

17.
Much prior work in AI on various attempts to speed up rule-based systems by parallel processing has been reported. Unfortunately, many of these results indicate that there is limited parallelism to be found when rules are applied to relatively small amounts of data. Thus, one can predict that much greater parallelism can be extracted when rules are applied to large amounts of data. However, traditional compile-time parallelization strategies as developed for main-memory based systems do not scale to large databases. We propose a scalable strategy for the efficient parallel implementation of rule-based systems operating upon large databases. We concentrate on load balancing techniques in a synchronous model of rule execution, where the variance in runtime of the distributed sites is minimized per cycle of rule processing, thus increasing utilization and speedup. We demonstrate that static load balancing techniques are insufficient, and thus low overhead dynamic load balancing is the key to successful scaling. We present a form of dynamic load balancing that is based upon predicting future system loads, rather than conventional demand-driven approaches that monitor current system state. We analyze a number of possible predictive dynamic load balancing protocols by isoefficiency analysis to guide the design of a parallel database rule processing system.  相似文献   

18.
In this paper, we study parallel branch and bound on fine grained hypercube multiprocessors. Each processor in a fine grained system has only a very small amount of memory available. Therefore, current parallel branch and bound methods for coarse grained systems ( 1000 nodes) cannot be applied, since all these methods assume that every processor stores the path from the node it is currently processing back to the node where the process was created (the back-up path). Furthermore, the much larger number of processors available in a fine grained system makes it imperative that global information (e.g. the current best solution) is continuously available at every processor; otherwise the amount of unnecessary search would become intolerable. We describe an efficient branch-and-bound algorithm for fine grained hypercube multiprocessors. Our method uses a global scheme where all processors collectively store all back-up paths such that each processor needs to store only a constant amount of information. At each iteration of the algorithm, all current nodes may decide whether they need to create new children, be pruned, or remain unchanged. We describe an algorithm that, based on these decisions, updates the current back-up paths and distributes global information in O(log m) steps, where m is the current number of nodes. This method also includes dynamic allocation of search processes to processors and provides optimal load balancing. Even if very drastic changes in the set of current nodes occur, our load balancing mechanism does not suffer any slow down.  相似文献   

19.
Dynamic load balancing schemes are significant for efficiently executing nonuniform problems in highly parallel multicomputer systems.The objective is to minimize the total exectuion time of single applications.This paper has proposed an ARID strategy for distributed dynamic load balancing.Its principle and control protocol are described,and te communication overhead,the effect on system stability and the performance efficiency are analyzed.Finally,simulation experiments are carried out to compare the adaptive strategy with other dynamic load balancing schemes.  相似文献   

20.
A parallel pipeline is shown to be a natural method of speeding up a typical computer vision application, face inspection using `eigenfaces'. Faces within a stream of video images are continuously surveyed in a manner akin to a `conveyor belt' inspection process. The parallelisation is a new exemplar of a scheme for the rapid prototyping of large-scale, multi-algorithm applications suitable for transfer to a message-passing multicomputer. A general solution, pipelined processor farms, is preferred to a customised solution. This paper gives details of the software tools and software engineering methods employed to tackle this class of problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号