首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The two-way stripes partition mapping and the greedy assignment mapping are proposed to map finite element graphs composed of a number of rectilinear four-node elements on hypercubes. The two-way stripes partition mapping is a two-phase mapping approach. In the first phase a two-way stripes partition heuristic is used to lower the communication cost. In the second phase the load transfer heuristic is used to balance the computational load among processors. The greedy assignment mapping tries to minimize the communication cost and balance the computational load of processors simultaneously. Our simulation results show that the speedups for the two-way stripes partition mapping are better than those for the greedy assignment mapping when the load balancing criterion is achieved in both approaches (that is, the number of nodes in each processor is at most one more than the number of nodes in any other processor). However, the greedy approach performs well at a much lower cost.The work of this author was supported in part by NSF under contract CCR-9110812.  相似文献   

2.
Amnon Barak  Amnon Shiloh 《Software》1985,15(9):901-913
This paper deals with the organization of a distributed load-balancing policy for a multicomputer system which consists of a cluster of independent computers that are interconnected by a local area communication network. We introduce three algorithms necessary to maintain load balancing in this system: the local load algorithm, used by each processor to monitor its own load; the exchange algorithm, for exchanging load information between the processors, and the process migration algorithm that uses this information to dynamically migrate processes from overloaded to underloaded processors. The policy that we present is distributed, i.e. each processor uses the same policy. It is both dynamic, responding to load changes without using an a priori knowledge of the resources that each process requires; and stable, unnecessary overloading of a processor is minimized. We give the essential details of the implementation of the policy and initial results on its performance. Our results confirm the feasibility of building distributed systems that are based on network communication for uniform access, resource sharing and improved reliability, as well as the use of workstations without a secondary storage device.  相似文献   

3.
To solve the load imbalance problem of a solution-adaptive finite element application program on a distributed memory multicomputer, nodes of a refined finite element graph can be remapped to processors or load of a refined finite element graph can be redistributed based on the current load of each processor. For the former case, remapping can be performed by some fast mapping algorithms. For the latter case, a load-balancing algorithm can be applied to balance the computational load of each processor. In this paper, three tree-based parallel load-balancing methods, the MCSTLB method, the BTLB method, and the CBTLB method, were proposed to deal with the load imbalance problems of solution-adaptive finite element application programs. To evaluate the performance of the proposed methods, we have implemented those methods along with three mapping methods, the AE/ORB method, the AE/MC method, and the MLkP method, on an SP2 parallel machine. Three criteria, the execution time of mapping/load-balancing methods, the execution time of a solution-adaptive finite element application program under different mapping/load-balancing methods, and the speedups achieved by mapping/load-balancing methods for a solution-adaptive finite element application program, are used for the performance evaluation. The experimental results show that 1) if the initial mapping is performed by a mapping method and the same mapping method and load-balancing methods were used in each refinement to balance the load of processors, the execution time of an application program under a load-balancing method is always shorter than that of the mapping method, and 2) the execution time of an application program under the CBTLB method is shorter than that of the BTLB method and the MCSTLB method  相似文献   

4.
Multicomputer cache simulation results derived from address traces collected from an Intel iPSC/2 hypercube multicomponent are presented. The primary emphasis is on examining how increasing the number of processor nodes executing a parallel application affects the overall multicomputer cache performance. The effects on multicomputer direct-mapped cache performance of application-specific data partitioning, data access patterns, communication distribution, and communication frequency are illustrated. The effects of system accesses on total cache performance are explored, as well as the reasons for application-specific differences in cache behavior for system and user accesses. Comparing user code results with full user and system code analysis reveals the significant effect of system accesses, and this effect increases with multicomputer size. The time distribution of an application's message-passing operations is found to more strongly affect cache performance than the total amount of time spent in message-passing code  相似文献   

5.
The growing importance of expert systems in real-time applications reveals the necessity of reducing response times. Since monoprocessor optimizations of production systems have widely been explored, only multiple processor architectures appear to provide further performance gain. Efficient exploitation of the inherent parallelism of production systems, however, requires suitable algorithms for load balancing without simultaneously increasing organization or communication overhead. We present a novel parallel algorithm for PAMELA expert systems, based on dynamic distribution of data processing. The concept is supported by a transputer based architecture with an advanced interconnection structure.  相似文献   

6.
We study the static load balancing problem in a distributed computer system that consists of a set of heterogeneous computer systems interconnected by a star network with two-way traffic. We formulate the static load balancing problem as a nonlinear optimization problem which minimizes the mean response time. We prove that in the optimal solution the satellite nodes in the star network can be divided into four different types: the idle source nodes, the active source nodes, the neutral nodes, and the sink nodes. The necessary and sufficient conditions for optimal solution are studied. An efficient algorithm of complexity O(n) is proposed for the static load balancing of an n-satellite system. The effects of link communication time on optimal load balancing in a star network are also studied by parametric analysis. By employing the proposed algorithm, a significant system performance improvement over that without load balancing is illustrated in a numerical example. The numerical example also shows that the effects of the link communication time in a star network are large.  相似文献   

7.
In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log2P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies.  相似文献   

8.
Multicomputer systems achieve high performance by utilizing a number of computing nodes. Recently, by achieving significant reductions in communication delay, the three-dimensional (3D) torus has emerged as a new candidate interconnection topology for message-passing multicomputer systems. In this paper, we propose an efficient processor allocation scheme-scan search scheme-for the 3D torus based on a first-fit approach. The scan search scheme minimizes the average allocation time for an incoming task by effectively manipulating the 3D information on a torus as 2D information using a data structure called the CST (Coverage Status Table). Comprehensive computer simulation reveals that the allocation time of the scan search scheme is always smaller than that of the earlier scheme based on a best-fit approach. The difference gets larger as the input load increases, and it is as much a factor of 3 for high load. To investigate the performance of the proposed scheme in different scheduling environments, we also consider a non-FCFS scheduling policy along with the typical FCFS policy. The allocation time complexity of the scan search scheme is O(LW2H2). This is significantly smaller than that of the existing scheme which is O(L4W4H4). Here, L, W, and H represent the length, width, and height of 3D torus, respectively  相似文献   

9.
Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method called RBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such as AQ‐type rule induction programs, or extracted from a tree generated by another method, such as the ID3 or C4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably with AQDT‐1 and AQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable.  相似文献   

10.
软件定义网络(SDN)为网络虚拟化提供了新的解决方案,通过网络虚拟化技术可以将一套基础设施虚拟化为多个逻辑网络从而满足不同的网络需求.本文研究了SDN网络虚拟化时多个物理交换机虚拟为一个大虚拟交换机的过程中,虚拟网络规则与物理网络规则的映射问题.综合考虑链路负载、规则分布以及节点负载,提出了三段式规则映射优化算法.首先根据虚拟网络的规则请求生成组播源节点和目的节点集,采用MPH算法生成规则映射树;然后采用入节点最近原则,将虚拟网络规则请求的指令序列部署到规则映射树中的中间节点和叶子节点中;最后考虑节点负载,对规则部署进行微调,最终生成虚拟规则映射策略.通过仿真实验,与直接边缘节点部署相比,平均降低了网络节点规则总数量40%以上.  相似文献   

11.
A method for executing nested loops with constant loop-carried dependencies in parallel on message-passing multiprocessor systems to reduce communication overhead is presented. In the partitioning phase, the nested loop is divided into blocks that reduce the interblock communication, without regard to the machine topology. The execution ordering of the iterations is defined by a given time function based on L. Lamport's (1974) hyperplane method. The iterations are then partitioned into blocks so that the execution ordering is not disturbed, and the amount of interblock communication is minimized. In the mapping phase, the partitioned blocks are mapped onto a fixed-size multiprocessor system in such a manner that the blocks that have to exchange data frequently are allocated to the same processor or neighboring processors. A heuristic mapping algorithm for hypercube machines is proposed  相似文献   

12.
Diffusion Schemes for Load Balancing on Heterogeneous Networks   总被引:1,自引:0,他引:1  
Several different diffusion schemes have previously been developed for load balancing on homogeneous processor networks. We generalize existing schemes, in order to deal with heterogeneous networks. Generalized schemes may operate efficiently on networks where each processor can have arbitrary computing power, i.e., the load will be balanced proportionally to these powers. The balancing flow that is calculated by schemes for homogeneous networks is minimal with regard to the l 2 -norm and we prove this to hold true for generalized schemes, too. We demonstrate the usability of generalized schemes by a number of experiments on several heterogeneous networks.  相似文献   

13.
针对云渲染系统中渲染节点与任务不匹配调度而带来的时间负载不均衡和耗时长的问题,提出一种基于时间负载均衡的任务调度方式来优化系统耗时的策略.该算法采用Min-min与Max-min相结合的思想,建立时间负载均衡模型进行前期迭代,将迭代结果作为蚁群算法的初始序列,并按照适应度规则计算出相应的初始信息素,同时通过单一变量法确定合理的参数,蚁群算法采用已有的初始资源和参数值进行后期迭代,根据标准量度自定义函数进行高效寻优,进而求得最终的任务调度序列.仿真结果表明,本策略既具有较高的搜索效率和较强的全局寻优能力,又能有效降低任务完成时间,且在时间负载均衡和寻优速度方面均显著优于蚁群算法和蚁群退火算法.  相似文献   

14.
Strategies for dynamic load balancing on highly parallel computers   总被引:5,自引:0,他引:5  
Dynamic load balancing strategies for minimizing the execution time of single applications running in parallel on multicomputer systems are discussed. Dynamic load balancing (DLB) is essential for the efficient use of highly parallel systems when solving non-uniform problems with unpredictable load estimates. With the evolution of more highly parallel systems, centralized DLB approaches which make use of a high degree of knowledge become less feasible due to the load balancing communication overhead. Five DLB strategies are presented which illustrate the tradeoff between 1) knowledge - the accuracy of each balancing decision, and 2) overhead - the amount of added processing and communication incurred by the balancing process. All five strategies have been implemented on an Inter iPSC/2 hypercube  相似文献   

15.
关健  刘大昕 《计算机工程》2004,30(5):129-130,149
为了解决基于专家系统的入侵检测系统匹配速度慢,不能适应网络高带宽要求的问题,提出了一种图形化的模型,采用分类树的方法构建规则分析机模型,根据属性在攻击描述的作用,决定节点的选择顺序,并且在搜索过程中采用树的遍历算法代替产生式规则的字符串比较方法,从而有效减少误用检测系统的属性匹配时间,满足了实时性要求。  相似文献   

16.
In this paper, we study parallel branch and bound on fine grained hypercube multiprocessors. Each processor in a fine grained system has only a very small amount of memory available. Therefore, current parallel branch and bound methods for coarse grained systems ( 1000 nodes) cannot be applied, since all these methods assume that every processor stores the path from the node it is currently processing back to the node where the process was created (the back-up path). Furthermore, the much larger number of processors available in a fine grained system makes it imperative that global information (e.g. the current best solution) is continuously available at every processor; otherwise the amount of unnecessary search would become intolerable. We describe an efficient branch-and-bound algorithm for fine grained hypercube multiprocessors. Our method uses a global scheme where all processors collectively store all back-up paths such that each processor needs to store only a constant amount of information. At each iteration of the algorithm, all current nodes may decide whether they need to create new children, be pruned, or remain unchanged. We describe an algorithm that, based on these decisions, updates the current back-up paths and distributes global information in O(log m) steps, where m is the current number of nodes. This method also includes dynamic allocation of search processes to processors and provides optimal load balancing. Even if very drastic changes in the set of current nodes occur, our load balancing mechanism does not suffer any slow down.  相似文献   

17.
A parallel pipeline is shown to be a natural method of speeding up a typical computer vision application, face inspection using `eigenfaces'. Faces within a stream of video images are continuously surveyed in a manner akin to a `conveyor belt' inspection process. The parallelisation is a new exemplar of a scheme for the rapid prototyping of large-scale, multi-algorithm applications suitable for transfer to a message-passing multicomputer. A general solution, pipelined processor farms, is preferred to a customised solution. This paper gives details of the software tools and software engineering methods employed to tackle this class of problem.  相似文献   

18.
Dynamic load balancing schemes are significant for efficiently executing nonuniform problems in highly parallel multicomputer systems.The objective is to minimize the total exectuion time of single applications.This paper has proposed an ARID strategy for distributed dynamic load balancing.Its principle and control protocol are described,and te communication overhead,the effect on system stability and the performance efficiency are analyzed.Finally,simulation experiments are carried out to compare the adaptive strategy with other dynamic load balancing schemes.  相似文献   

19.
Load balancing a distributed/parallel system consists in allocating work (load) to its processors so that they have to process approximately the same amount of work or amounts in relation with their computation power. In this paper, we present a new distributed algorithm that implements the Most to Least Loaded (M2LL) policy. This policy aims at indicating pairs of processors, that will exchange loads, taking into account actually broken edges as well as the current load distribution in the system. The M2LL policy fixes the pairs of neighboring processors by selecting in priority the most loaded and the least loaded processor of each neighborhood. Our first and main result is that the M2LL distributed implementation terminates after at most (n/2)⋅d t iterations where n and d t are respectively the number of nodes and the degree of the system at time t. We then present a performance comparison between Generalized Adaptive Exchange (GAE) that uses M2LL and Relaxed First Order Scheme (RFOS), two load balancing algorithms for dynamic networks in which only link failures are considered. The comparison is carried out on a dedicated test bed that we have designed and implemented to this end. Our second important result is that although generating more communications, the GAE algorithm with the M2LL policy is faster than RFOS in balancing the system load. In addition, GAE M2LL is able to achieve a more stable balanced state than RFOS and scales well.  相似文献   

20.
针对大型ERP系统的网络资源负载均衡问题展开研究。在理论上对问题的需求进行建模分析,设计了满足主机和网络性能约束的启发式目标函数,将模型转化为度约束最小生成树问题,设计了一种模拟退火算法对此问题进行处理,提出了基于该算法的资源负载均衡方案LABS。理论分析表明,整个网络执行该方案的时间复杂度为节点规模的平方阶,说明方案具有较强的可用性与可伸缩性。仿真实验结果显示,通过选择适当的启发因子,算法不仅可以吸纳大部分节点协同参与负载均衡操作,还能够显著减少系统中的瓶颈节点数,降低平均资源使用率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号