首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 151 毫秒
1.
The objective of the study was to achieve balanced load among processors, reduce the communication overhead of the load balancing algorithm, and improve respource utilization, which results in better average resonse time. A communication protocol and a fully distributed algorithm for dynamic load balancing through task migration in a connected N-processor network are presented. Each processor communicates its load directly with only a subset (of the size √ N) of processors, reducing communication traffic and average response time. It is proved that the given algorithm will perform task migration even if there is only one light load processor and one heavy load processor in the system. Simulation results show that the proposed scheme can save up to 60% of the protocol messages used by the broadcast algorithms and can reduce the average response time  相似文献   

2.
Load balancing a distributed/parallel system consists in allocating work (load) to its processors so that they have to process approximately the same amount of work or amounts in relation with their computation power. In this paper, we present a new distributed algorithm that implements the Most to Least Loaded (M2LL) policy. This policy aims at indicating pairs of processors, that will exchange loads, taking into account actually broken edges as well as the current load distribution in the system. The M2LL policy fixes the pairs of neighboring processors by selecting in priority the most loaded and the least loaded processor of each neighborhood. Our first and main result is that the M2LL distributed implementation terminates after at most (n/2)⋅d t iterations where n and d t are respectively the number of nodes and the degree of the system at time t. We then present a performance comparison between Generalized Adaptive Exchange (GAE) that uses M2LL and Relaxed First Order Scheme (RFOS), two load balancing algorithms for dynamic networks in which only link failures are considered. The comparison is carried out on a dedicated test bed that we have designed and implemented to this end. Our second important result is that although generating more communications, the GAE algorithm with the M2LL policy is faster than RFOS in balancing the system load. In addition, GAE M2LL is able to achieve a more stable balanced state than RFOS and scales well.  相似文献   

3.
一种支持分布式进程迁移的动态负载平衡征募算法的研究   总被引:1,自引:0,他引:1  
负载平衡是分布式系统必须考虑的问题,本文介绍的征募算法独立于网络拓扑结构,其思想可以应用到分布式系统中,征募算法的设计思想向传统负载平衡算法提出了挑战,它不但克服了投标算法的缺点,而且在减小通讯开销和提高处理机利用率两方面作了很多努力,使其成为一种高效的分布式进程迁移和动态负载平衡策略。我们在分布式UNIX系统上实现并验证了征募算法的高效性。  相似文献   

4.
Effective load distribution and resource management is of great importance in designing complex distributed systems as grid. This pre-assumes the capability of partitioning the arriving jobs into independent tasks that can be executed simultaneously, assigning the tasks to processors and scheduling the task execution on each processor. A simulation model, consisting of two homogeneous clusters, is considered to evaluate the performance for various workloads. The Deferred policy is applied to collect global system information about processor queues. This paper proposes a special scheduling method referred to as task clustering method. We examine the efficiency of two task routing policies – one static and one adaptive – and six task scheduling policies, which rearrange processor queues regarding to a criterion. Our simulation results indicate that the adaptive task routing policy in conjunction with SGFS-ST scheduling algorithm, which uses more efficiently the task clustering method, leads to a significant performance improvement.  相似文献   

5.
We present a simple and efficient mutual exclusion algorithm whose optimal message passing complexity isO(N), whereNis the number of processors in the network. The message complexity is measured by counting the number of communication hops in a network for a given topology. This algorithm reduces its message passing complexity by a token-chasing method, and enhances its effectiveness by dynamically adjusting state information stored in each processor. Moreover, this algorithm shortens the request delay by fully taking advantage of the network dynamic status information. The performance of the algorithm is also modeled for analytical evaluation. We have conducted a group of experiments on a network of workstations for comparisons between our algorithm and two other existing mutual exclusion algorithms. The experimental results show the effectiveness of our algorithm, especially when a large number of requests access the critical region in a distributed system. Finally, the token-chasing algorithm is further enhanced for fault tolerance under message loss and link crash conditions.  相似文献   

6.
This paper presents a theoretical analysis of the Load Balancing Problem (LBP) in a network of processing units. The performance objective is to minimize the makespan, i.e., the time spent to finish all jobs in a network of processing units. Because of the communication delay that results from the network topology, it is impossible to have a strategy which obtains the exact optimum under all load distributions. Instead, we measure the information efficiency of a load balancing policy by the worst case ratio of the solution (for each load distribution) of a load balancing policy to the optimal solution (for the same load distribution) assuming that processors have complete information about the load distribution over the network. This ratio is called the competitive ratio of this strategy [17, 24, 34]. In particular, a policy is calledcompetitiveif this ratio is bounded by a constant. As a first step, we discuss the centralized LBP, where all the processors have complete information of the load distribution over a network. Its solution serves as a benchmark to compare with realistic strategies, both in theoretical analysis, and experimental and simulational studies of distributed algorithms. We show that when jobs have different sizes, even with preemptive scheduling, LBP is NP–complete. When the jobs are of the same size, we give a polynomial algorithm, using network–flow techniques, which extends to approximate solutions for jobs of different sizes. We apply this benchmark solution in order to analyze the competitiveness for three network topologies: completely connected graphs, rings, and hierarchical completek-ary trees. The constant competitive ratio results for complete network and hierarchical completek-ary trees are applied to a study on the issues of network designs suitable for the LBP. We further discuss the problem for general networks with jobs of different sizes for slightly weaker results than those for the constant competitive ratio requirement. Finally, we comment on the related issues of job partitioning over parallel/distributed systems.  相似文献   

7.
Considern 2 processors arranged in ann × n torus network in which each processor is connected by direct communication channels with its four neighbours. This paper studies the followingverification problem on anonymousn × n torus networks: verify whether the network is oriented; that is, verify whether there is an agreement, among all processors, on a consistent channel labelling. The problem is to be solved by a distributed algorithm executed by the processors themselves. If processors can label their channels arbitrarily, then there are network labellings that are not oriented but, to the processors, are indistinguishable from ones that are oriented. Hence there is no deterministic distributed verification algorithm. However, a verification algorithm does exist if the initial labellings are suitably restricted. We describe the restrictions placed on the initial labellings by subsets of the permutation groupS 4. We show that the existence of an algorithm for verification is equivalent to the existence of certain tilings of the torus with Wang tiles. Using this equivalence, we have determined the existence of a distributed algorithm for the verification problem for alln × n torus networks for an important class of restrictions, the subgroups ofS 4.  相似文献   

8.
The prevalence of dynamic-content web services, exemplified by search and online social networking, has motivated an increasingly wide web-facing front end. Horizontal scaling in the Cloud is favored for its elasticity, and distributed design of load balancers is highly desirable. Existing algorithms with a centralized design, such as Join-the-Shortest-Queue (JSQ), incur high communication overhead for distributed dispatchers.We propose a novel class of algorithms called Join-Idle-Queue (JIQ) for distributed load balancing in large systems. Unlike algorithms such as Power-of-Two, the JIQ algorithm incurs no communication overhead between the dispatchers and processors at job arrivals. We analyze the JIQ algorithm in the large system limit and find that it effectively results in a reduced system load, which produces 30-fold reduction in queueing overhead compared to Power-of-Two at medium to high load. An extension of the basic JIQ algorithm deals with very high loads using only local information of server load.  相似文献   

9.
Decomposition of knowledge for concurrent processing   总被引:1,自引:0,他引:1  
In some environments, it is more difficult for distributed systems to cooperate. In fact, some distributed systems are highly heterogeneous and might not readily cooperate. In order to alleviate these problems, we have developed an environment that preserves the autonomy of the local systems, while enabling distributed processing. This is achieved by: modeling the different application systems into a central knowledge base (called a Metadatabase); providing each application system with a local knowledge processor; and distributing the knowledge within these local shells. This paper is concerned with describing the knowledge decomposition process used for its distribution. The decomposition process is used to minimize the needed cooperation among the local knowledge processors, and is accomplished by “serializing” the rule execution process. A rule is decomposed into an ordered set of subrules, each of which is executed in sequence and located in a specific local knowledge processor. The goals of the decomposition algorithm are to minimize the number of subrules produced, hence reducing the time spent in communication, and to assure that the sequential execution of the subrules is “equivalent” to the execution of the original rule  相似文献   

10.
In distributed systems, an application program is divided into several software modules, which need to be allocated to processors connected by communication links. The distributed system reliability (DSR) could be defined as the probability of successfully completing the distributed program. Previous studies about optimal task allocation with respect to DSR focused on the effects of the inter-connectivity of processors, the failure rates of the processors, and the failure rates of the communication links. We are the first to study the effects of module software reliabilities and module execution frequencies on the optimal task allocation. By viewing each module as a state in the Markov process, we build a task allocation decision model to maximize DSR for distributed systems with 100% reliable network. In this model, the DSR is derived from the module software reliabilities, the processor hardware reliabilities, the transition probabilities between modules, and the task allocation matrix. Resource constraints of memory space limitation and computation load limitation on each processor are considered. The constraint of total system cost, including the execution cost, the communication cost, and the failure cost, is also considered. We solve the problem by Constraint Programming using the ILOG SOLVER library. We then apply the proposed model to a case extended from previous studies. Finally, a sensitivity analysis is performed to verify the effects of module software reliabilities and processor hardware reliabilities on the DSR and on the task allocation decision.  相似文献   

11.
A load processor is a system that has a buffer which can receive load and store it while it is waiting to be processed and has a local decision-making policy for determining if portions of its load should be sent to other load processors. A load balancing system is a set of such load processors that are connected in a network so that (i) they can sense the amount of load in the buffers of neighbouring processors and pass load to them, and (ii) so that, via local information and decisions by the individual load processors, the overall load in the entire network can be balanced. Such balancing is important to ensure that certain processors are not overloaded while others are left idle (i.e. load balancing helps avoid underutilization of processing resources). The topology of the network, delays in transporting and sensing load, types of load, and types of local load passing policies all affect the performance of the load balancing system. In this paper, we show how a variety of load balancing systems can be modelled in a discrete event system (DES) theoretic framework, and how balancing properties and performance can be characterized and analysed in a general Lyapunov stability theoretic framework.  相似文献   

12.
The focus of this study is how we can efficiently implement the neural network backpropagation algorithm on a network of computers (NOC) for concurrent execution. We assume a distributed system with heterogeneous computers and that the neural network is replicated on each computer. We propose an architecture model with efficient pattern allocation that takes into account the speed of processors and overlaps the communication with computation. The training pattern set is distributed among the heterogeneous processors with the mapping being fixed during the learning process. We provide a heuristic pattern allocation algorithm minimizing the execution time of backpropagation learning. The computations are overlapped with communications. Under the condition that each processor has to perform a task directly proportional to its speed, this allocation algorithm has polynomial‐time complexity. We have implemented our model on a dedicated network of heterogeneous computers using Sejnowski's NetTalk benchmark for testing. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

13.
Two techniques to perform an irregularly structured Gröbner basis computation (a basic method used in symbolic polynomial manipulation) on distributed memory machines are developed. The first technique, based on relaxation of dependencies present in the sequential computation, exploits coarse-grain parallelism. In this so-called relaxation approach, at every step, each processor reduces a local pair if available, communicates the result and status information from other processors, and updates the local set of pairs and basis. The basis is replicated on each processor while the set of pairs is distributed across the processors. The computation terminates when no pairs are left to be reduced on each processor. A relaxation algorithm based on this approach, along with its experimental results, are provided. The other technique, named quasi-barrier, is developed to enhance the performance of the relaxation algorithm. Using this technique, load balance and performance can be improved by synchronizing p processors when a fraction of the active tasks are completed. The performance enhancement is significant for large numbers of processors when the distribution of pair reduction times is close to exponential. The experimental results obtained on Intel Paragon and IBM SP2 demonstrate the effectiveness of these techniques.  相似文献   

14.
In this paper, we study parallel branch and bound on fine grained hypercube multiprocessors. Each processor in a fine grained system has only a very small amount of memory available. Therefore, current parallel branch and bound methods for coarse grained systems ( 1000 nodes) cannot be applied, since all these methods assume that every processor stores the path from the node it is currently processing back to the node where the process was created (the back-up path). Furthermore, the much larger number of processors available in a fine grained system makes it imperative that global information (e.g. the current best solution) is continuously available at every processor; otherwise the amount of unnecessary search would become intolerable. We describe an efficient branch-and-bound algorithm for fine grained hypercube multiprocessors. Our method uses a global scheme where all processors collectively store all back-up paths such that each processor needs to store only a constant amount of information. At each iteration of the algorithm, all current nodes may decide whether they need to create new children, be pruned, or remain unchanged. We describe an algorithm that, based on these decisions, updates the current back-up paths and distributes global information in O(log m) steps, where m is the current number of nodes. This method also includes dynamic allocation of search processes to processors and provides optimal load balancing. Even if very drastic changes in the set of current nodes occur, our load balancing mechanism does not suffer any slow down.  相似文献   

15.
Diffusion algorithms are some of the most popular algorithms for dynamic load balancing in which loads move from heavily loaded processors to lightly loaded neighbor processors. To achieve a global load balance in a parallel computer, the algorithm is iterated until the load difference between any two processors is smaller than a specified value. Therefore, one fundamental property to be studied is algorithm convergence. Several analytical works on the convergence of different diffusion load balancing algorithms have been carried out, but they treat loads as non-negative real quantities. In this paper, we describe the Diffusion Algorithm Searching Unbalanced Domains (DASUD) algorithm, which uses loads as non-negative integer values and, unlike existing algorithms, reaches a local balance situation where the maximum load difference between any two processor in the set of neighbor processors for each processor is one load unit. The convergence property of an asynchronous implementation of DASUD using integer loads is proven theoretically.  相似文献   

16.
The problem of distributing tasks to processors in a distributed computing system is addressed. A task should be assigned to a processor whose capabilities are most appropriate for the execution of that task and excessive interprocessor communication is avoided. A simple algorithm for task allocation is presented. The execution costs and communication costs of the tasks are represented by arrays. A task is either assigned to a processor or fused with another task using a simple criterion. The execution and communication costs are then modified suitably. The process continues until all the tasks are assigned to processors. This algorithm also facilitates incorporation of various system constraints. It is applicable to random program structures and to a system containing any number of processors.  相似文献   

17.
Observations on using genetic algorithms for dynamic load-balancing   总被引:2,自引:0,他引:2  
Load-balancing problems arise in many applications, but, most importantly, they play a special role in the operation of parallel and distributed computing systems. Load-balancing deals with partitioning a program into smaller tasks that can be executed concurrently and mapping each of these tasks to a computational resource such as a processor (e.g., in a multiprocessor system) or a computer (e.g., in a computer network). By developing strategies that can map these tasks to processors in a way that balances out the load, the total processing time will be reduced with improved processor utilization. Most of the research on load-balancing focused on static scenarios that, in most of the cases, employ heuristic methods. However, genetic algorithms have gained immense popularity over the last few years as a robust and easily adaptable search technique. The work proposed here investigates how a genetic algorithm can be employed to solve the dynamic load-balancing problem. A dynamic load-balancing algorithm is developed whereby optimal or near-optimal task allocations can “evolve” during the operation of the parallel computing system. The algorithm considers other load-balancing issues such as threshold policies, information exchange criteria, and interprocessor communication. The effects of these and other issues on the success of the genetic-based load-balancing algorithm as compared with the first-fit heuristic are outlined  相似文献   

18.
A distributed counter allows each processor in an asynchronous message passing network to access the counter value and increment it. We study the problem of implementing a distributed counter so that no processor is a communication bottleneck. We prove a lower bound of Ω(logn/log logn) on the number of messages that some processor must exchange in a sequence ofncounting operations spread overnprocessors. We propose a counter that achieves this bound when each processor increments the counter exactly once. Hence, the lower bound is tight. Because most algorithms and data structures count in some way, the lower bound holds for many distributed computations. We feel that the proposed concept of a communication bottleneck is a relevant measure of efficiency for a distributed algorithm and data structure, because it indicates the achievable degree of distribution.  相似文献   

19.
Two processors jointly provide a real-time service which can be completed by exactly one processor. Assuming each processor is allowed to announce only a one-bit information in a distributed way to decide which one should process the job, inevitably some of the jobs will get lost if only classical resources are used. In this paper, we proposed the distributed quantum entanglement sharing (DQES) model to share quantum entanglement with processors. Assisted with DQES model, not only the system dependability can be enhanced, but the faulty processor can also be identified. We also presented some possible applications such like database consistency, job scheduling, system dependability, and reliable communication protocols.  相似文献   

20.
针对网络并行环境的计算能力强而通信相对较慢的实际情况,给出了一种局域网上求解线性方程组的并行Gauss-Seidel迭代算法.该算法将线性方程组的系数矩阵及右端项按行分块,然后将分块的系数矩阵及右端项按卷帘方式存储在各处理机,每次迭代通过循环传送已求出的部分解分量以减少处理机间的通信开销,提高并行算法的效率.试验结果表明该算法具有较高的并行效率和加速比.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号