共查询到20条相似文献,搜索用时 15 毫秒
1.
下一代互联网高度可扩展支持服务动态部署.越来越多延时和抖动敏感服务(如IPTV、VoIP等)的应用对BGP路由计算的性能提出了更高的需求.路由器采用分布式控制平面和实现并行BGP路由计算克服集中控制平面的性能瓶颈是解决这个问题的有效途径.但现有并行BGP路由计算方案因负载均衡性能差影响了系统的并行性能.文中基于Hashing技术提出了并行BGP路由计算自适应负载均衡模型.通过在线统计路由更新设计了自适应负载均衡算法P-AP(Prediction-based Adaptive Partition),自适应地动态调整路由更新在处理节点间的分配.最后设计和实现了原型系统,并利用Route Views 收集的BGP Update数据进行实验.实验结果表明,P-AP算法具有负载均衡性能好、负载调整频率小和路由计算加速性能好等特点,能够有效地提高并行BGP路由计算性能. 相似文献
2.
Parallel merge sort is useful for sorting a large quantity of data progressively. The merge sort should be parallelized carefully since the conventional algorithm has poor performance due to the successive reduction of the number of participating processors by half, and down to one in the last merging stage. The proposed load-balanced merge sort utilizes all processors throughout the computation. It evenly distributes data to all processors in each stage. Thus every processor is forced to work in all phases. Significant performance enhancement has been achieved up to a speedup of (P–1)/log P where P is the number of processors. Experimental results demonstrate a speedup of 9.6 (upper bound of 10.7) on 32-processor Cray T3E when sorting 4M 32-bit integers, and a speed up of 2.3 (upper bound of 2.8) on an 8-node PC cluster. 相似文献
3.
《Journal of Parallel and Distributed Computing》2000,60(9):1047-1073
Parallel load balancing is studied for problems with certain bisection properties. A class of problems has α-bisectors if every problem p of weight w(p) in the class can be subdivided into two subproblems whose weight (load) is at least an α-fraction of the original problem. A problem p is to be split into N subproblems such that the maximum weight among them is as close to w(p)/N as possible. It was previously known that good load balancing can be achieved for such classes of problems using Algorithm HF, a sequential algorithm that repeatedly bisects the subproblem with maximum weight. Several parallel variants of Algorithm HF are introduced and analyzed with respect to worst-case load imbalance, running-time, and communication overhead. For fixed α, all variants have running-time O(log N) and provide constant upper bounds on the worst-case load imbalance. Results of simulation experiments regarding the load balance achieved in the average case are presented. 相似文献
4.
《Journal of Parallel and Distributed Computing》1994,22(1):60-79
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics: the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh, and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristics. For each of these architectures, we are able to determine near optimal load balancing schemes. Results obtained from implementation of these schemes in the context of the Tautology Verification problem on the Ncube/2 (a trademark of the Ncube Corporation) multicomputer are used to validate our theoretical results for the hypercube architecture. These results also demonstrate the accuracy and viability of our framework for scalability analysis. 相似文献
5.
We present a new parallel semiconductor device simulation using the dynamic load balancing approach. This semiconductor device
simulation based on the adaptive finite volume method with a posteriori error estimation has been developed and successfully implemented on a 16-PC Linux cluster with a message passing interface
library. A constructive monotone iterative technique is also applied for solution of the system of nonlinear algebraic equations.
Two different parallel versions of the algorithm to perform a complete device simulation are proposed. The first is a dynamic
parallel domain decomposition approach, and the second is a parallel current-voltage characteristic points simulation. This
implementation shows that a well-designed load balancing simulation can significantly reduce the execution time up to an order
of magnitude. Compared with the measured data, numerical results on various submicron VLSI devices are presented, to show
the accuracy and efficiency of the method. 相似文献
6.
Communication networks pose difficult problems for the soft limit real-time control of calls and services. For such highly
parallel distributed systems the system observation limits are rigorously treated. As a consequence a parallel processing
node model and the load parameters and balancing potential are analysed by the use of a suitable simulation model. Based upon
the simulation results, new load balancing algorithms are developed for the respective problem class.
Received: December 23, 1998 相似文献
7.
8.
并行入侵检测系统的负载均衡算法 总被引:9,自引:0,他引:9
提出了并行入侵检测系统 (IDS)的负载均衡算法 ,该算法给每个 IDS探测器设置了一个数据包接受区间 ,通过对数据包的目的 IP地址做散列 (Hash)运算 ,把数据包映射到某个探测器的接受区间内 ;根据探测器的处理能力和负载调节各个探测器接受区间的宽度 ,从而合理分配各个探测器上的网络流量 ,充分利用所有探测器的计算资源 .理论分析和实验结果表明 ,该算法在高带宽环境中有较高的效率 相似文献
9.
10.
Luc Bouganim Daniela Florescu Patrick Valduriez 《Distributed and Parallel Databases》1999,7(1):99-121
To scale up to high-end configurations, shared-memory multiprocessors are evolving towards Non Uniform Memory Access (NUMA) architectures. In this paper, we address the central problem of load balancing during parallel query execution in NUMA multiprocessors. We first show that an execution model for NUMA should not use data partitioning (as shared-nothing systems do) but should strive to exploit efficient shared-memory strategies like Synchronous Pipelining (SP). However, SP has problems in NUMA, especially with skewed data. Thus, we propose a new execution strategy which solves these problems. The basic idea is to allow partial materialization of intermediate results and to make them progressivly public, i.e., able to be processed by any processor, as needed to avoid processor idle times. Hence, we call this strategy Progressive Sharing (PS). We conducted a performance comparison using an implementation of SP and PS on a 72-processor KSR1 computer, with many queries and large relations. With no skew, SP and PS have both linear speed-up. However, the impact of skew is very severe on SP performance while it is insignificant on PS. Finally, we show that, in NUMA, PS can also be beneficial in executing several pipeline chains concurrently. 相似文献
11.
In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log2P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies. 相似文献
12.
《国际计算机数学杂志》2012,89(2):165-177
The iterative Multilevel Averaging Weight (MAW) algorithm presented in paper [1] is modified to solve the dynamic load imbalance problems arising from the two-dimensional short-range parallel molecular dynamics simulations in this paper. Firstly, five types of load balancing models are given which allows detailed studies of the algorithm. In particular, it shows that for strip decomposition, the number of iteration needs for the system to converge from an initially unbalanced state to a well balanced state is bounded by 2 log P , where P is the number of processors. This result can permit the algorithm to efficiently track fluctuations in the molecular density as the simulation progresses, and is much better than that of the Cellular Automaton Diffusion (CAD) scheme presented in paper [2] . Secondly, we apply MAW algorithm to solve the load imbalance problem in the parallel molecular dynamics simulation for higher speed wall collisions. At last, the numerical experimental results and parallel computing performance with MPI-1.2 under a PC-Cluster consists of 64 Pentium-III 500 MHz nodes connected by 100 Mbps Switches are given in this paper. 相似文献
13.
A Parallel Interval Computation Model for Global Optimization with Automatic Load Balancing
下载免费PDF全文
![点击此处可从《计算机科学技术学报》网站下载免费的PDF全文](/ch/ext_images/free.gif)
In this paper,we propose a decentralized parallel computation model for global optimization using interval analysis.The model is adaptive to any number of processors and the workload is automatically and evenly distributed among all processors by alternative message passing.The problems received by each processor are processed based on their local dominance properties,which avoids unnecessary interval evaluations.Further,the problem is treated as a whole at the beginning of computation so that no initial decomposition scheme is required.Numerical experiments indicate that the model works well and is stable with different number of parallel processors,distributes the load evenly among the processors,and provides an impressive speedup,especially when the problem is time-consuming to solve. 相似文献
14.
Shah R. Veeravalli B. Misra M. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(12):1675-1686
In this paper, we address several issues that are imperative to grid environments such as handling resource heterogeneity and sharing, communication latency, job migration from one site to other, and load balancing. We address these issues by proposing two job migration algorithms, which are MELISA (modified ELISA) and LBA (load balancing on arrival). The algorithms differ in the way load balancing is carried out and is shown to be efficient in minimizing the response time on large and small-scale heterogeneous grid environments, respectively. MELISA, which is applicable to large-scale systems (that is, interGrid), is a modified version of ELISA in which we consider the job migration cost, resource heterogeneity, and network heterogeneity when load balancing is considered. The LBA algorithm, which is applicable to small-scale systems (that is, intraGrid), performs load balancing by estimating the expected finish time of a job on buddy processors on each job arrival. Both algorithms estimate system parameters such as the job arrival rate, CPU processing rate, and load on the processor and balance the load by migrating jobs to buddy processors by taking into account the job transfer cost, resource heterogeneity, and network heterogeneity. We quantify the performance of our algorithms using several influencing parameters such as the job size, data transfer rate, status exchange period, and migration limit, and we discuss the implications of the performance and choice of our approaches. 相似文献
15.
16.
The numerical method used to solve hyperbolic conservation laws is often an explicit scheme. As a commonly used technique to improve the quality of numerical simulation, the $h$ -adaptive mesh method is adopted to resolve sharp structures in the solution. Since the computational costs of altering the mesh and solving the PDEs are comparable, too often the mesh adaption triggered may bring down the overall efficiency of solving hyperbolic conservation laws using $h$ -adaptive mesh method. In this paper, we propose a so-called double tolerance adaptive strategy to optimize the overall numerical efficiency by reducing the number of mesh adaptions, as well as preserving the quality of the numerical solution. Numerical results are presented to demonstrate the robustness and effectiveness of our $h$ -adaptive algorithm using the double tolerance adaptive strategy. 相似文献
17.
This paper examines the effectiveness of load balancing strategies for ray tracing on large parallel computer systems and cluster computers. Popular static load balancing strategies are shown to be inadequate for rendering complex images with contemporary ray tracing algorithms, and for rendering NTSC resolution images on 128 or more computers. Strategies based on image tiling are shown to be ineffective except on very small numbers of computers. A dynamic load balancing strategy, based on a diffusion model, is applied to a parallel Monte Carlo rendering system. The diffusive strategy is shown to remedy the defects of the static strategies. A hybrid strategy that combines static and dynamic approaches produces nearly optimal performance on a variety of images and computer systems. The theoretical results should be relevant to other rendering and image processing applications. 相似文献
18.
Ali Kemal Sinop Tolga Abaci Ümit Akku Attila Gürsoy Ugur Güdükbay 《The Journal of supercomputing》2005,31(3):249-263
In this paper, we present a parallel system called PHR for computing hierarchical radiosity solutions of complex scenes. The system is targeted for multi-processor architectures with distributed memory. The system evaluates and subdivides the interactions level by level in a breadth first fashion, and the interactions are redistributed at the end of each level to keep load balanced. In order to allow interactions freely travel across processors, all the patch data is replicated on all the processors. Hence, the system favors load balancing at the expense of increased communication volume. However, the results show that the overhead of communication is negligible compared with total execution time. We obtained a speed-up of 25 for 32 processors in our test scenes. 相似文献
19.
《Journal of Parallel and Distributed Computing》1994,22(3):506-522
Much prior work in AI on various attempts to speed up rule-based systems by parallel processing has been reported. Unfortunately, many of these results indicate that there is limited parallelism to be found when rules are applied to relatively small amounts of data. Thus, one can predict that much greater parallelism can be extracted when rules are applied to large amounts of data. However, traditional compile-time parallelization strategies as developed for main-memory based systems do not scale to large databases. We propose a scalable strategy for the efficient parallel implementation of rule-based systems operating upon large databases. We concentrate on load balancing techniques in a synchronous model of rule execution, where the variance in runtime of the distributed sites is minimized per cycle of rule processing, thus increasing utilization and speedup. We demonstrate that static load balancing techniques are insufficient, and thus low overhead dynamic load balancing is the key to successful scaling. We present a form of dynamic load balancing that is based upon predicting future system loads, rather than conventional demand-driven approaches that monitor current system state. We analyze a number of possible predictive dynamic load balancing protocols by isoefficiency analysis to guide the design of a parallel database rule processing system. 相似文献
20.
一种并行BP神经网络的动态负载平衡方案 总被引:2,自引:0,他引:2
为了加快在大规模神经网络训练下并行技术的训练速度问题,从BP算法的内部结构分析了BP神经网络算法的大规模行划分方法,提出了一种动态负载平衡方案。通过在PC集群环境下对并行算法的试验结果表明.这种并行划分提高了加速比,具有现实意义。 相似文献