首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
We present a framework that uses data dependency information to automate load balanced volume distribution and ray-task scheduling for parallel visualization of massive volumes. This dependency graph approach improves load balancing for both ray casting and ray tracing. The main bottlenecks in distributed volume rendering involve moving data across the network and loading memory into rendering hardware. Our load balancing solution combines static network distribution with dynamic ray-task scheduling. At the core of the dependency graph approach are the flex-block tree, introduced in this paper, and the cell-tree. The flex-block tree is similar to a kd-tree except that leaf nodes are cells containing a combination of empty space and tightly cropped subvolumes, or flex-blocks. A main contribution of this paper is the moving walls algorithm, which uses dynamic programming to create a flex-block partition. We show results for optimizing distributed ray cast rendering using a time cost function. We compare data distribution using the moving walls algorithm, with distribution using a recursive solution, and with a grid combined with a local kd-tree partition on each render-node.
Arie KaufmanEmail:
  相似文献   

2.
The use of arc consistency algorithms can greatly reduce the search space of a constraint satisfaction problem by preprocessing and eliminating variable assignments which can never participate in a solution. This paper examines two variations in the frequency of communications of three parallel arc consistency algorithms on distributed memory machines. This paper also examines the effect of simple load balancing versus the more robust mean field annealing graph partitioning heuristic on speedup. Through actual machine experimentation, we compute time required for the algorithms based on such parameters as distribution of work, frequency of message passing, and load balancing. Results indicate that the configuration with less frequent communications and the robust load balancing yields the best speedup.  相似文献   

3.
4.
Many parallel and distributed strategies were created to reduce the execution time of bioinformatics algorithms. One well-known bioinformatics algorithm is the Smith–Waterman, that may be parallelized using the wavefront method. When the wavefront is distributed across many heterogeneous nodes, it must be balanced to create a synchronous data flow. This is a very challenging problem if the nodes have variable computational power. This paper presents an agent-based solution for parallel biological sequence comparison applications that use the multi-node wavefront method. In our approach, autonomous agents are able to identify unbalanced computations and dynamically rebalance the load among the nodes. Two strategies were developed to the balancer agent in order to identify if the computations are balanced, one using global information and other using only local information. The global strategy demands a huge amount of data transfers, incurring in more communication, whereas the local strategy can decide about the balancing status using only local information. The results show that the balancing gains of strategies are very close. Thus, the local strategy is preferred, since it can be implemented in real wavefront balancers with almost the same benefits as the global strategy.  相似文献   

5.
非规则数据场并行体绘制算法   总被引:1,自引:0,他引:1  
并行算法是实现体绘制加速的重要途径,然而现有的并行体制绘制算法大部分是针对规则数据场的。  相似文献   

6.

Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.

  相似文献   

7.
We study the static load balancing problem in a distributed computer system that consists of a set of heterogeneous computer systems interconnected by a star network with two-way traffic. We formulate the static load balancing problem as a nonlinear optimization problem which minimizes the mean response time. We prove that in the optimal solution the satellite nodes in the star network can be divided into four different types: the idle source nodes, the active source nodes, the neutral nodes, and the sink nodes. The necessary and sufficient conditions for optimal solution are studied. An efficient algorithm of complexity O(n) is proposed for the static load balancing of an n-satellite system. The effects of link communication time on optimal load balancing in a star network are also studied by parametric analysis. By employing the proposed algorithm, a significant system performance improvement over that without load balancing is illustrated in a numerical example. The numerical example also shows that the effects of the link communication time in a star network are large.  相似文献   

8.
基于Hadoop的FP-Growth关联规则并行改进算法   总被引:1,自引:0,他引:1  
大数据环境下,传统的串行FP-Growth算法在处理海量数据时,占用内存过大、频繁项多,适用于大数据情况的PFP(Parallel FP-Growth)算法存在数据量增大无法处理的缺陷。针对这些问题,本文提出了基于Hadoop的负载均衡数据分割FP-Growth并行算法。在Hadoop平台下,本文使用负载均衡和数据分割相结合的方式对原始事务数据集分片实现并行化。实验证明基于Hadoop的负载均衡数据分割FP-Growth并行算法在处理数据量和效率上有所提高。  相似文献   

9.
In this paper, a new adaptive scheme is presented for dynamic load balancing in a message-passing multicomputer. The scheme is based on using easy-to-implement heuristics and adaptive threshold in balancing the system load among dispersed nodes. It uses a distributed control over all computer nodes as coordinated by an information collector. Four heuristic methods are presented here, which are distinguished by the ranges for location and threshold update policies and by the disciplines used for determining the load transfer destination. A parallel simulator with distributed load balancers is developed on an iPSC/2 hypercube multi-computer. The load balancing scheme is evaluated on the basis of the effects of system utilization, load imbalance, communication and migration overhead, and multicomputer size. Relative merits of the four methods are revealed under various physical configurations of the multicomputer. The application potentials are discussed for parallel execution of AI-oriented programs and distributed semantic network data bases.  相似文献   

10.
In this paper, we present a topology-aware load balancing algorithm for parallel multi-core machines and its proof of asymptotic convergence to an optimal solution. The algorithm, named HwTopoLB, aims to improve the application performance by reducing core idleness and communication delays. HwTopoLB was designed taking into account the properties of current parallel systems composed of multi-core compute nodes, namely their network interconnection, and their complex and hierarchical core topology. The latter comprises multiple levels of cache, and a memory subsystem with NUMA design. These systems provide high processing power at the expense of asymmetric communication costs, which can hamper the performance of parallel applications depending on their communication patterns if ignored. Our load balancing algorithm models asymmetries in terms of latencies and bandwidths, representing the distances and communication costs among hardware components. We have implemented HwTopoLB using the Charm++ Parallel Runtime System and evaluated its performance with two different benchmarks and one application. Our experimental results with HwTopoLB exhibit scalability over clustered multi-core compute nodes, and average performance improvements of 23% over execution without load balancers and 19% over the existing load balancing strategies on different multi-core systems.  相似文献   

11.
Scheduling large-scale application in heterogeneous grid systems is a fundamental NP-complete problem that is critical to obtain good performance and execution cost. To achieve high performance in a grid system it requires effective task partitioning, resource management and load balancing. The heterogeneous and dynamic nature of a grid, as well as the diverse demands of applications running on the grid, makes grid scheduling a major task. Existing schedulers in wide-area heterogeneous systems require a large amount of information about the application and the grid environment to produce reasonable schedules. However, this required information may not be available, may be too expensive to collect, or may increase the runtime overhead of the scheduler such that the scheduler is rendered ineffective. We believe that no one scheduler is appropriate for all grid systems and applications. This is because while data parallel applications in which further data partitioning is possible can be further improved by efficient management of resources, smart selection of resources and load balancing can be possible, in functional/not-dividable-task parallel applications such partitioning is either not possible or difficult or expensive in term of performance. In this paper, we propose a scheduler for data parallel applications (SDPA) which offers an efficient task partitioning and load balancing strategy for data parallel applications in grid environment. The proposed SDPA offers two major features: maintaining job priority even if insufficient number of free resources is available and pre-task assignment to cut the idle time of nodes. The SDPA selects nodes smartly according to the nature of task and the nodes’ resources availability. Simulation results conducted reveal that SDPA achieves performance improvement over reported strategies in the reviewed literature in terms of execution time, throughput and waiting time.  相似文献   

12.
并行问题和最短路径问题已成为一个热点研究课题,传统的最短路径算法已不能满足数据爆炸式增长的处理需求,尤其当网络规模很大时,所需的计算时间和存储空间也大大的增加;MapReduce模型的出现,带来了一种新的解决方法来解决最短路径;GPU具有强大的并行计算能力和存储带宽,与CPU相比具有明显的优势;通过研究MapReduce模型和GPU执行过程的分析,指出单独基于MapReduce模型的最短路径并行方法存在的问题,降低了系统的性能;论文的创新点是结合MapReduce和GPU形成双并行模型,并行预处理数据,针对最短路径中的数据传输和同步开销,增加数据动态处理器;最后实验从并行算法的性能评价指标平均加速比进行比较,结果表明,双重并行环境下的最短路径的计算,提高了加速比。  相似文献   

13.
In this paper we describe the design of a distributed animation system built using the Java language, a Parallel Virtual Machine platform, and the World-Wide Web. We focus on two aspects. One is the design of a platform to support distributed 3D animation, the other is the improvement of the efficiency of the parallel computing. Due to the collaborative and distributed nature of the Web, the Web browser is integrated with the distributed computing system like a Parallel Virtual Machine. The model emphasizes the separation of interface and function. It provides a very friendly and portable interface to manipulate the PVM console and the 3D animation system. To improve the efficiency of the parallel computing, we propose a new load balancing strategy, called global distributed control to balance the load in the network processors. The algorithm not only has the ability to dynamically adjust to the load imbalance, but also has the fault tolerance ability. It performs the best when it is compared with three traditional load balancing schemes.  相似文献   

14.
面向大型网格模型的简化问题,提出了一种基于顶点聚类方法采用多数据流策略的并行核外模型简化算法。算法首先将传统顶点聚类简化算法中的代表点计算方法改进为顶点筛选方法,进而设计了一种适用于分布式计算环境的数据外存储策略,最后采用多数据流的思想改进单元筛选与顶点筛选两个方法的执行过程,从而形成完整的并行核外模型简化算法。实验结果表明,该算法有效避免了基于区域分解的并行算法对模型结构的破坏,提高了模型简化的质量;相比于多种现有的并行算法,该算法极大程度优化了并行资源的负载分配问题,具备更为理想的加速比和并行效率。  相似文献   

15.
Multidimensional adaptive sampling technique is crucial for generating high quality images with effects such as motion blur, depth-of-field and soft shadows, but it costs a lot of memory and computation time. We propose a novel kd-tree based parallel adaptive rendering approach. First, a?two-level framework for adaptive sampling in parallel is introduced to reduce the computation time and control the memory cost: in the prepare stage, we coarsely sample the entire multidimensional space and use kd-tree structure to separate it into several multidimensional subspaces; in the main stage, each subspace is refined by a sub kd-tree and rendered in parallel. Second, novel kd-tree based strategies are introduced to measure space’s error value and generate anisotropic Poisson disk samples. The experimental results show that our algorithm produces better quality images than previous ones.  相似文献   

16.
Parallel joins have been widely studied during the past decade and a number of efficient algorithms were presented. While it is known that the performance of these algorithms may suffer greatly in the presence of skewed input data, the work on load balancing schemes for parallel join has been limited. The main contribution of this paper is the development and analysis of a new distributed data structure and an effective load balancing scheme for parallel main memory hash join on NUMA architecture. Multiprocessors based on this architecture are scalable in both size of main memory and number of processors, and provide very high memory bandwidth. The load balancing scheme is based on random probing to avoid the hot spot problems caused by probing sequentially. We have modeled this load balancing scheme both analytically and experimentally. The experiments were run on a BBN TC2000 multiprocessor system  相似文献   

17.
《Parallel Computing》2014,40(10):754-767
The processing of massive amounts of data on clusters with finite amount of memory has become an important problem facing the parallel/distributed computing community. While MapReduce-style technologies provide an effective means for addressing various problems that fit within the MapReduce paradigm, there are many classes of problems for which this paradigm is ill-suited. In this paper we present a runtime system for traditional MPI programs that enables the efficient and transparent out-of-core execution of distributed-memory parallel programs. This system, called BDMPI,1 leverages the semantics of MPI’s API to orchestrate the execution of a large number of MPI processes on much fewer compute nodes, so that the running processes maximize the amount of computation that they perform with the data fetched from the disk. BDMPI enables the development of efficient out-of-core parallel distributed memory codes without the high engineering and algorithmic complexities associated with multiple levels of blocking. BDMPI achieves significantly better performance than existing technologies on a single node as well as on a small cluster, and performs within 30% of optimized out-of-core implementations.  相似文献   

18.
On runtime parallel scheduling for processor load balancing   总被引:3,自引:0,他引:3  
Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides high-quality load balancing. This paper presents an overview of the parallel scheduling technique. Scheduling algorithms for tree, hypercube, and mesh networks are presented. These algorithms can fully balance the load and maximize locality at runtime. Communication costs are significantly reduced compared to other existing algorithms  相似文献   

19.
谭鹤毅 《测控技术》2017,36(6):109-111
针对分布式多核节点系统的负载均衡难以取得最优解的问题,提出了一种基于改进极值优化的负载均衡方法.该方法通过节点的CPU占用率发现负载不均衡情况,然后用一个衡量模型估计计算与通信开销使改进的极值优化方法能够实现集群的负载均衡.仿真与实验结果表明该算法能够提高分布式集群的计算效率,是一种理想的负载均衡算法.  相似文献   

20.
基于遗传算法的动态负载平衡研究   总被引:1,自引:0,他引:1  
在很多应用中都出现负载平衡的问题,但是更重要的是,负载平衡在并行分布式计算系统中起到不同寻常的作用。以工作站机群为代表的网络计算环境是当前并行计算和分布式系统的研究重点之一,解决异构性问题和动态负载平衡是使用机群进行网络并行计算的关键。文章介绍如何使用遗传算法解决动态负载平衡的问题,以及在实现系统中所采用的一些关键性策略、方法和技术。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号