期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP

F. Dehne A. Ferreira E. Cáceres S. W. Song A. Roncato 《Algorithmica》2002,33(2):183-200

In this paper we present deterministic parallel algorithms for the coarse-grained multicomputer (CGM) and bulk synchronous parallel (BSP) models for solving the following well-known graph problems: (1) list ranking, (2) Euler tour construction in a tree, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, and (7) 2-edge connectivity and biconnectivity (testing and component computation). The algorithms require O(log p) communication rounds with linear sequential work per round (p = no. processors, N = total input size). Each processor creates, during the entire algorithm, messages of total size O(log (p) (N/p)) . The algorithms assume that the local memory per processor (i.e., N/p ) is larger than p ^ε , for some fixed ε > 0 . Our results imply BSP algorithms with O(log p) supersteps, O(g log (p) (N/p)) communication time, and O(log (p) (N/p)) local computation time. It is important to observe that the number of communication rounds/ supersteps obtained in this paper is independent of the problem size, and grows only logarithmically with respect to p . With growing problem size, only the sizes of the messages grow but the total number of messages remains unchanged. Due to the considerable protocol overhead associated with each message transmission, this is an important property. The result for Problem (1) is a considerable improvement over those previously reported. The algorithms for Problems (2)—(7) are the first practically relevant parallel algorithms for these standard graph problems. Received July 5, 2000; revised April 16, 2001. 相似文献

2.

Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms

Dehne Dittrich Hutchinson 《Algorithmica》2003,36(2):97-122

External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACM Working Group on Storage I/ O for Large-Scale Computing. In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine. When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献

3.

Optimal Read-Once Parallel Disk Scheduling

Mahesh Kallahalla Peter J. Varman 《Algorithmica》2005,43(4):309-343

An optimal prefetching and I/O scheduling algorithm L-OPT, for parallel I/O systems, using a read-once model of block references is presented. The algorithm uses knowledge of the next $L$ references, $L$-block lookahead, to create a minimal-length I/O schedule. For a system with $D$ disks and a buffer of capacity $m$ blocks, we show that the competitive ratio of L-OPT is $\Theta(\sqrt{mD/L})$ when $L \geq m$, which matches the lower bound of any prefetching algorithm with $L$-block lookahead. Tight bounds for the remaining ranges of lookahead are also presented. In addition we show that L-OPT is the optimal offline algorithm: when the lookahead consists of the entire reference string, it performs the absolute minimum possible number of I/Os. Finally, we show that L-OPT is comparable with the best online algorithm with the same amount of lookahead; the ratio of the length of its schedule to the length of the optimal schedule is always within a constant factor. 相似文献

4.

粗粒度并行遗传算法性能分析 总被引：3，自引：0，他引：3

郑志军郑守淇《小型微型计算机系统》2006,27(6):1002-1006

依据实验来分析影响并行遗传算法性能的因素得到的结论缺乏理论上的说服力．通过对粗粒度并行遗传算法加速比公式的分析，提出了影响并行遗传算法性能的关键因素，同时否定了以迁移率作为评价并行遗传算法性能指标的合理性，并通过实难进一步验证结论的正确性．得到的结论为提高遗传算法的并行化效率提供了可靠的依据。相似文献

5.

计算几何与并行算法

下载免费PDF全文

朱和李晓梅《计算机工程与科学》1993,15(3):51-59

本文首先介绍了计算几何的基本概念,论述了计算几何的四个基本问题,即几何搜索问题、相交问题、邻接问题及凸壳问题。然后重点分析了凸壳构造问题,介绍了其最佳串行算法、及相应的并行算法。接着对一些计算几何的串行及并行算法进行了分析比较。最后提出了笔者对新一代并行计算机系统上设计计算几何并行算法的看法。相似文献

6.

Scalable Parallel Genetic Algorithms

Wilson Rivera 《Artificial Intelligence Review》2001,16(2):153-168

Genetic algorithms, search algorithms based on the genetic processes observed in natural evolution, have been used to solve difficult problems in many different disciplines. When applied to very large-scale problems, genetic algorithms exhibit high computational cost and degradation of the quality of the solutions because of the increased complexity. One of the most relevant research trends in genetic algorithms is the implementation of parallel genetic algorithms with the goal of obtaining quality of solutions efficiently. This paper first reviews the state-of-the-art in parallel genetic algorithms. Parallelization strategies and emerging implementations are reviewed and relevant results are discussed. Second, this paper discusses important issues regarding scalability of parallel genetic algorithms. 相似文献

7.

Coarse-Grained Parallel Algorithms for Multi-Dimensional Wavelet Transforms

Yang Linda Misra Manavendra 《The Journal of supercomputing》1998,12(1-2):99-118

This paper presents parallel algorithms for computing multi-dimensional wavelet transforms on both shared memory and distributed memory machines. Traditional data partitioning methods for n-dimensional Discrete Wavelet Transforms (DWTs) call for data redistribution once a one dimensional wavelet transform is computed along each dimension. To avoid the data communication inherent in this redistribution, two new partitioning methods called CRBP (Communication Reduced Block Partitioning) and CRLP (Communication Reduced Layer Partitioning) are proposed. The efficiency of the algorithms is compared through several examples implemented on a cluster of SGI workstations. Two kinds of parallel approaches are used to compute multi-dimensional wavelet transforms on shared memory machines: homogeneous parallelism, and heterogeneous parallelism. Homogeneous parallelism uses traditional data partitioning while heterogeneous parallelism uses the CRBP approach. The effectiveness of these approaches is demonstrated through several examples implemented on an SGI Power Challenge. The paper discusses the effectiveness of each of the approaches on the two kinds of architectures. 相似文献

8.

Scalable Parallel Algorithms for Geometric Pattern Recognition

Laurence Boxer Russ Miller Andrew Rau-Chaplin 《Journal of Parallel and Distributed Computing》1999,58(3):477

This paper considers a variety of geometric pattern recognition problems on input sets of size n using a coarse grained multicomputer model consisting of p processors with Ω(n/p) local memory each (i.e., Ω(n/p) memory cells of Θ(log n) bits apiece), where the processors are connected to an arbitrary interconnection network. It introduces efficient scalable parallel algorithms for a number of geometric problems including the rectangle finding problem, the maximal equally spaced collinear points problem, and the point set pattern matching problem. All of the algorithms presented are scalable in that they are applicable and efficient over a very wide range of ratios of problem size to number of processors. In addition to the practicality imparted by scalability, these algorithms are easy to implement in that all required communications can be achieved by a small number of calls to standard global routing operations. 相似文献

9.

Scaleable Parallel Algorithms for Lower Envelopes with Applications

Laurence Boxer Russ Miller Andrew Rau-Chaplin 《Journal of Parallel and Distributed Computing》1998,53(2):1297

相似文献

10.

基于共享存储器通信方式的并行遗传算法 总被引：3，自引：0，他引：3

蒙祖强陈振州郑金华《计算机工程与应用》2000,36(5):72-74

文章具体分析了共享存储器的通信方法，提出了基于共享存储器通信方式的并行遗传算法。相似文献

11.

Efficient Algorithms for Dynamic Allocation of Distributed Memory

T. Leighton E. J. Schwabe 《Algorithmica》1999,24(2):139-171

We consider the problem of dynamically allocating and deallocating local memory resources among multiple users in a parallel or distributed system. Given a group of independent users and a collection of interconnected local memory devices, we want to render the fragmentation of the memory resources irrelevant by allowing any user to allocate space for his or her purposes as long as there is space available anywhere in the system. In effect, we would like it to appear to the users as though they are allocating memory from a single central pool of memory, even though the space is distributed throughout the system. Our goal is to devise an on-line allocation algorithm that minimizes two cost measures: first, the fraction of unused space , which arises due to fragmentation of the memory; second, the slowdown needed by the system to service user requests, which arises due to the contention for access to the memory devices. We solve this distributed dynamic allocation problem in near-optimal fashion by devising an algorithm that allows the memory to be used to 100% of capacity despite the fragmentation and guarantees that service delays will always be within a constant factor of optimal. The algorithm is completely on-line (no foreknowledge of user activity is assumed) and can accommodate any sequence of allocations and deallocations by the users that does not violate global memory bounds. We also consider the distributed dynamic allocation problem in the more restrictive setting where the local memory devices are connected by a low-degree fixed-connection network, rather than being fully interconnected. In this case, communication costs must be more explicitly considered in our allocation algorithms. We give allocation algorithms for butterfly and hypercube networks, and prove necessary and sufficient conditions on the total amount of memory space needed for near-optimal algorithms to exist. Received November 5, 1996; revised December 10, 1997. 相似文献

12.

Parallel Algorithms for Series Parallel Graphs and Graphs with Treewidth Two 1

H. L. Bodlaender B. van Antwerpen - de Fluiter 《Algorithmica》2001,29(4):534-559

In this paper a parallel algorithm is given that, given a graph G=(V,E) , decides whether G is a series parallel graph, and, if so, builds a decomposition tree for G of series and parallel composition rules. The algorithm uses O(log \kern -1pt |E|log ^\ast \kern -1pt |E|) time and O(|E|) operations on an EREW PRAM, and O(log \kern -1pt |E|) time and O(|E|) operations on a CRCW PRAM. The results hold for undirected as well as for directed graphs. Algorithms with the same resource bounds are described for the recognition of graphs of treewidth two, and for constructing tree decompositions of treewidth two. Hence efficient parallel algorithms can be found for a large number of graph problems on series parallel graphs and graphs with treewidth two. These include many well-known problems like all problems that can be stated in monadic second-order logic. Received July 15, 1997; revised January 29, 1999, and June 23, 1999. 相似文献

13.

Optimal External Memory Planar Point Enclosure

Lars Arge Vasilis Samoladas Ke Yi 《Algorithmica》2009,54(3):337-352

In this paper we study the external memory planar point enclosure problem: Given N axis-parallel rectangles in the plane, construct a data structure on disk (an index) such that all K rectangles containing a query point can be reported I/O-efficiently. This problem has important applications in e.g. spatial and temporal databases, and is dual to the important and well-studied orthogonal range searching problem. Surprisingly, despite the fact that the problem can be solved optimally in internal memory with linear space and O(log N+K) query time, we show that one cannot construct a linear sized external memory point enclosure data structure that can be used to answer a query in O(log _B N+K/B) I/Os, where B is the disk block size. To obtain this bound, Ω(N/B ^1−ε) disk blocks are needed for some constant ε>0. With linear space, the best obtainable query bound is O(log ₂ N+K/B) if a linear output term O(K/B) is desired. To show this we prove a general lower bound on the tradeoff between the size of the data structure and its query cost. We also develop a family of structures with matching space and query bounds. An extended abstract of this paper appeared in Proceedings of the 12th European Symposium on Algorithms (ESA’04), Bergen, Norway, September 2004, pp. 40–52. L. Arge’s research was supported in part by the National Science Foundation through RI grant EIA–9972879, CAREER grant CCR–9984099, ITR grant EIA–0112849, and U.S.-Germany Cooperative Research Program grant INT–0129182, as well as by the US Army Research Office through grant W911NF-04-01-0278, by an Ole Roemer Scholarship from the Danish National Science Research Council, a NABIIT grant from the Danish Strategic Research Council and by the Danish National Research Foundation. V. Samoladas’ research was supported in part by a grant co-funded by the European Social Fund and National Resources-EPEAEK II-PYTHAGORAS. K. Yi’s research was supported in part by the National Science Foundation through ITR grant EIA–0112849, U.S.-Germany Cooperative Research Program grant INT–0129182, and Hong Kong Direct Allocation Grant (DAG07/08). 相似文献

14.

松散耦合系统上的共享存贮并行计算

袁道华《计算机工程与设计》1998,19(5):47-51

介绍工作站网络上并行计算的概念，意义，和计算模型，重点讨论了基于分布式共享存贮器的网络并行计算，提出了作者自己的思想方法和并行程序设计模型。相似文献

15.

Efficient Parallel Algorithms for Planar st -Graphs

Atallah Chen Daescu 《Algorithmica》2008,35(3):194-215

Abstract. Planar st -graphs find applications in a number of areas. In this paper we present efficient parallel algorithms for solving several fundamental problems on planar st -graphs. The problems we consider include all-pairs shortest paths in weighted planar st -graphs, single-source shortest paths in weighted planar layered digraphs (which can be reduced to single-source shortest paths in certain special planar st -graphs), and depth-first search in planar st -graphs. Our parallel shortest path techniques exploit the specific geometric and graphic structures of planar st -graphs, and involve schemes for partitioning planar st -graphs into subgraphs in a way that ensures that the resulting path length matrices have a monotonicity property [1], [2]. The parallel algorithms we obtain are a considerable improvement over the previously best known solutions (when they are applied to these st -graph problems), and are in fact relatively simple. The parallel computational models we use are the CREW PRAM and EREW PRAM. 相似文献

16.

String Search in Coarse-Grained Parallel Computers

P. Ferragina F. Luccio 《Algorithmica》1999,24(3-4):177-194

Given a text string T[1,n] , the multistring search problem consists of determining which of k pattern strings {X ₁ [1,m], X ₂ [1,m], . . ., X _k [1,m]} , provided on-line, occur in T . We study this problem in the BSP model [27], and then extend our analysis to other coarse-grained parallel computational models. We refer to the relevant and difficult case of long patterns, that is m≥p , where p is the number of available processors, and provide an optimal result with respect to both computation and communication times, attaining a constant number of supersteps. We then study single string search (i.e., k=1 ), and show how the multistring search algorithm can be employed to speed up the process and balance the communication cost. The proposed solution takes a constant number of supersteps, and achieves optimal communication time if the string to be searched is longer than p ² . Our results are based on the distribution of a proper data structure among the p processors, to reduce and balance the communication cost. We also indicate how short patterns can be efficiently dealt with, through a completely different algorithmic approach. Received June 1, 1997; revised March 10, 1998. 相似文献

17.

并行无存储冲突的邻接矩阵算法

李朝鹏成运《数字社区&智能家居》2009,5(9):7201-7202

邻接矩阵算法在科学计算与信息处理方面有着极为重要的应用,是图论的基础研究之一。针对目前邻接矩阵算法多是基于串行,或并行SIMD模型而无法解决存储冲突的问题,提出一种基于SIMD—EREW共享存储模型的并行邻接矩阵算法,算法使用O（p）个并行处理单元,在O（n^2／p）的时间内完成对n个数据点邻接矩阵的计算。将提出算法与现有算法进行的性能对比分析表明：本算法明显改进了现有文献的研究结果,是一种并行无存储冲突的邻接矩阵算法。相似文献

18.

A Functional Approach to External Graph Algorithms

Abello Buchsbaum Westbrook 《Algorithmica》2008,32(3):437-458

Abstract. We present a new approach for designing external graph algorithms and use it to design simple, deterministic and randomized external algorithms for computing connected components, minimum spanning forests, bottleneck minimum spanning forests, maximal independent sets (randomized only), and maximal matchings in undirected graphs. Our I/ O bounds compete with those of previous approaches. We also introduce a semi-external model, in which the vertex set but not the edge set of a graph fits in main memory. In this model we give an improved connected components algorithm, using new results for external grouping and sorting with duplicates. Unlike previous approaches, ours is purely functional—without side effects—and is thus amenable to standard checkpointing and programming language optimization techniques. This is an important practical consideration for applications that may take hours to run. 相似文献

19.

The Complexity of Parallel Multisearch on Coarse-Grained Machines

A. Bäumker W. Dittrich A. Pietracaprina 《Algorithmica》1999,24(3-4):209-242

Given m ordered segments that form a partition of some universe (e.g., a two-dimensional strip), the multisearch problem consists of determining, for a set of n query points in the universe, the segments they belong to. We present the first nontrivial parallel deterministic scheme for performing multisearch on a distributed-memory machine when m=ω(n) . The scheme is designed on the BSP* model of parallel computation, a variant of Valiant's BSP which rewards blockwise communication, and relies on a suitable redundant representation of the segments. The time needed to answer the queries is analyzed as a function of the redundancy and of the BSP* parameters. We show that optimal performance can be obtained using logarithmic redundancy. We also prove a lower bound on the communication requirements of any deterministic multisearch scheme realized on a distributed-memory machine. The lower bound exhibits a tradeoff between the redundancy used to represent the segments and the performance of the scheme. Received June 1, 1997; revised March 10, 1998. 相似文献

20.

并行文件系统与并行I／O研究 总被引：1，自引：0，他引：1

曾碧卿陈志刚谭璐邓晓衡《微机发展》2004,14(12):79-82

集群计算系统中并行文件系统的研究是当前计算机与网络技术中的一个热点问题,而并行I/O是缓解系统数据输入输出瓶颈的一个技术途径。论文对当前集群系统中的并行文件系统与并行I/O做了研究,阐述了研究发展的现状、关键问题等,指出了在集群计算系统中的文件组织、分布以及其在磁盘上的实现、数据的访问特性、高性能网络文件系统、系统的负载平衡与缓冲和预取策略。相似文献