共查询到20条相似文献,搜索用时 15 毫秒
1.
Given m ordered segments that form a partition of some universe (e.g., a two-dimensional strip), the multisearch problem consists
of determining, for a set of n query points in the universe, the segments they belong to. We present the first nontrivial parallel deterministic scheme
for performing multisearch on a distributed-memory machine when m=ω(n) . The scheme is designed on the BSP* model of parallel computation, a variant of Valiant's BSP which rewards blockwise communication,
and relies on a suitable redundant representation of the segments. The time needed to answer the queries is analyzed as a
function of the redundancy and of the BSP* parameters. We show that optimal performance can be obtained using logarithmic
redundancy. We also prove a lower bound on the communication requirements of any deterministic multisearch scheme realized
on a distributed-memory machine. The lower bound exhibits a tradeoff between the redundancy used to represent the segments
and the performance of the scheme.
Received June 1, 1997; revised March 10, 1998. 相似文献
2.
一类Toeplitz三对角方程组的一种分布式并行算法 总被引:3,自引:0,他引:3
文中提出一类Toeplitz三对角方程组的一种分布式并行算法。该算法以系数矩阵的分解为基础,充分利用了系数矩阵结构的特殊性,算法因并行化而引入的冗余计算量非常少,算法的通信机制简单,通信量仅与处理 机台数p有关,与方程组规模n无关,算法具有很高的并行效率,理论分析和数值试验表明,其加速比Sp(n)→p(n→ ∞),此为线性加速比的理想情况。文中给出了算法在分布存储多计算机系统上的数值试验结果。 相似文献
3.
In this paper we present deterministic parallel algorithms for the coarse-grained multicomputer (CGM) and bulk synchronous parallel (BSP) models for solving the following well-known graph problems: (1) list ranking, (2) Euler tour construction in a tree,
(3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction
and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, and (7) 2-edge connectivity
and biconnectivity (testing and component computation). The algorithms require O(log p) communication rounds with linear sequential work per round (p = no. processors, N = total input size). Each processor creates, during the entire algorithm, messages of total size O(log (p) (N/p)) .
The algorithms assume that the local memory per processor (i.e., N/p ) is larger than p
ε
, for some fixed ε > 0 . Our results imply BSP algorithms with O(log p) supersteps, O(g log (p) (N/p)) communication time, and O(log (p) (N/p)) local computation time.
It is important to observe that the number of communication rounds/ supersteps obtained in this paper is independent of the problem size, and grows only logarithmically with respect to p . With growing problem size, only the sizes of the messages grow but the total number of messages remains unchanged. Due
to the considerable protocol overhead associated with each message transmission, this is an important property. The result
for Problem (1) is a considerable improvement over those previously reported. The algorithms for Problems (2)—(7) are the
first practically relevant parallel algorithms for these standard graph problems.
Received July 5, 2000; revised April 16, 2001. 相似文献
4.
We present a randomized parallel algorithm for constructing the three-dimensional convex hull on a generic p-processor coarse-grained multicomputer with arbitrary interconnection network and n/p local memory per processor, where n/p ≥ p
2+ε
(for some arbitrarily small ε > 0). For any given set of n points in 3-space, the algorithm computes the three-dimensional convex hull, with high probability, in local computation time and O(1) communication phases with at most O(n/p) data sent/received by each processor. That is, with high probability, the algorithm computes the three-dimensional convex
hull of an arbitrary point set in time , where Γ
n,p
denotes the time complexity of one communication phase. The assumption n/p ≥ p
2+ε
implies a coarse-grained, limited parallelism, model which is applicable to most commercially available multiprocessors.
In the terminology of the BSP model, our algorithm requires, with high probability, O(1) supersteps, synchronization period , computation cost , and communication cost O((n/p) g).
Received October 30, 1995, and in revised form April 15, 1996, and in final form September 17, 1996. 相似文献
5.
Given a text string T[1,n] , the multistring
search
problem consists of determining which of k pattern strings {X
1
[1,m], X
2
[1,m], . . ., X
k
[1,m]} , provided on-line, occur in T . We study this problem in the BSP model [27], and then extend our analysis to other coarse-grained parallel computational models. We refer to the relevant
and difficult case of long patterns, that is m≥p , where p is the number of available processors, and provide an optimal result with respect to both computation and communication
times, attaining a constant number of supersteps. We then study single string search (i.e., k=1 ), and show how the multistring search algorithm can be employed to speed up the process and balance the communication cost.
The proposed solution takes a constant number of supersteps, and achieves optimal communication time if the string to be searched
is longer than p
2
. Our results are based on the distribution of a proper data structure among the p processors, to reduce and balance the communication cost. We also indicate how short patterns can be efficiently dealt with, through a completely different algorithmic approach.
Received June 1, 1997; revised March 10, 1998. 相似文献
6.
Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory
of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation.
In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited
success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge
by the ACM Working Group on Storage I/ O for Large-Scale Computing.
In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors
on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine.
When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献
7.
为了改进蚁群优化算法的收敛速度,研究了一种基于粗粒度模型的并行蚁群优化算法,该算法将搜索任务划分给q个子群,由这些子群并行地完成搜索,可使搜索速度大幅度提高。实验结果表明,用该算法求解TSP问题,收敛速度比最新的改进算法快百倍以上。 相似文献
8.
The range tree is a fundamental data structure for multidimensional point sets, and, as such, is central in a wide range
of geometric and database applications. In this paper we describe the first nontrivial adaptation of range trees to the parallel
distributed memory setting (BSP-like models). Given a set of n points in d -dimensional Cartesian space, we show how to construct on a coarse-grained multicomputer a distributed range tree T in time O( s / p + T
c
(s,p)) , where s = n log
d-1
n is the size of the sequential data structure and T
c
(s,p) is the time to perform an h -relation with h=Θ (s/p) . We then show how T can be used to answer a given set Q of m=O(n) range queries in time O((s log m)/p + T
c
(s,p)) and O((s log m)/p + T
c
(s,p) + k/p) , where k is the number of results to be reported. These parallel construction and search algorithms are both highly efficient, in
that their running times are the sequential time divided by the number of processors, plus a constant number of parallel communication
rounds.
Received June 1, 1997; revised March 10, 1998. 相似文献
9.
We present a randomized algorithm for computing the Voronoi diagram of line segments using coarse-grained parallel machines.
Operating on P processors, for any input of n line segments, this algorithm performs O((n log n)/P) local operations per processor, O(n/P) messages per processor, and O(1) communication phases, with high probability for n=Ω(P
3+ε
) .
Received June 1, 1997; revised March 10, 1998. 相似文献
10.
一种新并行遗传算法及其应用 总被引:2,自引:0,他引:2
基于量子计算的概念和原理,本文提出一种新并行量子遗传算法,即粗粒度并行量子遗传算法(CGPQGA)。该算法的核心是引入层环粗粒度并行计算模型和一种新进化策略。由于CGPQGA只需迁移搜索到的最佳个体到各个子群体,因而算法的通信开销很小。通过用CGPQGA设计控制器的应用实例表明,CGPQGA优于常规并行遗传算法,能加速子群体中最佳个体的迁移,收敛速度快,全局寻优能力强,同时具有勘探和开采的能力。 相似文献
11.
This paper presents several deterministic algorithms for selecting the k th largest record from a set of n records on any n -node hypercubic network. All of the algorithms are based on the selection algorithm of Cole and Yap, as well as on various
sorting algorithms for hypercubic networks. Our fastest algorithm runs in O( lg n lg
*
n) time, very nearly matching the trivial lower bound. Previously, the best upper bound known for selection was O( lg n lg lg n) . A key subroutine in our O( lg n lg
*
n) time selection algorithm is a sparse version of the Sharesort algorithm that sorts n records using p processors, , in O( lg n ( lg lg p - lg lg (p/n) )
2
) time.
Received March 23, 1994; revised October 30, 1997. 相似文献
12.
利用近似三对角Toeplitz矩阵的特殊结构,提出了一种新的求解近似三对角Toeplitz方程组的快速算法.在三对角Toeplitz矩阵的近似LU分解的基础上,利用“分而治之”的思想,并结合秦九韶技术和特殊的数学技巧减少大量的冗余计算,提出了求解近似Toeplitz三对角方程组的快速分布式并行算法,并在理论上证明了算法具有近似于线性的加速比.最后通过数值实验证明,新的并行算法具有较高的并行效率,并且当矩阵阶数n足够大时,算法的加速比趋近于线性加速比. 相似文献
13.
This paper presents an improved analysis of a randomized parallel backtrack search algorithm (RPBS). Our analysis uses the
single-node-donation model that each donation contains a single tree node. It is shown that with high probability the total
number of messages generated by RPBS is O(phd) where p is the number of processors, and h and d are the height and degree of the backtrack search tree. Under the assumption of unit-time message delivery, it is shown
that with high probability the execution time of RPBS is n/p + O(hd) where n is the number of nodes of the backtrack search tree and the leading term n/p has no constant factor. As the result of limited communication requirement, RPBS can be efficiently implemented in message-passing
or shared-memory multiprocessor systems. A general analysis of network implementation of RPBS is presented. The concept of
total routing time, the sum of routing times of all messages, is introduced as a measure of communication cost. It is shown
that the overall effect of message delay to the execution time of RPBS is small if the total routing time is small. Some experimental
data on a shared-memory machine are reported.
Received November 23, 1996; revised February 15, 1998. 相似文献
14.
15.
遗传算法(GA)是一种基于自然群体遗传机制的有效搜索算法,由于它在搜索空间中同时考虑许多点,这样就减少了收敛于局部极小的可能,也增加了处理的并行性。因此可以利用并行遗传算法(PGA)研究典型的组合优化实例-TSP问题的求解问题。该文提出一种有效的并行算法求解旅行商(TSP)问题,实验结果表明,该方法在解的精度上优于以前的算法。 相似文献
16.
并行任务调度不论是从理论上还是应用上近年来都倍受关注。但是目前出现的大量算法很难应用于实际,基于此,论文探讨了典型的调度问题P3|fix|Cmax,这类问题是强NP-难的。论文在Goemans的研究基础上,给出了一个很简单的线性算法,构造出调度性能为9/8的半规则调度,改进了Goemans的7/6的结果。 相似文献
17.
3-D data visualization is very useful for medical imaging and computational fluid dynamics. Volume rendering can be used to exhibit the shape and volumetric properties of 3-D objects. However, volume rendering requires a considerable amount of time to process the large volume of data. To deliver the necessary rendering rates, parallel hardware architectures such as distributed memory multicomputers offer viable solutions. The challenge is to design efficient parallel algorithms that utilize the hardware parallelism effectively. In this paper, we present two efficient parallel volume rendering algorithms, the 1D-partition and 2D-partition methods, based on the shear-warp factorization for distributed memory multicomputers. The 1D-partition method has a performance bound on the size of the volume data. If the number of processors is less than a threshold, the 1D-partition method can deliver a good rendering rate. If the number of processors is over a threshold, the 2D-partition method can be used. To evaluate the performance of these two algorithms, we implemented the proposed methods along with the slice data partitioning, volume data partitioning, and sheared volume data partitioning methods on an IBM SP2 parallel machine. Six volume data sets were used as the test samples. The experimental results show that the proposed methods outperform other compatible algorithms for all test samples. When the number of processors is over a threshold, the experimental results also demonstrate that the 2D-partition method is better than the 1D-partition method. 相似文献
18.
基于对角划分的矩阵乘并行算法 总被引:5,自引:0,他引:5
提出了一种新的基于对角划分的矩阵乘并行算法,它在以往行列划分策略的基础上,采用基于对角划分的策略。数值试验表明该算法具有较高的加速比和并行效率。 相似文献
19.
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR
gates. A layered PMC is a PMC in which all input nodes are in the external face, and the gates can be assigned to layers in
such a way that every wire goes between gates in successive layers. Goldschlager, Cook and Dymond, and others have developed
NC
2
algorithms to evaluate a layered PMC when the output node is in the same face as the input nodes. These algorithms require
a large number of processors (Ω(n
6
), where n is the size of the input circuit).
In this paper we give an efficient parallel algorithm that evaluates a layered PMC of size n in time using only a linear number of processors on an EREW PRAM. Our parallel algorithm is the best possible to within a polylog
factor, and is a substantial improvement over the earlier algorithms for the problem.
Received April 18, 1994; revised April 7, 1995. 相似文献
20.
微机环境下基于PVM的网络并行程序开发方法 总被引:1,自引:0,他引:1
并行虚拟机PVM是一种通用的网络并行程序开发环境,它可以把连网的巨型机,大规模并行机,工作站以及微机作为一大型并行机使用,供人们开发并行算法或运行并行系统。此文对PVM的基本情况和最新进展进行介绍,讨论了基于PVM的网络并行程序开发方法,最后给出了具体的实例。 相似文献