共查询到20条相似文献,搜索用时 0 毫秒
1.
A. Chin 《Algorithmica》1994,12(2-3):170-181
Consider the problem of efficiently simulating the shared-memory parallel random access machine (PRAM) model on massively parallel architectures with physically distributed memory. To prevent network congestion and memory bank contention, it may be advantageous to hash the shared memory address space. The decision on whether or not to use hashing depends on (1) the communication latency in the network and (2) the locality of memory accesses in the algorithm.We relate this decision directly to algorithmic issues by studying the complexity of hashing in the Block PRAM model of Aggarwal, Chandra, and Snir, a shared-memory model of parallel computation which accounts for communication locality. For this model, we exhibit a universal family of hash functions having optimal locality. The complexity of applying these hash functions to the shared address space of the Block PRAM (i.e., by permuting data elements) is asymptotically equivalent to the complexity of performing a square matrix transpose, and this result is best possible for all pairwise independent universal hash families. These complexity bounds provide theoretical evidence that hashing and randomized routing need not destroy communication locality, addressing an open question of Valiant.This work was started when the author was a student at Oxford University, supported by a National Science Foundation Graduate Fellowship and a Rhodes Scholarship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation or the Rhodes Trust. 相似文献
2.
D. B. Skillicorn 《International journal of parallel programming》1991,20(2):133-158
A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture. A more general model must be architecture independent, must realistically reflect execution costs, and must reduce the cognitive overhead of managing massive parallelism. A growing number of models meeting some of these goals have been suggested. We discuss their properties and relative strengths and weaknesses. We conclude that data parallelism is a style with much to commend it, and discuss the Bird-Meertens formalism as a coherent approach to data parallel programming.This work was supported by the Natural Sciences and Engineering Research Council of Canada. 相似文献
3.
Amitabha Das Louise E. Moser P. M. Melliar-Smith 《International journal of parallel programming》1991,20(5):403-419
The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog2
n), comparable to that for specialized sorting networks and an improvement on theO(n
1.5) complexity of conventional mesh-connected array processors. 相似文献
4.
This paper introduces a model for parallel computation, called thedistributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network.We introduce the notion of aconservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to shortcut pointers in a data structure so that remote processors can communicate without causing undue congestion. We giveO(lgn)-step, linear-processor, linear-space, conservative algorithms for a variety of problems onn-node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions in an expression tree. We giveO(lg2
n)-step, linear-processor, linear-space, conservative algorithms for problems on graphs of sizen, including finding a minimum-cost spanning forest, computing biconnected components, and constructing an Eulerian cycle. Most of these algorithms use as a subroutine a generalization of the prefix computation to trees. We show that any suchtreefix computation can be performed inO(lgn) steps using a conservative variant of Miller and Reif's tree-contraction technique.This research was supported in part by the Defense Advanced Research Projects Agency under Contract N00014-80-C-0622 and by the Office of Naval Research under Contract N00014-86-K-0593. Charles Leiserson is supported in part by an NSF Presidential Young Investigator Award with matching funds provided by AT&T Bell Laboratories and Xerox Corporation. Bruce Maggs is supported in part by an NSF Fellowship. 相似文献
5.
An efficient implementation of parallel eigenvalue computation for massively parallel processing 总被引:4,自引:0,他引:4
This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201. 相似文献
6.
Nick D. Dendris Iannis A. Kalafatis Lefteris M. Kirousis 《Journal of Mathematical Imaging and Vision》1994,4(4):375-387
Labelling the lines of a planar line drawing of a 3-D object in a way that reflects the geometric properties of the object is a much studied problem in computer vision, considered to be an important step towards understanding the object from its 2-D drawing. Combinatorially, the labellability problem is a Constraint Satisfaction Problem and has been shown to be NP-complete even for images of polyhedral scenes. In this paper, we examine scenes that consist of a set of objects each obtained by rotating a polygon around an arbitrary axis. The objects are allowed to arbitrarily intersect or overlay. We show that for these scenes, there is a sequential lineartime labelling algorithm. Moreover, we show that the algorithm has a fast parallel version that executes inO(log3
n) time on an Exclusive-Read-Exclusive-Write Parallel Random Access Machine withO(n
3/log3
n) processors. The algorithm not only answers the decision problem of labellability, but also produces a legal labelling, if there is one. This parallel algorithm should be contrasted with the techniques of dealing with special cases of the constraint satisfaction problem. These techniques employ an effective, but inherently sequential, relaxation procedure in order to restrict the domains of the variables.This research was partially supported by the European Community ESPRIT Basic Research Program under contracts 7141 (project ALCOM II) and 6019 (project Insight II). 相似文献
7.
8.
An increasing awareness of the need for high speed parallel processing systems for image analysis has stimulated a great deal of interest in the study of such systems. These studies have focussed primarily on specific algorithms and while they demonstrated the utility of such an approach, few general principles have evolved. As a result, it is still uncertain how one may go about addressing a given application. This paper first presents techniques for formulating parallel image processing tasks by focussing on one or more components of an image processing environment. Then a parallel processing model is proposed which specifies the interaction among tasks formulated in this manner. The techniques and model enable one to determine constraints on architectural features required to achieve predefined performance levels and compare and contrast different formulations. 相似文献
9.
An increasing awareness of the need for high speed parallel processing systems for image analysis has stimulated a great deal of interest in the design and development of such systems. Efficient processing schemes for several specific problems have been developed providing some insight into the general problems encountered in designing efficient image processing algorithms for parallel architectures. However it is still not clear what architecture or architectures are best suited for image processing in general, or how one may go about determining those which are. An approach that would allow application requirements to specify architectural features would be useful in this context. Working towards this goal, general principles are outlined for formulating parallel image processing tasks by exploiting parallelism in the algorithms and data structures employed. A synchronous parallel processing model is proposed which governs the communication and interaction between these tasks. This model presents a uniform framework for comparing and contrasting different formulation strategies. In addition, techniques are developed for analyzing instances of this model to determine a high level specification of a parallel architecture that best ‘matches’ the requirements of the corresponding application. It is also possible to derive initial estimates of the component capabilities that are required to achieve predefined performance levels. Such analysis tools are useful both in the design stage, in the selection of a specific parallel architecture, or in efficiently utilizing an existing one. In addition, the architecture independent specification of application requirements makes it a useful tool for benchmarking applications. 相似文献
10.
11.
12.
The vertex updating problem for a minimum spanning tree (MST) is defined as follows: Given a graphG=(V, E
G) and an MSTT forG, find a new MST forG to which a new vertexz has been added along with weighted edges that connectz with the vertices ofG. We present a set of rules that produce simple optimal parallel algorithms that run inO(lgn) time usingn/lgn EREW PRAM processors, wheren=¦V¦. These algorithms employ any valid tree-contraction schedule that can be produced within the stated resource bounds. These rules can also be used to derive simple linear-time sequential algorithms for the same problem. The previously best-known parallel result was a rather complicated algorithm that usedn processors in the more powerful CREW PRAM model. Furthermore, we show how our solution can be used to solve the multiple vertex updating problem: Update a given MST whenk new vertices are introduced simultaneously. This problem is solved inO(lgk·lgn) parallel time using (k·n)/(lgk·lgn) EREW PRAM processors. This is optimal for graphs having (kn) edges.Part of this work was done while P. Metaxas was with the Department of Mathematics and Computer Science, Dartmouth College. 相似文献
13.
《国际计算机数学杂志》2012,89(9):1796-1808
In this article, we provide a matrix method in order to compute orbits of parallel and sequential dynamical systems on Boolean functions. In this sense, we develop algorithms for systems defined over directed (and undirected) graphs when the evolution operator is a general minterm or maxterm and, likewise, when it is constituted by independent local Boolean functions, so providing a new tool for the study of orbits of these dynamical systems. 相似文献
14.
Object-oriented database management systems (OODBMSs) provide rich facilities for the modeling and processing of structural as well as behavioral properties of complex application objects. However, due to their inherent generality and continuously evolving functionalities, efficient implementations are important for these OODBMSs to support the present and future applications, particularly when the databases are very large. In this paper, we present several parallel, multi-wavefront algorithms based on two processing approaches, i.e., identification and elimination approaches, to verify association patterns specified in queries. Both approaches allow more processors to operate concurrently on a query than the traditional tree-structured query processing approach, thus introducing a higher degree of parallelism in query processing. A heuristic method is presented for partitioning an object-oriented database (OODB). The main consideration for partitioning the database is load balancing. This method also tries to reduce the communication time by reducing the length of the path that wavefronts need to be propagated. Multiple wavefront algorithms based on the two approaches for tree-structured queries have been implemented on an nCUBE 2 parallel computer. The implementation of the query processor allows multiple queries to be executed simultaneously. This implementation provides an environment for evaluating the algorithms and the heuristic method for partitioning the database. The evaluation results are presented in this paper.Recommended by: Patrick Valduriez 相似文献
15.
中国科学院过程工程研究所多相反应实验室,建立了一个通用粒子模拟平台并已开始应用。目前类似的并行模拟系统采用的Shift并行通信模式往往有一些问题,需要一种新的通信模式来弥补它的不足。本文设计具有良好通用性的非结构化通信模式All2All,用来完成通用粒子方法模拟平台中计算节点问的通信。本文的算例证明这种通信模式可解决在粒子并行模拟Shift通信模式所不能处理的,具有复杂拓扑关系的相邻节点间的数据通信问题。本文设计的All2All通信模式方法只需稍加修改,就可以方便地应用于其它领域的并行计算系统。 相似文献
16.
R Ohbuchi 《Parallel Computing》1985,2(3):219-228
This paper gives an overview of Japanese research and development efforts on the parallel processing architectures. Projects are categorized by their application domains. Following an introduction, general trends and some examples of research projects for each of the application domains such as artificial intelligence, numerical processing, and others like database, image, graphics, etc. are presented. 相似文献
17.
Henk A. Van der Vorst 《Parallel Computing》1987,5(3):303-311
We analyse an alternative decomposition for a tridiagonal matrix which has the property that the decomposition as well as the subsequent solution process can be done in two parallel parts. This decomposition is equivalent to the two-sided Gaussian elimination algorithm that has been discussed by Babuska. In the context of parallel computing a similar approach has been suggested by Joubert and Cloete. The computational complexity of this alternative decomposition is the same as for the standard decomposition and a remarkable aspect is that it often leads to slightly more accurate solutions than the standard process does. The algorithm can be combined with recursive doubling or cyclic reduction in order to increase the degree of parallelism and vectorizability. 相似文献
18.
This paper presents architectures and implementation of a Sliding Memory Plane (SliM) Image Processor to build a SIMD parallel computer. The paper also proposes an enhanced multiplication algorithm to reduce the gate count and the number of cycles. The SliM chip consists of mesh-connected 5×5 PEs. Due to the idea ofsliding, that is, overlapping the inter-PE communication time with the computation time, SliM can greatly reduce the inter-PE communication overhead. In addition, four operations corresponding to ALU, shift, data I/O, and inter-PE communication can be grouped into an instruction to be executed in a cycle simultaneously. The implemented SliM chip operates at 25 MHz and gives 625 MIPS. Because of a mesh topology, a large number of chips can be easily connected to form a SIMD parallel computer. We have implemented the scalable SliM Array Processor and developed parallel algorithms for real-time image processing. 相似文献
19.
L. I. Timchenko 《Cybernetics and Systems Analysis》2000,36(2):251-267
A three-dimensional network and its application to the analysis of images are described. This multilevel architecture studies partial correlations between structural components of an image. An algorithm is proposed that formalizes a new approach to the decomposition of images. An image is transformed so that each pixel contains information on the spatial structure of its neighborhood. The most correlated information is first formed, which ensures the resistance of the algorithm to small structural changes. Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 114–133, March–April, 2000. 相似文献
20.
The problem of optimization of communications during the execution of a program on a parallel computer with distributed memory
is investigated. Statements are formulated that make it possible to determine the possibility of organization of data broadcast
and translation. The conditions proposed are represented in the form suitable for practical application and can be used for
automated parallelization of programs.
This work was done within the framework of the State Program of Fundamental Studies of the Republic of Belarus (under the
code name “Mathematical structures 21”) with the partial support of the Foundation for Fundamental Studies of the Republic
of Belarus (grant F03-062).
__________
Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 166–182, March–April 2006. 相似文献