期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An optimal speed-up parallel algorithm for triangulating simplicial point sets in space

Hossam ElGindy 《International journal of parallel programming》1986,15(5):389-398

Previous research on developing parallel triangulation algorithms concentrated on triangulating planar point sets.O(log³ n) running time algorithms usingO(n) processors have been developed in Refs. 1 and 2. Atallah and Goodrich⁽³⁾ presented a data structure that can be viewed as a parallel analogue of the sequential plane-sweeping paradigm, which can be used to triangulate a planar point set inO(logn loglogn) time usingO(n) processors. Recently Merks⁽⁴⁾ described an algorithm for triangulating point sets which runs inO(logn) time usingO(n) processors, and is thus optimal. In this paper we develop a parallel algorithm for triangulating simplicial point sets in arbitrary dimensions based on the idea of the sequential algorithm presented in Ref. 5. The algorithm runs inO(log² n) time usingO(n/logn) processors. The algorithm hasO(n logn) as the product of the running time and the number of processors; i.e., an optimal speed-up. 相似文献

2.

A randomized parallel branch-and-bound algorithm

Virendra K. Janakiram Edward F. Gehringer Dharma P. Agrawal Ravi Mehrotra 《International journal of parallel programming》1988,17(3):277-301

Randomized algorithms are algorithms that employ randomness in their solution method. We show that the performance of randomized algorithms is less affected by factors that prevent most parallel deterministic algorithms from attaining their theoretical speedup bounds. A major reason is that the mapping of randomized algorithms onto multiprocessors involves very little scheduling or communication overhead. Furthermore, reliability is enhanced because the failure of a single processor leads only to degradation, not failure, of the algorithm. We present results of an extensive simulation done on a multiprocessor simulator, running a randomized branch-and-bound algorithm. The particular case we consider is the knapsack problem, due to its ease of formulation. We observe the largest speedups in precisely those problems that take large amounts of time to solve. This work has been supported by the U.S. Army Research Office under Contract No. DAAG 29-85-K-0236. 相似文献

3.

An optimal parallel algorithm for triangulating a set of points in the plane 总被引：1，自引：0，他引：1

Merks Ed 《International journal of parallel programming》1986,15(5):399-411

This paper presents an optimal parallel algorithm for triangulating an arbitrary set ofn points in the plane. The algorithm runs inO(logn) time usingO(n) space andO(_n) processors on a Concurrent-Read, Exclusive-Write Parallel RAM model (CREW PRAM). The parallel lower bound on triangulation is (logn) time so the best possible linear speedup has been achieved. A parallel divide-and-conquer technique of subdividing a problem into subproblems is employed. 相似文献

4.

A multigrid algorithm for parallel computers: CPMG

S. N. Gupta M. Zubair C. E. Grosch 《Journal of scientific computing》1992,7(3):263-279

In this article, we present a multigrid algorithm for parallel computers, the chopped parallel multigrid (CPMG) algorithm. The CPMG algorithm improves the processor utilization by reducing the work load on coarse grids without affecting the convergence rate of the algorithm. This is in contrast to earlier approaches (Gannon and van Rosendale, 1986; Frederickson and McBryan, 1989), where unutilized processors are used to improve the convergence rate. The CPMG algorithm reduces the coarse grid work bychopping the alternate cycles of multigrid. Using analytical results and simulations on sequential machines we show that the CPMG can achieve almost the same convergence rate as standard MG for many cases. Analytically we show that the advantage gained by CPMG over standard MG on a mesh connected massively parallel machine is 33% in hardware utilization, 50% in communication overheads and 38% in overall execution time. We have also evaluated the performance of CPMG on an actual massively parallel machine, the DAP-510. The advantage gained by CPMG over standard MG is 35% in overall execution time. Moreover, the CPMG can be integrated with other parallel multigrid algorithms, such as the PSMG algorithm (Frederickson and McBryan, 1989) and Decker's algorithm (Decker, 1990). 相似文献

5.

A greedy Delaunay-based surface reconstruction algorithm 总被引：1，自引：0，他引：1

David?Cohen-Steiner Email author Frank?Da 《The Visual computer》2004,20(1):4-16

In this paper, we present a new greedy algorithm for surface reconstruction from unorganized point sets. Starting from a seed facet, a piecewise linear surface is grown by adding Delaunay triangles one by one. The most plausible triangles are added first and in such a way as to prevent the appearance of topological singularities. The output is thus guaranteed to be a piecewise linear orientable manifold, possibly with boundary. Experiments show that this method is very fast and achieves topologically correct reconstruction in most cases. Moreover, it can handle surfaces with complex topology, boundaries, and nonuniform sampling. 相似文献

6.

An efficient parallel algorithm for shortest paths in planar layered digraphs

S. Subramanian R. Tamassia J. S. Vitter 《Algorithmica》1995,14(4):322-339

Computing shortest paths in a directed graph has received considerable attention in the sequential RAM model of computation. However, developing a polylog-time parallel algorithm that is close to the sequential optimal in terms of the total work done remains an elusive goal. We present a first step in this direction by giving efficient parallel algorithms for shortest paths in planar layered digraphs.We show that these graphs admit special kinds of separators calledone- way separators which allow the paths in the graph to cross it only once. We use these separators to give divide- and -conquer solutions to the problem of finding the shortest paths between any two vertices. We first give a simple algorithm that works in the CREW model and computes the shortest path between any two vertices in ann-node planar layered digraph in timeO(log² n) usingn/logn processors. We then use results of Aggarwal and Park [1] and Atallah [4] to improve the time bound toO(log² n) in the CREW model andO(logn log logn) in the CREW model. The processor bounds still remain asn/logn for the CREW model andn/log logn for the CRCW model.Support for the first and third authors was provided in part by a National Science Foundation Presidential Young Investigator Award CCR-9047466 with matching funds from IBM, by NSF Research Grant CCR-9007851, by Army Research Office Grant DAAL03-91-G-0035, and by the Office of Naval Research and the Advanced Research Projects Agency under Contract N00014-91-J-4052, ARPA, Order 8225. Support for the second author was provided in part by NSF Research Grant CCR-9007851, by Army Research Office Grant DAAL03-91-G-0035, and by the Office of Naval Research and the Advanced Research Projects Agency under Contract N00014-91-J-4052 and ARPA Order 8225. 相似文献

7.

FDGE: A forward-difference grid engine and its application in rendering algebraic surfaces in parallel

Panos E. Livadas 《Journal of Mathematical Imaging and Vision》1993,3(4):377-391

The display of an implicitly defined surface is obtained by its projection onto the viewing plane. Ray casting is a technique that accomplishes this projection byfiring a mapping ray through each pixel of the screen into the world space. The intersection points of this ray with the surface are found; these points are further tested to determine which one is visible and is within the viewing volume. Finally, if a point that satisfies the above conditions is found, then the point is further processed for the determination of its shading value.Much progress in rendering algebraic surfaces has been made recently. However, most of the proposed solutions are based on subdivision methods. This paper focuses on a method that performsdirect rendering of such a surface that will minimize the number of operations; this method will also replace almost all required multiplications with additions. Furthermore, the VLSI-oriented algorithm lends itself to parallelism. In addition, how this method can be used in the calculation of the shading values is described and a VLSI architecture is briefly discussed. Finally, a by-product of this method is that it can be used to efficiently calculate the values of a bivariate polynomial in a rectangular grid and in parallel. 相似文献

8.

Associative parallel algorithm for dynamic update of a minimum spanning tree after addition of a new node to a graph

A. Sh. Nepomniaschaya 《Cybernetics and Systems Analysis》2006,42(1):18-27

This paper proposes a new associative parallel algorithm for dynamic update of a minimum spanning tree after addition of a new node with all its incident edges to a graph. This algorithm is represented as the InsertVert procedure implemented on a model of an associative parallel system of the SIMD type with vertical processing (a STAR machine). The correctness of the procedure is proved and its time complexity is estimated. This work was supported by the Russian Foundation for Basic Research under grant 03-01-00399. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 19–31, January–February 2006. 相似文献

9.

基于无组织结构数据集的三维表面重建算法 总被引：13，自引：0，他引：13

严京旗施鹏飞《计算机学报》2001,24(10):1051-1056

三维表面重建技术具有广泛的应用前景,通用高效的重建算法是迫切需要研究的课题之一。由于三维点集分布不均匀,目前通用的基于无组织结构数据集的重建算法欠缺稳定性,应用范围受到了限制。文中提出了分布不均匀补偿的近邻点确定算法,提高了算法的稳定性和可靠性,扩展了基于无组织结构数据集表面重建算法的应用范围。实验表明通过分布不均匀补偿的离散数据集三维表面重建算法具有很好的重建效果。相似文献

10.

一种基于CUDA的并行多目标进化算法

胡宾宾祁荣宾钱锋《计算机与应用化学》2015,32(1)

传统的多目标进化算法多是基于Pareto最优概念的类随机搜索算法,求解速度较慢,特别是当问题维度变高,需要群体规模较大时,上述问题更加凸显。这一问题已经获得越来越多研究人员以及从业人员的关注。实验仿真中可以发现,构造非支配集和保持群体多样性这两部分工作占用了算法99%以上的执行时间。解决上述问题的一个有效方法就是对这一部分算法进行并行化改造。本文提出了一种基于CUDA平台的并行化解决方案,采用小生境技术实现共享适应度来维持候选解集的多样性,将多目标进化算法的实现全部置于GPU端,区别于以往研究中非支配排序的部分工作以及群体多样性保持的全部工作仍在CPU上执行。通过对ZDT系列函数的仿真结果,可以看出本文算法性能远远优于NSGA-Ⅱ和NPGA。最后通过求解油品调和过程这一有约束多目标优化问题,可以看出在解决化工应用中的有约束多目标优化问题时,该算法依然表现出优异的加速效果。相似文献

11.

A massively parallel genetic algorithm for RNA secondary structure prediction 总被引：4，自引：0，他引：4

Bruce A. Shapiro Joseph Navetta 《The Journal of supercomputing》1994,8(3):195-207

We present a new method for predicting RNA secondary structure based on a genetic algorithm. The algorithm is designed to run on a massively parallel SIMD computer. Statistical analysis shows that the program performs well when compared to a dynamic programming algorithm used to solve the same problem. The program has also pointed out a long-standing simplification in the implementation of the original dynamic programming algorithm that sometimes causes it not to find the optimal secondary structure. 相似文献

12.

The parallel odd-even merge algorithm

《国际计算机数学杂志》2012,89(3-4):265-273

This paper briefly describes the implementation of the odd-even merge algorithm on a parallel MIMD computer and discusses its computational complexity. 相似文献

13.

Experiences with a parallel algorithm for data flow analysis

Yong-Fong Lee Barbara G. Ryder Thomas J. Marlowe 《The Journal of supercomputing》1991,5(2-3):163-188

We have designed a family of parallel data flow analysis algorithms for execution on distributed-memory MIMD machines, based on general-purpose, hybrid algorithms for data flow analysis [Marlowe and Ryder 1990]. We exploit a natural partitioning of the hybrid algorithms and explore a static mapping, dynamic scheduling strategy. Alternative mapping-scheduling choices and refinements of the flow graph condensation used are discussed. Our parallel hybrid algorithm family is illustrated on Reaching Definitions, although parallel algorithms also exist for many interprocedural (e.g., Aliasing) and intraprocedural (e.g., Available Expressions) problems [Marlowe 1989]. We have implemented the parallel hybrid algorithm for Reaching Definitions on an Intel iPSC/2. Our empirical results suggest the practicality of parallel hybrid algorithms.An earlier version of this paper was presented at Supercomputing '90.The research reported here was supported, in part, by the New Jersey Commission on Science and Technology and the CAIP Center's Industrial Members, by Siemens Research Corporation and by National Science Foundation grant CCR-8920078. 相似文献

14.

A parallel algorithm for the enumeration of the spanning trees of a graph

Shao-Wen Mai D.J. Evans 《Parallel Computing》1984,1(3-4):275-286

As is well known, the strategy of divide-and-conquer is widely used in problem solving. The method of partitioning is also a fundamental strategy for the design of a parallel algorithm. The problem of enumerating the spanning trees of a graph arises in several contexts such as computer-aided design and computer networks. A parallel algorithm for solving the problem is presented in this paper. It is based on the principle of the inclusion and exclusion of sets, and not directly based on the partitioning of the graph itself. The results of the preliminary experiments on a MIMD system appear promising. 相似文献

15.

A parallel algorithm for real-time object recognition

《Pattern recognition》2002,35(9):1917-1931

The aim of this paper is to present a new generalized Hough transform-based hardware algorithm in order to detect non-analytic objects in a two-dimensional (2D) image space. Our main idea consists to use, during voting process into the 5D parameter space, only meaningful set of edge points that belong to the boundary of the target object and that feature a similar geometric property. In this paper, a same line support property has been used. This has the merit to reduce the size of the 5D parameter space, while increasing the detection accuracy. The whole algorithm was implemented into a highly parallel architecture supported by a single PC board. It is composed of a mixture of digital signal processing and field programmable gate array technologies and uses the content addressable memory as a main processing unit. Complexity evaluation of the whole system indicated that a set of 46 different images of 256×256 pixels each can be classified in real-time (e.g. under frame rate). 相似文献

16.

A scalable,parallel algorithm for maximal clique enumeration

Matthew C. Schmidt Nagiza F. Samatova Kevin Thomas Byung-Hoon Park 《Journal of Parallel and Distributed Computing》2009

The problem of maximal clique enumeration (MCE) is to enumerate all of the maximal cliques in a graph. Once enumerated, maximal cliques are widely used to solve problems in areas such as 3-D protein structure alignment, genome mapping, gene expression analysis, and detection of social hierarchies. Even the most efficient serial MCE algorithms require large amounts of time to enumerate the maximal cliques in networks arising from these problems that contain hundreds, thousands, or larger numbers of vertices. The previous attempts to provide practical solutions to the MCE problem through parallel implementation have had limited success, largely due to a number of challenges inherent to the nature of the MCE combinatorial search space. On the one hand, MCE algorithms often create a backtracking search tree that has a highly irregular and hard-or-impossible to predict structure; therefore, almost any static decomposition of the search tree by parallel processors results in highly unbalanced processor execution times. On the other hand, the data-intensive nature of the MCE problem often makes naive dynamic load distribution strategies that require extensive data movement prohibitively expensive. As a result, good scaling of the overall execution time of parallel MCE algorithms has been reported for only up to a couple hundred processors. In this paper, we propose a parallel, scalable, and memory-efficient MCE algorithm for distributed and/or shared memory high performance computing architectures, whose runtime scales linearly for thousands of processors on real-world application graphs with hundreds and thousands of nodes. Its scalability and efficiency are attributed to the proposed: (a) representation of the search tree decomposition to enable parallelization; (b) parallel depth-first backtracking search to both constrain the search space and minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing intelligently coupled with work stack splitting to minimize computing elements’ idle time. To the best of our knowledge, the proposed parallel MCE algorithm is the first to achieve a linear scaling runtime using up to 2048 processors on Cray XT machines for a number of real-world biological networks. 相似文献

17.

Optimal parallel algorithm for findingst-ambitus of a planar biconnected graph

K. S. Easwarakumar S. V. Krishnan C. Pandu Rangan S. Seshadri 《Algorithmica》1996,15(3):242-255

A cycleC passing through two specific verticess andt of a biconnected graph is said to be anst-ambitus if its bridges do not interlace in some special way. We present algorithms forst-ambitus for planar biconnected graphs, which are much simpler than the one known for general graphs [MT]. Our algorithm runs inO(n) time on a sequential machine and (logn) parallel time usingO(n/logn) processors on an EREW PRAM. 相似文献

18.

A parallel alpha/beta tree searching algorithm

Robert M. Hyatt Bruce W. Suter Harry L. Nelson 《Parallel Computing》1989,10(3):299-308

This paper gives a theoretical foundation for using parallel processing to search an alpha/beta minimax game tree. A mathematical discussion predicts the behavior of a particular parallel algorithm, Principle Variation Splitting (PVS), along with an enhancement that improves performance for most test cares. After the theoretical discussion, some practical results obtained by running two implementations of the PVS algorithm on a 30-processor tightly coupled shared memory machine are given to show the speedup obtained by parallel processing. 相似文献

19.

An optimal parallel algorithm for computing cut vertices and blocks on interval graphs

《国际计算机数学杂志》2012,89(1):59-70

In this paper, a parallel algorithm is presented to find all cut-vertices and blocks of an interval graph. If the list of sorted end points of the intervals of an interval graph is given then the proposed algorithm takes O(log n) time and O(n/log n) processors on an EREW PRAM, if the sorted list is not given then the time and processors complexities are respectively O(log n) and O(n). 相似文献

20.

用单数码相机实现物体表面的三维重建

张勇金学波《计算机工程与设计》2008,29(11):2926-2929

根据由运动重建物体结构的原理,设计了一个简便易操作的三维重建系统,具体做法是:先用张氏标定法求得内参数矩阵,然后在两个不同的未知位置拍摄物体得到两幅图像,经立体匹配后,利用图像特征点的对应关系求解基本矩阵和本质矩阵,分解本质矩阵获得两个拍摄位置确定的摄像机运动参数(旋转矩阵和平移向量),进而求出相机在两个位置的投影矩阵,最后用三角法计算出物体表面特征点的三维坐标并在OpenGL中重建物体表面.和传统的立体视觉系统相比,本系统只需要一台数码相机和平面方格模板就可以实现三维重建,因此适用于普通相机用户. 相似文献