期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A chained-matrices approach for parallel computation of continued fractions and its applications

Lin Shun-Shii 《Journal of scientific computing》1994,9(1):65-80

A chained-matrices approach for parallel computing thenth convergent of continued fractions is presented. The resulting algorithm computes the entire prefix values of any continued fraction inO(logn) time on the EREW PRAM model or a network withO(n/logn) processors connected by the cube-connectedcycles, binary tree, perfect shuffle, or hypercube. It can be applied to approximate the transcendental numbers, such as ande, inO(logm) time by usingO(m/logm) processors for a result withm-digit precision. We also use it to costoptimally solve the second-order linear recurrence, the polynomial evaluation, the recurrence of vector norm, the general class of recurrence equation defined by Kogge and Stone (1973), and the generalmth order linear recurrence. It is easy to implement because there are only some matrix multiplications and a division operation involved.This work was supported in part by National Science Council of the Republic of China under Contract NSC 77-0408-E002-09. 相似文献

2.

An optimal parallel algorithm for triangulating a set of points in the plane 总被引：1，自引：0，他引：1

Merks Ed 《International journal of parallel programming》1986,15(5):399-411

This paper presents an optimal parallel algorithm for triangulating an arbitrary set ofn points in the plane. The algorithm runs inO(logn) time usingO(n) space andO(_n) processors on a Concurrent-Read, Exclusive-Write Parallel RAM model (CREW PRAM). The parallel lower bound on triangulation is (logn) time so the best possible linear speedup has been achieved. A parallel divide-and-conquer technique of subdividing a problem into subproblems is employed. 相似文献

3.

Parallel general prefix computations with geometric,algebraic, and other applications

Frederick Springsteel Ivan Stojmenović 《International journal of parallel programming》1989,18(6):485-503

We introduce a generic problem component that captures the most common, difficult kernel of many problems. This kernel involves general prefix computations (GPC). GPC's lower bound complexity of (n logn) time is established, and we give optimal solutions on the sequential model inO(n logn) time, on the CREW PRAM model inO(logn) time, on the BSR (broadcasting with selective reduction) model in constant time, and on mesh-connected computers inO(n) time, all withn processors, plus anO(log² n) time solution on the hypercube model. We show that GPC techniques can be applied to a wide variety of geometric (point set and tree) problems, including triangulation of point sets, two-set dominance counting, ECDF searching, finding two-and three-dimensional maximal points, the reconstruction of trees from their traversals, counting inversions in a permutation, and matching parentheses.work partially supported by NSF IRI/8709726work partially supported by NSERC. 相似文献

4.

An optimal parallel algorithm for planar cycle separators

Ming-Yang Kao Shang-Hua Teng K. Toyama 《Algorithmica》1995,14(5):398-408

We present an optimal parallel algorithm for computing a cycle separator of ann-vertex embedded planar undirected graph inO(logn) time onn/logn processors. As a consequence, we also obtain an improved parallel algorithm for constructing a depth-first search tree rooted at any given vertex in a connected planar undirected graph in O(log² n) time on n/logn processors. The best previous algorithms for computing depth-first search trees and cycle separators achieved the same time complexities, but withn processors. Our algorithms run on a parallel random access machine that permits concurrent reads and concurrent writes in its shared memory and allows an arbitrary processor to succeed in case of a write conflict.A preliminary version of this paper appeared as Improved Parallel Depth-First Search in Undirected Planar Graphs in theProceedings of the Third Workshop on Algorithms and Data Structures, 1993, pp. 407–420.Supported in part by NSF Grant CCR-9101385. 相似文献

5.

Parallel integer sorting using small operations 总被引：1，自引：0，他引：1

Ramachandran Vaidyanathan Carlos R. P. Hartmann Pramod K. Varshney 《Acta Informatica》1995,32(1):79-92

We consider the problem of sortingn integers in the range [0,n ^c-1], wherec is a constant. It has been shown by Rajasekaran and Sen [14] that this problem can be solved optimally inO(logn) steps on an EREW PRAM withO(n) n -bit operations, for any constant >O. Though the number of operations is optimal, each operation is very large. In this paper, we show thatn integers in the range [0,n ^c-1] can be sorted inO(logn) time withO(nlogn)O(1)-bit operations andO(n) O(logn)-bit operations. The model used is a non-standard variant of an EREW PRAMtthat permits processors to have word-sizes ofO(1)-bits and (logn)-bits. Clearly, the speed of the proposed algorithm is optimal. Considering that the input to the problem consists ofO (n logn) bits, the proposed algorithm performs an optimal amount of work, measured at the bit level.This work was partially supported by The Northeast Parallel Architectures Center (NPAC) at Syracuse University, Syracuse, NY 13244 and The Rome Air Development Center, under contract F30602-88-D-0027. 相似文献

6.

An optimal speed-up parallel algorithm for triangulating simplicial point sets in space

Hossam ElGindy 《International journal of parallel programming》1986,15(5):389-398

Previous research on developing parallel triangulation algorithms concentrated on triangulating planar point sets.O(log³ n) running time algorithms usingO(n) processors have been developed in Refs. 1 and 2. Atallah and Goodrich⁽³⁾ presented a data structure that can be viewed as a parallel analogue of the sequential plane-sweeping paradigm, which can be used to triangulate a planar point set inO(logn loglogn) time usingO(n) processors. Recently Merks⁽⁴⁾ described an algorithm for triangulating point sets which runs inO(logn) time usingO(n) processors, and is thus optimal. In this paper we develop a parallel algorithm for triangulating simplicial point sets in arbitrary dimensions based on the idea of the sequential algorithm presented in Ref. 5. The algorithm runs inO(log² n) time usingO(n/logn) processors. The algorithm hasO(n logn) as the product of the running time and the number of processors; i.e., an optimal speed-up. 相似文献

7.

Efficient parallel algorithms forr-dominating set andp-center problems on trees

Xin He Yaacov Yesha 《Algorithmica》1990,5(1-4):129-145

We develop efficient parallel algorithms for ther-dominating set and thep-center problems on trees. On a concurrent-read exclusive-write PRAM, our algorithm for ther-dominating set problem runs inO(logn log logn) time withn processors. The algorithm for thep-center problem runs inO(log² n log logn) time withn processors. 相似文献

8.

An improved parallel algorithm for integer GCD

Benny Chor Oded Goldreich 《Algorithmica》1990,5(1):1-10

We present a simple parallel algorithm for computing the greatest common divisor (gcd) of twon-bit integers in the Common version of the CRCW model of computation. The run-time of the algorithm in terms of bit operations isO(n/logn), usingn ¹⁺ processors, where is any positive constant. This improves on the algorithm of Kannan, Miller, and Rudolph, the only sublinear algorithm known previously, both in run time and in number of processors; they requireO(n log logn/logn),n ² log² n, respectively, in the same CRCW model.We give an alternative implementation of our algorithm in the CREW model. Its run-time isO(n log logn/logn), usingn ¹⁺ processors. Both implementations can be modified to yield the extended gcd, within the same complexity bounds.Supported in part by an IBM Graduate Fellowship and a Bantrell Postdoctoral Fellowship.Supported in part by a Weizmann Postdoctoral Fellowship.4 All logarithms are to base 2. 相似文献

9.

Parallel algorithms for shortest path problems in polygons

Hossam ElGindy Michael Goodrich 《The Visual computer》1988,3(6):371-378

Given ann-vertex simple polygon we address the following problems: (i) find the shortest path between two pointss andd insideP, and (ii) compute the shortestpath tree between a single points and each vertex ofP (which implicitly represents all the shortest paths). We show how to solve the first problem inO(logn) time usingO(n) processors, and the more general second problem inO(log² n) time usingO(n) processors, and the more general second problem inO(log² n) time usingO(n) processors for any simple polygonP. We assume the CREW RAM shared memory model of computation in which concurrent reads are allowed, but no two processors should attempt to simultaneously write in the same memory location. The algorithms are based on the divide-and-conquer paradigm and are quite different from the known sequential algorithmsResearch supported by the Faculty of Graduate Studies and Research (McGill University) grant 276-07 相似文献

10.

Primality testing with fewer random bits

René Peralta Victor Shoup 《Computational Complexity》1993,3(4):355-367

In the usual formulations of the Miller-Rabin and Solovay-Strassen primality testing algorithms for a numbern, the algorithm chooses candidatesx ₁,x ₂, ...,x _k uniformly and independently at random from _n, and tests if any is a witness to the compositeness ofn. For either algorithm, the probabilty that it errs is at most 2^–k.In this paper, we study the error probabilities of these algorithms when the candidates are instead chosen asx, x+1, ..., x+k–1, wherex is chosen uniformly at random from _n. We prove that fork=[1/2log₂ n], the error probability of the Miller-Rabin test is no more thann ^–1/2+o(1), which improves on the boundn ^–1/4+o(1) previously obtained by Bach. We prove similar bounds for the Solovay-Strassen test, but they are not quite as strong; in particular, we only obtain a bound ofn ^–1/2+o(1) if the number of distinct prime factors ofn iso(logn/loglogn). 相似文献

11.

The complexity of on-line simulations between multidimensional turing machines and random access machines

Michael C. Loui David R. Luginbuhl 《Theory of Computing Systems》1992,25(4):293-308

To study different implementations of arrays, we present four results on the time complexities of on-line simulations between multidimensional Turing machines and random access machines (RAMs). First, everyd-dimensional Turing machine of time complexityt can be simulated by a log-cost RAM running inO(t(logt)^1–(1/d)(log logt)^1/d) time. Second, everyd-dimensional Turing machine of time complexityt can be simulated by a unit-cost RAM running inO(t/(logt)^1/d) time, provided that the input length iso(t/(logt)^1/d). Third, there is a log-cost RAMR of time complexityO(n), wheren is the input length, such that, for anyd-dimensional Turing machineM that simulatesR on-line,M requires (n ^{1 + (1/d)})/(logn(log logn)^{1 + (1/d)})) time. Fourth, every unit-cost RAM of time complexityt can be simulated by ad-dimensional Turing machine inO(t ²(logt)^1/2) time ifd = 2, and inO(t ²) time ifd 3. This result uses the weight-balanced trees of Nievergelt and Reingold.This paper was prepared while M. C. Loui was visiting the National Science Foundation in Washington, DC, and the Institute for Advanced Computer Studies, University of Maryland, College Park, MD. The views, opinions, and conclusions in this paper are those of the authors and should not be construed as an official position of the National Science Foundation, Department of Defense, U.S. Air Force, or any other U.S. government agency. The research of M. C. Loui was supported by the National Science Foundation under Grant CCR-8922008. 相似文献

12.

Pipelined search on coarse grained networks

Selim G. Akl Frank Dehne 《International journal of parallel programming》1989,18(5):359-364

The time complexity of searching a sorted list ofn elements in parallel on a coarse grained network of diameterD and consisting ofN processors (wheren may be much larger thanN) is studied. The worst case period and latency of a sequence of pipeline search operation are easity seen to be (logn–logN) and (D+logn–logN), respectively. Since forn=N ¹⁺⁽¹⁾ the worst-case period is (logn) (which can be achieved by a single processor), coarse-grained networks appear to be unsuitable for the search problem. By contrast, it is demonstrated using standard queuing theory techniques that a constant expected period can be achieved provided thatn=O(N2 ^N).This research was supported by the Natural Sciences and Engineering Research Council of Canada under Grants A3336 and A9173. 相似文献

13.

Randomized range-maxima in nearly-constant parallel time

Omer Berkman Yossi Matias Uzi Vishkin 《Computational Complexity》1992,2(4):350-373

Given an array ofn input numbers, therange-maxima problem is that of preprocessing the data so that queries of the type what is the maximum value in subarray [i..j] can be answered quickly using one processor. We present a randomized preprocessing algorithm that runs inO(log^* n) time with high probability, using an optimal number of processors on a CRCW PRAM; each query can be processed in constant time by one processor. We also present a randomized algorithm for a parallel comparison model. Using an optimal number of processors, the preprocessing algorithm runs inO( (n)) time with high probability; each query can be processed inO ( (n)) time by one processor. (As is standard, (n) is the inverse of Ackermann function.) A constant time query can be achieved by some slowdown in the performance of the preprocessing stage. 相似文献

14.

The rectilinear steiner arborescence problem

Sailesh K. Rao P. Sadayappan Frank K. Hwang Peter W. Shor 《Algorithmica》1992,7(1):277-288

The Rectilinear Steiner Arborescence (RSA) problem is Given a setN ofn nodes lying in the first quadrant of E², find the shortest directed tree rooted at the origin, containing all nodes inN, and composed solely of horizontal and vertical arcs oriented only from left to right or from bottom to top. In this paper we investigate many fundamental properties of the RSA problem, propose anO(n logn)-time heuristic algorithm giving an RSA whose length has an upper bound of twice that of the minimum length RSA, and show that a polynomial-time algorithm that was earlier reported in the literature for this problem is incorrect. 相似文献

15.

On the efficiency of effective Nullstellensätze

Marc Giusti Joos Heintz Juan Sabia 《Computational Complexity》1993,3(1):56-95

Letk be an infinite and perfect field,x ₁, ...,x _n indeterminates overk and letf ₁, ...,f _s be polynomials ink[x ₁, ...,x _n] of degree bounded by a given numberd, which satisfiesdn. We prove an effective affine Nullstellensatz of the following particular form:For arbitrary given parametersd, s, n there exists a probabilistic (randomized) arithmetic network overk of sizes ^O(1) d ^O(n) and depthO(n ⁴log² sd) solving the following task: 相似文献

16.

Scheduling tree dags on parallel architectures

K. Kalpakis Y. Yesha 《Algorithmica》1996,15(4):373-396

We provide optimal within a constant explicit upper bounds on the makespan of schedules for tree-structured programs on mesh arrays of processors, and provide polynomial-time algorithms to find schedules with makespan matching these bounds. In particular, we show how to find, in polynomial time, a (nonpreemptive) schedule for a binary tree dag withn unit execution time tasks and heighth on ad-dimensional mesh array withm processors and links of unit bandwidth and unit propagation delay whose makespan isO(n/m+n ¹/(d+1)+h), i.e., optimal within a constant factor. Further, we extend these schedules to bounded degree forest dags with arbitrary positive integer execution time tasks and to meshes when the propagation delay of all the links is an arbitrary positive integer. Thus, we provide a polynomial-time approximation algorithm for an NP-hard problem, with a performance ratio that is a constant.We also show how to schedule tree dags on any parallel architecture that satisfies certain natural, not very restrictive, conditions that are satisfied by most parallel architectures used in practice. Let be a fixed positive real number. We provide polynomial time computable schedules for binary tree dags withn unit execution time tasks and heighth (g(n)n ^–,g(n) logn) on any parallel architecture satisfying those conditions, with unit bandwidth and unit propagation delay links, with optimal up to a constant makespanO(g(n)+ft), whereg is a function that depends only on that architecture. The number of processors used is optimal within a constant factor ifh g(n)n ^–, and is optimal within anO(logn) factor ifhg(n)logn. As an example, for hypercube and complete binary tree architectures, we achieve optimal within a constant makespanO(h) whenh=(log² n), using an optimal within anO(logn) factor number of processors. Further, we extend these schedules to the case of bounded-degree forest dags with tasks of arbitrary positive integer execution times and architectures when the propagation delay of all the links is a given arbitrary positive integer.The second author was supported in part by the National Science Foundation under Grant CCR-9106062, and in part by the University of Maryland at College Park, Institute for Advanced Computer Studies. 相似文献

17.

Communicating processes,scheduling, and the complexity of nontermination

Hsu-Chun Yen 《Theory of Computing Systems》1990,23(1):33-59

In this paper we study the computational complexity of the nontermination problem for systems of communicating processes with respect to five types of scheduling schemes, namely, round-robin, random, priority, first-come-first-served, and equifair schedules. We show that the problem is undecidable (₁-complete) with respect to round-robin, first-come-first-served, and priority scheduling; whereas it is decidable with respect to random and equifair scheduling. (Here ₁ denotes the set of languages whose complements are recursively enumerable.) For a restricted class of systems in which the communication channels between processes are of unit capacity, we show that the nontermination problem is solvable inO(k ² logn) nondeterministic space for round-robin, random, priority, and first-come-first-served scheduling, and inn ^o(k ²) nondeterministic time for equifair scheduling, wherek is the number of processes andn is the size of the maximal process. We are also able to establish a lower bound of ((k–59)/20*logn) nondeterministic space for all five types of scheduling schemes. 相似文献

18.

Optimal parallel detection of squares in strings

Alberto Apostolico 《Algorithmica》1992,8(1):285-319

A stringw isprimitive if it is not a power of another string (i.e., writingw =v ^k impliesk = 1. Conversely,w is asquare ifw =vv, withv a primitive string. A stringx issquare-free if it has no nonempty substring of the formww. It is shown that the square-freedom of a string ofn symbols over an arbitrary alphabet can be tested by a CRCW PRAM withn processors inO(logn) time and linear auxiliary space. If the cardinality of the input alphabet is bounded by a constant independent of the input size, then the number of processors can be reduced ton/logn without affecting the time complexity of this strategy. The fastest sequential algorithms solve this problemO(n logn) orO(n) time, depending on whether the cardinality of the input alphabet is unbounded or bounded, and either performance is known to be optimal within its class. More elaborate constructions lead to a CRCW PRAM algorithm for detecting, within the samen-processors bounds, all positioned squares inx in timeO(logn) and using linear auxiliary space. The fastest sequential algorithms solve this problem inO(n logn) time, and such a performance is known to be optimal.This research was supported, through the Leonardo Fibonacci Institute, by the Istituto Trentino di Cultura, Trento, Italy. Additional support was provided by the French and Italian Ministries of Education, by the National Research Council of Italy, by the British Research Council Grant SERC-E76797, by NSF Grant CCR-89-00305, by NIH Library of Medicine Grant ROI LM05118, by AFOSR Grant 90-0107, and by NATO Grant CRG900293. 相似文献

19.

Optimal Computing the Chessboard Distance Transform on Parallel Processing Systems

Yu-Hua Lee Shi-Jinn Horng 《Computer Vision and Image Understanding》1999,73(3):272

Thedistance transform(DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a grey-level image, where each pixel has a value corresponding to the distance to the nearest foreground pixel. The time complexity for computing the distance transform is fully dependent on the different distance metrics. Especially, the more exact the distance transform is, the worse execution time reached will be. Nowadays, quite often thousands of images are processed in a limited time. It seems quite impossible for a sequential computer to do such a computation for the distance transform in real time. In order to provide efficient distance transform computation, it is considerably desirable to develop a parallel algorithm for this operation. In this paper, based on the diagonal propagation approach, we first provide anO(N²) time sequential algorithm to compute thechessboard distance transform(CDT) of anN×Nimage, which is a DT using the chessboard distance metrics. Based on the proposed sequential algorithm, the CDT of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N²/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N²/log logN) processors, andO(logN) time on the hypercube computer usingO(N²/logN) processors. Following the mapping as proposed by Lee and Horng, the algorithm for the medial axis transform is also efficiently derived. The medial axis transform of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N²/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N²/log logN) processors, andO(logN) time on the hypercube computer usingO(N²/logN) processors. The proposed parallel algorithms are composed of a set of prefix operations. In each prefix operation phase, only increase (add-one) operation and minimum operation are employed. So, the algorithms are especially efficient in practical applications. 相似文献

20.

Robust algorithms for packet routing in a mesh

P. Raghavan 《Theory of Computing Systems》1995,28(1):1-11

This paper considers the problem of permutation packet routing on a n×n mesh-connected array of processors. Each node in the array is assumed to be independently faulty with a probability bounded above by a valuep. This paper gives a routing algorithm which, ifp 0.29, will with very high probability route every packet that can be routed inO(n logn) steps with queue lengths that areO(log² n). Extensions to higher-dimensional meshes are given. 相似文献