期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Journal of Parallel and Distributed Computing》1993,17(4):327-336

We present a decentralized, symmetric mutual exclusion algorithm that is tailored to the hypercube architecture. Our algorithm has better time complexity than a suitable hypercube adaptation of Maekawa′s O(√N) Mutual Exclusion algorithm. We show that, in the presence of contention, our algorithm has a shorter Blocking Delay than Maekawa′s algorithm. In the absence of contention, our algorithm achieves optimal Round-trip Delay under both the n-port and the one-port communication models. 相似文献

2.

A Parallel Algorithm for Computing Polygon Set Operations

《Journal of Parallel and Distributed Computing》1995,26(1):85-98

We present a parallel algorithm for performing boolean set operations on generalized polygons that have holes in them. The intersection algorithm has a processor complexity of O(m²n²) processors and a time complexity of O(max(2log m, log²n)), where m is the maximum number of vertices in any loop of a polygon, and n is the maximum number of loops per polygon. The union and difference algorithms have a processor complexity of O(m²n²) and time complexity of O(log m) and O(2log m, log n) respectively. The algorithm is based on the EREW PRAM model. The algorithm tries to minimize the intersection point computations by intersecting only a subset of loops of the polygons, taking advantage of the topological structure of the two polygons. We believe this will result in better performance on the average as compared to the worst case. Though all the algorithms presented here are deterministic, randomized algorithms such as sample sort can be used for the sorting subcomponent of the algorithms to obtain fast practical implementations. 相似文献

3.

Reconstructing a Binary Tree from Its Traversals in Doubly Logarithmic CREW Time

《Journal of Parallel and Distributed Computing》1995,27(1):100-105

We consider the following problem. For a binary tree T = (V, E) where V = {1, 2, ..., n}, given its inorder traversal and either its preorder or its postorder traversal, reconstruct the binary tree. We present a new parallel algorithm for this problem. Our algorithm requires O(n) space. The main idea of our algorithm is to reduce the reconstruction process to merging two sorted sequences. With the best parallel merging algorithms, our algorithm can be implemented in O(log log n) time using O(n/log log n) processors on the CREW PRAM (or in O(log n) time using O(n/log n) processors on the EREW PRAM). Our result provides one more example of a fundamental problem which can be solved by optimal parallel algorithms in O(log log n)time on the CREW PRAM. 相似文献

4.

Global Combine Algorithms for 2-D Meshes with Wormhole Routing

Barnett M. Littlefield R. Payne D. G. Vandegeijn R. 《Journal of Parallel and Distributed Computing》1995,24(2)

The problem of performing a global combine (summation) operation on a distributed memory computer using a two-dimensional mesh interconnect with wormhole routing is considered. We present algorithms that are asymptotically optimal for short vectors (O(log(p)) for p processing nodes) and for long vectors (O(n) for n data elements per node), as well as hybrid algorithms that are superior for intermediate n. The algorithms are analyzed using detailed performance models that include the effects of link conflicts and other characteristics of the underlying communication system. The models are validated using experimental data from the Intel Touchstone DELTA computer. We show that no one algorithm is optimal for all vector lengths; rather, each of the presented algorithms is superior under some circumstances. 相似文献

5.

Detecting Termination by Weight-Throwing in a Faulty Distributed System

《Journal of Parallel and Distributed Computing》1995,25(1):7-15

This paper presents a fault-tolerant termination detection algorithm for a distributed system in which processes tend to fail. Allowing an arbitrary number of processes to have fail-stop behavior, the algorithm can detect termination efficiently with O(M + kn + n) control messages and O(k + 1) detection delays, where M is the number of basic messages issued, n is the number of processes, and k is the actual number of processes that fail. This algorithm has fewer detection delays than existing algorithms in the literature and comparable performance in terms of message complexity. In particular, when no fault occurs, the algorithm has constant detection delay and it uses, in the worst case, an optimal number of messages. 相似文献

6.

New trie data structures which support very fast search operations

Dan E. Willard 《Journal of Computer and System Sciences》1984,28(3):379-394

A new data structure called the q-fast trie is introduced. Given a set of N records whose keys are distinct nonnegative integers less than some initially specified bound M, a q-fast trie uses space O(N) and time O(√log M) for insertions, deletions, and all the retrieval operations commonly associated with binary trees. A simpler but less space efficient data structure called the p-fast trie is defined. 相似文献

7.

Parallel Heap Operations on an EREW PRAM

《Journal of Parallel and Distributed Computing》1994,20(2):248-255

We consider parallel heap operations on the exclusive-read exclusive-write parallel random-access machine. We first present an O(n/p + log p) time parallel algorithm to construct a heap of n elements using p processors, which is optimal for p θ(n/log n). We then propose a parallel root deletion algorithm. In a preparatory step, a data structure for dynamic processor allocation is constructed using O((n/log n)^{1 − 1/k}) processors in O(log k) time for some constant k, 1 ≤ k ≤ ⌈log(n/log n)⌉. A sequence of root deletions can then be performed, each of which takes O((log n)/p + log p + log log n) time using p processors. Finally, we discuss a parallel algorithm running in O((log n)/p + log p) time for inserting an element into a heap, which is optimal for p = θ((log n)/log log n). Both deletion and insertion algorithms run in O(log log n) time when p = θ((log n)/log log n). 相似文献

8.

Closing the complexity gap between FCFS mutual exclusion and mutual exclusion

Robert Danek Wojciech Golab 《Distributed Computing》2010,23(2):87-111

First-Come-First-Served (FCFS) mutual exclusion (ME) is the problem of ensuring that processes attempting to concurrently access a shared resource do so one by one, in a fair order. In this paper, we close the complexity gap between FCFS ME and ME in the asynchronous shared memory model where processes communicate using atomic reads and writes only, and do not fail. Our main result is the first known FCFS ME algorithm that makes O(log N) remote memory references (RMRs) per passage and uses only atomic reads and writes. Our algorithm is also adaptive to point contention. More precisely, the number of RMRs a process makes per passage in our algorithm is Θ(min(k, log N)), where k is the point contention. Our algorithm matches known RMR complexity lower bounds for the class of ME algorithms that use reads and writes only, and beats the RMR complexity of prior algorithms in this class that have the FCFS property. 相似文献

9.

A Deterministic Construction of Normal Bases With Complexity O(n3 + n log n log(log n) log q)

《Journal of Symbolic Computation》1995,19(4):305-319

Constructing normal bases of GF(qⁿ) over GF (q) can be done by probabilistic methods as well as deterministic ones. In the following paper we consider only deterministic constructions. As far as we know, the best complexity for probabilistic algorithms is O(n² log⁴n log² (log n) + n log n log (log n) log q ) (see von zur Gathen and Shoup, 1992). For deterministic constructions, some prior ones, e.g. Lueneburg (1986), do not use the factorization of Xⁿ - 1 over GF(q). As analysed by Bach, Driscoll and Shallit (1993), the best complexity (from Lueneburg, 1986) is O(n³ log n log(log n) + n² log n log(log n) log q). For other deterministic constructions, which need such a factorization, the best complexities are O(n^3,376 + n² log n log(log n) log q) (von zur Gathen and Giesbrecht, 1990), or O(n³ log q); see Augot and Camion (1993). Here we propose a new deterministic construction that does not require the factorization of Xⁿ - 1. Its complexity is reduced to O (n³ + n log n log(log n) log q ). 相似文献

10.

An improved algorithm for finding the median distributively

Francis Chin H. F. Ting 《Algorithmica》1987,2(1-4):235-249

Given two processes, each having a total-ordered set ofn elements, we present a distributed algorithm for finding median of these 2n elements using no more than logn +O(√logn) messages, but if the elements are distinct, only logn +O(1) messages will be required. The communication complexity of our algorithm is better than the previously known result which takes 2 logn messages. 相似文献

11.

Multiple Addition and Prefix Sum on a Linear Array with a Reconfigurable Pipelined Bus System

Amitava Datta 《The Journal of supercomputing》2004,29(3):303-317

We present several fast algorithms for multiple addition and prefix sum on the Linear Array with a Reconfigurable Pipelined Bus System (LARPBS), a recently proposed architecture based on optical buses. Our algorithm for adding N integers runs on an N log M-processor LARPBS in O(log* N) time, where log* N is the number of times logarithm has to be taken to reduce N below 1 and M is the largest integer in the input. Our addition algorithm improves the time complexity of several matrix multiplication algorithms proposed by Li, Pan and Zheng (IEEE Trans. Parallel and Distributed Systems, 9(8):705–720, 1998). We also present several fast algorithms for computing prefix sums of N integers on the LARPBS. For integers with bounded magnitude, our first algorithm for prefix sum computation runs in O(log log N) time using N processors and in O(1) time using N ¹⁺ processors, for < 1. For integers with unbounded magnitude, the first algorithm for multiple addition runs in O(log log N log* N) time using N log M processors, when M is the largest integer in the input. Our second algorithm for multiple addition runs in O(log* N) time using N ¹⁺ log M processors, for < 1. We also show suitable extensions of our algorithm for real numbers. 相似文献

12.

Efficient adaptive collect algorithms

Yehuda Afek Yaron De Levie 《Distributed Computing》2007,20(3):221-238

Deterministic collect algorithms are presented that are adaptive to total contention and are efficient with respect to both the number of registers used and the step complexity. One of them has optimal O(k) step and O(n) space complexities, but assumes that processes’ identifiers are in O(n), where n is the total number of processes in the system and k is the total contention. The step complexity of an unrestricted name space variant of this algorithm remains O(k), but its space complexity increases to O(n ²). 相似文献

13.

An algorithm for merging meaps

Jörg -R. Sack Thomas Strothotte 《Acta Informatica》1985,22(2):171-186

Summary We present an algorithm to merge priority queues organized as heaps. The worst case number of comparisons required to merge two heaps of sizes k and n is O(log(n)*log(k)). The algorithm requires O(k) +log(n)*log (k)) data movements if heaps are implemented using arrays and O(log(n)*log(k)) for a pointer-based implementation. Previous algorithms require either O(n+k) data movements and comparisons, or O(k*log(log(n+k))) comparisons and O(k*log(n+k)) data movements. The algorithm presented in this paper improves on the previous algorithms for the case when k>log(n).This work was done while the authors were at McGill University, Montréal, Canada 相似文献

14.

Parallel integer sorting and simulation amongst CRCW models

Sanjeev Saxena 《Acta Informatica》1996,33(5):607-619

In this paper a general technique for reducing processors in simulation without any increase in time is described. This results in an O(√logn) time algorithm for simulating one step of PRIORITY on TOLERANT with processor-time product of O(n log logn); the same as that for simulating PRIORITY on ARBITRARY. This is used to obtain anO(logn/log logn + √logn (log logm ? log logn)) time algorithm for sortingn integers from the set {0,...,m ? 1},m ≧n, with a processor-time product ofO(n log logm log logn) on a TOLERANT CRCW PRAM. New upper and lower bounds for ordered chaining problem on an allocated COMMON CRCW model are also obtained. The algorithm for ordered chaining takesO(logn/log logn) time on an allocated PRAM of sizen. It is shown that this result is best possible (upto a constant multiplicative factor) by obtaining a lower bound of Ω(r logn/(logr + log logn)) for finding the first (leftmost one) live processor on an allocated-COMMON PRAM of sizen ofr-slow virtual processors (one processor simulatesr processors of allocated PRAM). As a result, for ordered chaining problem, “processor-time product” has to be at least Ω(n logn/log logn) for any poly-logarithmic time algorithm. Algorithm for ordered-chaining problem results in anO(logN/log logN) time algorithm for (stable) sorting ofn integers from the set {0,...,m ? 1} withn-processors on a COMMON CRCW PRAM; hereN = max(n, m). In particular if,m =n ^O(1), then sorting takes Θ(logn/log logn) time on both TOLERANT and COMMON CRCW PRAMs. Processor-time product for TOLERANT isO(n(log logn)²). Algorithm for COMMON usesn processors. 相似文献

15.

Optimal algorithms for rectangle problems on a mesh-connected computer

《Journal of Parallel and Distributed Computing》1988,5(2):154-171

In this paper, Mesh-Connected Computer (MCC) algorithms for computing several properties of a set of, possibly intersecting rectangles are presented. Given a set of n iso-oriented rectangles, we describe MCC algorithms for determining the following properties: (i) the area of the logic “OR” of these rectangles (i.e., the area of the region covered by at least one rectangle); (ii) the area of the union of pairwise “AND” of the rectangles (i.e., the area of the region covered by two or more rectangles); (iii) the largest number of rectangles that overlap (this solves the fixed-size rectangle placement problem, i.e., given a set of planar points and a rectangle, find a placement of the rectangle in the plane so that the number of points covered by the rectangle is maximal); (iv) the minimum separation between any pair of a set of nonoverlapping rectangles. All these algorithms can be implemented on a 2√n × 2√n MCC in O(√n) time which is optimal. The algorithms compare favorably with the known sequential algorithms that have O(n log n) time complexity. 相似文献

16.

On the Advice Complexity of the k-server Problem Under Sparse Metrics

Sushmita Gupta Shahin Kamali Alejandro López-Ortiz 《Theory of Computing Systems》2016,59(3):476-499

We consider the k-Server problem under the advice model of computation when the underlying metric space is sparse. On one side, we introduce Θ(1)-competitive algorithms for a wide range of sparse graphs. These algorithms require advice of (almost) linear size. We show that for graphs of size N and treewidth α, there is an online algorithm that receives O (n(log α + log log N))^* bits of advice and optimally serves any sequence of length n. We also prove that if a graph admits a system of μ collective tree (q, r)-spanners, then there is a (q + r)-competitive algorithm which requires O (n(log μ + log log N)) bits of advice. Among other results, this gives a 3-competitive algorithm for planar graphs, when provided with O (n log log N) bits of advice. On the other side, we prove that advice of size Ω(n) is required to obtain a 1-competitive algorithm for sequences of length n even for the 2-server problem on a path metric of size N ≥ 3. Through another lower bound argument, we show that at least \(\frac {n}{2}(\log \alpha - 1.22)\) bits of advice is required to obtain an optimal solution for metric spaces of treewidth α, where 4 ≤ α < 2k. 相似文献

17.

Window-based greedy contention management for transactional memory: theory and practice

Gokarna Sharma Costas Busch 《Distributed Computing》2012,25(3):225-248

We consider greedy contention managers for transactional memory for M?× N execution windows of transactions with M threads and N transactions per thread. We present, formally analyze, and experimentally evaluate three new randomized greedy contention management algorithms for transaction windows. Assuming that each transaction has duration τ and conflicts with at most C other transactions inside the window, the first algorithm Offline-Greedy produces a schedule of length O(τ· (C?+?N· log(MN))) with high probability. The offline algorithm depends on knowing the conflict graph which evolves while the execution of the transactions progresses. The second algorithm Online-Greedy produces a schedule of length that is only a logarithmic factor worse than Offline-Greedy, but does not require knowledge of the conflict graph. The third algorithm Adaptive-Greedy is the adaptive version of the previous algorithms which produces a schedule of length asymptotically the same as with online algorithm by adaptively guessing the value of C. All of the algorithms exhibit competitive ratio very close to O(s), where s is the number of shared resources, and at the same time, our algorithms provide new non-trivial tradeoffs for greedy transaction scheduling that parameterize window sizes and transaction conflicts within the execution window. We evaluate these window-based algorithms experimentally using the sorted link list, red-black tree, skip list, and vacation benchmarks. The evaluation results confirm their benefits in practical performance throughput and other metrics such as aborts per commit ratio and execution time overhead, along with the non-trivial provable properties of the algorithms. 相似文献

18.

Efficient Graph-Theoretic Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System 总被引：1，自引：0，他引：1

Amitava Datta 《The Journal of supercomputing》2002,23(2):193-211

We present efficient algorithms for solving several fundamental graph-theoretic problems on a Linear Array with a Reconfigurable Pipelined Bus System (LARPBS), one of the recently proposed models of computation based on optical buses. Our algorithms include finding connected components, minimum spanning forest, biconnected components, bridges and articulation points for an undirected graph. We compute the connected components and minimum spanning forest of a graph in O(log n) time using O(m+n) processors where m and n are the number of edges and vertices in the graph and m=O(n ²) for a dense graph. Both the processor and time complexities of these two algorithms match the complexities of algorithms on the Arbitrary and Priority CRCW PRAM models which are two of the strongest PRAM models. The algorithms for these two problems published by Li et al. [7] have been considered to be the most efficient on the LARPBS model till now. Their algorithm [7] for these two problems require O(log n) time and O(n ³/log n) processors. Hence, our algorithms have the same time complexity but require less processors. Our algorithms for computing biconnected components, bridges and articulation points of a graph run in O(log n) time on an LARPBS with O(n ²) processors. No previous algorithm was known for these latter problems on the LARPBS. 相似文献

19.

Adaptive and efficient mutual exclusion

Hagit Attiya Vita Bortnikov 《Distributed Computing》2002,15(3):177-189

Summary. This paper presents adaptive algorithms for mutual exclusion using only read and write operations; the performance of the algorithms depends only on the point contention, i.e., the number of processes that are concurrently active during the algorithm execution (and not on n, the total number of processes). Our algorithm has O(k) remote step complexity and O(logk) system response time, wherek is the point contention. The remote step complexity is the maximal number of steps performed by a process where a wait is counted as one step. The system response time is the time interval between subsequent entries to the critical section, where one time unit is the minimal interval in which every active process performs at least one step. The space complexity of this algorithm is O(N logn), where N is the range of processes' names. We show how to make the space complexity of our algorithm depend solely on n, while preserving the other performance measures of the algorithm. Received: March 2001 / Accepted: November 2001 相似文献

20.

The Mailman algorithm: A note on matrix-vector multiplication

Edo Liberty Steven W. Zucker 《Information Processing Letters》2009,109(3):179-222

Given an m×n matrix A we are interested in applying it to a real vector x∈Rn in less than the straightforward O(mn) time. For an exact deterministic computation at the very least all entries in A must be accessed, requiring O(mn) operations and matching the running time of naively applying A to x. However, we claim that if the matrix contains only a constant number of distinct values, then reading the matrix once in O(mn) steps is sufficient to preprocess it such that any subsequent application to vectors requires only O(mn/log(max{m,n})) operations. Algorithms for matrix-vector multiplication over finite fields, which save a log factor, have been known for many years. Our contribution is unique in its simplicity and in the fact that it applies also to real valued vectors. Using our algorithm improves on recent results for dimensionality reduction. It gives the first known random projection process exhibiting asymptotically optimal running time. The mailman algorithm is also shown to be useful (faster than naïve) even for small matrices. 相似文献