首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract. Blockwise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/ O. In this paper we present a simple, deterministic simulation technique which transforms certain Bulk Synchronous Parallel (BSP) algorithms into efficient parallel EM algorithms. It optimizes blockwise data access and parallel disk I/ O and, at the same time, utilizes multiple processors connected via a communication network or shared memory. We obtain new improved parallel EM algorithms for a large number of problems including sorting, permutation, matrix transpose, several geometric and GIS problems including three-dimensional convex hulls (two-dimensional Voronoi diagrams), and various graph problems. We show that certain parallel algorithms known for the BSP model can be used to obtain EM algorithms that meet well known I /O complexity lower bounds for various problems, including sorting.  相似文献   

2.
Dehne  Dittrich  Hutchinson 《Algorithmica》2008,36(2):97-122
Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACM Working Group on Storage I/ O for Large-Scale Computing. In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine. When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems.  相似文献   

3.
Abstract. We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures.  相似文献   

4.
Andrews  Bender  Zhang 《Algorithmica》2008,32(2):277-301
Abstract. Processor speed and memory capacity are increasing several times faster than disk speed. This disparity suggests that disk I/ O performance could become an important bottleneck. Methods are needed for using disks more efficiently. Past analysis of disk scheduling algorithms has largely been experimental and little attempt has been made to develop algorithms with provable performance guarantees. We consider the following disk scheduling problem. Given a set of requests on a computer disk and a convex reachability function that determines how fast the disk head travels between tracks, our goal is to schedule the disk head so that it services all the requests in the shortest time possible. We present a 3/2 -approximation algorithm (with a constant additive term). For the special case in which the reachability function is linear we present an optimal polynomial-time solution. The disk scheduling problem is related to the special case of the Asymmetric Traveling Salesman Problem with the triangle inequality (ATSP-Δ ) in which all distances are either 0 or some constant α . We show how to find the optimal tour in polynomial time and describe how this gives another approximation algorithm for the disk scheduling problem. Finally we consider the on-line version of the problem in which uniformly distributed requests arrive over time. We present an algorithm related to the above ATSP-Δ .  相似文献   

5.
Ying Xu 《Algorithmica》2008,36(1):93-96
   Abstract. We consider the problem of distributed gossiping in radio networks of unknown topology. For radio networks of size n and diameter D , we present an adaptive deterministic gossiping algorithm of time O (
n+n log 2 n ) or O(n 1.5 ) . This algorithm is a tuned version of the fastest previously known gossiping algorithm due to Gasieniec and Lingas [1], and improves the time complexity by a poly-logarithmic factor.  相似文献   

6.
   Abstract. We consider a simple restriction of the PRAM model (called PPRAM), where the input is arbitrarily partitioned between a fixed set of p processors and the shared memory is restricted to m cells. This model allows for investigation of the tradeoffs/ bottlenecks with respect to the communication bandwidth (modeled by the shared memory size m ) and the number of processors p . The model is quite simple and allows the design of optimal algorithms without losing the effect of communication bottlenecks. We have focused on the PPRAM complexity of problems that have
(n) sequential solutions (where n is the input size), and where m ≤ p ≤ n . We show essentially tight time bounds (up to logarithmic factors) for several problems in this model such as summing, Boolean threshold, routing, integer sorting, list reversal and k -selection. We get typically two sorts of complexity behaviors for these problems: One type is
(n/p + p/m) , which means that the time scales with the number of processors and with memory size (in appropriate ranges) but not with both. The other is
(n/m) , which means that the running time does not scale with p and reflects a communication bottleneck (as long as m < p ). We are not aware of any problem whose complexity scales with both p and m (e.g.
). This might explain why in actual implementations one often fails to get p -scalability for p close to n .  相似文献   

7.
The Buffer Tree: A Technique for Designing Batched External Data Structures   总被引:1,自引:0,他引:1  
Lars Arge 《Algorithmica》2008,37(1):1-24
We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures.  相似文献   

8.
Abello  Buchsbaum  Westbrook 《Algorithmica》2008,32(3):437-458
Abstract. We present a new approach for designing external graph algorithms and use it to design simple, deterministic and randomized external algorithms for computing connected components, minimum spanning forests, bottleneck minimum spanning forests, maximal independent sets (randomized only), and maximal matchings in undirected graphs. Our I/ O bounds compete with those of previous approaches. We also introduce a semi-external model, in which the vertex set but not the edge set of a graph fits in main memory. In this model we give an improved connected components algorithm, using new results for external grouping and sorting with duplicates. Unlike previous approaches, ours is purely functional—without side effects—and is thus amenable to standard checkpointing and programming language optimization techniques. This is an important practical consideration for applications that may take hours to run.  相似文献   

9.
External sorting—the process of sorting a file that is too large to fit into the computer's internal memory and must be stored externally on disks—is a fundamental subroutine in database systems[G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM ) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [K,Section 5.4.9] recently identified SRM (which he calls ``randomized striping') as the method of choice for sorting with parallel disks. In this paper we present an efficient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs efficiently during external merging. Our implementation is based on synchronous parallel I/O primitives provided by the TPIE programming environment[TPI]; whenever our program issues an I/O read (write) operation, one block of data is synchronously read from (written to) each disk in parallel. We compare the performance of SRM over a wide range of input sizes with that of disk-striped mergesort (DSM ), which is widely used in practice. DSM consists of a standard mergesort in conjunction with striped I/O for parallel disk access. SRM merges together significantly more runs at a time compared with DSM, and thus it requires fewer merge passes. We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than those for SRM (since DSM merges fewer runs at a time), sorting using SRM is often significantly faster than with DSM (since SRM requires fewer passes). The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel disks, including distribution sort and multiway partitioning of a file into several other files. Since both parallel disk merging and multimedia processing deal with streams that get ``consumed' at nonuniform and partially predictable rates, our techniques for lookahead based upon forecasting data may have relevance in video server applications. Received June 28, 2000, and in revised form June 5, 2001. Online publication April 8, 2002.  相似文献   

10.
Mehlhorn  Sanders 《Algorithmica》2003,35(1):75-93
Abstract. We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory. In the external memory model of computation with block size B , a memory consisting of m blocks, and at most m sequences the problem is trivially solved with N/B memory misses by reserving one block of memory for each sequence. However, in a cache memory with associativity a , every access may lead to a cache fault if k > a . For a direct mapped cache (a = 1 ) two sequences suffice. We show that by randomizing the starting addresses of the sequences the number of cache misses can be kept to O(N/B) provided that k = O(m/B 1/a ) , i.e., the number of sequences that can be supported is decreased by a factor B 1/a . We also show a corresponding lower bound. Our result leads to a general method for converting sequence based algorithms designed for the external memory model of computation to cache memory even for caches with small associativity.  相似文献   

11.
In this paper, we present efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize and handle a dynamic document collection. The algorithms are running on the Coarse-Grained Multicomputer (CGM) and/or the Bulk Synchronous Parallel (BSP) model which are two models that capture within a few parameters the characteristics of the parallel machine. To the best of our knowledge, our parallel retrieval algorithms are the first ones analyzed under these specific parallel models. For all the phases of the proposed algorithms, we analytically determine the relevant communication and computation cost thereby formally proving the efficiency of the proposed solutions. In addition, we prove that our technique for the on-line retrieval phase performs very well in comparison to other possible alternatives in the typical case of a multiuser information retrieval (IR) system where a number of user queries are concurrently submitted to an IR system. Finally, we discuss external memory issues and show how our techniques can be adapted to the case when processors have limited main memory but sufficient disk capacity for holding their local data.
Damianos GavalasEmail:
  相似文献   

12.
With the recent advances in network technology, the number of high-speed networked homes increases rapidly and the enhanced services such as on-demand video services become feasible in terms of market maturity. Another trend is that storage systems become network-accessible. One of the leading network-attached storage systems is the Fiber Channel Arbitrated Loop (FC-AL). As a residential service gateway, the FC-AL-based servers can stably provide high quality video (e.g., DVD quality MPEG-2 stream) with thousands of clients between external service providers and local clients. In addition, in densely populated areas such as New York City, they can be much more cost efficient. Using our end-to-end simulation experiments to combine all the components, we have observed that FC-AL-based streaming servers perform better than SCSI-based systems, but there is still room for performance improvement. We are motivated by the fact that, unlike in SCSI-based systems, all the disks in FC-AL-based severs utilize only a small portion of their caches to a similar degree due to FC-AL fairness arbitration algorithm. Thus, we propose an effective prefetching scheme to improve the performance by further utilizing the disk cache. We show how the proposed scheme can determine the maximum number of prefetched blocks depending on the disk block and cache size. It is also shown how to find the optimal number of blocks transmitted to the FC-AL from the disk cache per FC-AL arbitration. In addition, we describe the cache replacement policy to take full advantage of the sequential access pattern of video files, and explain how to support multiple loops. By analysis and simulation experiments, we show that our prefetching scheme is not able to only increase the total number of concurrent streams significantly by reducing the disk seek time, but it can also further utilize the FC-AL by reducing the overhead of arbitration.
Jonathan C. L. LiuEmail:
  相似文献   

13.
Katz  Nielsen  Segal 《Algorithmica》2008,36(1):59-73
   Abstract. We show how to maintain efficiently a minimum piercing set for a set S of intervals on the line, under insertions and deletions to/from S. A linear-size dynamic data structure is presented, which enables us to compute a new minimum piercing set following an insertion or deletion in time O(c( S) log |S|), where c (S) is the size of the new minimum piercing set. We also show how to maintain a piercing set for S of size at most (1+ɛ)c (S), for 0 < ɛ ≤ 1 , in
((log |S|)/ɛ) amortized time per update. We then apply these results to obtain efficient solutions to the following three problems: (i) the shooter location problem, (ii) computing a minimum piercing set for arcs on a circle, and (iii) dynamically maintaining a box cover for a d -dimensional point set.  相似文献   

14.
Data distribution in memory or on disks is an important factor influencing the performance of parallel applications. On the other hand, programs or systems, like a parallel file system, frequently redistribute data between memory and disks. This paper presents a generalization of previous approaches of the redistribution problem. We introduce algorithms for mapping between two arbitrary distributions of a data set. The algorithms are optimized for multidimensional array partitions. We motivate our approach and present potential utilizations. The paper also presents a case study, the employment of mapping functions, and redistribution algorithms in a parallel file system.
Walter F. TichyEmail:
  相似文献   

15.
Abstract. The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [32] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/ fast search operations supported. In this paper we analyze, both theoretically and experimentally, the I/ O complexity and the working space of six algorithms for constructing large suffix arrays. Three of them are state-of-the-art, the other three algorithms are our new proposals. We perform a set of experiments based on three different data sets (English texts, amino-acid sequences and random texts) and give a precise hierarchy of these algorithms according to their working-space versus construction-time tradeoff. Given the current trends in model design [12], [32] and disk technology [29], [30], we pose particular attention to differentiate between ``random' and ``contiguous' disk accesses, in order to explain reasonably some practical I/ O phenomena which are related to the experimental behavior of these algorithms and that would otherwise be meaningless in the light of other simpler external-memory models. We also address two other issues. The former is concerned with the problem of building word indexes; we show that our results can be successfully applied to this case too, without any loss in efficiency and without compromising the simplicity of programming to achieve a uniform, simple and efficient approach to both the two indexing models. The latter issue is related to the intriguing and apparently counterintuitive ``contradiction' between the effective practical performance of the well-known Baeza-Yates—Gonnet—Snider algorithm [17], verified in our experiments, and its unappealing worst-case behavior. We devise a new external-memory algorithm that follows the basic philosophy underlying that algorithm but in a significantly different manner, thus resulting in a novel approach which combines good worst-case bounds with efficient practical performance.  相似文献   

16.
Hayes  Kutin  van Melkebeek 《Algorithmica》2008,34(4):480-501
   Abstract. We describe a quantum black-box network computing the majority of N bits with zero-sided error ɛ using only
queries: the algorithm returns the correct answer with probability at least 1 - ɛ , and ``I don't know' otherwise. Our algorithm is given as a randomized ``XOR decision tree' for which the number of queries on any input is strongly concentrated around a value of at most 2/3N . We provide a nearly matching lower bound of
on the expected number of queries on a worst-case input in the randomized XOR decision tree model with zero-sided error o(1) . Any classical randomized decision tree computing the majority on N bits with zero-sided error 1/2 has cost N .  相似文献   

17.
Iwata 《Algorithmica》2008,36(4):331-341
   Abstract. This paper presents a new algorithm for computing the maximum degree δ k (A) of a minor of order k in a matrix pencil A(s) . The problem is of practical significance in the field of numerical analysis and systems control. The algorithm adopts a general framework of ``combinatorial relaxation' due to Murota. It first solves the weighted bipartite matching problem to obtain an estimate
on δ k (A) , and then checks if the estimate is correct, exploiting the optimal dual solution. In case of incorrectness, it modifies the matrix pencil A(s) to improve the estimate
without changing δ k (A) . The present algorithm performs this matrix modification by an equivalence transformation with constant matrices, whereas the previous one uses biproper rational function matrices. Thus the present approach saves memory space and reduces the running time bound by a factor of rank A .  相似文献   

18.
Random Duplicated Assignment (RDA) is an approach in which video data is stored by assigning a number of copies of each data block to different, randomly chosen disks. It has been shown that this approach results in smaller response times and lower disk and RAM costs compared to the well-known disk stripping techniques. Based on this storage approach, one has to determine, for each given batch of data blocks, from which disk each of the data blocks is to be retrieved. This is to be done in such a way that the maximum load of the disks is minimized. The problem is called the Retrieval Selection Problem (RSP). In this paper, we propose a new efficient algorithm for RSP. This algorithm is based on the breadth-first search approach and is able to guarantee optimal solutions for RSP in O(n2+mn), where m and n correspond to the number of data blocks and the number of disks, respectively. We will show that our proposed algorithm has a lower time complexity than an existing algorithm, called the MFS algorithm.  相似文献   

19.
Answering linear optimization queries with an approximate stream index   总被引:2,自引:2,他引:0  
We propose a SAO index to approximately answer arbitrary linear optimization queries in a sliding window of a data stream. It uses limited memory to maintain the most “important” tuples. At any time, for any linear optimization query, we can retrieve the approximate top-K tuples in the sliding window almost instantly. The larger the amount of available memory, the better the quality of the answers is. More importantly, for a given amount of memory, the quality of the answers can be further improved by dynamically allocating a larger portion of the memory to the outer layers of the SAO index.
Philip S. YuEmail:
  相似文献   

20.
Punnen  Margot  Kabadi 《Algorithmica》2008,35(2):111-127
   Abstract. We show that the 2-Opt and 3-Opt heuristics for the traveling salesman problem (TSP) on the complete graph K n produce a solution no worse than the average cost of a tour in K n in a polynomial number of iterations. As a consequence, we get that the domination numbers of the 2- Opt , 3- Opt , Carlier—Villon, Shortest Path Ejection Chain, and Lin—Kernighan heuristics are all at least (n-2)! / 2 . The domination number of the Christofides heuristic is shown to be no more than
, and for the Double Tree heuristic and a variation of the Christofides heuristic the domination numbers are shown to be one (even if the edge costs satisfy the triangle inequality). Further, unless P = NP, no polynomial time approximation algorithm exists for the TSP on the complete digraph
with domination number at least (n-1)!-k for any constant k or with domination number at least (n-1)! - (( k /(k+1))(n+r))!-1 for any non-negative constants r and k such that (n+r)
0 mod (k+1). The complexities of finding the median value of costs of all the tours in
and of similar problems are also studied.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号