共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract. Blockwise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/ O. In this paper we present a simple, deterministic simulation technique which transforms certain Bulk Synchronous Parallel (BSP) algorithms into efficient parallel EM algorithms.
It optimizes blockwise data access and parallel disk I/ O and, at the same time, utilizes multiple processors connected via a communication network or shared memory. We obtain new improved parallel EM algorithms for a large number
of problems including sorting, permutation, matrix transpose, several geometric and GIS problems including three-dimensional
convex hulls (two-dimensional Voronoi diagrams), and various graph problems. We show that certain parallel algorithms known
for the BSP model can be used to obtain EM algorithms that meet well known I /O complexity lower bounds for various problems, including sorting. 相似文献
2.
Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory
of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation.
In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited
success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge
by the ACM Working Group on Storage I/ O for Large-Scale Computing.
In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors
on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine.
When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献
3.
Abstract. We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and
a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known
internal memory algorithms—given the developed external data structures. 相似文献
4.
Abstract. Processor speed and memory capacity are increasing several times faster than disk speed. This disparity suggests that disk
I/ O performance could become an important bottleneck. Methods are needed for using disks more efficiently. Past analysis of
disk scheduling algorithms has largely been experimental and little attempt has been made to develop algorithms with provable
performance guarantees.
We consider the following disk scheduling problem. Given a set of requests on a computer disk and a convex reachability function
that determines how fast the disk head travels between tracks, our goal is to schedule the disk head so that it services all
the requests in the shortest time possible. We present a 3/2 -approximation algorithm (with a constant additive term). For the special case in which the reachability function is linear
we present an optimal polynomial-time solution.
The disk scheduling problem is related to the special case of the Asymmetric Traveling Salesman Problem with the triangle
inequality (ATSP-Δ ) in which all distances are either 0 or some constant α . We show how to find the optimal tour in polynomial time and describe how this gives another approximation algorithm for
the disk scheduling problem. Finally we consider the on-line version of the problem in which uniformly distributed requests
arrive over time. We present an algorithm related to the above ATSP-Δ . 相似文献
5.
Ying Xu 《Algorithmica》2008,36(1):93-96
Abstract. We consider the problem of distributed gossiping in radio networks of unknown topology. For radio networks of size n and diameter D , we present an adaptive deterministic gossiping algorithm of time O (
n+n log
2
n ) or O(n
1.5
) . This algorithm is a tuned version of the fastest previously known gossiping algorithm due to Gasieniec and Lingas [1],
and improves the time complexity by a poly-logarithmic factor. 相似文献
6.
Abstract. We consider a simple restriction of the PRAM model (called PPRAM), where the input is arbitrarily partitioned between a fixed
set of p processors and the shared memory is restricted to m cells. This model allows for investigation of the tradeoffs/ bottlenecks with respect to the communication bandwidth (modeled by the shared memory size m ) and the number of processors p . The model is quite simple and allows the design of optimal algorithms without losing the effect of communication bottlenecks.
We have focused on the PPRAM complexity of problems that have
(n) sequential solutions (where n is the input size), and where m ≤ p ≤ n . We show essentially tight time bounds (up to logarithmic factors) for several problems in this model such as summing, Boolean
threshold, routing, integer sorting, list reversal and k -selection. We get typically two sorts of complexity behaviors for these problems: One type is
(n/p + p/m) , which means that the time scales with the number of processors and with memory size (in appropriate ranges) but not with
both. The other is
(n/m) , which means that the running time does not scale with p and reflects a communication bottleneck (as long as m < p ). We are not aware of any problem whose complexity scales with both p and m (e.g.
). This might explain why in actual implementations one often fails to get p -scalability for p close to n . 相似文献
7.
Lars Arge 《Algorithmica》2008,37(1):1-24
We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and
a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known
internal memory algorithms—given the developed external data structures. 相似文献
8.
Abstract. We present a new approach for designing external graph algorithms and use it to design simple, deterministic and randomized
external algorithms for computing connected components, minimum spanning forests, bottleneck minimum spanning forests, maximal
independent sets (randomized only), and maximal matchings in undirected graphs. Our I/ O bounds compete with those of previous approaches. We also introduce a semi-external model, in which the vertex set but
not the edge set of a graph fits in main memory. In this model we give an improved connected components algorithm, using new
results for external grouping and sorting with duplicates. Unlike previous approaches, ours is purely functional—without side
effects—and is thus amenable to standard checkpointing and programming language optimization techniques. This is an important
practical consideration for applications that may take hours to run. 相似文献
9.
External sorting—the process of sorting a file that is too large to fit into the computer's internal memory and must be stored
externally on disks—is a fundamental subroutine in database systems[G], [IBM]. Of prime importance are techniques that use
multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM ) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably
optimal number of passes and that is fast in practice. Knuth [K,Section 5.4.9] recently identified SRM (which he calls ``randomized
striping') as the method of choice for sorting with parallel disks.
In this paper we present an efficient implementation of SRM, based upon novel and elegant data structures. We give a new
implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs efficiently during external merging. Our implementation is based on
synchronous parallel I/O primitives provided by the TPIE programming environment[TPI]; whenever our program issues an I/O read (write)
operation, one block of data is synchronously read from (written to) each disk in parallel.
We compare the performance of SRM over a wide range of input sizes with that of disk-striped mergesort (DSM ), which is widely used in practice. DSM consists of a standard mergesort in conjunction with striped I/O for parallel disk
access. SRM merges together significantly more runs at a time compared with DSM, and thus it requires fewer merge passes.
We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than
those for SRM (since DSM merges fewer runs at a time), sorting using SRM is often significantly faster than with DSM (since
SRM requires fewer passes).
The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel
disks, including distribution sort and multiway partitioning of a file into several other files. Since both parallel disk
merging and multimedia processing deal with streams that get ``consumed' at nonuniform and partially predictable rates, our
techniques for lookahead based upon forecasting data may have relevance in video server applications.
Received June 28, 2000, and in revised form June 5, 2001. Online publication April 8, 2002. 相似文献
10.
Abstract. We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer
is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory.
In the external memory model of computation with block size B , a memory consisting of m blocks, and at most m sequences the problem is trivially solved with N/B memory misses by reserving one block of memory for each sequence.
However, in a cache memory with associativity a , every access may lead to a cache fault if k > a . For a direct mapped cache (a = 1 ) two sequences suffice. We show that by randomizing the starting addresses of the sequences the number of cache misses can
be kept to O(N/B) provided that k = O(m/B
1/a
) , i.e., the number of sequences that can be supported is decreased by a factor B
1/a
. We also show a corresponding lower bound.
Our result leads to a general method for converting sequence based algorithms designed for the external memory model of computation
to cache memory even for caches with small associativity. 相似文献
11.
Charalampos Konstantopoulos Basilis Mamalis Grammati Pantziou Damianos Gavalas 《The Journal of supercomputing》2009,48(3):286-318
In this paper, we present efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval
and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize
and handle a dynamic document collection. The algorithms are running on the Coarse-Grained Multicomputer (CGM) and/or the Bulk Synchronous Parallel (BSP) model which are two models that capture within a few parameters the characteristics of the parallel machine. To the best
of our knowledge, our parallel retrieval algorithms are the first ones analyzed under these specific parallel models. For
all the phases of the proposed algorithms, we analytically determine the relevant communication and computation cost thereby
formally proving the efficiency of the proposed solutions. In addition, we prove that our technique for the on-line retrieval
phase performs very well in comparison to other possible alternatives in the typical case of a multiuser information retrieval
(IR) system where a number of user queries are concurrently submitted to an IR system. Finally, we discuss external memory
issues and show how our techniques can be adapted to the case when processors have limited main memory but sufficient disk
capacity for holding their local data.
相似文献
Damianos GavalasEmail: |
12.
With the recent advances in network technology, the number of high-speed networked homes increases rapidly and the enhanced services such as on-demand video services become feasible in terms of market maturity. Another trend is that storage systems become network-accessible. One of the leading network-attached storage systems is the Fiber Channel Arbitrated Loop (FC-AL). As a residential service gateway, the FC-AL-based servers can stably provide high quality video (e.g., DVD quality MPEG-2 stream) with thousands of clients between external service providers and local clients. In addition, in densely populated areas such as New York City, they can be much more cost efficient. Using our end-to-end simulation experiments to combine all the components, we have observed that FC-AL-based streaming servers perform better than SCSI-based systems, but there is still room for performance improvement. We are motivated by the fact that, unlike in SCSI-based systems, all the disks in FC-AL-based severs utilize only a small portion of their caches to a similar degree due to FC-AL fairness arbitration algorithm. Thus, we propose an effective prefetching scheme to improve the performance by further utilizing the disk cache. We show how the proposed scheme can determine the maximum number of prefetched blocks depending on the disk block and cache size. It is also shown how to find the optimal number of blocks transmitted to the FC-AL from the disk cache per FC-AL arbitration. In addition, we describe the cache replacement policy to take full advantage of the sequential access pattern of video files, and explain how to support multiple loops. By analysis and simulation experiments, we show that our prefetching scheme is not able to only increase the total number of concurrent streams significantly by reducing the disk seek time, but it can also further utilize the FC-AL by reducing the overhead of arbitration.
相似文献
Jonathan C. L. LiuEmail: |
13.
Abstract. We show how to maintain efficiently a minimum piercing set for a set S of intervals on the line, under insertions and deletions to/from S. A linear-size dynamic data structure is presented, which enables us to compute a new minimum piercing set following an insertion
or deletion in time O(c( S) log |S|), where c (S) is the size of the new minimum piercing set. We also show how to maintain a piercing set for S of size at most (1+ɛ)c (S), for 0 < ɛ ≤ 1 , in
((log |S|)/ɛ) amortized time per update. We then apply these results to obtain efficient solutions to the following three problems: (i)
the shooter location problem, (ii) computing a minimum piercing set for arcs on a circle, and (iii) dynamically maintaining
a box cover for a d -dimensional point set. 相似文献
14.
Data distribution in memory or on disks is an important factor influencing the performance of parallel applications. On the
other hand, programs or systems, like a parallel file system, frequently redistribute data between memory and disks.
This paper presents a generalization of previous approaches of the redistribution problem. We introduce algorithms for mapping
between two arbitrary distributions of a data set. The algorithms are optimized for multidimensional array partitions. We
motivate our approach and present potential utilizations. The paper also presents a case study, the employment of mapping
functions, and redistribution algorithms in a parallel file system.
相似文献
Walter F. TichyEmail: |
15.
Abstract. The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [32] is
one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/ fast search operations supported. In this paper we analyze, both theoretically and experimentally, the I/ O complexity and the working space of six algorithms for constructing large suffix arrays. Three of them are state-of-the-art,
the other three algorithms are our new proposals. We perform a set of experiments based on three different data sets (English
texts, amino-acid sequences and random texts) and give a precise hierarchy of these algorithms according to their working-space
versus construction-time tradeoff. Given the current trends in model design [12], [32] and disk technology [29], [30], we
pose particular attention to differentiate between ``random' and ``contiguous' disk accesses, in order to explain reasonably
some practical I/ O phenomena which are related to the experimental behavior of these algorithms and that would otherwise be meaningless in
the light of other simpler external-memory models.
We also address two other issues. The former is concerned with the problem of building word indexes; we show that our results
can be successfully applied to this case too, without any loss in efficiency and without compromising the simplicity of programming
to achieve a uniform, simple and efficient approach to both the two indexing models. The latter issue is related to the intriguing
and apparently counterintuitive ``contradiction' between the effective practical performance of the well-known Baeza-Yates—Gonnet—Snider
algorithm [17], verified in our experiments, and its unappealing worst-case behavior. We devise a new external-memory algorithm
that follows the basic philosophy underlying that algorithm but in a significantly different manner, thus resulting in a novel
approach which combines good worst-case bounds with efficient practical performance. 相似文献
16.
Abstract. We describe a quantum black-box network computing the majority of N bits with zero-sided error ɛ using only
queries: the algorithm returns the correct answer with probability at least 1 - ɛ , and ``I don't know' otherwise. Our algorithm is given as a randomized ``XOR decision tree' for which the number of queries
on any input is strongly concentrated around a value of at most 2/3N . We provide a nearly matching lower bound of
on the expected number of queries on a worst-case input in the randomized XOR decision tree model with zero-sided error
o(1) . Any classical randomized decision tree computing the majority on N bits with zero-sided error 1/2 has cost N . 相似文献
17.
Iwata 《Algorithmica》2008,36(4):331-341
Abstract. This paper presents a new algorithm for computing the maximum degree δ
k
(A) of a minor of order k in a matrix pencil A(s) . The problem is of practical significance in the field of numerical analysis and systems control.
The algorithm adopts a general framework of ``combinatorial relaxation' due to Murota. It first solves the weighted bipartite
matching problem to obtain an estimate
on δ
k
(A) , and then checks if the estimate is correct, exploiting the optimal dual solution. In case of incorrectness, it modifies
the matrix pencil A(s) to improve the estimate
without changing δ
k
(A) .
The present algorithm performs this matrix modification by an equivalence transformation with constant matrices, whereas
the previous one uses biproper rational function matrices. Thus the present approach saves memory space and reduces the running
time bound by a factor of rank A . 相似文献
18.
Chor Ping Low 《Information Processing Letters》2002,83(6):315-321
Random Duplicated Assignment (RDA) is an approach in which video data is stored by assigning a number of copies of each data block to different, randomly chosen disks. It has been shown that this approach results in smaller response times and lower disk and RAM costs compared to the well-known disk stripping techniques. Based on this storage approach, one has to determine, for each given batch of data blocks, from which disk each of the data blocks is to be retrieved. This is to be done in such a way that the maximum load of the disks is minimized. The problem is called the Retrieval Selection Problem (RSP). In this paper, we propose a new efficient algorithm for RSP. This algorithm is based on the breadth-first search approach and is able to guarantee optimal solutions for RSP in O(n2+mn), where m and n correspond to the number of data blocks and the number of disks, respectively. We will show that our proposed algorithm has a lower time complexity than an existing algorithm, called the MFS algorithm. 相似文献
19.
We propose a SAO index to approximately answer arbitrary linear optimization queries in a sliding window of a data stream.
It uses limited memory to maintain the most “important” tuples. At any time, for any linear optimization query, we can retrieve
the approximate top-K tuples in the sliding window almost instantly. The larger the amount of available memory, the better the quality of the answers
is. More importantly, for a given amount of memory, the quality of the answers can be further improved by dynamically allocating
a larger portion of the memory to the outer layers of the SAO index.
相似文献
Philip S. YuEmail: |
20.
Abstract. We show that the 2-Opt and 3-Opt heuristics for the traveling salesman problem (TSP) on the complete graph K
n
produce a solution no worse than the average cost of a tour in K
n
in a polynomial number of iterations. As a consequence, we get that the domination numbers of the 2- Opt , 3- Opt , Carlier—Villon,
Shortest Path Ejection Chain, and Lin—Kernighan heuristics are all at least (n-2)! / 2 . The domination number of the Christofides heuristic is shown to be no more than
, and for the Double Tree heuristic and a variation of the Christofides heuristic the domination numbers are shown to be
one (even if the edge costs satisfy the triangle inequality). Further, unless P = NP, no polynomial time approximation algorithm
exists for the TSP on the complete digraph
with domination number at least (n-1)!-k for any constant k or with domination number at least (n-1)! - (( k /(k+1))(n+r))!-1 for any non-negative constants r and k such that (n+r)
0 mod (k+1). The complexities of finding the median value of costs of all the tours in
and of similar problems are also studied. 相似文献