共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory
of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation.
In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited
success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge
by the ACM Working Group on Storage I/ O for Large-Scale Computing.
In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors
on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine.
When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献
2.
Hash tables on external memory are commonly used for indexing in database management systems. In this paper we present an
algorithm that, in an asymptotic sense, achieves the best possible I/O and space complexities. Let B denote the number of records that fit in a block, and let N denote the total number of records. Our hash table uses
I/Os, expected, for looking up a record (no matter if it is present or not). To insert, delete or change a record that has
just been looked up requires
I/Os, amortized expected, including I/Os for reorganizing the hash table when the size of the database changes. The expected
external space usage is
times the optimum of N/B blocks, and just O(1) blocks of internal memory are needed. 相似文献
3.
We analyze the performance of simple algorithms for matching two planar point sets under rigid transformations so as to minimize
the directed Hausdorff distance between the sets. This is a well studied problem in computational geometry. Goodrich, Mitchell,
and Orletsky presented a very simple approximation algorithm for this problem, which computes transformations based on aligning
pairs of points. They showed that their algorithm achieves an approximation ratio of 4. We introduce a modification to their
algorithm, which is based on aligning midpoints rather than endpoints. This modification has the same simplicity and running
time as theirs, and we show that it achieves a better approximation ratio of roughly 3.14. We also analyze the approximation
ratio in terms of a instance-specific parameter that is based on the ratio between diameter of the pattern set to the optimum
Hausdorff distance. We show that as this ratio increases (as is common in practical applications) the approximation ratio
approaches 3 in the limit. We also investigate the performance of the algorithm by Goodrich et al. as a function of this ratio,
and present nearly matching lower bounds on the approximation ratios of both algorithms.
This work was supported by the National Science Foundation under grants CCR-0098151 and CCF-0635099. 相似文献
4.
Point location is an extremely well-studied problem both in internal memory models and recently also in the external memory model. In this paper, we present an I/O-efficient dynamic data structure for point location in general planar subdivisions. Our structure uses linear space to store a subdivision with N segments. Insertions and deletions of segments can be performed in amortized O(log? B N) I/Os and queries can be answered in $O(\log_{B}^{2} N)$ I/Os in the worst-case. The previous best known linear space dynamic structure also answers queries in $O(\log_{B}^{2} N)$ I/Os, but only supports insertions in amortized $O(\log_{B}^{2} N)$ I/Os. Our structure is also considerably simpler than previous structures. 相似文献
5.
The segment tree is a well-known internal data structure with numerous applications in computational geometry. It allows the dynamical maintenance of a set of intervals such that the intervals enclosing a query point can be found efficiently (point enclosure search).In this paper we transfer the underlying principle of the segment tree in a nontrivial way to secondary storage and arrive at the EST-an external file structure with the same functionality and the following properties: (1) Point enclosure searches are very efficient—only very few pages are accessed that are not filled to more than 50% with result intervals. (2) A page filling of 50% is guaranteed—on the average it will be around 70%. Although the segment tree represents, in the worst case, each interval by a logarithmic number offragments, in practical cases fragmentation remains low and the storage requirements about linear. (3) The EST is balanced and the update algorithms are efficient. (4) Unlike many other file structures for spatial objects the EST has no problems with an arbitrarydensity, that is, an arbitrarily large number of intervals covering any point of the line.Furthermore, the EST can be used as a file structureconstructor in the following sense: Let there be a file structureX supporting searches for objects with propertyx and suppose it is necessary to maintain a collection of objects with associated (e.g., time) intervals. Then an EST-X structure that supports searches for objects with propertyx present at timet can be built. This suggests using the EST as a building block in the implementation of temporal database systems. Other applications include the one-dimensional indexing of collections of spatial objects in two or more dimensions.More generally, this paper shows techniques for mapping internal tree structures with node lists (other examples: range tree, interval tree) to secondary memory. In this context an intriguing theoretical problem, thecover-balancing problem, is solved: Given a tree whose nodes have associatedweights partitioned into subtrees whose weights must lie in a certain range, maintain this partition under weight changes at arbitrary nodes. This is in contrast to classical balancing problems where updates occur only at the leaves.This work was supported by the DFG (Deutsche Forschungsgemeinschaft) under Grant Cr 65/2-5. 相似文献
6.
An optimal prefetching and I/O scheduling algorithm L-OPT, for parallel I/O systems, using a read-once model of block references
is presented. The algorithm uses knowledge of the next $L$ references, $L$-block lookahead, to create a minimal-length I/O
schedule. For a system with $D$ disks and a buffer of capacity $m$ blocks, we show that the competitive ratio of
L-OPT is $\Theta(\sqrt{mD/L})$ when $L \geq m$, which matches the lower bound of any prefetching algorithm with $L$-block
lookahead. Tight bounds for the remaining ranges of lookahead are also presented. In addition we show that L-OPT is the optimal
offline algorithm: when the lookahead consists of the entire reference string, it performs the absolute minimum possible number
of I/Os. Finally, we show that L-OPT is comparable with the best online algorithm with the same amount of lookahead; the
ratio of the length of its schedule to the length of the optimal schedule is always within a constant factor. 相似文献
7.
Abstract. We present a new scheme for storing a planar graph in external memory so that any online path can be traversed in an I-O efficient way. Our storage scheme significantly improves the previous results for planar graphs with bounded face size. We also prove an upper bound on I-O efficiency of any storage scheme for well-shaped triangulated meshes. For these meshes, our storage scheme achieves optimal performance. 相似文献
8.
In this paper, we study the packet classification problem and the filter conflict resolution problem, both motivated by applications
in the routing of packets in an IP network. For the first problem, we focus on the static 2-dimensional conflict-free (i.e.,
nested) filters. We design a linear space data structure with O(T
w
(n)+(log log n)2) query time on a RAM with word size O(w) bits where n is the number of filters in the router, w is the number of bits in an IP address and
This is the first optimal space data structure with poly-logarithmic query time for this problem.
In practice, network filters often contain very few conflicts but are not completely conflict-free. Fortunately, conflicts
can be resolved by adding conflict-resolving filters. Moreover, practical filters often possess another slightly different
nesting property which we called 1-nestedness. We present an algorithm to resolve conflicts in a set of 1-nested filters in O(nT
w
(n)+k) time and linear space, where k is the number of filter pairs in conflict. Furthermore, we show that our data structure for the first problem can be adapted
to apply on conflict-resolved 1-nested filters with the same query and space complexities.
This research was fully supported by a grant from the Research Grants Council of the Hong Kong SAR, China (City U 1164/04E).
A preliminary version appeared in ISPAN’04. 相似文献
9.
Abstract. We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer
is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory.
In the external memory model of computation with block size B , a memory consisting of m blocks, and at most m sequences the problem is trivially solved with N/B memory misses by reserving one block of memory for each sequence.
However, in a cache memory with associativity a , every access may lead to a cache fault if k > a . For a direct mapped cache (a = 1 ) two sequences suffice. We show that by randomizing the starting addresses of the sequences the number of cache misses can
be kept to O(N/B) provided that k = O(m/B
1/a
) , i.e., the number of sequences that can be supported is decreased by a factor B
1/a
. We also show a corresponding lower bound.
Our result leads to a general method for converting sequence based algorithms designed for the external memory model of computation
to cache memory even for caches with small associativity. 相似文献
10.
Abstract. We present a new approach for designing external graph algorithms and use it to design simple, deterministic and randomized external algorithms for computing connected components, minimum spanning forests, bottleneck minimum spanning forests, maximal independent sets (randomized only), and maximal matchings in undirected graphs. Our I/ O bounds compete with those of previous approaches. We also introduce a semi-external model, in which the vertex set but not the edge set of a graph fits in main memory. In this model we give an improved connected components algorithm, using new results for external grouping and sorting with duplicates. Unlike previous approaches, ours is purely functional—without side effects—and is thus amenable to standard checkpointing and programming language optimization techniques. This is an important practical consideration for applications that may take hours to run. 相似文献
11.
Jop F. Sibeyn 《Information Processing Letters》2004,91(2):99-106
Algorithms are presented for external matrix multiplication and for all-pairs shortest path computation. In comparison with earlier algorithms, the amount of I/O is reduced by a constant factor. The all-pairs shortest path algorithm even performs fewer internal operations, making the algorithm practically interesting. 相似文献
12.
平面多边形内外点判定算法评估 总被引:1,自引:0,他引:1
以前的算法评估主要是基于“时间复杂度”和“空间复杂度”进行分析的,评估结果往往是一个含有多个参数的代数式。随着计算机软硬件技术的发展,算法评估指标也应该相应发展或创新。同时,随着评估技术的发展,算法评估应尽量给出一个明确的定量评估值。提出了包含便捷性、实用性、快速性、适用性、复杂性、正确性六个因素的一套算法评估指标体系,解释了每个指标的含义以及定量化表述方法。以平面多边形内外点的判定问题为背景,对于其中7个有代表性的算法,依据前面提及的评价指标体系进行了定量化的评估。数据实例显示,提出的方法是合理的、正确的、可行的。 相似文献
13.
On the false-positive rate of Bloom filters 总被引:1,自引:0,他引:1
Prosenjit Bose Hua Guo Anil Maheshwari Jason Morrison Yihui Tang 《Information Processing Letters》2008,108(4):210-213
Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the false-positive rate, and Bloom's analysis has appeared in many publications throughout the years. We show that Bloom's analysis is incorrect and give a correct analysis. 相似文献
14.
Alon Itai 《Information Processing Letters》2007,104(6):200-204
The Sparse Table is a data structure for controlling density in an array which was first proposed in 1981 and has recently reappeared as a component of cache-oblivious data structures. All existing variants of the Sparse Table divide the array into blocks that have a calibrator tree above them. We show that the same amortized complexity can be achieved without this auxiliary structure, obtaining a canonical data structure that can be updated by conceptually simpler algorithms. 相似文献
15.
We present an algorithm that takes
I/Os (sort(N)=Θ((N/(DB))log
M/B
(N/B)) is the number of I/Os it takes to sort N data items) to compute a tree decomposition of width at most k, for any graph G of treewidth at most k and size N, where k is a constant. Given such a tree decomposition, we use a dynamic programming framework to solve a wide variety of problems
on G in
I/Os, including the single-source shortest path problem and a number of problems that are NP-hard on general graphs. The tree
decomposition can also be used to obtain an optimal separator decomposition of G. We use such a decomposition to perform depth-first search in G in
I/Os.
As important tools that are used in the tree decomposition algorithm, we introduce flippable DAGs and present an algorithm that computes a perfect elimination ordering of a k-tree in
I/Os.
The second contribution of our paper, which is of independent interest, is a general and simple framework for obtaining I/O-efficient
algorithms for a number of graph problems that can be solved using greedy algorithms in internal memory. We apply this framework
in order to obtain an improved algorithm for finding a maximal matching and the first deterministic I/O-efficient algorithm
for finding a maximal independent set of an arbitrary graph. Both algorithms take
I/Os. The maximal matching algorithm is used in the tree decomposition algorithm.
An abstract of this paper was presented at the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, Proceedings, pp. 89–90,
2001.
Research of A. Maheshwari supported by NSERC.
Part of this work was done while the second author was a Ph.D. student at the School of Computer Science of Carleton University. 相似文献
16.
Lars Arge 《Algorithmica》2008,37(1):1-24
We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures. 相似文献
17.
实时数据库系统(RTDBS)的高性能要求以内存数据库(MMDB)作为其底层支持,为了保证MMDB在事务的执行过程中没有内外存的I/O,必须设计合理的存贮结构,提供合适的数据安置策略,给出合理的内外存交换算法。本文给出了适合实时应用要求的四层存贮体系结构,详细讨论了实时应用环境下的数据及事务特征,然后针对它们的特征,给出了一个内外存交换算法。 相似文献
18.
邻接矩阵算法在科学计算与信息处理方面有着极为重要的应用,是图论的基础研究之一。针对目前邻接矩阵算法多是基于串行,或并行SIMD模型而无法解决存储冲突的问题,提出一种基于SIMD—EREW共享存储模型的并行邻接矩阵算法,算法使用O(p)个并行处理单元,在O(n^2/p)的时间内完成对n个数据点邻接矩阵的计算。将提出算法与现有算法进行的性能对比分析表明:本算法明显改进了现有文献的研究结果,是一种并行无存储冲突的邻接矩阵算法。 相似文献
19.
Flash memory efficient LTL model checking 总被引:1,自引:0,他引:1
S. EdelkampD. Sulewski J. BarnatL. Brim P. Šime?ek 《Science of Computer Programming》2011,76(2):136-157
As the capacity and speed of flash memories in form of solid state disks grow, they are becoming a practical alternative for standard magnetic drives. Currently, most solid-state disks are based on NAND technology and much faster than magnetic disks in random reads, while in random writes they are generally not.So far, large-scale LTL model checking algorithms have been designed to employ external memory optimized for magnetic disks. We propose algorithms optimized for flash memory access. In contrast to approaches relying on the delayed detection of duplicate states, in this work, we design and exploit appropriate hash functions to re-invent immediate duplicate detection.For flash memory efficient on-the-fly LTL model checking, which aims at finding any counter-example to the specified LTL property, we study hash functions adapted to the two-level hierarchy of RAM and flash memory. For flash memory efficient off-line LTL model checking, which aims at generating a minimal counterexample and scans the entire state space at least once, we analyze the effect of outsourcing a memory-based perfect hash function from RAM to flash memory.Since the characteristics of flash memories are different to magnetic hard disks, the existing I/O complexity model is no longer sufficient. Therefore, we provide an extended model for the computation of the I/O complexity adapted to flash memories that has a better fit to the observed behavior of our algorithms. 相似文献
20.
Taehyung Lee 《Information Processing Letters》2011,111(5):201-207
We consider on-line construction of the suffix tree for a parameterized string, where we always have the suffix tree of the input string read so far. This situation often arises from source code management systems where, for example, a source code repository is gradually increasing in its size as users commit new codes into the repository day by day. We present an on-line algorithm which constructs a parameterized suffix tree in randomized O(n) time, where n is the length of the input string. Our algorithm is the first randomized linear time algorithm for the on-line construction problem. 相似文献