共查询到20条相似文献,搜索用时 78 毫秒
1.
Abstract. We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and
a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known
internal memory algorithms—given the developed external data structures. 相似文献
2.
We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures. 相似文献
3.
Lars Arge 《Algorithmica》2003,37(1):1-24
We present a technique for designing external memory data structures that support batched operations I/ O efficiently. We show how the technique can be used to develop external versions of a search tree, a priority queue, and a segment tree, and give examples of how these structures can be used to develop I/ O-efficient algorithms. The developed algorithms are either extremely simple or straightforward generalizations of known internal memory algorithms—given the developed external data structures. 相似文献
4.
Abstract. High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external
memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/ O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for the efficient adaptation
of single-disk external memory algorithms to multiple disks. We solve this problem for arbitrary access patterns by randomly
mapping blocks of a logical address space to the disks.
We show that a shared buffer of O
(D) blocks suffices to support efficient writing. The analysis uses the properties of negative association to handle dependencies
between the random variables involved. This approach might be of independent interest for probabilistic analysis in general.
If two randomly allocated copies of each block exist, N arbitrary blocks can be read within
I/ O steps with high probability. The redundancy can be further reduced from 2 to 1+1/r for any integer r without a big impact on reading efficiency. From the point of view of external memory models, these results rehabilitate
Aggarwal and Vitter's ``single-disk multi-head' model [1] that allows access to D arbitrary blocks in each I/ O step. This powerful model can be emulated on the physically more realistic independent disk model [2] with small constant
overhead factors. Parallel disk external memory algorithms can therefore be developed in the multi-head model first. The emulation
result can then be applied directly or further refinements can be added. 相似文献
5.
Abstract. Blockwise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/ O. In this paper we present a simple, deterministic simulation technique which transforms certain Bulk Synchronous Parallel (BSP) algorithms into efficient parallel EM algorithms.
It optimizes blockwise data access and parallel disk I/ O and, at the same time, utilizes multiple processors connected via a communication network or shared memory. We obtain new improved parallel EM algorithms for a large number
of problems including sorting, permutation, matrix transpose, several geometric and GIS problems including three-dimensional
convex hulls (two-dimensional Voronoi diagrams), and various graph problems. We show that certain parallel algorithms known
for the BSP model can be used to obtain EM algorithms that meet well known I /O complexity lower bounds for various problems, including sorting. 相似文献
6.
External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer
is only a small fraction of the problem size. Blockwise access to data is a central theme in the design of efficient EM algorithms.
A similar requirement arises in the transmission of data between processors in high performance parallel computation systems,
for which blockwise communication is a crucial issue.
{We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data
structures on parallel computers. Our examples originate as algorithms for parallel machines, and we adapt them to the EM
situation where the queries and data structure are considered to be much larger than the size of the available internal memory.
}
This paper presents techniques to achieve blocking for I/ O as well as for communication in multisearch on the BSP and EM-BSP models. We describe improvements to the 1-optimal BSP* multisearch algorithm of [8] which permit larger search trees to be handled.
In the area of EM algorithms new algorithms for multisearch in balanced trees are described. For search trees of size O(nlog n) where n is the number of queries, we obtain a work-optimal, parallel, randomized EM multisearch algorithm whose I/ O and communication time are smaller, asymptotically, than the computation time. We obtain a deterministic version via a
similar technique. These algorithms are obtained via the simulation techniques of [12], [17], [13], [14], and [24].
For larger trees we describe a parallel, EM algorithm which is 1-optimal considering computation, communication, and I/ O (for suitable parameter constellations) plus I/ O-optimal. We give a lower bound to the number of I/ O operations required for filtering n queries through a binary or multiway search tree of size m when m\geq n
2+ε
, for a constant ε>0 .
Online publication February 20, 2001. 相似文献
7.
In this paper, we study the problem of implementing standard data structures on a hypercube multiprocessor. We present a technique for efficiently executing multiple independent search processes on a class of graphs called ordered h-level graphs. We show how this technique can be utilized to implement a segment tree on a hypercube, thereby obtaining O(long2n) time algorithms for solving the next element search problem, the trapezoidal composition problem, and the triangulation problem. 相似文献
8.
Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory
of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation.
In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited
success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge
by the ACM Working Group on Storage I/ O for Large-Scale Computing.
In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors
on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine.
When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献
9.
The block-cyclic data distribution is commonly used to organize array elements over the processors of a coarse-grained distributed
memory parallel computer. In many scientific applications, the data layout must be reorganized at run-time in order to enhance
locality and reduce remote memory access overheads. In this paper we present a general framework for developing array redistribution
algorithms. Using this framework, we have developed efficient algorithms that redistribute an array from one block-cyclic
layout to another.
Block-cyclic redistribution consists of index
set
computation , wherein the destination locations for individual data blocks are calculated, and data
communication , wherein these blocks are exchanged between processors. The framework treats both these operations in a uniform and integrated
way. We have developed efficient and distributed algorithms for index set computation that do not require any interprocessor
communication. To perform data communication in a conflict-free manner, we have developed direct
indirect
and
hybrid algorithms. In the direct algorithm, a data block is transferred directly to its destination processor. In an indirect algorithm,
data blocks are moved from source to destination processors through intermediate relay processors. The hybrid algorithm is
a combination of the direct and indirect algorithms.
Our framework is based on a generalized circulant matrix formalism of the redistribution problem and a general purpose distributed
memory model of the parallel machine. Our algorithms sustain excellent performance over a wide range of problem and machine
parameters. We have implemented our algorithms using MPI, to allow for easy portability across different HPC platforms. Experimental
results on the IBM SP-2 and the Cray T3D show superior performance over previous approaches. When the block size of the cyclic
data layout changes by a factor of K , the redistribution can be performed in O( log K) communication steps. This is true even when K is a prime number. In contrast, previous approaches take O(K) communication steps for redistribution.
Our framework can be used for developing scalable redistribution libraries, for efficiently implementing parallelizing compiler
directives, and for developing parallel algorithms for various applications. Redistribution algorithms are especially useful
in signal processing applications, where the data access patterns change significantly between computational phases. They
are also necessary in linear algebra programs, to perform matrix transpose operations.
Received June 1, 1997; revised March 10, 1998. 相似文献
10.
Abstract. Processor speed and memory capacity are increasing several times faster than disk speed. This disparity suggests that disk
I/ O performance could become an important bottleneck. Methods are needed for using disks more efficiently. Past analysis of
disk scheduling algorithms has largely been experimental and little attempt has been made to develop algorithms with provable
performance guarantees.
We consider the following disk scheduling problem. Given a set of requests on a computer disk and a convex reachability function
that determines how fast the disk head travels between tracks, our goal is to schedule the disk head so that it services all
the requests in the shortest time possible. We present a 3/2 -approximation algorithm (with a constant additive term). For the special case in which the reachability function is linear
we present an optimal polynomial-time solution.
The disk scheduling problem is related to the special case of the Asymmetric Traveling Salesman Problem with the triangle
inequality (ATSP-Δ ) in which all distances are either 0 or some constant α . We show how to find the optimal tour in polynomial time and describe how this gives another approximation algorithm for
the disk scheduling problem. Finally we consider the on-line version of the problem in which uniformly distributed requests
arrive over time. We present an algorithm related to the above ATSP-Δ . 相似文献
11.
Efficient Bulk Operations on Dynamic R-Trees 总被引:8,自引:0,他引:8
In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently
massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading ) has been studied intensively in the database community. The continuous arrival of massive amounts of new data makes it
important to update existing indexes (bulk updating ) efficiently.
In this paper we present a simple, yet efficient, technique for performing bulk update and query operations on multidimensional
indexes. We present our technique in terms of the so-called R-tree and its variants, as they have emerged as practically efficient
indexing methods for spatial data. Our method uses ideas from the buffer tree lazy buffering technique and fully utilizes the available internal memory and the page size of the operating system. We
give a theoretical analysis of our technique, showing that it is efficient both in terms of I/ O communication, disk storage, and internal computation time. We also present the results of an extensive set of experiments
showing that in practice our approach performs better than the previously best known bulk update methods with respect to update
time, and that it produces a better quality index in terms of query performance. One important novel feature of our technique
is that in most cases it allows us to perform a batch of updates and queries simultaneously. To be able to do so is essential
in environments where queries have to be answered even while the index is being updated and reorganized.
Received November 15, 1998; revised May 3, 2001. 相似文献
12.
Ahmed Bouajjani Tayssir Touili 《International Journal on Software Tools for Technology Transfer (STTT)》2012,14(2):145-165
We consider the framework of regular tree model checking where sets of configurations of a system are represented by regular tree languages and its dynamics is modeled by a term
rewriting system (or a regular tree transducer). We focus on the computation of the reachability set R*(L) where R is a regular tree transducer and L is a regular tree language. The construction of this set is not possible in general. Therefore, we present a general acceleration
technique, called regular tree widening which allows to speed up the convergence of iterative fixpoint computations in regular tree model checking. This technique
can be applied uniformly to various kinds of transformations. We show the application of our framework to different analysis
contexts: verification of parameterized tree networks and data-flow analysis of multithreaded programs. Parametrized networks
are modeled by relabeling tree transducers, and multithreaded programs are modeled by term rewriting rules encoding transformations
on control structures. We prove that our widening technique can emulate many existing algorithms for special classes of transformations
and we show that it can deal with transformations beyond the scope of these algorithms. 相似文献
13.
Abstract. We present a new approach for designing external graph algorithms and use it to design simple, deterministic and randomized
external algorithms for computing connected components, minimum spanning forests, bottleneck minimum spanning forests, maximal
independent sets (randomized only), and maximal matchings in undirected graphs. Our I/ O bounds compete with those of previous approaches. We also introduce a semi-external model, in which the vertex set but
not the edge set of a graph fits in main memory. In this model we give an improved connected components algorithm, using new
results for external grouping and sorting with duplicates. Unlike previous approaches, ours is purely functional—without side
effects—and is thus amenable to standard checkpointing and programming language optimization techniques. This is an important
practical consideration for applications that may take hours to run. 相似文献
14.
Photorealistic image synthesis is a computationally demanding task that relies on ray tracing for the evaluation of integrals. Rendering time is dominated by tracing long paths that are very incoherent by construction. We therefore investigate the use of SIMD instructions to accelerate incoherent rays. SIMD is used in the hierarchy construction, the tree traversal and the leaf intersection. This is achieved by increasing the arity of acceleration structures, which also reduces memory requirements. We show that the resulting hierarchies can be built quickly and are smaller than acceleration structures known so far while at the same time outperforming them for incoherent rays. Our new acceleration structure speeds up ray tracing by a factor of 1.6 to 2.0 compared to a highly optimized bounding interval hierarchy implementation, and 1.3 to 1.6 compared to an efficient kd‐tree. At the same time, the memory requirements are reduced by 10–50%. Additionally we show how a caching mechanism in conjunction with this memory efficient hierarchy can be used to speed up shadow rays in a global illumination algorithm without increasing the memory footprint. This optimization decreased the number of traversal steps up to 50%. 相似文献
15.
As trees are used in a wide variety of application areas, the comparison of trees arises in many guises. Here we consider
two generalizations of classical tree pattern matching, which consists of determining if one tree is isomorphic to a subgraph
of another. For the embedding problems of subgraph isomorphism and topological embedding, we present algorithms for determining
a largest tree embeddable in two trees T and T' (or a largest subtree) and a smallest tree in which each of T and T' can be embedded (or a smallest supertree). Both subtrees and supertrees can be used in a variety of different applications. For example, when each of the two trees
contains partial information about a data set, such as the evolution of a set of species, the subtree or supertree corresponds
to a structuring of the data in a manner consistent with both original trees. The size of a subtree or supertree of two trees
can also be used to measure the similarity between two arrangements of data, whether images, documents, or RNA secondary structures.
In this paper we present a general paradigm for sequential and parallel subtree and supertree algorithms for subgraph isomorphism
and topological embedding. Our sequential algorithms run in time O(n
2.5
log n) and our parallel algorithms in time O(log
3
n) on a randomized crew pram using a polynomial number of processors. In addition, we produce better algorithms for these problems when the underlying
trees are ordered, that is, when the children of each node have a left-to-right ordering associated with them. In particular,
we obtain O(n
2
) -time sequential algorithms and O(log
3
n) -time deterministic parallel algorithms on crew prams for both embeddings.
Received July 17, 1995; revised May 25, 1996, and December 10, 1996. 相似文献
16.
Systematically we study data structures used to implement the algorithms of association rule mining, including hash tree, itemset tree, and FP-tree (frequent pattern tree). Further, we present a generalized FP-tree in an applied context. This assists in better understanding existing association-rule-mining strategies. In addition, we discuss and analyze experimentally the generalized k-FP-tree, and demonstrate that the generalized FP-tree reduces the computation costs significantly. This study will be useful to many association analysis tasks where one must provide really interesting rules and develop efficient algorithms for identifying association rules. 相似文献
17.
Abstract. We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer
is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory.
In the external memory model of computation with block size B , a memory consisting of m blocks, and at most m sequences the problem is trivially solved with N/B memory misses by reserving one block of memory for each sequence.
However, in a cache memory with associativity a , every access may lead to a cache fault if k > a . For a direct mapped cache (a = 1 ) two sequences suffice. We show that by randomizing the starting addresses of the sequences the number of cache misses can
be kept to O(N/B) provided that k = O(m/B
1/a
) , i.e., the number of sequences that can be supported is decreased by a factor B
1/a
. We also show a corresponding lower bound.
Our result leads to a general method for converting sequence based algorithms designed for the external memory model of computation
to cache memory even for caches with small associativity. 相似文献
18.
Jeffrey Xu Yu Zhiheng Li Guimei Liu 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(4):947-970
Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the
efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs
and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods
for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper,
we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct
is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns,
because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently.
We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from
on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed. 相似文献
19.
In this paper, we address the scalability problem of periodicity detection for time series and sequence databases. We present time and space efficient periodicity detection method that efficiently uses external memory (disk) when the series cannot be processed inside the available main memory. Our approach uses suffix tree to facilitate periodicity detection. We consider two cases, namely in-core and out of core. First, we optimize storage requirements of the suffix tree to be able to fit larger suffix trees in-core. This guarantees the ability to mine larger sequences when everything can be kept in-core, which is what the current periodicity detection approaches are able to mine. Second, when the data structures go out of core, we extend the suffix tree construction part to use external memory. We are able to achieve this while maintaining linear time complexity. As a result, when we go out of core, we can mine databases that require considerably larger space to keep the data structures compared to the available main memory. For the out-of-core periodicity detection part, the proposed method allows the required data structures to be kept on external memory partially when a memory overflow situation occurs. Various pruning strategies are also proposed to allow the proposed approach to process large sequences within reasonable amount of time. Additionally, we introduced the notion of “emulated tree traversal” for fast suffix tree traversal. Due to all these special considerations, we are able to mine much larger sequences compared to other existing periodicity detection algorithms. To demonstrate the applicability, power, and effectiveness of the proposed framework, we present results of periodicity detection up to 500 MB of time sequence data, which (to the best of our knowledge) is the largest reported sequence mined for periodicity detection ever. 相似文献
20.
Parallelizing the Data Cube 总被引:1,自引:0,他引:1
Frank Dehne Todd Eavis Susanne Hambrusch Andrew Rau-Chaplin 《Distributed and Parallel Databases》2002,11(2):181-201
This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p. 相似文献