共查询到20条相似文献,搜索用时 15 毫秒
1.
Speeding up construction of PMR quadtree-based spatial indexes 总被引:5,自引:0,他引:5
Gisli R. Hjaltason Hanan Samet 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):109-137
Spatial indexes, such as those based on the quadtree, are important in spatial databases for efficient execution of queries
involving spatial constraints, especially when the queries involve spatial joins. In this paper we present a number of techniques
for speeding up the construction of quadtree-based spatial indexes, specifically the PMR quadtree, which can index arbitrary
spatial data. We assume a quadtree implementation using the “linear quadtree”, a disk-resident representation that stores
objects contained in the leaf nodes of the quadtree in a linear index (e.g., a B-tree) ordered based on a space-filling curve.
We present two complementary techniques: an improved insertion algorithm and a bulk-loading method. The bulk-loading method
can be extended to handle bulk-insertions into an existing PMR quadtree. We make some analytical observations about the I/O
cost and CPU cost of our PMR quadtree bulk-loading algorithm, and conduct an extensive empirical study of the techniques presented
in the paper. Our techniques are found to yield significant speedup compared to traditional quadtree building methods, even
when the size of a main memory buffer is very small compared to the size of the resulting quadtrees.
Edited by R. Sacks-Davis. Received: July 10, 2001 / Accepted: March 25, 2002 Published online: September 25, 2002 相似文献
2.
Simonas Šaltenis Christian S. Jensen 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):1-16
Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases
that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by
the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are
needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions
to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On
the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the
first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of
an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion
of the current time, now, is accommodated for both temporal dimensions. The index is based on the R-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive
queries. Performance experiments are reported that evaluate pertinent aspects of the index.
Edited by T. Sellis. Received: 7 December 2000 / Accepted: 1 September 2001 Published online: 18 December 2001 相似文献
3.
Peter Muth Patrick O'Neil Achim Pick Gerhard Weikum 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):199-221
Numerous applications such as stock market or medical information systems require that both historical and current data be
logically integrated into a temporal database. The underlying access method must support different forms of “time-travel”
queries, the migration of old record versions onto inexpensive archive media, and high insertion and update rates. This paper
presents an access method for transaction-time temporal data, called the log-structured history data access method (LHAM)
that meets these demands. The basic principle of LHAM is to partition the data into successive components based on the timestamps
of the record versions. Components are assigned to different levels of a storage hierarchy, and incoming data is continuously
migrated through the hierarchy. The paper discusses the LHAM concepts, including concurrency control and recovery, our full-fledged
LHAM implementation, and experimental performance results based on this implementation. A detailed comparison with the TSB-tree,
both analytically and based on experiments with real implementations, shows that LHAM is highly superior in terms of insert
performance, while query performance is in almost all cases at least as good as for the TSB-tree; in many cases it is much
better.
Received March 4, 1999 / Accepted September 28, 1999 相似文献
4.
Two methods for cloud visualisation from weather simulation data 总被引:2,自引:0,他引:2
Andrzej Trembilski 《The Visual computer》2001,17(3):179-184
For the local TV presentation of weather forecast data, it is important to have high-quality and fast visualisation of clouds. In this paper, I present methods for the visualisation of clouds from data produced by a meteorological weather simulation. Isosurfaces, which are originally too coarse because of the data grid resolution, are refined and deformed. The resulting geometry is used for cloud visualisation. 相似文献
5.
Searching in metric spaces by spatial approximation 总被引:5,自引:0,他引:5
Gonzalo Navarro 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):28-46
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those
objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data
structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer
to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that
the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most
important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not
need to tune parameters, which makes it appealing for use by non-experts.
Edited by R. Sacks-Davis Received: 17 April 2001 / Accepted: 24 January 2002 / Published online: 14 May 2002 相似文献
6.
Integrated spatial and feature image query 总被引:3,自引:0,他引:3
We present a new system for querying for images by regions and their spatial and feature attributes. The system enables the
user to find the images that contain arrangements of regions similar to those diagrammed in a query image. By indexing the
attributes of regions, such as sizes, locations and visual features, a wide variety of complex joint spatial and feature queries
are efficiently computed. In order to demonstrate the utility of the system, we develop a process for the extracting color
regions from photographic images. We demonstrate that integrated spatial and feature querying using color regions improves
image search capabilities over non-spatial content-based image retrieval methods. 相似文献
7.
8.
Stefan Manegold Peter A. Boncz Martin L. Kersten 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):231-246
In the past decade, advances in the speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access
is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article,
we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines
for database architecture, in terms of both data structures and algorithms. We discuss how vertically fragmented data structures
optimize cache performance on sequential data access. We then focus on equi-join, typically a random-access operation, and
introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical
model that incorporates memory access cost. Experiments that validate this model were performed on the Monet database system.
We obtained exact statistics on events such as TLB misses and L1 and L2 cache misses by using hardware performance counters
found in modern CPUs. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms
makes them perform well, which is confirmed by experimental results.
Received April 20, 2000 / Accepted June 23, 2000 相似文献
9.
Efficient similarity search for market basket data 总被引:2,自引:0,他引:2
Alexandros Nanopoulos Yannis Manolopoulos 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):138-152
Several organizations have developed very large market basket databases for the maintenance of customer transactions. New
applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket
databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation
method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the
processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of
the proposed scheme over the state-of-the-art method.
Edited by R. Ng. Received: August 6, 2001 / Accepted: May 21, 2002 Published online: September 25, 2002 相似文献
10.
UnQL: a query language and algebra for semistructured data based on structural recursion 总被引:5,自引:0,他引:5
Peter Buneman Mary Fernandez Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):76-110
Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data
and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using
structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees.
On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data
top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel
using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational
queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed
as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal
properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its
data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying
data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with
respect to value equality.
Received: July 9, 1999 / Accepted: December 24, 1999 相似文献
11.
Chiang Lee Chi-Sheng Shih Yaw-Huei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):327-343
Traditional algorithms for optimizing the execution order of joins are no more valid when selections and projections involve
methods and become very expensive operations. Selections and projections could be even more costly than joins such that they
are pulled above joins, rather than pushed down in a query tree. In this paper, we take a fundamental look at how to approach
query optimization from a top-down design perspective, rather than trying to force one model to fit into another. We present
a graph model which is designed to characterize execution plans. Each edge and each vertex of the graph is assigned a weight
to model execution plans. We also design algorithms that use these weights to optimize the execution order of operations.
A cost model of these algorithms is developed. Experiments are conducted on the basis of this cost model. The results show
that our algorithms are superior to similar work proposed in the literature.
Received 20 April 1999 / Accepted 9 August 2000 Published online 20 April 2001 相似文献
12.
We present a high-performance solution to the I/O retrieval problem in a distributed multimedia system. Parallelism of data
retrieval is achieved by striping the data across multiple disks. We identify the components that contribute to media data-retrieval
delay. The variable delays among these have a great bearing on the server throughput under varying load conditions. We present
a buffering scheme to minimize these variations. We have implemented our model on the Intel Paragon parallel computer. The
results of component-wise instrumentation of the server operation are presented and analyzed. Experimental results that demonstrate
the efficacy of the buffering scheme are presented. Based on our experiments, a dynamic admission-control policy that takes
server workloads into account is proposed. 相似文献
13.
Aya Soffer Hanan Samet 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(4):253-274
Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include
maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches
for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach
preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object
that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting
of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries
are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that
have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on
the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational
multimedia database management system so that symbolic images can be stored and retrieved based on their content are described.
Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented.
Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of).
Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs
are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion
time, storage space, retrieval accuracy, and retrieval time.
Received June 12, 1998 / Accepted October 13, 1998 相似文献
14.
Leo Ojala Nisse Husberg Teemu Tynjälä 《International Journal on Software Tools for Technology Transfer (STTT)》2001,3(4):382-393
A distributed dynamic channel allocation algorithm has been proposed in [11]. In this paper the algorithm is modelled using predicate/transition nets. The same model can be used for any number of cell and channel configurations. The Maria reachability analyser has been used to analyse the protocol for some configurations. These are deadlock-free and are shown to satisfy the requirement that the same channel is never allocated to two neighbouring cells. The suitability of high-level nets for the modelling and analysis of distributed algorithms is discussed. Published online: 24 August 2001 相似文献
15.
Yasushi Sakurai Masatoshi Yoshikawa Shunsuke Uemura Haruhiko Kojima 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):93-108
We propose a novel index structure, the A-tree (approximation tree), for similarity searches in high-dimensional data. The
basic idea of the A-tree is the introduction of virtual bounding rectangles (VBRs) which contain and approximate MBRs or data
objects. VBRs can be represented quite compactly and thus affect the tree configuration both quantitatively and qualitatively.
First, since tree nodes can contain a large number of VBR entries, fanout becomes large, which increases search speed. More
importantly, we have a free hand in arranging MBRs and VBRs in the tree nodes. Each A-tree node contains an MBR and its children
VBRs. Therefore, by fetching an A-tree node, we can obtain information on the exact position of a parent MBR and the approximate
position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the
A-tree outperforms the SR-tree and the VA-file in all dimensionalities up to 64 dimensions, which is the highest dimension
in our experiments. Additionally, we propose a cost model for the A-tree. We verify the validity of the cost model for synthetic
and real data sets.
Edited by T. Sellis. Received: December 8, 2000 / Accepted: March 20, 2002 Published online: September 25, 2002 相似文献
16.
David Gibson Jon Kleinberg Prabhakar Raghavan 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):222-236
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical
data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of
producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method
for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure
arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types
of non-linear dynamical systems.
Received February 15, 1999 / Accepted August 15, 1999 相似文献
17.
In video processing, a common first step is to segment the videos into physical units, generally called shots. A shot is a video segment that consists of one continuous action. In general, these physical units need to be clustered
to form more semantically significant units, such as scenes, sequences, programs, etc. This is the so-called story-based video
structuring. Automatic video structuring is of great importance for video browsing and retrieval. The shots or scenes are
usually described by one or several representative frames, called key-frames. Viewed from a higher level, key frames of some shots might be redundant in terms of semantics. In this paper, we propose
automatic solutions to the problems of: (i) video partitioning, (ii) key frame computing, (iii) key frame pruning. For the
first problem, an algorithm called “net comparison” is devised. It is accurate and fast because it uses both statistical and
spatial information in an image and does not have to process the entire image. For the last two problems, we develop an original
image similarity criterion, which considers both spatial layout and detail content in an image. For this purpose, coefficients
of wavelet decomposition are used to derive parameter vectors accounting for the above two aspects. The parameters exhibit
(quasi-) invariant properties, thus making the algorithm robust for many types of object/camera motions and scaling variances.
The novel “seek and spread” strategy used in key frame computing allows us to obtain a large representative range for the
key frames. Inter-shot redundancy of the key-frames is suppressed using the same image similarity measure. Experimental results
demonstrate the effectiveness and efficiency of our techniques. 相似文献
18.
Abstract. The paper proposes a new method for efficient triangulation of large, unordered sets of 3D points using a CAD model comprising NURBS entities. It is primarily aimed at engineering applications involving analysis and visualisation of measured data, such as inspection, where a model of the object in question is available. Registration of the data to the model is the necessary first step, enabling the triangulation to be efficiently performed in 2D, on the projections of the measured points onto the model entities. The derived connectivity is then applied to the original 3D data. Improvement of the generated 3D mesh is often necessary, involving mesh smoothing, constraint-based elimination of redundant triangles and merging of mesh patches. Examples involving random measurements on aerospace and automotive free-form components are presented. Received: 30 August 1999 / Accepted: 10 January 2000 相似文献
19.
The GMAP: a versatile tool for physical data independence 总被引:1,自引:0,他引:1
Odysseas G. Tsatalos Marvin H. Solomon Yannis E. Ioannidis 《The VLDB Journal The International Journal on Very Large Data Bases》1996,5(2):101-118
Physical data independence is touted as a central feature of modern
database systems. It allows users to frame queries in terms of the logical
structure of the data, letting a query processor automatically translate
them into optimal plans that access physical storage structures. Both
relational and object-oriented systems, however, force users to frame their
queries in terms of a logical schema that is directly tied to physical
structures. We present an approach that eliminates this dependence. All
storage structures are defined in a declarative language based on
relational algebra as functions of a logical schema. We present an
algorithm, integrated with a conventional query optimizer, that translates
queries over this logical schema into plans that access the storage
structures. We also show how to compile update requests into plans that
update all relevant storage structures consistently and optimally.
Finally, we report on experiments with a prototype implementation of our
approach that demonstrate how it allows storage structures to be tuned to
the expected or observed workload to achieve significantly better
performance than is possible with conventional techniques.
Edited by
Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received
September 15, 1994 / Accepted September 1, 1995 相似文献
20.
Y. Nakajima S. Mori S. Takegami S. Sato 《International Journal on Document Analysis and Recognition》1999,2(1):19-23
Two methods for stroke segmentation from a global point of view are presented and compared. One is based on thinning methods
and the other is based on contour curve fitting. For both cases an input image is binarized. For the former, Hilditch's method
is used, then crossing points are sought, around which a domain is constructed. Outside the domain, a set of line segments
are identified. These lines are connected and approximated by cubic B-spline curves. Smoothly connected lines are selected
as segmented curves. This method works well for a limited class of crossing lines, which are shown experimentally. In the
latter, a contour line is approximated by cubic B-spline curve, along which curvature is measured. According to the extreme
points of the curvature graph, the contour line is segmented, based on which the line segment is obtained. Experimental results
are shown for some difficult cases.
Received October 31, 1998 / Revised January 12, 1999 相似文献