共查询到20条相似文献,搜索用时 15 毫秒
1.
Stefan Manegold Peter A. Boncz Martin L. Kersten 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):231-246
In the past decade, advances in the speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access
is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article,
we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines
for database architecture, in terms of both data structures and algorithms. We discuss how vertically fragmented data structures
optimize cache performance on sequential data access. We then focus on equi-join, typically a random-access operation, and
introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical
model that incorporates memory access cost. Experiments that validate this model were performed on the Monet database system.
We obtained exact statistics on events such as TLB misses and L1 and L2 cache misses by using hardware performance counters
found in modern CPUs. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms
makes them perform well, which is confirmed by experimental results.
Received April 20, 2000 / Accepted June 23, 2000 相似文献
2.
Navin Kabra David J. DeWitt 《The VLDB Journal The International Journal on Very Large Data Bases》1999,8(1):55-78
In this paper we describe the design and implementation of OPT++, a tool for extensible database query optimization that
uses an object-oriented design to simplify the task of implementing, extending, and modifying an optimizer. Building an optimizer
using OPT++ makes it easy to extend the query algebra (to add new query algebra operators and physical implementation algorithms
to the system), easy to change the search space, and also to change the search strategy. Furthermore, OPT++ comes equipped
with a number of search strategies that are available for use by an optimizer-implementor. OPT++ considerably simplifies both,
the task of implementing an optimizer for a new database system, and the task of evaluating alternative optimization techniques
and strategies to decide what techniques are best suited for that database system. We present the results of a series of performance
studies. These results validate our design and show that, in spite of its flexibility, OPT++ can be used to build efficient
optimizers.
Received October 1996 / Accepted January 1998 相似文献
3.
Scene change detection techniques for video database systems 总被引:1,自引:0,他引:1
Haitao Jiang Abdelsalam Helal Ahmed K. Elmagarmid Anupam Joshi 《Multimedia Systems》1998,6(3):186-195
Scene change detection (SCD) is one of several fundamental problems in the design of a video database management system (VDBMS).
It is the first step towards the automatic segmentation, annotation, and indexing of video data. SCD is also used in other
aspects of VDBMS, e.g., hierarchical representation and efficient browsing of the video data. In this paper, we provide a
taxonomy that classifies existing SCD algorithms into three categories: full-video-image-based, compressed-video-based, and
model-based algorithms. The capabilities and limitations of the SCD algorithms are discussed in detail. The paper also proposes
a set of criteria for measuring and comparing the performance of various SCD algorithms. We conclude by discussing some important
research directions. 相似文献
4.
Algebraic query optimisation for database programming languages 总被引:1,自引:0,他引:1
Alexandra Poulovassilis Carol Small 《The VLDB Journal The International Journal on Very Large Data Bases》1996,5(2):119-132
A major challenge still facing the designers and implementors of database
programming languages (DBPLs) is that of query optimisation. We investigate
algebraic query optimisation techniques for DBPLs in the context of a purely
declarative functional language that supports sets as first-class objects.
Since the language is computationally complete issues such as
non-termination of expressions and construction of infinite data structures
can be investigated, whilst its declarative nature allows the issue of side
effects to be avoided and a richer set of equivalences to be developed.
The language has a well-defined semantics which permits us to reason
formally about the properties of expressions, such as their equivalence with
other expressions and their termination. The support of a set bulk data
type enables much prior work on the optimisation of relational languages to
be utilised.
In the paper we first give the syntax of our archetypal DBPL and briefly
discuss its semantics. We then define a small but powerful algebra of
operators over the set data type, provide some key equivalences for
expressions in these operators, and list transformation principles for
optimising expressions. Along the way, we identify some caveats to
well-known equivalences for non-deductive database languages. We next
extend our language with two higher level constructs commonly found in
functional DBPLs: set comprehensions and functions with known inverses. Some
key equivalences for these constructs are provided, as are transformation
principles for expressions in them. Finally, we investigate extending our
equivalences for the set operators to the analogous operators over bags.
Although developed and formally proved in the context of a functional
language, our findings are directly applicable to other DBPLs of similar
expressiveness.
Edited by
Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received
September 15, 1994 / Accepted September 1, 1995 相似文献
5.
Wen-Syan Li K.Selçuk Candan Kyoji Hirata Yoshinori Hara 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):312-326
Due to the fuzziness of query specification and media matching, multimedia retrieval is conducted by way of exploration.
It is essential to provide feedback so that users can visualize query reformulation alternatives and database content distribution.
Since media matching is an expensive task, another issue is how to efficiently support exploration so that the system is not
overloaded by perpetual query reformulation. In this paper, we present a uniform framework to represent statistical information
of both semantics and visual metadata for images in the databases. We propose the concept of query verification, which evaluates queries using statistics, and provides users with feedback, including the strictness and reformulation alternatives
of each query condition as well as estimated numbers of matches. With query verification, the system increases the efficiency
of the multimedia database exploration for both users and the system. Such statistical information is also utilized to support
progressive query processing and query relaxation.
Received: 9 June 1998/ Accepted: 21 July 2000 Published online: 4 May 2001 相似文献
6.
Stefan Deßloch Theo Härder Nelson Mattos Bernhard Mitschang Joachim Thomas 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(2):79-95
The increasing power of modern computers is steadily opening up new application domains for advanced data processing such
as engineering and knowledge-based applications. To meet their requirements, concepts for advanced data management have been
investigated during the last decade, especially in the field of object orientation. Over the last couple of years, the database
group at the University of Kaiserslautern has been developing such an advanced database system, the KRISYS prototype. In this
article, we report on the results and experiences obtained in the course of this project. The primary objective for the first
version of KRISYS was to provide semantic features, such as an expressive data model, a set-oriented query language, deductive
as well as active capabilities. The first KRISYS prototype became completely operational in 1989. To evaluate its features
and to stabilize its functionality, we started to develop several applications with the system. These experiences marked the
starting point for an overall redesign of KRISYS. Major goals were to tune KRISYS and its query-processing facilities to a
suitable client/server environment, as well as to provide elaborate mechanisms for consistency control comprising semantic
integrity constraints, multi-user synchronization, and failure recovery. The essential aspects of the resulting client/server
architecture are embodied by the client-side data management needed to effectively support advanced applications and to gain
the required system performance for interactive work. The project stages of KRISYS properly reflect the essential developments
that have taken place in the research on advanced database systems over the last years. Hence, the subsequent discussions
will bring up a number of important aspects with regard to advanced data processing that are of significant general importance,
as well as of general applicability to database systems.
Received June 18, 1996 / Accepted November 11, 1997 相似文献
7.
Peter A. Boncz Martin L. Kersten 《The VLDB Journal The International Journal on Very Large Data Bases》1999,8(2):101-119
In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation
have a significant performance advantage. In order to support relational or object oriented applications on top of such a
fragmented data model, a flexible yet powerful intermediate language is needed. This problem has been successfully tackled
in Monet, a modern extensible database kernel developed by our group. We focus on the design choices made in the Monet interpreter
language (MIL), its algebraic query language, and outline how its concept of tactical optimization enhances and simplifies
the optimization of complex queries. Finally, we summarize the experience gained in Monet by creating a highly efficient implementation
of MIL.
Received November 10, 1998 / Accepted March 22, 1999 相似文献
8.
Aya Soffer Hanan Samet 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(4):253-274
Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include
maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches
for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach
preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object
that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting
of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries
are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that
have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on
the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational
multimedia database management system so that symbolic images can be stored and retrieved based on their content are described.
Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented.
Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of).
Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs
are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion
time, storage space, retrieval accuracy, and retrieval time.
Received June 12, 1998 / Accepted October 13, 1998 相似文献
9.
Arunprasad P. Marathe Kenneth Salem 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):68-91
Arrays are a common and important class of data. At present, database systems do not provide adequate array support: arrays
can neither be easily defined nor conveniently manipulated. Further, array manipulations are not optimized. This paper describes
a language called the Array Manipulation Language (AML), for expressing array manipulations, and a collection of optimization techniques for AML expressions.
In the AML framework for array manipulation, arbitrary externally-defined functions can be applied to arrays in a structured
manner. AML can be adapted to different application domains by choosing appropriate external function definitions. This paper
concentrates on arrays occurring in databases of digital images such as satellite or medical images.
AML queries can be treated declaratively and subjected to rewrite optimizations. Rewriting minimizes the number of applications
of potentially costly external functions required to compute a query result. AML queries can also be optimized for space.
Query results are generated a piece at a time by pipelined execution plans, and the amount of memory required by a plan depends
on the order in which pieces are generated. An optimizer can consider generating the pieces of the query result in a variety
of orders, and can efficiently choose orders that require less space. An AML-based prototype array database system called
ArrayDB has been built, and it is used to show the effectiveness of these optimization techniques.
Edited by M. Carey. Received: 10 August 2001 / Accepted: 11 December 2001 Published online: 24 May 2002 相似文献
10.
Chiang Lee Chi-Sheng Shih Yaw-Huei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):327-343
Traditional algorithms for optimizing the execution order of joins are no more valid when selections and projections involve
methods and become very expensive operations. Selections and projections could be even more costly than joins such that they
are pulled above joins, rather than pushed down in a query tree. In this paper, we take a fundamental look at how to approach
query optimization from a top-down design perspective, rather than trying to force one model to fit into another. We present
a graph model which is designed to characterize execution plans. Each edge and each vertex of the graph is assigned a weight
to model execution plans. We also design algorithms that use these weights to optimize the execution order of operations.
A cost model of these algorithms is developed. Experiments are conducted on the basis of this cost model. The results show
that our algorithms are superior to similar work proposed in the literature.
Received 20 April 1999 / Accepted 9 August 2000 Published online 20 April 2001 相似文献
11.
MiniCon: A scalable algorithm for answering queries using views 总被引:5,自引:0,他引:5
Rachel Pottinger Alon Halevy 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):182-198
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously
materialized views over the database, rather than accessing the database relations. The problem has received significant attention
because of its relevance to a wide variety of data management problems, such as data integration, query optimization, and
the maintenance of physical data independence. To date, the performance of proposed algorithms has received very little attention,
and in particular, their scale up in the presence of a large number of views is unknown. We first analyze two previous algorithms,
the bucket algorithm and the inverse-rules, and show their deficiencies. We then describe the MiniCon, a novel algorithm for
finding the maximally-contained rewriting of a conjunctive query using a set of conjunctive views. We present the first experimental
study of algorithms for answering queries using views. The study shows that the MiniCon scales up well and significantly outperforms
the previous algorithms. We describe an extension of the MiniCon to handle comparison predicates, and show its performance
experimentally. Finally, we describe how the MiniCon can be extended to the context of query optimization.
Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001 相似文献
12.
A database model for object dynamics 总被引:1,自引:0,他引:1
M.P. Papazoglou B.J. Krämer 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(2):73-96
To effectively model complex applications in which constantly changing situations can be represented, a database system must
be able to support the runtime specification of structural and behavioral nuances for objects on an individual or group basis.
This paper introduces the role mechanism as an extension of object-oriented databases to support unanticipated behavioral
oscillations for objects that may attain many types and share a single object identity. A role refers to the ability to represent
object dynamics by seamlessly integrating idiosyncratic behavior, possibly in response to external events, with pre-existing
object behavior specified at instance creation time. In this manner, the same object can simultaneously be an instance of
different classes which symbolize the different roles that this object assumes. The role concept and its underlying linguistic
scheme simplify the design requirements of complex applications that need to create and manipulate dynamic objects.
Edited by D. McLeod / Received March 1994 / Accepted January 1996 相似文献
13.
Steve Jones Shona McInnes Mark S. Staveley 《International Journal on Digital Libraries》1999,2(2-3):207-223
Online information repositories commonly provide keyword search facilities through textual query languages based on Boolean
logic. However, there is evidence to suggest that the syntactic demands of such languages can lead to user errors and adversely
affect the time that it takes users to form queries. Users also face difficulties because of the conflict in semantics between
AND and OR when used in Boolean logic and English language. Analysis of usage logs for the New Zealand Digital Library (NZDL)
show that few Boolean queries contain more than three terms, use of the intersection operator dominates and that query refinement
is common. We suggest that graphical query languages, in particular Venn-like diagrams, can alleviate the problems that users
experience when forming Boolean expressions with textual languages. A study of the utility of Venn diagrams for query specification
indicates that with little or no training users can interpret and form Venn-like diagrams in a consistent manner which accurately
correspond to Boolean expressions. We describe VQuery, a Venn-diagram based user interface to the New Zealand Digital Library
(NZDL). In a study which compared VQuery with a standard textual Boolean interface, users took significantly longer to form
queries and produced more erroneous queries when using VQuery. We discuss the implications of these results and suggest directions
for future work.
Received: 15 December 1997 / Revised: June 1999 相似文献
14.
Heuristic and randomized optimization for the join ordering problem 总被引:11,自引:0,他引:11
Michael Steinbrunn Guido Moerkotte Alfons Kemper 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(3):191-208
Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective
optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for
optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems.
Two possible solution spaces, the space of left-deep and bushy processing trees, are evaluated from a statistical point of
view. The result is that the common limitation to left-deep processing trees is only advisable for certain join graph types.
Basically, optimizers from three classes are analysed: heuristic, randomized and genetic algorithms. Each one is extensively
scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized
and genetic algorithms are well suited for optimizing join expressions. They generate solutions of high quality within a reasonable
running time. The benefits of heuristic optimizers, namely the short running time, are often outweighed by merely moderate
optimization performance. 相似文献
15.
In this paper we describe a database that consists of handwritten English sentences. It is based on the Lancaster-Oslo/Bergen
(LOB) corpus. This corpus is a collection of texts that comprise about one million word instances. The database includes 1,066
forms produced by approximately 400 different writers. A total of 82,227 word instances out of a vocabulary of 10,841 words
occur in the collection. The database consists of full English sentences. It can serve as a basis for a variety of handwriting
recognition tasks. However, it is expected that the database would be particularly useful for recognition tasks where linguistic
knowledge beyond the lexicon level is used, because this knowledge can be automatically derived from the underlying corpus.
The database also includes a few image-processing procedures for extracting the handwritten text from the forms and the segmentation
of the text into lines and words.
Received September 28, 2001 / Revised October 10, 2001 相似文献
16.
Serge Abiteboul Sophie Cluet Tova Milo 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(2):96-114
Structured data stored in files can benefit from standard database technology. In particular, we show here how such data
can be queried and updated using declarative database languages. We introduce the notion of structuring schema, which consists of a grammar annotated with database programs. Based on a structuring schema, a file can be viewed as a database
structure, queried and updated as such. For queries, we show that almost standard database optimization techniques can be used to answer queries without having to construct
the entire database. For updates, we study in depth the propagation to the file of an update specified on the database view of this file. The problem is not
feasible in general and we present a number of negative results. The positive results consist of techniques that allow to
propagate updates efficiently under some reasonable locality conditions on the structuring schemas.
Received November 1, 1995 / Accepted June 20, 1997 相似文献
17.
Active rules for XML: A new paradigm for E-services 总被引:1,自引:0,他引:1
Angela Bonifati Stefano Ceri Stefano Paraboschi 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(1):39-47
XML is rapidly becoming one of the most widely adopted technologies for information exchange and representation. As the use
of XML becomes more widespread, we foresee the development of active XML rules, i.e., rules explicitly designed for the management
of XML information. In particular, we argue that active rules for XML offer a natural paradigm for the rapid development of
innovative e-services. In the paper, we show how active rules can be specified in the context of XSLT, a pattern-based language
for publishing XML documents (promoted by the W3C) which is receiving strong commercial support, and Lorel, a query language
for XML documents that is quite popular in the research world. We demonstrate, through simple examples of active rules for
XSLT and Lorel, that active rules can be effective for the implementation of e-commerce services. We also discuss the various
issues that need to be considered in adapting the notion of relational triggers to the XML context.
Received: 30 October 2000 / Accepted: 19 December 2000 Published online: 27 April 2001 相似文献
18.
Praveen Seshadri 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(3):130-140
The explosion in complex multimedia content makes it crucial for database systems to support such data efficiently. This
paper argues that the “blackbox” ADTs used in current object-relational systems inhibit their performance, thereby limiting
their use in emerging applications. Instead, the next generation of object-relational database systems should be based on
enhanced abstract data type (E-ADT) technology. An (E-ADT) can expose the semantics of its methods to the database system, thereby permitting advanced query optimizations. Fundamental architectural changes
are required to build a database system with E-ADTs; the added functionality should not compromise the modularity of data
types and the extensibility of the type system. The implementation issues have been explored through the development of E-ADTs
in Predator. Initial performance results demonstrate an order of magnitude in performance improvements.
Received January 1, 1998 / Accepted May 27, 1998 相似文献
19.
F.S. Brundick Ann E.M. Brodeen Malcolm S. Taylor 《International Journal on Document Analysis and Recognition》2002,4(3):170-176
In this paper we consider a statistical approach to augment a limited database of groundtruth documents for use in evaluation
of optical character recognition software. A modified moving-blocks bootstrap procedure is used to construct surrogate documents
for this purpose which prove to serve effectively and, in some regards, indistinguishably from groundtruth. The proposed method
is validated through a rigorous statistical procedure.
Received: March 30, 2000 / Revised: September 14, 2001 相似文献
20.
Zengping Tian Hongjun Lu Wenyun Ji Aoying Zhou Zhong Tian 《International Journal on Digital Libraries》2002,3(4):325-331
Detecting and eliminating duplicate records is one of the major tasks for improving data quality. The task, however, is not
as trivial as it seems since various errors, such as character insertion, deletion, transposition, substitution, and word
switching, are often present in real-world databases. This paper presents an n-gram-based approach for detecting duplicate
records in large databases. Using the approach, records are first mapped to numbers based on the n-grams of their field values.
The obtained numbers are then clustered, and records within a cluster are taken as potential duplicate records. Finally, record
comparisons are performed within clusters to identify true duplicate records. The unique feature of this method is that it
does not require preprocessing to correct syntactic or typographical errors in the source data in order to achieve high accuracy.
Moreover, sorting the source data file is unnecessary. Only a fixed number of database scans is required. Therefore, compared
with previous methods, the algorithm is more time efficient.
Published online: 22 August 2001 相似文献