共查询到20条相似文献,搜索用时 31 毫秒
1.
Query processing over object views of relational data 总被引:2,自引:0,他引:2
Gustav Fahl Tore Risch 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):261-281
This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in
a relational database as if it was stored in an object-oriented (OO) database. A query against the object view is translated
to one or several queries against the relational database. The results of these queries are then processed to form an answer
to the initial query. The approach is not restricted to a ‘pure’ object view mechanism for the relational data, since the
object view can also store its own data and methods. Therefore it must be possible to process queries that combine local data
residing in the object view with data retrieved from the relational database. We discuss the key issues when object views
of relational databases are developed, namely: how to map relational structures to sub-type/supertype hierarchies in the view,
how to represent relational database access in OO query plans, how to provide the concept of object identity in the view,
how to handle the fact that the extension of types in the view depends on the state of the relational database, and how to
process and optimize queries against the object view. The results are based on experiences from a running prototype implementation.
Edited by: M.T. ?zsu. Received April 12, 1995 / Accepted April 22, 1996 相似文献
2.
一种检测汉语相似重复记录的有效方法 总被引:7,自引:0,他引:7
消除重复记录可以提高数据质量。提出了按字段值种类数选择排序字段的方法。在相似重复记录的检测中,用第1个排序字段建立存储相似重复记录的二维链表,然后再用第2、第3个排序字段对二维链表中的记录进行排序-比较,以提高检测效果。为了正确地匹配汉字串,研究了由于缩写所造成的不匹配和读音、字型相似造成的输入错误。通过查找“相似汉字表”解决部分输入错误的问题,计算相似度函数判断被比较的记录是否是重复记录。实验表明,提出的方法能有效的检测汉语相似重复记录。 相似文献
3.
D. Laurent J. Lechtenbörger N. Spyratos G. Vossen 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):295-315
Views over databases have regained attention in the context of data warehouses, which are seen as materialized views. In this setting, efficient view maintenance is an important issue, for which the notion of self-maintainability has been identified as desirable. In this paper, we extend the concept of self-maintainability to (query and update) independence within a formal framework, where independence with respect to arbitrary given sets of queries and updates over the sources
can be guaranteed. To this end we establish an intuitively appealing connection between warehouse independence and view complements. Moreover, we study special kinds of complements, namely monotonic complements, and show how to compute minimal ones in the presence of keys and foreign keys in the underlying databases. Taking advantage
of these complements, an algorithmic approach is proposed for the specification of independent warehouses with respect to
given sets of queries and updates.
Received: 21 November 2000 / Accepted: 1 May 2001 Published online: 6 September 2001 相似文献
4.
In control systems, the interfaces between software and its embedding environment are a major source of costly errors. For
example, Lutz reported that 20–35% of the safety-related errors discovered during integration and system testing of two spacecraft
were related to the interfaces between the software and the embedding hardware. Also, the software’s operating environment
is likely to change over time, further complicating the issues related to system-level inter-component communication. In this
paper we discuss a formal approach to the specification and analysis of inter-component communication using a revised version
of RSML (Requirements State Machine Language). The formalism allows rigorous specification of the physical aspects of the
inter-component communication and forces encapsulation of communication-related properties in well-defined and easy-to-read
interface specifications. This enables us both to analyse a system design to detect incompatibilities between connected components
and to use the interface specifications as safety kernels to enforce safety constraints. 相似文献
5.
Aya Soffer Hanan Samet 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(4):253-274
Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include
maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches
for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach
preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object
that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting
of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries
are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that
have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on
the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational
multimedia database management system so that symbolic images can be stored and retrieved based on their content are described.
Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented.
Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of).
Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs
are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion
time, storage space, retrieval accuracy, and retrieval time.
Received June 12, 1998 / Accepted October 13, 1998 相似文献
6.
In a video-on-demand (VOD) environment, disk arrays are often used to support the disk bandwidth requirement. This can pose
serious problems on available disk bandwidth upon disk failure. In this paper, we explore the approach of replicating frequently
accessed movies to provide high data bandwidth and fault tolerance required in a disk-array-based video server. An isochronous
continuous video stream imposes different requirements from a random access pattern on databases or files. Explicitly, we
propose a new replica placement method, called rotational mirrored declustering (RMD), to support high data availability for disk arrays in a VOD environment. In essence, RMD is similar to the conventional
mirrored declustering in that replicas are stored in different disk arrays. However, it is different from the latter in that
the replica placements in different disk arrays under RMD are properly rotated. Combining the merits of prior chained and
mirrored declustering methods, RMD is particularly suitable for storing multiple movie copies to support VOD applications.
To assess the performance of RMD, we conduct a series of experiments by emulating the storage and delivery of movies in a
VOD system. Our results show that RMD consistently outperforms the conventional methods in terms of load-balancing and fault-tolerance
capability after disk failure, and is deemed a viable approach to supporting replica placement in a disk-array-based video
server. 相似文献
7.
The most common way of designing databases is by means of a conceptual model, such as E/R, without taking into account other
views of the system. New object-oriented design languages, such as UML (Unified Modelling Language), allow the whole system,
including the database schema, to be modelled in a uniform way. Moreover, as UML is an extendable language, it allows for
any necessary introduction of new stereotypes for specific applications. Proposals exist to extend UML with stereotypes for
database design but, unfortunately, they are focused on relational databases. However, new applications require complex objects
to be represented in complex relationships, object-relational databases being more appropriate for these requirements. The
framework of this paper is an Object-Relational Database Design Methodology, which defines new UML stereotypes for Object-Relational
Database Design and proposes some guidelines to translate a UML conceptual schema into an object-relational schema. The guidelines
are based on the SQL:1999 object-relational model and on Oracle8i as a product example.
Initial submission: 22 January 2002 / Revised submission: 10 June 2002
Published online: 7 January 2003
This paper is a revised and extended version of Extending UML for Object-Relational Database Design, presented in the UML’2001
conference [17]. 相似文献
8.
Wee Teck Ng Peter M. Chen 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(3):194-204
Recent results in the Rio project at the University of Michigan show that it is possible to create an area of main memory
that is as safe as disk from operating system crashes. This paper explores how to integrate the reliable memory provided by
the Rio file cache into a database system. Prior studies have analyzed the performance benefits of reliable memory; we focus
instead on how different designs affect reliability. We propose three designs for integrating reliable memory into databases:
non-persistent database buffer cache, persistent database buffer cache, and persistent database buffer cache with protection.
Non-persistent buffer caches use an I/O interface to reliable memory and require the fewest modifications to existing databases.
However, they waste memory capacity and bandwidth due to double buffering. Persistent buffer caches use a memory interface
to reliable memory by mapping it into the database address space. This places reliable memory under complete database control
and eliminates double buffering, but it may expose the buffer cache to database errors. Our third design reduces this exposure
by write protecting the buffer pages. Extensive fault tests show that mapping reliable memory into the database address space
does not significantly hurt reliability. This is because wild stores rarely touch dirty, committed pages written by previous
transactions. As a result, we believe that databases should use a memory interface to reliable memory.
Received January 1, 1998 / Accepted June 20, 1998 相似文献
9.
Duplicate Record Detection: A Survey 总被引:20,自引:0,他引:20
Elmagarmid A.K. Ipeirotis P.G. Verykios V.S. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(1):1-16
10.
Effective timestamping in databases 总被引:3,自引:0,他引:3
Kristian Torp Christian S. Jensen Richard T. Snodgrass 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):267-288
Many existing database applications place various timestamps on their data, rendering temporal values such as dates and times
prevalent in database tables. During the past two decades, several dozen temporal data models have appeared, all with timestamps
being integral components. The models have used timestamps for encoding two specific temporal aspects of database facts, namely
transaction time, when the facts are current in the database, and valid time, when the facts are true in the modeled reality.
However, with few exceptions, the assignment of timestamp values has been considered only in the context of individual modification
statements.
This paper takes the next logical step: It considers the use of timestamping for capturing transaction and valid time in the
context of transactions. The paper initially identifies and analyzes several problems with straightforward timestamping, then
proceeds to propose a variety of techniques aimed at solving these problems. Timestamping the results of a transaction with
the commit time of the transaction is a promising approach. The paper studies how this timestamping may be done using a spectrum
of techniques. While many database facts are valid until now, the current time, this value is absent from the existing temporal types. Techniques that address this problem using different
substitute values are presented. Using a stratum architecture, the performance of the different proposed techniques are studied.
Although querying and modifying time-varying data is accompanied by a number of subtle problems, we present a comprehensive
approach that provides application programmers with simple, consistent, and efficient support for modifying bitemporal databases
in the context of user transactions.
Received: March 11, 1998 / Accepted July 27, 1999 相似文献
11.
We present a system for classifying the color aspect of textured surfaces having a nearly constant hue (such as wooden boards,
textiles, wallpaper, etc.). The system is designed to compensate for small fluctuations (over time) of the light source and
for inhomogeneous illumination conditions (shading correction). This is an important feature because even in industrial environments
where the lighting conditions are controlled, a constant and homogeneous illumination cannot be guaranteed. Together with
an appropriate camera calibration (which includes a periodic update), our approach offers a robust system which is able to
“distinguish” (i.e., classify correctly) between surface classes which exhibit visually barely perceptible color variations.
In particular, our approach is based on relative (not absolute) color measurements. In this paper, we outline the classification
algorithm while focusing in detail on the camera calibration and a method for compensating for fluctuations of the light source.
Received: 1 September 1998 / Accepted: 16 March 2000 相似文献
12.
Oktie Hassanzadeh Renée J. Miller 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1141-1166
A major source of uncertainty in databases is the presence of duplicate items, i.e., records that refer to the same real-world entity. However, accurate deduplication is a difficult task and imperfect data cleaning may result in loss of valuable information. A reasonable alternative approach is to keep duplicates when the correct cleaning strategy is not certain, and utilize an efficient probabilistic query-answering technique to return query results along with probabilities of each answer being correct. In this paper, we present a flexible modular framework for scalably creating a probabilistic database out of a dirty relation of duplicated data and overview the challenges raised in utilizing this framework for large relations of string data. We study the problem of associating probabilities with duplicates that are detected using state-of-the-art scalable approximate join methods. We argue that standard thresholding techniques are not sufficiently robust for this task, and propose new clustering algorithms suitable for inferring duplicates and their associated probabilities. We show that the inferred probabilities accurately reflect the error in duplicate records. 相似文献
13.
Bing Wang 《International Journal on Digital Libraries》1999,2(2-3):91-110
A digital library (DL) consists of a database which contains library information and a user interface which provides a visual
window for users to search relevant information stored in the database. Thus, an abstract structure of a digital library can
be defined as a combination of a special purpose database and a user-friendly interface. This paper addresses one of the fundamental aspects of such
a combination. This is the formal data structure for linking an object oriented database with hypermedia to support digital
libraries. It is important to establish a formal structure for a digital library in order to efficiently maintain different
types of library information. This article discusses how to build an object oriented hybrid system to support digital libraries.
In particular, we focus on the discussion of a general purpose data model for digital libraries and the design of the corresponding
hypermedia interface. The significant features of this research are, first, a formalized data model to define a digital library
system structure; second, a practical approach to manage the global schema of a library system; and finally, a design strategy
to integrate hypermedia with databases to support a wide range of application areas.
Received: 15 December 1997 / Revised: June 1999 相似文献
14.
Andrew Fano 《Personal and Ubiquitous Computing》2001,5(1):12-15
The promise of mobile devices lies not in their capacity to duplicate the capabilities of desktop machines, but rather in
their promise of enabling location-specific tasks. One of the challenges that must be addressed if they are to be used in
this way is how intuitive interfaces for mobile devices can be designed that enable access to location-specific services usable
across locations. We are developing a prototype mobile valet application that presents location-specific services organised
around the tasks associated with a location. The basic elements of the interface exploits commonalties in the way we address
tasks at various locations just as the familiar “file” and “edit” menus in various software applications exploit regularities
in software tasks. 相似文献
15.
The paper investigates efficient bandwidth allocation schemes for the transmission of MPEG-2 video traffic on high-speed
networks. To this end we performed an extensive analysis of the traffic generated by an MPEG-2 encoder. Specifically, we encoded
“The Sheltering Sky” movie according to the MPEG-2 standard. By the analysis of the generated traffic it results that a constant-quality
transmission can be performed with a poor bandwidth utilization. In the paper we identified that the low bandwidth utilization
is caused by rare high-rate periods in the codec bitstream. Hence, we identified the source scalability as a promising approach
to achieve a “quasi-constant” quality transmission and an efficient bandwidth utilization. The effectiveness of this approach
is evaluated in the paper via simulation. Specifically, by defining a Markovian model for an MPEG-2 scalable source we performed
a set of simulation experiments which indicate that the source scalability approach significantly increases the utilization,
while maintaining the quality of the video signal at the highest value for most of the time, e.g., a 50% of the network utilization
with the highest quality for the 99.7% of the time. 相似文献
16.
Wen-Syan Li K.Selçuk Candan Kyoji Hirata Yoshinori Hara 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):312-326
Due to the fuzziness of query specification and media matching, multimedia retrieval is conducted by way of exploration.
It is essential to provide feedback so that users can visualize query reformulation alternatives and database content distribution.
Since media matching is an expensive task, another issue is how to efficiently support exploration so that the system is not
overloaded by perpetual query reformulation. In this paper, we present a uniform framework to represent statistical information
of both semantics and visual metadata for images in the databases. We propose the concept of query verification, which evaluates queries using statistics, and provides users with feedback, including the strictness and reformulation alternatives
of each query condition as well as estimated numbers of matches. With query verification, the system increases the efficiency
of the multimedia database exploration for both users and the system. Such statistical information is also utilized to support
progressive query processing and query relaxation.
Received: 9 June 1998/ Accepted: 21 July 2000 Published online: 4 May 2001 相似文献
17.
Simonas Šaltenis Christian S. Jensen 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):1-16
Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases
that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by
the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are
needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions
to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On
the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the
first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of
an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion
of the current time, now, is accommodated for both temporal dimensions. The index is based on the R-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive
queries. Performance experiments are reported that evaluate pertinent aspects of the index.
Edited by T. Sellis. Received: 7 December 2000 / Accepted: 1 September 2001 Published online: 18 December 2001 相似文献
18.
The performance of several methods for estimating local surface geometry (the principal frame plus the principal quadric)
are examined by applying them to a suite of synthetic and real test data which have been corrupted by various amounts of additive
Gaussian noise. Methods considered include finite differences, a facet based approach, and quadric surface fitting. The nonlinear
quadric fitting method considered was found to perform the best but has the greatest computational cost. The facet based approach
works as well as the other quadric fitting methods and has a much smaller computational cost. Hence it is the recommended
method to use in practice. 相似文献
19.
J. Hu R.S. Kashi D. Lopresti G.T. Wilfong 《International Journal on Document Analysis and Recognition》2002,4(3):140-153
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition
have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a
fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity.
In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and
table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies
work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield
various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely
(deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined
to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results
of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed
acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by
the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could
be applied to other document recognition tasks as well.
Received July 18, 2000 / Accepted October 4, 2001 相似文献
20.
Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions
Achim Kraiss Gerhard Weikum 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(3):141-162
Large multimedia document archives may hold a major fraction of their data in tertiary storage libraries for cost reasons.
This paper develops an integrated approach to the vertical data migration between the tertiary, secondary, and primary storage
in that it reconciles speculative prefetching, to mask the high latency of the tertiary storage, with the replacement policy
of the document caches at the secondary and primary storage level, and also considers the interaction of these policies with
the tertiary and secondary storage request scheduling.
The integrated migration policy is based on a continuous-time Markov chain model for predicting the expected number of accesses
to a document within a specified time horizon. Prefetching is initiated only if that expectation is higher than those of the
documents that need to be dropped from secondary storage to free up the necessary space. In addition, the possible resource
contention at the tertiary and secondary storage is taken into account by dynamically assessing the response-time benefit
of prefetching a document versus the penalty that it would incur on the response time of the pending document requests.
The parameters of the continuous-time Markov chain model, the probabilities of co-accessing certain documents and the interaction
times between successive accesses, are dynamically estimated and adjusted to evolving workload patterns by keeping online
statistics. The integrated policy for vertical data migration has been implemented in a prototype system. The system makes
profitable use of the Markov chain model also for the scheduling of volume exchanges in the tertiary storage library. Detailed
simulation experiments with Web-server-like synthetic workloads indicate significant gains in terms of client response time.
The experiments also show that the overhead of the statistical bookkeeping and the computations for the access predictions
is affordable.
Received January 1, 1998 / Accepted May 27, 1998 相似文献