期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Query processing over object views of relational data 总被引：2，自引：0，他引：2

Gustav Fahl Tore Risch 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):261-281

This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A query against the object view is translated to one or several queries against the relational database. The results of these queries are then processed to form an answer to the initial query. The approach is not restricted to a ‘pure’ object view mechanism for the relational data, since the object view can also store its own data and methods. Therefore it must be possible to process queries that combine local data residing in the object view with data retrieved from the relational database. We discuss the key issues when object views of relational databases are developed, namely: how to map relational structures to sub-type/supertype hierarchies in the view, how to represent relational database access in OO query plans, how to provide the concept of object identity in the view, how to handle the fact that the extension of types in the view depends on the state of the relational database, and how to process and optimize queries against the object view. The results are based on experiences from a running prototype implementation. Edited by: M.T. ?zsu. Received April 12, 1995 / Accepted April 22, 1996 相似文献

2.

一种检测汉语相似重复记录的有效方法 总被引：7，自引：0，他引：7

程国达苏杭丽《计算机应用》2005,25(6):1362-1365

消除重复记录可以提高数据质量。提出了按字段值种类数选择排序字段的方法。在相似重复记录的检测中,用第1个排序字段建立存储相似重复记录的二维链表,然后再用第2、第3个排序字段对二维链表中的记录进行排序-比较,以提高检测效果。为了正确地匹配汉字串,研究了由于缩写所造成的不匹配和读音、字型相似造成的输入错误。通过查找“相似汉字表”解决部分输入错误的问题,计算相似度函数判断被比较的记录是否是重复记录。实验表明,提出的方法能有效的检测汉语相似重复记录。相似文献

3.

Monotonic complements for independent data warehouses

D. Laurent J. Lechtenbörger N. Spyratos G. Vossen 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):295-315

Views over databases have regained attention in the context of data warehouses, which are seen as materialized views. In this setting, efficient view maintenance is an important issue, for which the notion of self-maintainability has been identified as desirable. In this paper, we extend the concept of self-maintainability to (query and update) independence within a formal framework, where independence with respect to arbitrary given sets of queries and updates over the sources can be guaranteed. To this end we establish an intuitively appealing connection between warehouse independence and view complements. Moreover, we study special kinds of complements, namely monotonic complements, and show how to compute minimal ones in the presence of keys and foreign keys in the underlying databases. Taking advantage of these complements, an algorithmic approach is proposed for the specification of independent warehouses with respect to given sets of queries and updates. Received: 21 November 2000 / Accepted: 1 May 2001 Published online: 6 September 2001 相似文献

4.

Specifying and Analysing System-Level Inter-Component Interfaces

Mats P. E. Heimdahl Jeffrey M. Thompson 《Requirements Engineering》2000,5(4):208-224

In control systems, the interfaces between software and its embedding environment are a major source of costly errors. For example, Lutz reported that 20–35% of the safety-related errors discovered during integration and system testing of two spacecraft were related to the interfaces between the software and the embedding hardware. Also, the software’s operating environment is likely to change over time, further complicating the issues related to system-level inter-component communication. In this paper we discuss a formal approach to the specification and analysis of inter-component communication using a revised version of RSML (Requirements State Machine Language). The formalism allows rigorous specification of the physical aspects of the inter-component communication and forces encapsulation of communication-related properties in well-defined and easy-to-read interface specifications. This enables us both to analyse a system design to detect incompatibilities between connected components and to use the interface specifications as safety kernels to enforce safety constraints. 相似文献

5.

Integrating symbolic images into a multimedia database system using classification and abstraction approaches

Aya Soffer Hanan Samet 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(4):253-274

Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational multimedia database management system so that symbolic images can be stored and retrieved based on their content are described. Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented. Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of). Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion time, storage space, retrieval accuracy, and retrieval time. Received June 12, 1998 / Accepted October 13, 1998 相似文献

6.

Using rotational mirrored declustering for replica placement in a disk-array-based video server

Ming-Syan Chen Hui-I Hsiao Chung-Sheng Li Philip S. Yu 《Multimedia Systems》1997,5(6):371-379

In a video-on-demand (VOD) environment, disk arrays are often used to support the disk bandwidth requirement. This can pose serious problems on available disk bandwidth upon disk failure. In this paper, we explore the approach of replicating frequently accessed movies to provide high data bandwidth and fault tolerance required in a disk-array-based video server. An isochronous continuous video stream imposes different requirements from a random access pattern on databases or files. Explicitly, we propose a new replica placement method, called rotational mirrored declustering (RMD), to support high data availability for disk arrays in a VOD environment. In essence, RMD is similar to the conventional mirrored declustering in that replicas are stored in different disk arrays. However, it is different from the latter in that the replica placements in different disk arrays under RMD are properly rotated. Combining the merits of prior chained and mirrored declustering methods, RMD is particularly suitable for storing multiple movie copies to support VOD applications. To assess the performance of RMD, we conduct a series of experiments by emulating the storage and delivery of movies in a VOD system. Our results show that RMD consistently outperforms the conventional methods in terms of load-balancing and fault-tolerance capability after disk failure, and is deemed a viable approach to supporting replica placement in a disk-array-based video server. 相似文献

7.

A Methodological Approach for Object-Relational Database Design using UML

Esperanza Marcos Belén Vela José María Cavero 《Software and Systems Modeling》2003,2(1):59-72

The most common way of designing databases is by means of a conceptual model, such as E/R, without taking into account other views of the system. New object-oriented design languages, such as UML (Unified Modelling Language), allow the whole system, including the database schema, to be modelled in a uniform way. Moreover, as UML is an extendable language, it allows for any necessary introduction of new stereotypes for specific applications. Proposals exist to extend UML with stereotypes for database design but, unfortunately, they are focused on relational databases. However, new applications require complex objects to be represented in complex relationships, object-relational databases being more appropriate for these requirements. The framework of this paper is an Object-Relational Database Design Methodology, which defines new UML stereotypes for Object-Relational Database Design and proposes some guidelines to translate a UML conceptual schema into an object-relational schema. The guidelines are based on the SQL:1999 object-relational model and on Oracle8i as a product example. Initial submission: 22 January 2002 / Revised submission: 10 June 2002 Published online: 7 January 2003 This paper is a revised and extended version of Extending UML for Object-Relational Database Design, presented in the UML’2001 conference [17]. 相似文献

8.

Integrating reliable memory in databases

Wee Teck Ng Peter M. Chen 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(3):194-204

Recent results in the Rio project at the University of Michigan show that it is possible to create an area of main memory that is as safe as disk from operating system crashes. This paper explores how to integrate the reliable memory provided by the Rio file cache into a database system. Prior studies have analyzed the performance benefits of reliable memory; we focus instead on how different designs affect reliability. We propose three designs for integrating reliable memory into databases: non-persistent database buffer cache, persistent database buffer cache, and persistent database buffer cache with protection. Non-persistent buffer caches use an I/O interface to reliable memory and require the fewest modifications to existing databases. However, they waste memory capacity and bandwidth due to double buffering. Persistent buffer caches use a memory interface to reliable memory by mapping it into the database address space. This places reliable memory under complete database control and eliminates double buffering, but it may expose the buffer cache to database errors. Our third design reduces this exposure by write protecting the buffer pages. Extensive fault tests show that mapping reliable memory into the database address space does not significantly hurt reliability. This is because wild stores rarely touch dirty, committed pages written by previous transactions. As a result, we believe that databases should use a memory interface to reliable memory. Received January 1, 1998 / Accepted June 20, 1998 相似文献

9.

Duplicate Record Detection: A Survey 总被引：20，自引：0，他引：20

Elmagarmid A.K. Ipeirotis P.G. Verykios V.S. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(1):1-16

相似文献

10.

Effective timestamping in databases 总被引：3，自引：0，他引：3

Kristian Torp Christian S. Jensen Richard T. Snodgrass 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):267-288

Many existing database applications place various timestamps on their data, rendering temporal values such as dates and times prevalent in database tables. During the past two decades, several dozen temporal data models have appeared, all with timestamps being integral components. The models have used timestamps for encoding two specific temporal aspects of database facts, namely transaction time, when the facts are current in the database, and valid time, when the facts are true in the modeled reality. However, with few exceptions, the assignment of timestamp values has been considered only in the context of individual modification statements. This paper takes the next logical step: It considers the use of timestamping for capturing transaction and valid time in the context of transactions. The paper initially identifies and analyzes several problems with straightforward timestamping, then proceeds to propose a variety of techniques aimed at solving these problems. Timestamping the results of a transaction with the commit time of the transaction is a promising approach. The paper studies how this timestamping may be done using a spectrum of techniques. While many database facts are valid until now, the current time, this value is absent from the existing temporal types. Techniques that address this problem using different substitute values are presented. Using a stratum architecture, the performance of the different proposed techniques are studied. Although querying and modifying time-varying data is accompanied by a number of subtle problems, we present a comprehensive approach that provides application programmers with simple, consistent, and efficient support for modifying bitemporal databases in the context of user transactions. Received: March 11, 1998 / Accepted July 27, 1999 相似文献

11.

Building a color classification system for textured and hue homogeneous surfaces: system calibration and algorithm

Christian Daul Ronald Rösch Bernhard Claus 《Machine Vision and Applications》2000,12(3):137-148

We present a system for classifying the color aspect of textured surfaces having a nearly constant hue (such as wooden boards, textiles, wallpaper, etc.). The system is designed to compensate for small fluctuations (over time) of the light source and for inhomogeneous illumination conditions (shading correction). This is an important feature because even in industrial environments where the lighting conditions are controlled, a constant and homogeneous illumination cannot be guaranteed. Together with an appropriate camera calibration (which includes a periodic update), our approach offers a robust system which is able to “distinguish” (i.e., classify correctly) between surface classes which exhibit visually barely perceptible color variations. In particular, our approach is based on relative (not absolute) color measurements. In this paper, we outline the classification algorithm while focusing in detail on the camera calibration and a method for compensating for fluctuations of the light source. Received: 1 September 1998 / Accepted: 16 March 2000 相似文献

12.

Creating probabilistic databases from duplicated data

Oktie Hassanzadeh Renée J. Miller 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1141-1166

A major source of uncertainty in databases is the presence of duplicate items, i.e., records that refer to the same real-world entity. However, accurate deduplication is a difficult task and imperfect data cleaning may result in loss of valuable information. A reasonable alternative approach is to keep duplicates when the correct cleaning strategy is not certain, and utilize an efficient probabilistic query-answering technique to return query results along with probabilities of each answer being correct. In this paper, we present a flexible modular framework for scalably creating a probabilistic database out of a dirty relation of duplicated data and overview the challenges raised in utilizing this framework for large relations of string data. We study the problem of associating probabilities with duplicates that are detected using state-of-the-art scalable approximate join methods. We argue that standard thresholding techniques are not sufficiently robust for this task, and propose new clustering algorithms suitable for inferring duplicates and their associated probabilities. We show that the inferred probabilities accurately reflect the error in duplicate records. 相似文献

13.

A hybrid system approach for supporting digital libraries

Bing Wang 《International Journal on Digital Libraries》1999,2(2-3):91-110

A digital library (DL) consists of a database which contains library information and a user interface which provides a visual window for users to search relevant information stored in the database. Thus, an abstract structure of a digital library can be defined as a combination of a special purpose database and a user-friendly interface. This paper addresses one of the fundamental aspects of such a combination. This is the formal data structure for linking an object oriented database with hypermedia to support digital libraries. It is important to establish a formal structure for a digital library in order to efficiently maintain different types of library information. This article discusses how to build an object oriented hybrid system to support digital libraries. In particular, we focus on the discussion of a general purpose data model for digital libraries and the design of the corresponding hypermedia interface. The significant features of this research are, first, a formalized data model to define a digital library system structure; second, a practical approach to manage the global schema of a library system; and finally, a design strategy to integrate hypermedia with databases to support a wide range of application areas. Received: 15 December 1997 / Revised: June 1999 相似文献

14.

What are a Location’s “File” and “Edit” Menus?

Andrew Fano 《Personal and Ubiquitous Computing》2001,5(1):12-15

The promise of mobile devices lies not in their capacity to duplicate the capabilities of desktop machines, but rather in their promise of enabling location-specific tasks. One of the challenges that must be addressed if they are to be used in this way is how intuitive interfaces for mobile devices can be designed that enable access to location-specific services usable across locations. We are developing a prototype mobile valet application that presents location-specific services organised around the tasks associated with a location. The basic elements of the interface exploits commonalties in the way we address tasks at various locations just as the familiar “file” and “edit” menus in various software applications exploit regularities in software tasks. 相似文献

15.

MPEG-2 sources: exploiting source scalability for an efficient bandwidth allocation

A. Chimienti M. Conti E. Gregori M. Lucenteforte R. Picco 《Multimedia Systems》2000,8(3):240-255

The paper investigates efficient bandwidth allocation schemes for the transmission of MPEG-2 video traffic on high-speed networks. To this end we performed an extensive analysis of the traffic generated by an MPEG-2 encoder. Specifically, we encoded “The Sheltering Sky” movie according to the MPEG-2 standard. By the analysis of the generated traffic it results that a constant-quality transmission can be performed with a poor bandwidth utilization. In the paper we identified that the low bandwidth utilization is caused by rare high-rate periods in the codec bitstream. Hence, we identified the source scalability as a promising approach to achieve a “quasi-constant” quality transmission and an efficient bandwidth utilization. The effectiveness of this approach is evaluated in the paper via simulation. Specifically, by defining a Markovian model for an MPEG-2 scalable source we performed a set of simulation experiments which indicate that the source scalability approach significantly increases the utilization, while maintaining the quality of the video signal at the highest value for most of the time, e.g., a 50% of the network utilization with the highest quality for the 99.7% of the time. 相似文献

16.

Supporting efficient multimedia database exploration

Wen-Syan Li K.Selçuk Candan Kyoji Hirata Yoshinori Hara 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):312-326

Due to the fuzziness of query specification and media matching, multimedia retrieval is conducted by way of exploration. It is essential to provide feedback so that users can visualize query reformulation alternatives and database content distribution. Since media matching is an expensive task, another issue is how to efficiently support exploration so that the system is not overloaded by perpetual query reformulation. In this paper, we present a uniform framework to represent statistical information of both semantics and visual metadata for images in the databases. We propose the concept of query verification, which evaluates queries using statistics, and provides users with feedback, including the strictness and reformulation alternatives of each query condition as well as estimated numbers of matches. With query verification, the system increases the efficiency of the multimedia database exploration for both users and the system. Such statistical information is also utilized to support progressive query processing and query relaxation. Received: 9 June 1998/ Accepted: 21 July 2000 Published online: 4 May 2001 相似文献

17.

Indexing of now-relative spatio-bitemporal data

Simonas Šaltenis Christian S. Jensen 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):1-16

Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion of the current time, now, is accommodated for both temporal dimensions. The index is based on the R-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive queries. Performance experiments are reported that evaluate pertinent aspects of the index. Edited by T. Sellis. Received: 7 December 2000 / Accepted: 1 September 2001 Published online: 18 December 2001 相似文献

18.

A comparison of local surface geometry estimation methods

Alan M. McIvor Robert J. Valkenburg 《Machine Vision and Applications》1997,10(1):17-26

The performance of several methods for estimating local surface geometry (the principal frame plus the principal quadric) are examined by applying them to a suite of synthetic and real test data which have been corrupted by various amounts of additive Gaussian noise. Methods considered include finite differences, a facet based approach, and quadric surface fitting. The nonlinear quadric fitting method considered was found to perform the best but has the greatest computational cost. The facet based approach works as well as the other quadric fitting methods and has a much smaller computational cost. Hence it is the recommended method to use in practice. 相似文献

19.

Evaluating the performance of table processing algorithms

J. Hu R.S. Kashi D. Lopresti G.T. Wilfong 《International Journal on Document Analysis and Recognition》2002,4(3):140-153

While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well. Received July 18, 2000 / Accepted October 4, 2001 相似文献

20.

Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions

Achim Kraiss Gerhard Weikum 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(3):141-162

Large multimedia document archives may hold a major fraction of their data in tertiary storage libraries for cost reasons. This paper develops an integrated approach to the vertical data migration between the tertiary, secondary, and primary storage in that it reconciles speculative prefetching, to mask the high latency of the tertiary storage, with the replacement policy of the document caches at the secondary and primary storage level, and also considers the interaction of these policies with the tertiary and secondary storage request scheduling. The integrated migration policy is based on a continuous-time Markov chain model for predicting the expected number of accesses to a document within a specified time horizon. Prefetching is initiated only if that expectation is higher than those of the documents that need to be dropped from secondary storage to free up the necessary space. In addition, the possible resource contention at the tertiary and secondary storage is taken into account by dynamically assessing the response-time benefit of prefetching a document versus the penalty that it would incur on the response time of the pending document requests. The parameters of the continuous-time Markov chain model, the probabilities of co-accessing certain documents and the interaction times between successive accesses, are dynamically estimated and adjusted to evolving workload patterns by keeping online statistics. The integrated policy for vertical data migration has been implemented in a prototype system. The system makes profitable use of the Markov chain model also for the scheduling of volume exchanges in the tertiary storage library. Detailed simulation experiments with Web-server-like synthetic workloads indicate significant gains in terms of client response time. The experiments also show that the overhead of the statistical bookkeeping and the computations for the access predictions is affordable. Received January 1, 1998 / Accepted May 27, 1998 相似文献