共查询到20条相似文献,搜索用时 0 毫秒
1.
Yoonkyong Lee Mayssam Sayyadian AnHai Doan Arnon S. Rosenthal 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(1):97-122
Most recent schema matching systems assemble multiple components, each employing a particular matching technique. The domain user mustthen tune the system: select the right component to be executed and correctly adjust their numerous “knobs” (e.g., thresholds, formula coefficients). Tuning is skill and time intensive, but (as we show) without it the matching accuracy is significantly inferior. We describe eTuner, an approach to automatically tune schema matching systems. Given a schema S, we match S against synthetic schemas, for which the ground truth mapping is known, and find a tuning that demonstrably improves the performance of matching S against real schemas. To efficiently search the huge space of tuning configurations, eTuner works sequentially, starting with tuning the lowest level components. To increase the applicability of eTuner, we develop methods to tune a broad range of matching components. While the tuning process is completely automatic, eTuner can also exploit user assistance (whenever available) to further improve the tuning quality. We employed eTuner to tune four recently developed matching systems on several real-world domains. The results show that eTuner produced tuned matching systems that achieve higher accuracy than using the systems with currently possible tuning methods. 相似文献
2.
Doe-Wan Kim Tapas Kanungo 《International Journal on Document Analysis and Recognition》2002,5(1):47-66
Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition
(OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned
document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically
registered the original image to the rescanned one using four corner points and then transformed the original groundtruth
using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing
the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance
between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout
complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using
the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth
for microfilmed and FAXed versions of the University of Washington dataset documents.
Received: July 24, 2001 / Accepted: May 20, 2002 相似文献
3.
Eric Amiel Marie-Jo Bellosta Eric Dujardin Eric Simon 《The VLDB Journal The International Journal on Very Large Data Bases》1996,5(2):133-150
Object-oriented databases enforce behavioral schema consistency rules
to guarantee type safety, i.e., that no run-time type error can occur. When
the schema must evolve, some schema updates may violate these rules. In
order to maintain behavioral schema consistency, traditional solutions require
significant changes to the types, the type hierarchy and the code of existing
methods. Such operations are very expensive in a database context. To ease
schema evolution, we propose to support exceptions to the behavioral
consistency rules without sacrificing type safety. The basic idea is to detect
unsafe statements in a method code at compile-time and check them at run-time.
The run-time check is performed by a specific clause that is automatically
inserted around unsafe statements. This check clause warns the programmer of
the safety problem and lets him provide exception-handling code. Schema
updates can therefore be performed with only minor changes to the code of
methods.
Edited by
Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received
September 15, 1994 / Accepted September 1, 1995 相似文献
4.
Answering queries using views: A survey 总被引:25,自引:0,他引:25
Alon Y. Halevy 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):270-294
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously
defined materialized views over the database, rather than accessing the database relations. The problem has recently received
significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding
a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation
of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result,
finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views.
Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated
schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate
works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it
and the relevant theoretical results.
Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001 相似文献
5.
Fast template matching using bounded partial correlation 总被引:8,自引:0,他引:8
This paper describes a novel, fast template-matching technique, referred to as bounded partial correlation (BPC), based on
the normalised cross-correlation (NCC) function. The technique consists in checking at each search position a suitable elimination
condition relying on the evaluation of an upper-bound for the NCC function. The check allows for rapidly skipping the positions
that cannot provide a better degree of match with respect to the current best-matching one. The upper-bounding function incorporates
partial information from the actual cross-correlation function and can be calculated very efficiently using a recursive scheme.
We show also a simple improvement to the basic BPC formulation that provides additional computational benefits and renders
the technique more robust with respect to the parameters choice.
Received: 2 November 2000 / Accepted: 25 July 2001
Correspondence to: L. Di Stefano 相似文献
6.
7.
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers. 相似文献
8.
Presently, man-machine interface development is a widespread research activity. A system to understand hand drawn architectural
drawings in a CAD environment is presented in this paper. To understand a document, we have to identify its building elements
and their structural properties. An attributed graph structure is chosen as a symbolic representation of the input document
and the patterns to recognize in it. An inexact subgraph isomorphism procedure using relaxation labeling techniques is performed.
In this paper we focus on how to speed up the matching. There is a building element, the walls, characterized by a hatching
pattern. Using a straight line Hough transform (SLHT)-based method, we recognize this pattern, characterized by parallel straight
lines, and remove from the input graph the edges belonging to this pattern. The isomorphism is then applied to the remainder
of the input graph. When all the building elements have been recognized, the document is redrawn, correcting the inaccurate
strokes obtained from a hand-drawn input.
Received 6 June 1996 / Accepted 4 February 1997 相似文献
9.
Henrik Reif Andersen Jorn Lind-Nielsen 《International Journal on Software Tools for Technology Transfer (STTT)》1999,2(3):242-259
Partial model checking is a technique for verifying concurrent systems. It gradually reduces the verification problem to the
final answer by removing concurrent components one-by-one, transforming and minimizing the specifications as it proceeds.
This paper gives a survey of the theory behind partial model checking and the results obtained with it. 相似文献
10.
Paolo Atzeni Luigi Bellomarini Francesca Bugiotti Fabrizio Celli Giorgio Gianforme 《Information Systems》2012,37(3):269-287
To support heterogeneity is a major requirement in current approaches to integration and transformation of data. This paper proposes a new approach to the translation of schema and data from one data model to another, and we illustrate its implementation in the tool MIDST-RT.We leverage on our previous work on MIDST, a platform conceived to perform translations in an off-line fashion. In such an approach, the source database (both schema and data) is imported into a repository, where it is stored in a universal model. Then, the translation is applied within the tool as a composition of elementary transformation steps, specified as Datalog programs. Finally, the result (again both schema and data) is exported into the operational system.Here we illustrate a new, lightweight approach where the database is not imported. MIDST-RT needs only to know the schema of the source database and the model of the target one, and generates views on the operational system that expose the underlying data according to the corresponding schema in the target model. Views are generated in an almost automatic way, on the basis of the Datalog rules for schema translation.The proposed solution can be applied to different scenarios, which include data and application migration, data interchange, and object-to-relational mapping between applications and databases. 相似文献
11.
Samia Boukir Patrick Bouthemy François Chaumette Didier Juvin 《Machine Vision and Applications》1998,10(5-6):321-330
This paper presents a local approach for matching contour segments in an image sequence. This study has been primarily motivated
by work concerned with the recovery of 3D structure using active vision. The method to recover the 3D structure of the scene
requires to track in real-time contour segments in an image sequence. Here, we propose an original and robust approach that
is ideally suited for this problem. It is also of more general interest and can be used in any context requiring matching
of line boundaries over time. This method only involves local modeling and computation of moving edges dealing “virtually”
with a contour segment primitive representation. Such an approach brings robustness to contour segmentation instability and
to occlusion, and easiness for implementation. Parallelism has also been investigated using an SIMD-based real-time image-processing
system. This method has been validated with experiments on several real-image sequences. Our results show quite satisfactory
performance and the algorithm runs in a few milliseconds.
Received: 11 December 1996 / Accepted: 8 August 1997 相似文献
12.
Matching large schemas: Approaches and evaluation 总被引:1,自引:0,他引:1
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas. 相似文献
13.
In this paper, we discuss an appearance-matching approach to the difficult problem of interpreting color scenes containing
occluded objects. We have explored the use of an iterative, coarse-to-fine sum-squared-error method that uses information
from hypothesized occlusion events to perform run-time modification of scene-to-template similarity measures. These adjustments
are performed by using a binary mask to adaptively exclude regions of the template image from the squared-error computation.
At each iteration higher resolution scene data as well as information derived from the occluding interactions between multiple
object hypotheses are used to adjust these masks. We present results which demonstrate that such a technique is reasonably
robust over a large database of color test scenes containing objects at a variety of scales, and tolerates minor 3D object
rotations and global illumination variations.
Received: 21 November 1996 / Accepted: 14 October 1997 相似文献
14.
15.
We present an efficient and accurate method for retrieving images based on color similarity with a given query image or histogram.
The method matches the query against parts of the image using histogram intersection. Efficient searching for the best matching
subimage is done by pruning the set of subimages using upper bound estimates. The method is fast, has high precision and recall
and also allows queries based on the positions of one or more objects in the database image. Experimental results showing
the efficiency of the proposed search method, and high precision and recall of retrieval are presented.
Received: 20 January 1997 / Accepted: 5 January 1998 相似文献
16.
The idea of the information society is pervasive and varied and, in this context, universal access is itself a multi-faceted
concept. However, the notion of universality presupposes an analysis and understanding of what both unifies and discriminates
among different individual members of a community of technology users. This paper addresses these ideas and, in particular,
seeks to illustrate some techniques which can support such an analysis in a variety of task domains. Of special interest here
is a specific case study which examines the use of biometric processing as a means of managing access in the broadest sense.
It is argued that not only is the field of biometric measurement one where understanding similarities and differences is the
essence of what is required, but also that this offers the opportunity to establish and explore a variety of practical techniques
of very wide significance in the context of universal access.
Published online: 18 May 2001 相似文献
17.
Yi-Ping Hung Chu-Song Chen Kuan-Chung Hung Yong-Sheng Chen Chiou-Shann Fuh 《Machine Vision and Applications》1998,10(5-6):280-291
This paper presents a new multi-pass hierarchical stereo-matching approach for generation of digital terrain models (DTMs)
from two overlapping aerial images. Our method consists of multiple passes which compute stereo matches with a coarse-to-fine
and sparse-to-dense paradigm. An image pyramid is generated and used in the hierarchical stereo matching. Within each pass,
the DTM is refined by using the image pyramid from the coarse to the fine level. At the coarsest level of the first pass,
a global stereo-matching technique, the intra-/inter-scanline matching method, is used to generate a good initial DTM for
the subsequent stereo matching. Thereafter, hierarchical block matching is applied to image locations where features are detected
to refine the DTM incrementally. In the first pass, only the feature points near salient edge segments are considered in block
matching. In the second pass, all the feature points are considered, and the DTM obtained from the first pass is used as the
initial condition for local searching. For the passes after the second pass, 3D interactive manual editing can be incorporated
into the automatic DTM refinement process whenever necessary. Experimental results have shown that our method can successfully
provide accurate DTM from aerial images. The success of our approach and system has also been demonstrated with a flight simulation
software.
Received: 4 November 1996 / Accepted: 20 October 1997 相似文献
18.
Marius Bozga Jean-Claude Fernandez Lucian Ghirvu 《International Journal on Software Tools for Technology Transfer (STTT)》2003,4(2):142-152
Conformance testing is still the main industrial validation technique for telecommunication protocols. In practice, the automatic
construction of test cases based on finite-state models is hindered by the state explosion problem. We try to reduce its magnitude
by using static analysis techniques in order to obtain smaller but equivalent models.
Published online: 24 January 2003 相似文献
19.
Bernd Amann Irini Fundulaki Michel Scholl 《International Journal on Digital Libraries》2000,3(3):221-236
In this paper we present a new approach for building metadata schemas by integrating existing ontologies and structured vocabularies
(thesauri). This integration is based on the specification of inclusion relationships between thesaurus terms and ontology
concepts and results in application-specific metadata schemas incorporating the structural views of ontologies and the deep
classification schemes provided by thesauri. We will also show how the result of this integration can be used for RDF schema
creation and metadata querying. In our context, (metadata) queries exploit the inclusion semantics of term relationships,
which introduces some recursion. We will present a fairly simple database-oriented solution for querying such metadata which
avoids a (recursive) tree traversal and is based on a linear encoding of thesaurus hierarchies.
Published online: 22 September 2000 相似文献
20.
Schema matching is an important step in database integration. It identifies elements in two or more databases that have the same meaning. A multitude of schema matching methods have been proposed, but little is known about how humans assign meaning to database elements or assess the similarity of meaning of database elements. This paper presents an initial experimental study based on five theories of meaning that compares the effects of seven factors on the perceived similarity of database elements. Implications for schema matching research are discussed and guidance for future research is offered. 相似文献