首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Most recent schema matching systems assemble multiple components, each employing a particular matching technique. The domain user mustthen tune the system: select the right component to be executed and correctly adjust their numerous “knobs” (e.g., thresholds, formula coefficients). Tuning is skill and time intensive, but (as we show) without it the matching accuracy is significantly inferior. We describe eTuner, an approach to automatically tune schema matching systems. Given a schema S, we match S against synthetic schemas, for which the ground truth mapping is known, and find a tuning that demonstrably improves the performance of matching S against real schemas. To efficiently search the huge space of tuning configurations, eTuner works sequentially, starting with tuning the lowest level components. To increase the applicability of eTuner, we develop methods to tune a broad range of matching components. While the tuning process is completely automatic, eTuner can also exploit user assistance (whenever available) to further improve the tuning quality. We employed eTuner to tune four recently developed matching systems on several real-world domains. The results show that eTuner produced tuned matching systems that achieve higher accuracy than using the systems with currently possible tuning methods.  相似文献   

2.
Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition (OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically registered the original image to the rescanned one using four corner points and then transformed the original groundtruth using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth for microfilmed and FAXed versions of the University of Washington dataset documents. Received: July 24, 2001 / Accepted: May 20, 2002  相似文献   

3.
Object-oriented databases enforce behavioral schema consistency rules to guarantee type safety, i.e., that no run-time type error can occur. When the schema must evolve, some schema updates may violate these rules. In order to maintain behavioral schema consistency, traditional solutions require significant changes to the types, the type hierarchy and the code of existing methods. Such operations are very expensive in a database context. To ease schema evolution, we propose to support exceptions to the behavioral consistency rules without sacrificing type safety. The basic idea is to detect unsafe statements in a method code at compile-time and check them at run-time. The run-time check is performed by a specific clause that is automatically inserted around unsafe statements. This check clause warns the programmer of the safety problem and lets him provide exception-handling code. Schema updates can therefore be performed with only minor changes to the code of methods. Edited by Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received September 15, 1994 / Accepted September 1, 1995  相似文献   

4.
Answering queries using views: A survey   总被引:25,自引:0,他引:25  
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results. Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001  相似文献   

5.
Fast template matching using bounded partial correlation   总被引:8,自引:0,他引:8  
This paper describes a novel, fast template-matching technique, referred to as bounded partial correlation (BPC), based on the normalised cross-correlation (NCC) function. The technique consists in checking at each search position a suitable elimination condition relying on the evaluation of an upper-bound for the NCC function. The check allows for rapidly skipping the positions that cannot provide a better degree of match with respect to the current best-matching one. The upper-bounding function incorporates partial information from the actual cross-correlation function and can be calculated very efficiently using a recursive scheme. We show also a simple improvement to the basic BPC formulation that provides additional computational benefits and renders the technique more robust with respect to the parameters choice. Received: 2 November 2000 / Accepted: 25 July 2001 Correspondence to: L. Di Stefano  相似文献   

6.
7.
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers.  相似文献   

8.
Presently, man-machine interface development is a widespread research activity. A system to understand hand drawn architectural drawings in a CAD environment is presented in this paper. To understand a document, we have to identify its building elements and their structural properties. An attributed graph structure is chosen as a symbolic representation of the input document and the patterns to recognize in it. An inexact subgraph isomorphism procedure using relaxation labeling techniques is performed. In this paper we focus on how to speed up the matching. There is a building element, the walls, characterized by a hatching pattern. Using a straight line Hough transform (SLHT)-based method, we recognize this pattern, characterized by parallel straight lines, and remove from the input graph the edges belonging to this pattern. The isomorphism is then applied to the remainder of the input graph. When all the building elements have been recognized, the document is redrawn, correcting the inaccurate strokes obtained from a hand-drawn input. Received 6 June 1996 / Accepted 4 February 1997  相似文献   

9.
Partial model checking is a technique for verifying concurrent systems. It gradually reduces the verification problem to the final answer by removing concurrent components one-by-one, transforming and minimizing the specifications as it proceeds. This paper gives a survey of the theory behind partial model checking and the results obtained with it.  相似文献   

10.
To support heterogeneity is a major requirement in current approaches to integration and transformation of data. This paper proposes a new approach to the translation of schema and data from one data model to another, and we illustrate its implementation in the tool MIDST-RT.We leverage on our previous work on MIDST, a platform conceived to perform translations in an off-line fashion. In such an approach, the source database (both schema and data) is imported into a repository, where it is stored in a universal model. Then, the translation is applied within the tool as a composition of elementary transformation steps, specified as Datalog programs. Finally, the result (again both schema and data) is exported into the operational system.Here we illustrate a new, lightweight approach where the database is not imported. MIDST-RT needs only to know the schema of the source database and the model of the target one, and generates views on the operational system that expose the underlying data according to the corresponding schema in the target model. Views are generated in an almost automatic way, on the basis of the Datalog rules for schema translation.The proposed solution can be applied to different scenarios, which include data and application migration, data interchange, and object-to-relational mapping between applications and databases.  相似文献   

11.
This paper presents a local approach for matching contour segments in an image sequence. This study has been primarily motivated by work concerned with the recovery of 3D structure using active vision. The method to recover the 3D structure of the scene requires to track in real-time contour segments in an image sequence. Here, we propose an original and robust approach that is ideally suited for this problem. It is also of more general interest and can be used in any context requiring matching of line boundaries over time. This method only involves local modeling and computation of moving edges dealing “virtually” with a contour segment primitive representation. Such an approach brings robustness to contour segmentation instability and to occlusion, and easiness for implementation. Parallelism has also been investigated using an SIMD-based real-time image-processing system. This method has been validated with experiments on several real-image sequences. Our results show quite satisfactory performance and the algorithm runs in a few milliseconds. Received: 11 December 1996 / Accepted: 8 August 1997  相似文献   

12.
Matching large schemas: Approaches and evaluation   总被引:1,自引:0,他引:1  
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas.  相似文献   

13.
In this paper, we discuss an appearance-matching approach to the difficult problem of interpreting color scenes containing occluded objects. We have explored the use of an iterative, coarse-to-fine sum-squared-error method that uses information from hypothesized occlusion events to perform run-time modification of scene-to-template similarity measures. These adjustments are performed by using a binary mask to adaptively exclude regions of the template image from the squared-error computation. At each iteration higher resolution scene data as well as information derived from the occluding interactions between multiple object hypotheses are used to adjust these masks. We present results which demonstrate that such a technique is reasonably robust over a large database of color test scenes containing objects at a variety of scales, and tolerates minor 3D object rotations and global illumination variations. Received: 21 November 1996 / Accepted: 14 October 1997  相似文献   

14.
15.
We present an efficient and accurate method for retrieving images based on color similarity with a given query image or histogram. The method matches the query against parts of the image using histogram intersection. Efficient searching for the best matching subimage is done by pruning the set of subimages using upper bound estimates. The method is fast, has high precision and recall and also allows queries based on the positions of one or more objects in the database image. Experimental results showing the efficiency of the proposed search method, and high precision and recall of retrieval are presented. Received: 20 January 1997 / Accepted: 5 January 1998  相似文献   

16.
The idea of the information society is pervasive and varied and, in this context, universal access is itself a multi-faceted concept. However, the notion of universality presupposes an analysis and understanding of what both unifies and discriminates among different individual members of a community of technology users. This paper addresses these ideas and, in particular, seeks to illustrate some techniques which can support such an analysis in a variety of task domains. Of special interest here is a specific case study which examines the use of biometric processing as a means of managing access in the broadest sense. It is argued that not only is the field of biometric measurement one where understanding similarities and differences is the essence of what is required, but also that this offers the opportunity to establish and explore a variety of practical techniques of very wide significance in the context of universal access. Published online: 18 May 2001  相似文献   

17.
This paper presents a new multi-pass hierarchical stereo-matching approach for generation of digital terrain models (DTMs) from two overlapping aerial images. Our method consists of multiple passes which compute stereo matches with a coarse-to-fine and sparse-to-dense paradigm. An image pyramid is generated and used in the hierarchical stereo matching. Within each pass, the DTM is refined by using the image pyramid from the coarse to the fine level. At the coarsest level of the first pass, a global stereo-matching technique, the intra-/inter-scanline matching method, is used to generate a good initial DTM for the subsequent stereo matching. Thereafter, hierarchical block matching is applied to image locations where features are detected to refine the DTM incrementally. In the first pass, only the feature points near salient edge segments are considered in block matching. In the second pass, all the feature points are considered, and the DTM obtained from the first pass is used as the initial condition for local searching. For the passes after the second pass, 3D interactive manual editing can be incorporated into the automatic DTM refinement process whenever necessary. Experimental results have shown that our method can successfully provide accurate DTM from aerial images. The success of our approach and system has also been demonstrated with a flight simulation software. Received: 4 November 1996 / Accepted: 20 October 1997  相似文献   

18.
Conformance testing is still the main industrial validation technique for telecommunication protocols. In practice, the automatic construction of test cases based on finite-state models is hindered by the state explosion problem. We try to reduce its magnitude by using static analysis techniques in order to obtain smaller but equivalent models. Published online: 24 January 2003  相似文献   

19.
In this paper we present a new approach for building metadata schemas by integrating existing ontologies and structured vocabularies (thesauri). This integration is based on the specification of inclusion relationships between thesaurus terms and ontology concepts and results in application-specific metadata schemas incorporating the structural views of ontologies and the deep classification schemes provided by thesauri. We will also show how the result of this integration can be used for RDF schema creation and metadata querying. In our context, (metadata) queries exploit the inclusion semantics of term relationships, which introduces some recursion. We will present a fairly simple database-oriented solution for querying such metadata which avoids a (recursive) tree traversal and is based on a linear encoding of thesaurus hierarchies. Published online: 22 September 2000  相似文献   

20.
Schema matching is an important step in database integration. It identifies elements in two or more databases that have the same meaning. A multitude of schema matching methods have been proposed, but little is known about how humans assign meaning to database elements or assess the similarity of meaning of database elements. This paper presents an initial experimental study based on five theories of meaning that compares the effects of seven factors on the perceived similarity of database elements. Implications for schema matching research are discussed and guidance for future research is offered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号