共查询到20条相似文献,搜索用时 15 毫秒
1.
Matching large schemas: Approaches and evaluation 总被引:1,自引:0,他引:1
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas. 相似文献
2.
Schema matching plays a central role in a myriad of XML-based applications. There has been a growing need for developing high-performance matching systems in order to identify and discover semantic correspondences across XML data. XML schema matching methods face several challenges in the form of definition, adoption, utilization, and combination of element similarity measures. In this paper, we classify, review, and experimentally compare major methods of element similarity measures and their combinations. We aim at presenting a unified view which is useful when developing a new element similarity measure, when implementing an XML schema matching component, when using an XML schema matching system, and when comparing XML schema matching systems. 相似文献
3.
Automating schema mapping is challenging. Previous approaches to automating schema mapping focus mainly on computing direct matches between two schemas. Schemas, however, rarely match directly. Thus, to complete the task of schema mapping, we must also compute indirect matches. In this paper, we present a composite approach for generating a source-to-target mapping that contains both direct and many indirect matches between a source schema and a target schema. Recognizing expected-data values associated with schema elements and applying schema-structure heuristics are the key ideas needed to compute indirect matches. Experiments we have conducted over several real-world application domains show encouraging results, yielding about 90% precision and recall measures for both direct and indirect matches. 相似文献
4.
模式匹配是模式集成、语义WEB及电子商务等领域的重点及难点问题. 为了有效利用专家知识提高匹配质量, 提出了一种基于部分已验证匹配关系的模式匹配模型. 在该模型中, 首先,人工验证待匹配模式元素间的少量对应关系, 进而推理出当前任务下部分已知的匹配关系及单独匹配器的缺省权重; 然后,基于上述已收集到的先验知识对多种匹配器所生成的相似度矩阵进行合并及调整, 并在全局范围内进行优化; 最后,对优化矩阵的选择性进行评估, 从而为不同匹配任务推荐最合理的候选匹配生成方案. 实验结果表明, 部分已验证匹配关系的使用有助于模式匹配质量的提高. 相似文献
5.
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers. 相似文献
6.
7.
8.
在模式匹配方面已经出现了许多使用于特定应用领域的部分自动匹配方法,这种匹配方法结合了多种匹配技术以便能够在大规模的多样匹配环境中得到高的匹配率。提出了一种基于模式的元素匹配方法,它融合了语言和约束匹配器,使用了复合元素名称匹配器和神经网络匹配器,结合基于语言的匹配算法和最大优先策略的原则,以多重标准条件下复合名称匹配器的结果作为约束对模式元素进行归类。通过组合使用复合名称匹配器和神经网络匹配器,使得本方法可以应用于更复杂的匹配环境。 相似文献
9.
Peter Coschurba Joachim Baumann Uwe Kubach Alexander Leonhardi 《Personal and Ubiquitous Computing》2001,5(1):16-19
Metaphors are often used to provide the user with a mental model to ease the use of computers. An example of such a metaphor
is the commonly used “Desktop Metaphor”. Metaphors also can be used to ease context-aware information access for the users
of mobile information systems. In this paper we present a taxonomy that allows the categorisation of such metaphors. Furthermore,
we give an overview of existing metaphors and their implementations. After introducing some new metaphors we conclude our
considerations with a classification of new and existing metaphors using our taxonomy. 相似文献
10.
Mark D. Aagaard Byron Cook Nancy A. Day Robert B. Jones 《International Journal on Software Tools for Technology Transfer (STTT)》2003,4(3):298-312
Most verifications of superscalar, out-of-order microprocessors compare state-machine-based implementations and specifications,
where the specification is based on the instruction-set architecture. The different efforts use a variety of correctness statements,
implementations, and verification approaches. We present a framework for classifying correctness statements about safety properties
of superscalar microprocessors. Our framework is independent of the implementation representation and verification approach,
and is parameterized by the width of the processor. We characterize the relationships between the correctness statements of
many different efforts and also illustrate how classical approaches to microprocessor verification fit within our framework.
Published online: 17 December 2002 相似文献
11.
12.
Doe-Wan Kim Tapas Kanungo 《International Journal on Document Analysis and Recognition》2002,5(1):47-66
Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition
(OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned
document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically
registered the original image to the rescanned one using four corner points and then transformed the original groundtruth
using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing
the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance
between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout
complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using
the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth
for microfilmed and FAXed versions of the University of Washington dataset documents.
Received: July 24, 2001 / Accepted: May 20, 2002 相似文献
13.
Massimo Mecella Barbara Pernici 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(1):2-15
Component-based approaches are becoming more and more popular to support Internet-based application development. Different
component modeling approaches, however, can be adopted, obtaining different abstraction levels (either conceptual or operational).
In this paper we present a component-based architecture for the design of e-applications, and discuss the concept of wrapper
components as building blocks for the development of e-services, where these services are based on legacy systems. We discuss
their characteristics and their applicability in Internet-based application development.
Received: 30 October 2000 / Accepted: 9 January 2001 Published online: 28 June 2001 相似文献
14.
模式匹配是确定模式间语义匹配关系的技术,它在许多应用中起着重要的作用,如数据集成中异构模式信息整合、本体知识映射、电子商务中消息映射等。针对已有模式匹配方法的局限性,本着最大限度地减少人工干预使模式匹配自动化的原则,本文提出一种利用模式结构信息和已有匹配知识的模式匹配模型SMGM。它借鉴神经网络元间影响作用过程实现语义匹配推理;通过重用已有匹配知识,补充、精化匹配知识,自动缩减不确定阈值区间;并给出一种自适应式迭代挖掘求精已有匹配知识的自学习型模式匹配模型。实验表明:SMGM模型切实可行。 相似文献
15.
Current microarray databases use different terminologies and structures and thereby limit the sharing of data and collating of results between laboratories. Consequently, an effective integrated microarray data model is required. One important process to develop such an integrated database is schema matching. In this paper, we propose an effective schema matching approach called MDSM, to syntactically and semantically map attributes of different microarray schemas. The contribution from this work will be used later to create microarray global schemas. Since microarray data is complex, we use microarray ontology to improve the measuring accuracy of the similarity between attributes. The similarity relations can be represented as weighted bipartite graphs. We determine the best schema matching by computing the optimal matching in a bipartite graph using the Hungarian optimisation method. Experimental results show that our schema matching approach is effective and flexible to use in different kinds of database models such as; database schema, XML schema, and web site map. Finally, a case study on an existing public microarray schema is carried out using the proposed method. 相似文献
16.
Yihong Gong 《Multimedia Systems》1999,7(6):449-457
In this paper, we propose a novel system that strives to achieve advanced content-based image retrieval using seamless combination
of two complementary approaches: on the one hand, we propose a new color-clustering method to better capture color properties
of the original images; on the other hand, expecting that image regions acquired from the original images inevitably contain
many errors, we make use of the available erroneous, ill-segmented image regions to accomplish the object-region-based image
retrieval. We also propose an effective image-indexing scheme to facilitate fast and efficient image matching and retrieval.
The carefully designed experimental evaluation shows that our proposed image retrieval system surpasses other methods under
comparison in terms of not only quantitative measures, but also image retrieval capabilities. 相似文献
17.
Ela Hunt Malcolm P. Atkinson Robert W. Irving 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(3):256-271
Our aim is to develop new database technologies for the approximate matching of unstructured string data using indexes. We
explore the potential of the suffix tree data structure in this context. We present a new method of building suffix trees,
allowing us to build trees in excess of RAM size, which has hitherto not been possible. We show that this method performs
in practice as well as the O(n) method of Ukkonen [70]. Using this method we build indexes for 200 Mb of protein and 300 Mbp of DNA, whose disk-image exceeds
the available RAM. We show experimentally that suffix trees can be effectively used in approximate string matching with biological
data. For a range of query lengths and error bounds the suffix tree reduces the size of the unoptimised O(mn) dynamic programming calculation required in the evaluation of string similarity, and the gain from indexing increases with
index size. In the indexes we built this reduction is significant, and less than 0.3% of the expected matrix is evaluated.
We detail the requirements for further database and algorithmic research to support efficient use of large suffix indexes
in biological applications.
Received: November 1, 2001 / Accepted: March 2, 2002 Published online: September 25, 2002 相似文献
18.
Answering queries using views: A survey 总被引:25,自引:0,他引:25
Alon Y. Halevy 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):270-294
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously
defined materialized views over the database, rather than accessing the database relations. The problem has recently received
significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding
a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation
of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result,
finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views.
Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated
schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate
works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it
and the relevant theoretical results.
Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001 相似文献
19.
The adoption of standards for exchanging information across the Web presents both new opportunities and important challenges for data integration and aggregation. Although Web Services simplify the discovery and access of information sources, the problem of semantic heterogeneity remains: how to find semantic correspondences across the data being integrated.In this paper, we explore these issues in the context of Web Services, and propose OATS, a novel algorithm for schema matching that is specifically suited to Web Service data aggregation. We show how probing Web Services with a small set of related queries results in semantically correlated data instances which greatly simplifies the matching process, and demonstrate that the use of an ensemble of string distance metrics in matching data instances performs better than individual metrics. We also show how the choice of probe queries has a dramatic effect on matching accuracy. Motivated by this observation, we describe and evaluate an machine learning approach to selecting probes to maximise accuracy while minimising cost. 相似文献
20.
模式匹配技术是数据集成领域中的关键技术。为了快速、准确地完成模式匹配工作,已经提出了大量的基于各种模式类型的模式匹配方法。本文介绍了现存的模式匹配技术和两种多源模式匹配技术;并且为满足大规模匹配的需要提出了一种改进的多源模式匹配算法。 相似文献