首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
为了解决多源异构民航旅客服务数据集成过程中存在多模式匹配的效率不高、精确性不足、完整模式信息获取难度较大等问题,提出了一种基于SimHash和混合相似度的多模式匹配方法。该方法首先基于PMI计算特征单元权重,并通过SimHash算法构造属性列的签名来表示属性特征,以降低特征维度,进而引入K-means++算法对属性聚类并生成候选匹配集。最后基于属性的混合相似度构建属性映射图,以直观的方式展示属性间的匹配关系,同时提高多模式匹配效率。实验结果表明该方法具有可行性,为高效地解决多源异构民航旅客服务数据集成中的模式冲突问题提供新的解决方案。  相似文献   

2.
In the study of data exchange one usually assumes an open-world semantics, making it possible to extend instances of target schemas. An alternative closed-world semantics only moves ‘as much data as needed’ from the source to the target to satisfy constraints of a schema mapping. It avoids some of the problems exhibited by the open-world semantics, but limits the expressivity of schema mappings. Here we propose a mixed approach: one can designate different attributes of target schemas as open or closed, to combine the additional expressivity of the open-world semantics with the better behavior of query answering in closed worlds. We define such schema mappings, and show that they cover a large space of data exchange solutions with two extremes being the known open and closed-world semantics. We investigate the problems of query answering and schema mapping composition, and prove two trichotomy theorems, classifying their complexity based on the number of open attributes. We find conditions under which schema mappings compose, extending known results to a wide range of closed-world mappings. We also provide results for restricted classes of queries and mappings guaranteeing lower complexity.  相似文献   

3.
赵智超  赵政 《计算机工程》2009,35(13):58-60
针对对等数据管理系统中绕过离开节点时多次重写的耗时问题,提出一种基于XSLT模板展开的对等模式映射合成方法。处理模板匹配风格的XSLT子集表示的2个映射,通过按序对模板内容再次匹配展开,形成等价于多个映射的合成映射。当映射路径上的相邻节点一起离开时,代理节点使用合成映射直接绕过。仿真结果表明,合成方法能够正确生成等效映射,绕过时间缩短,网络拓扑更加强健。  相似文献   

4.
This paper presents a genetic algorithm (GA)-based optimization procedure for structural pattern recognition in a model-based recognition system using attributed relational graph (ARG) matching technique. The objective of our work is to improve the GA-based ARG matching procedures leading to a faster convergence rate and better quality mapping between a scene ARG and a set of given model ARGs. In this study, potential solutions are represented by integer strings indicating the mapping between scene and model vertices. The fitness of each solution string is computed by accumulating the similarity between the unary and binary attributes of the matched vertex pairs. We propose novel crossover and mutation operators, specifically for this problem. With these specialized genetic operators, the proposed algorithm converges to better quality solutions at a faster rate than the standard genetic algorithm (SGA). In addition, the proposed algorithm is also capable of recognizing multiple instances of any model object. An efficient pose-clustering algorithm is used to eliminate occasional wrong mappings and to determine the presence/pose of the model in the scene. We demonstrate the superior performance of our proposed algorithm using extensive experimental results.  相似文献   

5.
One of the main steps toward integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divide-and-merge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. Finally, we provide a new algorithm that combines MapMerge with schema mapping composition to correlate flows of schema mappings.  相似文献   

6.
Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are opaque or very difficult to interpret. In our previous work [36] we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph matching problem in the second step. In this paper we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.  相似文献   

7.
One aspect of the vision of dataspaces has been articulated as providing various benefits of classical data integration with reduced up-front costs. In this paper, we present techniques that aim to support schema mapping specification through interaction with end users in a pay-as-you-go fashion. In particular, we show how schema mappings, that are obtained automatically using existing matching and mapping generation techniques, can be annotated with metrics estimating their fitness to user requirements using feedback on query results obtained from end users.  相似文献   

8.
Storing and querying XML documents using a RDBMS is a challenging problem since one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: schema mapping, data mapping and query mapping. In this paper, we propose: (i) a lossless schema mapping algorithm to generate a database schema from a DTD, which makes several improvements over existing algorithms, (ii) two linear data mapping algorithms based on DOM and SAX, respectively, to map ordered XML data to relational data. To our best knowledge, there is no published linear schema-based data mapping algorithm for mapping ordered XML data to relational data. Experimental results are presented to show that our algorithms are efficient and scalable.  相似文献   

9.
Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of ??possible mappings?? between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.  相似文献   

10.
Xyleme is a huge warehouse integrating XML data of the Web. Xyleme considers a simple data model with data trees and tree types for describing the data sources, and a simple query language based on tree queries with boolean conditions. The main components of the data model are a mediated schema modeled by an abstract tree type, as a view of a set of tree types associated with actual data trees, called concrete tree types, and a mapping expressing the connection between the mediated schema and the concrete tree types. The first contribution of this paper is formal: we provide a declarative model-theoretic semantics for Xyleme tree queries, a way of checking tree query containment, and a characterization of tree queries as a composition of branch queries. The other contributions are algorithmic and handle the potentially huge size of the mapping relation which is a crucial issue for semantic integration and query evaluation in Xyleme. First, we propose a method for pre-evaluating queries at compile time by storing some specific meta-information about the mapping into map translation tables. These map translation tables summarize the set of all the branch queries that can be generated from the mediated schema and the set of all the mappings. Then, we propose different operators and strategies for relaxing queries which, having an empty map translation table, will have no answer if they are evaluated against the data. Finally, we present a method for semi-automatically generating the mapping relation.  相似文献   

11.
Current microarray databases use different terminologies and structures and thereby limit the sharing of data and collating of results between laboratories. Consequently, an effective integrated microarray data model is required. One important process to develop such an integrated database is schema matching. In this paper, we propose an effective schema matching approach called MDSM, to syntactically and semantically map attributes of different microarray schemas. The contribution from this work will be used later to create microarray global schemas. Since microarray data is complex, we use microarray ontology to improve the measuring accuracy of the similarity between attributes. The similarity relations can be represented as weighted bipartite graphs. We determine the best schema matching by computing the optimal matching in a bipartite graph using the Hungarian optimisation method. Experimental results show that our schema matching approach is effective and flexible to use in different kinds of database models such as; database schema, XML schema, and web site map. Finally, a case study on an existing public microarray schema is carried out using the proposed method.  相似文献   

12.
Autonomous mapping of HL7 RIM and relational database schema   总被引:1,自引:0,他引:1  
Healthcare systems need to share information within and across the boundaries in order to provide better care to the patients. For this purpose, they take advantage of the full potential of current state of the art in healthcare standards providing interoperable solutions. HL7 V3 specification is an international message exchange and interoperability standard. HL7 V3 messages exchanged between healthcare applications are ultimately recorded into local healthcare databases, mostly in relational databases. In order to bring these relational databases in compliance with HL7, mappings between HL7 RIM (Reference Information Model) and relational database schema are required. Currently, RIM and database mapping is largely performed manually, therefore it is tedious, time consuming, error prone and expensive process. It is a challenging task to determine all correspondences between RIM and schema automatically because of extreme heterogeneity issues in healthcare databases. To reduce the amount of manual efforts as much as possible, autonomous mapping approaches are required. This paper proposes a technique that addresses the aforementioned mapping issue and aligns healthcare databases to HL7 V3 RIM specifications. Furthermore, the proposed technique has been implemented as a working application and tested on real world healthcare systems. The application loads the target healthcare schema and then identifies the most appropriate match for tables and the associated fields in the schema by using domain knowledge and the matching rules defined in the Mapping Knowledge Repository. These rules are designed to handle the complexity of semantics found in healthcare databases. The GUI allows users to view and edit/re-map the correspondences. Once all the mappings are defined, the application generates Mapping Specification, which contains all the mapping information i.e. database tables and fields with associated RIM classes and attributes. In order to enable the transactions, the application is facilitated with the autonomous code generation from the Mapping Specification. The Code Generator component focuses primarily on generating custom classes and hibernate mapping files against the runtime system to retrieve and parse the data from the data source—thus allows bi-directional HL7 to database communication, with minimum programming required. Our experimental results show 35–65% accuracy on real laboratory systems, thus demonstrating the promise of the approach. The proposed scheme is an effective step in bringing the clinical databases in compliance with RIM, providing ease and flexibility.  相似文献   

13.
Rank Aggregation for Automatic Schema Matching   总被引:2,自引:0,他引:2  
Schema matching is a basic operation of data integration, and several tools for automating it have been proposed and evaluated in the database community. Research in this area reveals that there is no single schema matcher that is guaranteed to succeed in finding a good mapping for all possible domains and, thus, an ensemble of schema matchers should be considered. In this paper, we introduce schema metamatching, a general framework for composing an arbitrary ensemble of schema matchers and generating a list of best ranked schema mappings. Informally, schema metamatching stands for computing a "consensus" ranking of alternative mappings between two schemata, given the "individual" graded rankings provided by several schema matchers. We introduce several algorithms for this problem, varying from adaptations of some standard techniques for general quantitative rank aggregation to novel techniques specific to the problem of schema matching, and to combinations of both. We provide a formal analysis of the applicability and relative performance of these algorithms and evaluate them empirically on a set of real-world schemata  相似文献   

14.
Learning object identification rules for information integration   总被引:2,自引:0,他引:2  
When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it difficult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects’ shared attributes in order to identify matching objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous methods of object identification have required manual construction of object identification rules or mapping rules for determining the mappings between objects. This manual process is time consuming and error-prone. In our approach. Active Atlas learns to tailor mapping rules, through limited user input, to a specific application domain. The experimental results demonstrate that we achieve higher accuracy and require less user involvement than previous methods across various application domains.  相似文献   

15.
模式匹配是确定模式间语义匹配关系的技术,它在许多应用中起着重要的作用,如数据集成中异构模式信息整合、本体知识映射、电子商务中消息映射等。针对已有模式匹配方法的局限性,本着最大限度地减少人工干预使模式匹配自动化的原则,本文提出一种利用模式结构信息和已有匹配知识的模式匹配模型SMGM。它借鉴神经网络元间影响作用过程实现语义匹配推理;通过重用已有匹配知识,补充、精化匹配知识,自动缩减不确定阈值区间;并给出一种自适应式迭代挖掘求精已有匹配知识的自学习型模式匹配模型。实验表明:SMGM模型切实可行。  相似文献   

16.
Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies. Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating target dependencies. An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the mappings. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the mapping from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries.  相似文献   

17.
18.
利业鞑  庞雄文 《计算机应用》2009,29(7):1981-1984
在语义数据集成中本体映射是关键,手工设置映射关系消耗时间并且不准确,需要使用本体映射工具自动发现这种映射关系。在现有本体映射方法的基础上提出了基于领域学习的映射方法,可以发现本体中概念之间的映射关系,可以从领域知识中发现复杂映射的规则、增加映射时的实例数据,提高映射发现的查全率和准确率。实验结果验证了算法的有效性。  相似文献   

19.
模式匹配就是在作为输入的模式中有对应语义关系的元素间产生一个映射.为了提高模式匹配的效率,提出了一种新型的模式匹配方法--源模式分裂模式匹配算法.它可以解决标准模式匹配难以解决的问题:1)源模式的某一个属性和多个目标模式的多个属性之间建立匹配关系;2)表格中的不同元组对应其他表格同一元组的不同属性值的匹配.在匹配过程中,该方法先搜索种类型属性,然后根据种类型属性建立选择条件,最后把源模式进行分裂形成视图,再重新生成候选匹配集合,从而提高模式匹配的质量.  相似文献   

20.
Data integration with uncertainty   总被引:1,自引:0,他引:1  
This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号