首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
Rank Aggregation for Automatic Schema Matching   总被引:2,自引:0,他引:2  
Schema matching is a basic operation of data integration, and several tools for automating it have been proposed and evaluated in the database community. Research in this area reveals that there is no single schema matcher that is guaranteed to succeed in finding a good mapping for all possible domains and, thus, an ensemble of schema matchers should be considered. In this paper, we introduce schema metamatching, a general framework for composing an arbitrary ensemble of schema matchers and generating a list of best ranked schema mappings. Informally, schema metamatching stands for computing a "consensus" ranking of alternative mappings between two schemata, given the "individual" graded rankings provided by several schema matchers. We introduce several algorithms for this problem, varying from adaptations of some standard techniques for general quantitative rank aggregation to novel techniques specific to the problem of schema matching, and to combinations of both. We provide a formal analysis of the applicability and relative performance of these algorithms and evaluate them empirically on a set of real-world schemata  相似文献   

2.
3.
A survey of approaches to automatic schema matching   总被引:76,自引:1,他引:75  
Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component. Received: 5 February 2001 / Accepted: 6 September 2001 Published online: 21 November 2001  相似文献   

4.
模式匹配是模式集成、语义WEB及电子商务等领域的重点及难点问题. 为了有效利用专家知识提高匹配质量, 提出了一种基于部分已验证匹配关系的模式匹配模型. 在该模型中, 首先,人工验证待匹配模式元素间的少量对应关系, 进而推理出当前任务下部分已知的匹配关系及单独匹配器的缺省权重; 然后,基于上述已收集到的先验知识对多种匹配器所生成的相似度矩阵进行合并及调整, 并在全局范围内进行优化; 最后,对优化矩阵的选择性进行评估, 从而为不同匹配任务推荐最合理的候选匹配生成方案. 实验结果表明, 部分已验证匹配关系的使用有助于模式匹配质量的提高.  相似文献   

5.
在模式匹配方面已经出现了许多使用于特定应用领域的部分自动匹配方法,这种匹配方法结合了多种匹配技术以便能够在大规模的多样匹配环境中得到高的匹配率。提出了一种基于模式的元素匹配方法,它融合了语言和约束匹配器,使用了复合元素名称匹配器和神经网络匹配器,结合基于语言的匹配算法和最大优先策略的原则,以多重标准条件下复合名称匹配器的结果作为约束对模式元素进行归类。通过组合使用复合名称匹配器和神经网络匹配器,使得本方法可以应用于更复杂的匹配环境。  相似文献   

6.
Matching large schemas: Approaches and evaluation   总被引:1,自引:0,他引:1  
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas.  相似文献   

7.
不确定模式匹配研究综述   总被引:1,自引:1,他引:1  
模式匹配是数据集成、语义Web等研究领域的重要研究内容,需要依据一定的启发式信息发现模式元素之间的对应关系。鉴于启发式信息处理方法的不同,对模式匹配方法进行了分类,并从模式匹配结果集结方法的角度,介绍了综合模式匹配方法。不确定性是模式匹配过程固有的特性,介绍了建模模式匹配过程中不确定性的数据模型,在此基础上介绍了处理模式匹配过程中不确定性的模式匹配方法。最后对模式匹配研究进行了展望。  相似文献   

8.
Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not sufficient. We propose an evidential approach to combining multiple matchers using Dempster–Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require the use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Finally it uses some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly accurate and effective.  相似文献   

9.
异构数据源集成中的模式映射技术   总被引:4,自引:0,他引:4  
模式映射是异构数据源集成中实现查询重形成(Reformulation)的关键技术,本文首先介绍了模式映射的集中式和非集中式集成体系,总结了定义模式映射的3种基本形式:GAV、LAV和GLAV,重点探讨了模式映射中的核心技术:模式匹配和映射生成,最后讨论了模式映射技术新的研究议题。  相似文献   

10.
模式匹配就是在作为输入的模式中有对应语义关系的元素间产生一个映射.为了提高模式匹配的效率,提出了一种新型的模式匹配方法--源模式分裂模式匹配算法.它可以解决标准模式匹配难以解决的问题:1)源模式的某一个属性和多个目标模式的多个属性之间建立匹配关系;2)表格中的不同元组对应其他表格同一元组的不同属性值的匹配.在匹配过程中,该方法先搜索种类型属性,然后根据种类型属性建立选择条件,最后把源模式进行分裂形成视图,再重新生成候选匹配集合,从而提高模式匹配的质量.  相似文献   

11.
在数据库研究领域,模式匹配和实体统一是被广泛关注的两个方向。随着对Web数据集成需求的增长,无论是在模式和实体层次,研究这两方面问题是很有实际意义的。当前的研究大多针对两项任务的其中之一。在文章中,基于模式匹配促进实体统一的新思路,提出了一种同时解决这两项任务的方法,实现了它们之间的相互促进机制。在现实的Web异构数据源场景中应用该方法,得到的查准率和查全率都很高,证明了该方法的正确性和有效性。  相似文献   

12.
基于函数依赖的结构匹配方法   总被引:2,自引:0,他引:2  
李国徽  杜小坤  胡方晓  杨兵  唐向红 《软件学报》2009,20(10):2667-2678
模式匹配是模式集成、数据仓库、电子商务以及语义查询等领域中的一个基础问题,近来已经成为研究的热点,并取得了丰硕的成果.这些成果主要利用元素(典型的为关系模式中的属性)自身的信息来挖掘元素语义,目前,这方面的研究已经相当成熟.结构信息作为模式中一种重要的信息,能够为提高模式匹配的精确性提供有用的支持,但是目前关于如何利用结构信息提高模式匹配的精确性的研究还很少.将模式元素之间的相似度分为语义相似度(根据元素自身信息得到的相似度)和结构相似度(根据元素之间的关联关系得到的相似度),并采用新的统计方法计算元素间的结构相似度,然后再综合考虑语义相似度得到元素间的相似概率;最后根据相似概率得到模式元素间的映射关系(模式元素之间的对应关系).实验结果表明,该算法在查准率、查全率及全面性等方面都优于已有的其他算法.  相似文献   

13.
基于数据实例分布特征的自动模式匹配方法   总被引:7,自引:0,他引:7  
模式匹配已经成为信息集成、数据仓库、电子商务等很多应用领域中的基本问题。现有的模式匹配工作仍是以人工方式为主,这种方法费时、易出错,代价很高。本文提出了一种基于神经网络的模式匹配方法SMDD,通过分析模式元素所包含数据实例的分布规律,自动完成模式匹配。SMDD既可独立使用,也可与其他模式匹配方法结合使用,从数据内容的角度提高匹配质量。  相似文献   

14.
本文提出一种新的发掘数据库模式间复杂匹配的系统构架CSM。CSM首先通过预处理从数据类型上过滤掉部分不合理的候选匹配,并利用多个具有特殊目的的检索程序分别对选空间的特殊部分进行检索,发掘1:1和复杂匹配,针对被匹配模式中存在不透明列的问题,还可进一步应用补充匹配器找到不透明列间的匹配关系。实验表明,与其它模式匹配
方法相比,CSM不仅能全面地发掘模式间匹配,还具有较高的效率、查全率和查准率。  相似文献   

15.
由于数据源数据模式的自治性、异构性,不确定性是模式匹配过程固有的本质特性。提出了一种基于证据理论的不确定性匹配方法,首先根据属性类型把模式空间分成若干模式子空间;然后将不同的匹配器结果看作不同的证据源,利用不同的匹配器的结果生成了多个基本概率分配函数,采用改进的Dempster组合规则把多个匹配器结果自动组合,减少人工干预,并解决了不同的匹配器结果组合时证据间冲突的问题;最后利用Kuhn Munkres算法获取模式映射。实验结果表明了方法的可行性和有效性。  相似文献   

16.
模式匹配技术是数据集成领域中的关键技术。为了快速、准确地完成模式匹配工作,已经提出了大量的基于各种模式类型的模式匹配方法。本文介绍了现存的模式匹配技术和两种多源模式匹配技术;并且为满足大规模匹配的需要提出了一种改进的多源模式匹配算法。  相似文献   

17.
In this work, we propose a local approach for 2D ear authentication based on an ensemble of matchers trained on different color spaces. This is the first work that proposes to exploit the powerful properties of color analysis for improving the performance of an ear matcher.The method described is based on the selection of color spaces from which a set of Gabor features are extracted. The selection is performed using the sequential forward floating selection where the fitness function is related to the optimization of the ear recognition performance. Finally, the matching step is performed by means of the combination by the sum rule of several 1-nearest neighbor classifiers constructed on different color components.The effectiveness of the proposed method is demonstrated using the Notre-Dame EAR data set. Particularly interesting are the results obtained by the new approach in terms of rank-1 (∼84%), rank-5 (∼93%) and area under the ROC curve (∼98.5%), which are better than those obtained by other state-of-the-art 2D ear matchers.  相似文献   

18.
模式匹配方法研究   总被引:3,自引:0,他引:3  
从模式匹配的定义开始介绍,对已有的模式匹配方法进行分类,探讨了这些方法适用的领域和所能发掘的信息,区分了实例级和模式级、元素级和结构级以及基于自然语言和基于约束的匹配程序,以期在比较不同模式匹配方法或研究新匹配算法以及实现模式匹配组件时有所帮助。  相似文献   

19.
The emergence of increasing number of collaborating organizations has made clear the need for supporting interoperability infrastructures, enabling sharing and exchange of data among organizations. Schema matching and schema integration are the crucial components of the interoperability infrastructures, and their semi-automation to interrelate or integrate heterogeneous and autonomous databases in collaborative networks is desired. The Semi-Automatic Schema Matching and INTegration (SASMINT) System introduced in this paper identifies and resolves several important syntactic, semantic, and structural conflicts among schemas of relational databases to find their likely matches automatically. Furthermore, after getting the user validation on the matched results, it proposes an integrated schema. SASMINT uses a combination of a variety of metrics and algorithms from the Natural Language Processing and Graph Theory domains for its schema matching. For the schema integration, it utilizes a number of derivation rules defined in the scope of the research work explained in this paper. Furthermore, a derivation language called SASMINT Derivation Markup Language (SDML) is defined for capturing and formulating both the results of matching and the integration that can be further used, for example for federated query processing from independent databases. In summary, the paper focuses on addressing: (1) conflicts among schemas that make automatic schema matching and integration difficult, (2) the main components of the SASMINT approach and system, (3) in-depth exploration of SDML, (4) heuristic rules designed and implemented as part of the schema integration component of the SASMINT system, and (5) experimental evaluation of SASMINT.  相似文献   

20.
一种基于数据挖掘的Deep Web模式匹配方法   总被引:1,自引:0,他引:1  
模式匹配是Deep Web异构信息集成中的关键问题.介绍了一种整体性匹配方法,即同时发现大量模式,并一次性进行匹配.主要通过分析和比较两种已经存在的大规模模式匹配原型系统:MGS和DCM,结合它们核心算法的优点,提出一种新的基于数据挖掘技术的算法(Correlated-clustering).该算法先利用积极相关发现组匹配,再通过概念相似度的计算聚类同义属性,最后进行匹配选择.实验结果表明,本算法全面、效率高,充分体现了整体性方法的思想.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号