首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
一种基于二分图最优匹配的重复记录检测算法   总被引:1,自引:0,他引:1  
信息集成系统中存在重复记录,重复记录的存在为数据处理和分析带来了困难.重复记录检测已经成为当前数据库研究中的热点问题之一.目前的方法主要集中在计算具有同样数据类型属性的相似性上,而现实系统中存在大量具有不同数据类型、不同模式的记录.针对具有多种类型不同模式数据的重复记录检测问题,提出了一种基于二分图的最优匹配的记录相似度计算方法,并基于这种记录相似性提出了重复记录检测算法.理论分析和实验结果都表明了方法的正确性和有效性.  相似文献   

2.
This paper addresses the problem of handling semantic heterogeneity during database schema integration. We focus on the semantics of terms used as identifiers in schema definitions. Our solution does not rely on the names of the schema elements or the structure of the schemas. Instead, we utilize formal ontologies consisting of intensional definitions of terms represented in a logical language. The approach is based on similarity relations between intensional definitions in different ontologies. We present the definitions of similarity relations based on intensional definitions in formal ontologies. The extensional consequences of intensional relations are addressed. The paper shows how similarity relations are discovered by a reasoning system using a higher-level ontology. These similarity relations are then used to derive an integrated schema in two steps. First, we show how to use similarity relations to generate the class hierarchy of the global schema. Second, we explain how to enhance the class definitions with attributes. This approach reduces the cost of generating or re-generating global schemas for tightly-coupled federated databases.  相似文献   

3.
一种有效的贪婪模式匹配算法   总被引:2,自引:0,他引:2  
模式匹配问题是意图获得两个模式中所包含个体对象之间的语义匹配和映射,其结果表示源模式的个体对象与目标模式的个体对象之间存在特定的语义关联.它在数据库应用领域起到关键性的作用,例如数据集成、电子商务、数据仓库、XML消息交换等,特别地,它已成为元数据管理的基本问题.然而,模式匹配很大程度上依赖人工的操作,是一个费时费力的过程.模式匹配问题可以归约为一个组合优化问题:多标记图匹配问题.首先,将模式表示为多标记图,将模式匹配转换为多标记图匹配问题.其次,提出多标记图的相似性度量方法,进而提出基于多标记图相似性的模式匹配目标优化函数.最后,在这个目标函数基础上设计实现了一个贪婪匹配算法,其最显著的特点是综合多种可用的标记信息,灵活准确地获得最优的匹配结果.  相似文献   

4.
模式匹配是模式集成、语义WEB及电子商务等领域的重点及难点问题. 为了有效利用专家知识提高匹配质量, 提出了一种基于部分已验证匹配关系的模式匹配模型. 在该模型中, 首先,人工验证待匹配模式元素间的少量对应关系, 进而推理出当前任务下部分已知的匹配关系及单独匹配器的缺省权重; 然后,基于上述已收集到的先验知识对多种匹配器所生成的相似度矩阵进行合并及调整, 并在全局范围内进行优化; 最后,对优化矩阵的选择性进行评估, 从而为不同匹配任务推荐最合理的候选匹配生成方案. 实验结果表明, 部分已验证匹配关系的使用有助于模式匹配质量的提高.  相似文献   

5.
模式匹配就是在作为输入的模式中有对应语义关系的元素间产生一个映射.为了提高模式匹配的效率,提出了一种新型的模式匹配方法--源模式分裂模式匹配算法.它可以解决标准模式匹配难以解决的问题:1)源模式的某一个属性和多个目标模式的多个属性之间建立匹配关系;2)表格中的不同元组对应其他表格同一元组的不同属性值的匹配.在匹配过程中,该方法先搜索种类型属性,然后根据种类型属性建立选择条件,最后把源模式进行分裂形成视图,再重新生成候选匹配集合,从而提高模式匹配的质量.  相似文献   

6.
《Information Sciences》2005,169(1-2):27-46
Information integration for distributed and heterogeneous data sources is still an open challenging, and schema matching is critical in this process. This paper presents an approach to automatic elements matching between XML application schemas using similarity measure and relaxation labeling. The semantic modeling of XML application schema has also been presented. The similarity measure method considers element categories and their properties. In an effort to achieve an optimal matching, contextual constraints are used in the relaxation labeling method. Based on the semantic modeling of XML application schemas, the compatible constraint coefficients are devised in terms of the structures and semantic relationships as defined in the semantic model. To examine the effectiveness of the proposed methods, an algorithm for XML schema matching has been developed, and corresponding computational experiments show that the proposed approach has a high degree of accuracy.  相似文献   

7.
Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.  相似文献   

8.
李蓉蓉  王晖  陈冉 《计算机科学》2011,38(12):151-155
近年来,模式匹配作为Web信息集成管理与应用中的重要问题,得到了广泛关注和研究。已有模式匹配方法大多是基于模式信息的,对数据实例信息利用则较少。针对数据集成环境下模式信息不全或存在冲突的模式信息导致模式匹配结果不正确的问题,给出了计算属性间语义相似性的方法以提高模式匹配的性能,分析了模式内语义相近多属性间的语义差别,进一步给出了基于带权二分图最大化算法的模式匹配方法。通过实验,说明基于实例集合语义相似的模式匹配方法能在模式信息不全面或存在冲突的情况下,得到更完整、更准确的模式匹配。  相似文献   

9.
10.
Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not sufficient. We propose an evidential approach to combining multiple matchers using Dempster–Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require the use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Finally it uses some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly accurate and effective.  相似文献   

11.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

12.
View integration: a step forward in solving structural conflicts   总被引:1,自引:0,他引:1  
Thanks to the development of the federated systems approach on the one hand and the emphasis on user involvement in database design on the other, the interest in schema integration techniques is significantly increasing. Theories, methods and tools have been proposed. Conflict resolution is the key issue. Different perceptions by schema designers may lead to different representations. A way must be found to support these different representations within a single system. Most current integration methodologies rely on modification of initial schemas to solve the conflicts. This approach needs a strong interaction with the database administrator, who has authority to modify the initial schemas. This paper presents an approach to view integration specifically intended to support the coexistence of different representations of the same real-world objects. The main characteristics of this approach are the following: automatic resolution of structural conflicts, conflict resolution performed without modification of initial views, use of a formal declarative approach for user definition of inter-view correspondences, applicability to a variety of data models, and automatic generation of structural and operational mappings between the views and the integrated schema. Allowing users' views to be kept unchanged should result in improved user satisfaction. Each user is able to define his own view of the database, without having to conform to some other user's view. Moreover, such a feature is essential in database integration if existing programs are to be preserved  相似文献   

13.
基于GLAV集成的系统具体更好的伸缩性.本文研究了这种集成系统中的模式匹配方法,并基于模式树进行匹配,实现了全局模式与源模式之间的映射.  相似文献   

14.
Model independent assertions for integration of heterogeneous schemas   总被引:3,自引:0,他引:3  
Due to the proliferation of database applications, the integration of existing databases into a distributed or federated system is one of the major challenges in responding to enterprises' information requirements. Some proposed integration techniques aim at providing database administrators (DBAs) with a view definition language they can use to build the desired integrated schema. These techniques leave to the DBA the responsibility of appropriately restructuring schema elements from existing local schemas and of solving inter-schema conflicts. This paper investigates theassertion-based approach, in which the DBA's action is limited to pointing out corresponding elements in the schemas and to defining the nature of the correspondence in between. This methodology is capable of: ensuring better integration by taking into account additional semantic information (assertions about links); automatically solving structural conflicts; building the integrated schema without requiring conforming of initial schemas; applying integration rules to a variety of data models; and performing view as well as database integration. This paper presents the basic ideas underlying our approach and focuses on resolution of structural conflicts.  相似文献   

15.
Towards Deeper Understanding of the Search Interfaces of the Deep Web   总被引:2,自引:0,他引:2  
Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta-information; however, the schema is not formally defined in HTML. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we first propose a schema model for representing complex search interfaces, and then present a layout-expression based approach to automatically extract the logical attributes from search interfaces. We also rephrase the identification of different types of semantic information as a classification problem, and design several Bayesian classifiers to help derive semantic information from extracted attributes. A system, WISE-iExtractor, has been implemented to automatically construct the schema from any Web search interfaces. Our experimental results on real search interfaces indicate that this system is highly effective.  相似文献   

16.
Deep Web集成服务的不确定模式匹配   总被引:5,自引:0,他引:5  
随着Deep Web的迅猛发展,从高度自治、异构及动态变化的Web数据库中,为用户提供高质量的数据逐渐成为当前Deep Web集成服务的一个研究热点.在大部分Web数据库只能通过查询接口为用户提供服务的前提下,如何建立用户请求与集成查询接口模式之间以及集成查询接口模式与Web数据库查询接口模式之间的匹配关系,是Deep Web集成服务中进行合理的用户请求转换的关键.之前的相关工作都是寻找最佳的匹配结果,回避匹配的不确定性,丢弃了可能有价值的其他匹配结果.文中首先剖析了请求转换中模式匹配的不确定性,提出了数字类型的相似度计算方法,给出了进行数字类型的模式匹配的有效的剪枝方法以及数据类型驱动的模式匹配优化方法,并在此基础上提出了一种基于相似度计算的不确定性模式匹配方法,最后通过大量的实验证明了该方法的有效性.  相似文献   

17.
基于函数依赖的结构匹配方法   总被引:2,自引:0,他引:2  
李国徽  杜小坤  胡方晓  杨兵  唐向红 《软件学报》2009,20(10):2667-2678
模式匹配是模式集成、数据仓库、电子商务以及语义查询等领域中的一个基础问题,近来已经成为研究的热点,并取得了丰硕的成果.这些成果主要利用元素(典型的为关系模式中的属性)自身的信息来挖掘元素语义,目前,这方面的研究已经相当成熟.结构信息作为模式中一种重要的信息,能够为提高模式匹配的精确性提供有用的支持,但是目前关于如何利用结构信息提高模式匹配的精确性的研究还很少.将模式元素之间的相似度分为语义相似度(根据元素自身信息得到的相似度)和结构相似度(根据元素之间的关联关系得到的相似度),并采用新的统计方法计算元素间的结构相似度,然后再综合考虑语义相似度得到元素间的相似概率;最后根据相似概率得到模式元素间的映射关系(模式元素之间的对应关系).实验结果表明,该算法在查准率、查全率及全面性等方面都优于已有的其他算法.  相似文献   

18.
冯永  张洋 《计算机应用》2012,32(6):1688-1691
查询接口模式匹配是Deep Web信息集成中的关键部分,双重相关性挖掘方法(DCM)能有效利用关联挖掘方法解决复杂接口模式匹配问题。针对DCM方法在匹配效率、匹配准确性方面的不足,提出了一种基于匹配度和语义相似度的新模式匹配方法。该方法首先使用矩阵存储属性间的关联关系,然后采用匹配度计算属性间的相关度,最后利用语义相似度计算候选匹配的相似性。通过在美国伊利诺斯大学的BAMM数据集上进行实验,所提方法与DCM及其改进方法比较有更高的匹配效率和准确性,表明该方法能更好地处理接口之间模式匹配问题。  相似文献   

19.
为了解决多源异构民航旅客服务数据集成过程中存在多模式匹配的效率不高、精确性不足、完整模式信息获取难度较大等问题,提出了一种基于SimHash和混合相似度的多模式匹配方法。该方法首先基于PMI计算特征单元权重,并通过SimHash算法构造属性列的签名来表示属性特征,以降低特征维度,进而引入K-means++算法对属性聚类并生成候选匹配集。最后基于属性的混合相似度构建属性映射图,以直观的方式展示属性间的匹配关系,同时提高多模式匹配效率。实验结果表明该方法具有可行性,为高效地解决多源异构民航旅客服务数据集成中的模式冲突问题提供新的解决方案。  相似文献   

20.
基于《知网》的中文Deep Web模式匹配算法研究   总被引:1,自引:1,他引:0  
金玉  范学峰 《计算机应用研究》2009,26(10):3750-3753
随着数据库在Internet中的应用日益广泛,Deep Web集成(即Web数据库集成)成为当前信息领域的研究热点,模式匹配是Deep Web查询接口集成中的一个关键问题。目前大多数这方面的研究都是基于英文的,针对这种情况,探讨了中文Deep Web查询接口的模式匹配方法,并提出了一种基于《知网》、面向中文语义的模式匹配算法,并利用属性在查询接口上的相对位置信息解决语义冲突。手工收集查询表单对算法进行验证,实验表明该方法能使得接口之间属性匹配的正确率达到90 %以上。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号