期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

许峰满振梅王志坚《计算机工程》2006,32(6):40-41

随着网络技术的发展，未来的信息处理需要一种对大量的、异构的数据源的统一存取手段，多数据源集成就是研究这一问题。而模式匹配是数据集成领域中一个基本的问题，它主要考虑全局模式和局部模式之间的匹配。文章提出了一种解决模式匹配问题中数据模式复杂语义冲突的方法，并将其集成到一个统一的模式。相似文献

2.

不确定模式匹配研究综述 总被引：2，自引：1，他引：1

翁年凤刁兴春曹建军冯径《计算机科学》2011,38(12):1-5

模式匹配是数据集成、语义Web等研究领域的重要研究内容,需要依据一定的启发式信息发现模式元素之间的对应关系。鉴于启发式信息处理方法的不同,对模式匹配方法进行了分类,并从模式匹配结果集结方法的角度,介绍了综合模式匹配方法。不确定性是模式匹配过程固有的特性,介绍了建模模式匹配过程中不确定性的数据模型,在此基础上介绍了处理模式匹配过程中不确定性的模式匹配方法。最后对模式匹配研究进行了展望。相似文献

3.

Modeling and manipulating the structure of hierarchical schemas for the web

Theodore Dalamagas Alexandra Meliou 《Information Sciences》2008,178(4):985-1010

The Semantic Web is the next step of the current Web where information will become more machine-understandable to support effective data discovery and integration. Hierarchical schemas, either in the form of tree-like structures (e.g., DTDs, XML schemas), or in the form of hierarchies on a category/subcategory basis (e.g., thematic hierarchies of portal catalogs), play an important role in this task. They are used to enrich semantically the available information. Up to now, hierarchical schemas have been treated rather as sets of individual elements, acting as semantic guides for browsing or querying data. Under that view, queries like “find the part of a portal catalog which is not present in another catalog” can be answered only in a procedural way, specifying which nodes to select and how to get them. For this reason, we argue that hierarchical schemas should be treated as full-fledged objects so as to allow for their manipulation. This work proposes models and operators to manipulate the structural information of hierarchies, considering them as first-class citizens. First, we explore the algebraic properties of trees representing hierarchies, and define a lattice algebraic structure on them. Then, turning this structure into a boolean algebra, we present the operators S-union, S-intersection and S-difference to support structural manipulation of hierarchies. These operators have certain algebraic properties to provide clear semantics and assist the transformation, simplification and optimization of sequences of operations using laws similar to those of set theory. Also, we identify the conditions under which this framework is applicable. Finally, we demonstrate an application of our framework for manipulating hierarchical schemas on tree-like hierarchies encoded as RDF/s files. 相似文献

4.

A runtime approach to model-generic translation of schema and data

Paolo Atzeni Luigi Bellomarini Francesca Bugiotti Fabrizio Celli Giorgio Gianforme 《Information Systems》2012,37(3):269-287

To support heterogeneity is a major requirement in current approaches to integration and transformation of data. This paper proposes a new approach to the translation of schema and data from one data model to another, and we illustrate its implementation in the tool MIDST-RT.We leverage on our previous work on MIDST, a platform conceived to perform translations in an off-line fashion. In such an approach, the source database (both schema and data) is imported into a repository, where it is stored in a universal model. Then, the translation is applied within the tool as a composition of elementary transformation steps, specified as Datalog programs. Finally, the result (again both schema and data) is exported into the operational system.Here we illustrate a new, lightweight approach where the database is not imported. MIDST-RT needs only to know the schema of the source database and the model of the target one, and generates views on the operational system that expose the underlying data according to the corresponding schema in the target model. Views are generated in an almost automatic way, on the basis of the Datalog rules for schema translation.The proposed solution can be applied to different scenarios, which include data and application migration, data interchange, and object-to-relational mapping between applications and databases. 相似文献

5.

Comparing the understandability of alternative data warehouse schemas: An empirical study

David SchuffAuthor Vitae Karen CorralAuthor VitaeOzgur TuretkenAuthor Vitae 《Decision Support Systems》2011,52(1):9-20

An easily understood data warehouse model enables users to better identify and retrieve its data. It also makes it easier for users to suggest changes to its structure and content. Through an exploratory, empirical study, we compared the understandability of the star and traditional relational schemas. The results of our experiment contradict previous findings and show schema type did not lead to significant performance differences for a content identification task. Further, the relational schema actually led to slightly better results for a schema augmentation task. We discuss the implications of these findings for data warehouse design and future research. 相似文献

6.

Model-independent schema translation

Paolo Atzeni Paolo Cappellari Riccardo Torlone Philip A. Bernstein Giorgio Gianforme 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(6):1347-1370

相似文献

7.

Tuning the ensemble selection process of schema matchers

Avigdor Gal Tomer Sagi 《Information Systems》2010

Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers. 相似文献

8.

Web Service aggregation with string distance ensembles and active probe selection 总被引：1，自引：0，他引：1

Eddie Johnston Nicholas Kushmerick 《Information Fusion》2008,9(4):481-500

The adoption of standards for exchanging information across the Web presents both new opportunities and important challenges for data integration and aggregation. Although Web Services simplify the discovery and access of information sources, the problem of semantic heterogeneity remains: how to find semantic correspondences across the data being integrated.In this paper, we explore these issues in the context of Web Services, and propose OATS, a novel algorithm for schema matching that is specifically suited to Web Service data aggregation. We show how probing Web Services with a small set of related queries results in semantically correlated data instances which greatly simplifies the matching process, and demonstrate that the use of an ensemble of string distance metrics in matching data instances performs better than individual metrics. We also show how the choice of probe queries has a dramatic effect on matching accuracy. Motivated by this observation, we describe and evaluate an machine learning approach to selecting probes to maximise accuracy while minimising cost. 相似文献

9.

数据集成综述 总被引：65，自引：2，他引：63

陈跃国王京春《计算机科学》2004,31(5):48-51

介绍了数据集成的基本概念及其难点,讨论并比较了模式集成方法、数据复制方法以及综合数据集成方法,详细阐述了数据集成的主要难点——数据源的异构性,最后对数据集成的研究前景做出了展望。相似文献

10.

基于模式映射的查询计划生成算法

李由刘东波张维明《计算机科学》2006,33(3):125-128

因特网的迅速发展使得多数据源综合集成日益重要.但是,不同数据源之间数据结构和语义的异构性导致数据集成是相当困难的.本文提出了一种基于模式映射的查询计划生成算法.该算法在正确定义映射规则的前提下,根据不同的查询条件和不同的数据源模式,自动构造查询计划,并保证结果数据满足目标模式结构与引用完整性要求. 相似文献

11.

An ontology based approach to the integration of entity–relationship schemas

Qi Tok Wang 《Data & Knowledge Engineering》2006,58(3):299-326

In schema integration, schematic discrepancies occur when data in one database correspond to metadata in another. We explicitly declare the context that is the meta information relating to the source, classification, property etc. of entities, relationships or attribute values in entity–relationship (ER) schemas. We present algorithms to resolve schematic discrepancies by transforming metadata into the attribute values of entity types, keeping the information and constraints of original schemas. Although focusing on the resolution of schematic discrepancies, our technique works seamlessly with the existing techniques resolving other semantic heterogeneities in schema integration. 相似文献

12.

异构数据源集成中的模式映射技术 总被引：4，自引：0，他引：4

杨雪梅董逸生王永利钱江波钱刚《计算机科学》2006,33(7):87-91

模式映射是异构数据源集成中实现查询重形成（Reformulation）的关键技术,本文首先介绍了模式映射的集中式和非集中式集成体系,总结了定义模式映射的3种基本形式：GAV、LAV和GLAV,重点探讨了模式映射中的核心技术：模式匹配和映射生成,最后讨论了模式映射技术新的研究议题。相似文献

13.

Double-layered schema integration of heterogeneous XML sources

Hong-Quang NguyenAuthor Vitae David Taniar^{Author Vitae} 《Journal of Systems and Software》2011,84(1):63-76

Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance. 相似文献

14.

模式匹配中的依赖冲突

杜小坤李艳红涂韬《计算机科学》2015,42(4):235-239

通过分析已有匹配方法的缺陷,提出了一种利用依赖冲突选取匹配关系的新方法.首先为目标模式中每个元素选取候选匹配,然后计算每个全局匹配方案的冲突值,最后选取冲突值最小的匹配方案作为最终结果.实验表明,该方法能够显著提高匹配结果的准确率,并使得后续数据映射结果的优化操作更省时. 相似文献

15.

支持Web深层数据库网格的部分关键技术的研究

申德荣聂铁铮余恩运寇月于戈《计算机科学》2007,34(8):123-125

当前,深层Web数据库数量急剧增加,然而其知识并没有得到有效的利用.本文提出将特定的深层Web数据库应用于网格环境中的思想,并针对支持深层Web数据库网格的部分关键技术进行研究,主要包括：（1）深层Web数据库元信息定义模型与模式抽取模型的研究;（2）多层次的模式匹配模型和自协调模型研究;（3）基于属性松弛的Web数据库查询与集成模型研究;（4）多目标函数代价模型和面向局部性的自适应优化调度模型研究.研发成果将为构建深层Web数据库网格提供良好的支持,就像网格的概念所定义的一样,为用户提供统一的接口,可按需为消费者提供集成的深层Web数据知识.其具有广阔的应用前景. 相似文献

16.

Improving the usability of standard schemas

Jiemin Zhang April Webster Michael Lawrence Madhav Nepal Rachel Pottinger Sheryl Staub-French Melanie Tory 《Information Systems》2011

Due to the development of XML and other data models such as OWL and RDF, sharing data is an increasingly common task since these data models allow simple syntactic translation of data between applications. However, in order for data to be shared semantically, there must be a way to ensure that concepts are the same. One approach is to employ commonly usedschemas—called standard schemas —which help guarantee that syntactically identical objects have semantically similar meanings. As a result of the spread of data sharing, there has been widespread adoption of standard schemas in a broad range of disciplines and for a wide variety of applications within a very short period of time. However, standard schemas are still in their infancy and have not yet matured or been thoroughly evaluated. It is imperative that the data management research community takes a closer look at how well these standard schemas have fared in real-world applications to identify not only their advantages, but also the operational challenges that real users face. 相似文献

17.

A survey of approaches to automatic schema matching 总被引：75，自引：1，他引：75

Erhard Rahm Philip A. Bernstein 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):334-350

Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component. Received: 5 February 2001 / Accepted: 6 September 2001 Published online: 21 November 2001 相似文献

18.

Schema mediation for large-scale semantic data sharing

Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(1):68-83

Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003Edited by: V. Atluri 相似文献

19.

一种基于部分已验证匹配关系的模式匹配模型

下载免费PDF全文

黄少滨刘国峰万庆生程媛申林山《自动化学报》2013,39(10):1642-1652

模式匹配是模式集成、语义WEB及电子商务等领域的重点及难点问题. 为了有效利用专家知识提高匹配质量, 提出了一种基于部分已验证匹配关系的模式匹配模型. 在该模型中, 首先,人工验证待匹配模式元素间的少量对应关系, 进而推理出当前任务下部分已知的匹配关系及单独匹配器的缺省权重; 然后,基于上述已收集到的先验知识对多种匹配器所生成的相似度矩阵进行合并及调整, 并在全局范围内进行优化; 最后,对优化矩阵的选择性进行评估, 从而为不同匹配任务推荐最合理的候选匹配生成方案. 实验结果表明, 部分已验证匹配关系的使用有助于模式匹配质量的提高. 相似文献

20.

异构信息源集成系统的模式集成研究

张桂香《微计算机信息》2007,23(15):233-234

随着社会的发展,越来越多的企业开始将工作重心向集成体系结构转移。因此,对异构信息源集成的要求是非常迫切并会长期存在下去。本文提出了一种基于Java的关系数据模型(JIDM)作为集成系统的公共数据模型。在该模型的基础上,介绍了全局模式、输出模式以及局部模式之间的映射关系,解决了JIDM模型与关系模型、XML文件以及面向对象模型之间的映射问题。相似文献