共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
The Semantic Web is the next step of the current Web where information will become more machine-understandable to support effective data discovery and integration. Hierarchical schemas, either in the form of tree-like structures (e.g., DTDs, XML schemas), or in the form of hierarchies on a category/subcategory basis (e.g., thematic hierarchies of portal catalogs), play an important role in this task. They are used to enrich semantically the available information. Up to now, hierarchical schemas have been treated rather as sets of individual elements, acting as semantic guides for browsing or querying data. Under that view, queries like “find the part of a portal catalog which is not present in another catalog” can be answered only in a procedural way, specifying which nodes to select and how to get them. For this reason, we argue that hierarchical schemas should be treated as full-fledged objects so as to allow for their manipulation. This work proposes models and operators to manipulate the structural information of hierarchies, considering them as first-class citizens. First, we explore the algebraic properties of trees representing hierarchies, and define a lattice algebraic structure on them. Then, turning this structure into a boolean algebra, we present the operators S-union, S-intersection and S-difference to support structural manipulation of hierarchies. These operators have certain algebraic properties to provide clear semantics and assist the transformation, simplification and optimization of sequences of operations using laws similar to those of set theory. Also, we identify the conditions under which this framework is applicable. Finally, we demonstrate an application of our framework for manipulating hierarchical schemas on tree-like hierarchies encoded as RDF/s files. 相似文献
4.
Paolo Atzeni Luigi Bellomarini Francesca Bugiotti Fabrizio Celli Giorgio Gianforme 《Information Systems》2012,37(3):269-287
To support heterogeneity is a major requirement in current approaches to integration and transformation of data. This paper proposes a new approach to the translation of schema and data from one data model to another, and we illustrate its implementation in the tool MIDST-RT.We leverage on our previous work on MIDST, a platform conceived to perform translations in an off-line fashion. In such an approach, the source database (both schema and data) is imported into a repository, where it is stored in a universal model. Then, the translation is applied within the tool as a composition of elementary transformation steps, specified as Datalog programs. Finally, the result (again both schema and data) is exported into the operational system.Here we illustrate a new, lightweight approach where the database is not imported. MIDST-RT needs only to know the schema of the source database and the model of the target one, and generates views on the operational system that expose the underlying data according to the corresponding schema in the target model. Views are generated in an almost automatic way, on the basis of the Datalog rules for schema translation.The proposed solution can be applied to different scenarios, which include data and application migration, data interchange, and object-to-relational mapping between applications and databases. 相似文献
5.
David SchuffAuthor Vitae Karen CorralAuthor VitaeOzgur TuretkenAuthor Vitae 《Decision Support Systems》2011,52(1):9-20
An easily understood data warehouse model enables users to better identify and retrieve its data. It also makes it easier for users to suggest changes to its structure and content. Through an exploratory, empirical study, we compared the understandability of the star and traditional relational schemas. The results of our experiment contradict previous findings and show schema type did not lead to significant performance differences for a content identification task. Further, the relational schema actually led to slightly better results for a schema augmentation task. We discuss the implications of these findings for data warehouse design and future research. 相似文献
6.
7.
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers. 相似文献
8.
The adoption of standards for exchanging information across the Web presents both new opportunities and important challenges for data integration and aggregation. Although Web Services simplify the discovery and access of information sources, the problem of semantic heterogeneity remains: how to find semantic correspondences across the data being integrated.In this paper, we explore these issues in the context of Web Services, and propose OATS, a novel algorithm for schema matching that is specifically suited to Web Service data aggregation. We show how probing Web Services with a small set of related queries results in semantically correlated data instances which greatly simplifies the matching process, and demonstrate that the use of an ensemble of string distance metrics in matching data instances performs better than individual metrics. We also show how the choice of probe queries has a dramatic effect on matching accuracy. Motivated by this observation, we describe and evaluate an machine learning approach to selecting probes to maximise accuracy while minimising cost. 相似文献
9.
10.
11.
In schema integration, schematic discrepancies occur when data in one database correspond to metadata in another. We explicitly declare the context that is the meta information relating to the source, classification, property etc. of entities, relationships or attribute values in entity–relationship (ER) schemas. We present algorithms to resolve schematic discrepancies by transforming metadata into the attribute values of entity types, keeping the information and constraints of original schemas. Although focusing on the resolution of schematic discrepancies, our technique works seamlessly with the existing techniques resolving other semantic heterogeneities in schema integration. 相似文献
12.
13.
Hong-Quang NguyenAuthor Vitae David TaniarAuthor Vitae 《Journal of Systems and Software》2011,84(1):63-76
Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance. 相似文献
14.
15.
当前,深层Web数据库数量急剧增加,然而其知识并没有得到有效的利用.本文提出将特定的深层Web数据库应用于网格环境中的思想,并针对支持深层Web数据库网格的部分关键技术进行研究,主要包括:(1)深层Web数据库元信息定义模型与模式抽取模型的研究;(2)多层次的模式匹配模型和自协调模型研究;(3)基于属性松弛的Web数据库查询与集成模型研究;(4)多目标函数代价模型和面向局部性的自适应优化调度模型研究.研发成果将为构建深层Web数据库网格提供良好的支持,就像网格的概念所定义的一样,为用户提供统一的接口,可按需为消费者提供集成的深层Web数据知识.其具有广阔的应用前景. 相似文献
16.
Jiemin Zhang April Webster Michael Lawrence Madhav Nepal Rachel Pottinger Sheryl Staub-French Melanie Tory 《Information Systems》2011
Due to the development of XML and other data models such as OWL and RDF, sharing data is an increasingly common task since these data models allow simple syntactic translation of data between applications. However, in order for data to be shared semantically, there must be a way to ensure that concepts are the same. One approach is to employ commonly usedschemas—called standard schemas —which help guarantee that syntactically identical objects have semantically similar meanings. As a result of the spread of data sharing, there has been widespread adoption of standard schemas in a broad range of disciplines and for a wide variety of applications within a very short period of time. However, standard schemas are still in their infancy and have not yet matured or been thoroughly evaluated. It is imperative that the data management research community takes a closer look at how well these standard schemas have fared in real-world applications to identify not only their advantages, but also the operational challenges that real users face. 相似文献
17.
A survey of approaches to automatic schema matching 总被引:75,自引:1,他引:75
Erhard Rahm Philip A. Bernstein 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):334-350
Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing,
and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant
limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of
the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches,
and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level
and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous
match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of
past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and
when implementing a schema matching component.
Received: 5 February 2001 / Accepted: 6 September 2001 Published online: 21 November 2001 相似文献
18.
Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(1):68-83
Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003Edited by: V. Atluri 相似文献
19.
模式匹配是模式集成、语义WEB及电子商务等领域的重点及难点问题. 为了有效利用专家知识提高匹配质量, 提出了一种基于部分已验证匹配关系的模式匹配模型. 在该模型中, 首先,人工验证待匹配模式元素间的少量对应关系, 进而推理出当前任务下部分已知的匹配关系及单独匹配器的缺省权重; 然后,基于上述已收集到的先验知识对多种匹配器所生成的相似度矩阵进行合并及调整, 并在全局范围内进行优化; 最后,对优化矩阵的选择性进行评估, 从而为不同匹配任务推荐最合理的候选匹配生成方案. 实验结果表明, 部分已验证匹配关系的使用有助于模式匹配质量的提高. 相似文献
20.
随着社会的发展,越来越多的企业开始将工作重心向集成体系结构转移。因此,对异构信息源集成的要求是非常迫切并会长期存在下去。本文提出了一种基于Java的关系数据模型(JIDM)作为集成系统的公共数据模型。在该模型的基础上,介绍了全局模式、输出模式以及局部模式之间的映射关系,解决了JIDM模型与关系模型、XML文件以及面向对象模型之间的映射问题。 相似文献