首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance.  相似文献   

2.
Indexing and querying XML using extended Dewey labeling scheme   总被引:1,自引:0,他引:1  
Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance.  相似文献   

3.
XML是W3C组织于1998年2月发布的一种标记语言标准,其具有易于扩展、结构性强、交互性好、语义丰富、基于内容的数据标识、可格式化、易于处理、与平台无关的特点,使得数据层在XML技术的支持下得到统一。通过对海洋温盐深数据进行结构分析,本文设计了温盐深数据XML Schema,定义了温盐深数据的XML数据结构。  相似文献   

4.
XML instances are not necessarily self-contained but may have connections to remote XML data residing on other servers. In this paper, we show that—in spite of its minor support and use in the XML world—the XLink language provides a powerful mechanism for expressing such links both from the modeling point of view and for actually querying interlinked XML data: in our dbxlink approach, the links are not seen as explicit links (where the users must be aware of the links and traverse them explicitly in their queries), but define views that combine into a logical, transparent XML model which serves as an external schema and can be queried by XPath/XQuery. We motivate the underlying modeling and give a concise and declarative specification as an XML-to-XML mapping. We also describe the implementation of the model as an extension of the eXist [eXist: an Open Source Native XML Database, http://exist-db.org/] XML database system. The approach can be applied both for distribution of data and for integration of data from autonomous sources.  相似文献   

5.
This paper studies how to enable an effective ranked retrieval over data with categorical attributes, in particular, by supporting personalized ranked retrieval of highly relevant data. While ranked retrieval has been actively studied lately, existing efforts have focused only on supporting ranking over numerical or text data. However, many real-life data contain a large amount of categorical attributes, in combination with numerical and text attributes, which cannot be efficiently supported - unlike numerical attributes where a natural ordering is inherent, the existence of categorical attributes with no such ordering complicates both the formulation and processing of ranking. This paper studies the efficient and effective support of ranking over categorical data, as well as uniform support with other types of attributes.  相似文献   

6.
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Vincent T. Y. NgEmail:
  相似文献   

7.
Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this article, we initiate research on the anytime behavior of top-k algorithms on exact and fuzzy data. In particular, given specific top-k algorithms (TA and TA-Sorted) we are interested in studying their progress toward identification of the correct result at any point during the algorithms’ execution. We adopt a probabilistic approach where we seek to report at any point of operation of the algorithm the confidence that the top-k result has been identified. Such a functionality can be a valuable asset when one is interested in reducing the runtime cost of top-k computations. We present a thorough experimental evaluation to validate our techniques using both synthetic and real data sets.  相似文献   

8.
文中根据离散模式的特点及其图的蕴含概念引入了独特的模式蕴含概念,经研究得出结论:如果模式2蕴含于模式1,则在已知模式1的查询结果时,求模式2时的搜索空间可大大地减小,另外,如果已知模式2的查询结果时,则至少立即就可知模式l的若干个查询结果。这一研究成果被直接用于改善查询效率。  相似文献   

9.
Gae-won You 《Information Sciences》2008,178(20):3925-3942
As data of an unprecedented scale are becoming accessible on the Web, personalization, of narrowing down the retrieval to meet the user-specific information needs, is becoming more and more critical. For instance, while web search engines traditionally retrieve the same results for all users, they began to offer beta services to personalize the results to adapt to user-specific contexts such as prior search history or other application contexts. In a clear contrast to search engines dealing with unstructured text data, this paper studies how to enable such personalization in the context of structured data retrieval. In particular, we adopt contextual ranking model to formalize personalization as a cost-based optimization over collected contextual rankings. With this formalism, personalization can be abstracted as a cost-optimal retrieval of contextual ranking, closely matching user-specific retrieval context. With the retrieved matching context, we adopt a machine learning approach, to effectively and efficiently identify the ideal personalized ranked results for this specific user. Our empirical evaluations over synthetic and real-life data validate both the efficiency and effectiveness of our framework.  相似文献   

10.
XML documents are becoming popular for business process integration. To achieve interoperability between applications, XML documents must also conform to various commonly used data type definitions (DTDs). However, most business data are not maintained as XML documents. They are stored in various native formats, such as database tables or LDAP directories. Hence, a middleware is needed to dynamically generate XML documents conforming to predefined DTDs from various data sources. As industrial consortia and large corporations have created various DTDs, it is both challenging and time-consuming to design the necessary middleware to conform to so many different DTDs. This problem is particularly acute for a small- or medium-sized enterprise because it lacks the IT skills to quickly develop such a middleware. In this paper, we present XLE, an XML Lightweight Extractor, as a practical approach to dynamically extracting DTD-conforming XML documents from heterogeneous data sources. XLE is based on a framework called DTD source annotation (DTDSA). It treats a DTD as the control structure of a program. The annotations become the program statements, such as functions and assignments. DTD-conforming XML documents are generated by parsing annotated DTDs. Basically, DTD annotations describe declaratively the mappings between target XML documents and the source data. The XLE engine implements a few basic annotations, providing a practical solution for many small- and medium-sized enterprises. However, XLE is designed to be versatile. It allows sophisticated users to plug in their own implementations to access new types of data or to achieve better performance. Heterogeneous data sources can be simply specified in the annotations. A GUI tool is provided to highlight the places where annotations are needed.  相似文献   

11.
处理路径表达式是XML查询技术中的难点和热点.在本实验室提出的XML路径表达式索引-结构化映射的基础上.为了降低构建索引所需的空间开销,本文提出了构建路径索引的代价模型,并设计了基于给定查询负载,有选择地构建路径索引的相应算法,为给定查询负载自动选择近最优索引模式(NOIS).本文还提出了当查询效率发生变化时,系统对索引模式进行自适应调整的策略.实验研究表明:使用本文方法,系统可在不影响路径表达式处理效率的前提下,大大降低路径索引的空间开销,取得查询收益和空间开销的较佳权衡.  相似文献   

12.
Top-k monitoring queries are useful in many wireless sensor network applications. A query of this type continuously returns a list of k ordered nodes with the highest (or lowest) sensor readings. To process these queries, a well-known approach is to install a filter at each sensor node to avoid unnecessary transmissions of sensor readings. In this paper, we propose a new top-k monitoring method, named Distributed Adaptive Filter-based Monitoring. In this method, we first propose a new query reevaluation algorithm that works distributedly in the network to reduce the communication cost of sending probe messages. Then, we present an adaptive filter updating algorithm which is based on predicted benefits to lower down the transmission cost of sending updated filters to the sensor nodes. Experimental results on real data traces show that our proposed method performs much better than the other existing methods in terms of both network lifetime and average energy consumption.  相似文献   

13.
XML在关系数据库中的存储问题是XML研究领域中的一个重要问题。在总结多种映射方法的基础上,提出了一种方法将多个相似的XML文档进行解析,根据映射关系,生成各自的关系模式,并分析归纳出一个集成的关系模式,然后创建一个关系数据库,并在映射关系的基础上提取并存储XML文档数据到关系数据库。此方法以较为简洁的结构保存了XML文档的数据信息,其最大的特点就是不用考虑文档的模式信息(DTD,XML Schema)。并通过一个具体的实验结果来说明这种方法的有效性。  相似文献   

14.
目前,XML已经成为事实上的数据表示和数据交换标准,XQuery是用来对XML文档进行数据查询的W3C候选推荐标准。结合XQuery规范的最新发展状况,介绍XQuery查询语言的主要特性,通过实例讨论XQuery语言在数据查询、转换等方面的应用。对SQL/XML和Xquery进行了比较,并分析了XQuery的实现与应用情况。  相似文献   

15.
XML and other semi-structured data can be represented by a graph model. The paths in a data graph are used as a basic constructor of a query. Especially, by using patterns on paths, a user can formulate more expressive queries. Patterns in a path enlarge the search space of a data graph and current research for indexing semi-structured data focuses on reducing the search space. However, the existing indexes cannot reduce the search space when a data graph has some references.

In this paper, we introduce a partitioning technique for all paths in a data graph and an index graph which can effectively find appropriate path partitions for a path query with patterns.  相似文献   


16.
This paper analyzes the execution behavior of “No Random Accesses” (NRA) and determines the depths to which each sorted file is scanned in growing phase and shrinking phase of NRA respectively. The analysis shows that NRA needs to maintain a large quantity of candidate tuples in growing phase on massive data. Based on the analysis, this paper proposes a novel top-k algorithm Top-K with Early Pruning (TKEP) which performs early pruning in growing phase. General rule and mathematical analysis for early pruning are presented in this paper. The theoretical analysis shows that early pruning can prune most of the candidate tuples. Although TKEP is an approximate method to obtain the top-k result, the probability for correctness is extremely high. Extensive experiments show that TKEP has a significant advantage over NRA.  相似文献   

17.
Efficient extraction of schemas for XML documents   总被引:3,自引:0,他引:3  
In this paper, we present a technique for efficient extraction of concise and accurate schemas for XML documents. By restricting the schema form and applying some heuristic rules, we achieve the efficiency and conciseness. The result of an experiment with real-life DTDs shows that our approach attains high accuracy and is 20 to 200 times faster than existing approaches.  相似文献   

18.
基于XML的通用异构数据交换模型   总被引:1,自引:1,他引:1  
为了改进传统数据交换共享平台缺乏通用性和扩展性的问题,实现企业之间业务流数据的安全交换,设计了一种基于Web服务架构的可扩展通用数据交换平台.该平台充分利用了可扩展标记语言、简单对象访问协议、统一描述、发现和集成协议及Web服务描述语言的优点,采用对称密码及非对称密码技术对企业业务数据加密,建立了Web服务器体系统结构和基于企业B2B(企业间电子商务)集成解决方案的数据交换模型,并以.NET及C#语言实现.该平台实现了企业之间异构数据独立于平台的交互,数据交换过程中具有较高的安全性.  相似文献   

19.
提出了XML的形式数据模型及其查询代数,主要包括以下几个方面:构造先后序关系、引入带根连通有向图、建立XML形式数据模型(XFDM)和XML查询代数(XFQA)。它形成了一个较为完整的XML数据库管理系统的理论基础,可以作为XML以及其它半结构化数据库管理系统查询存储、查询分解、查询优化和查询实现的形式化基础。  相似文献   

20.
Many recent applications involve processing and analyzing uncertain data. In this paper, we combine the feature of top-k objects with that of skyline to model the problem of top-k skyline objects against uncertain data. The problem of efficiently computing top-k skyline objects on large uncertain datasets is challenging in both discrete and continuous cases. In this paper, firstly an efficient exact algorithm for computing the top-k   skyline objects is developed for discrete cases. To address applications where each object may have a massive set of instances or a continuous probability density function, we also develop an efficient randomized algorithm with an ?‐approximation?approximation guarantee. Moreover, our algorithms can be immediately extended to efficiently compute p-skyline; that is, retrieving the uncertain objects with skyline probabilities above a given threshold. Our extensive experiments on synthetic and real data demonstrate the efficiency of both algorithms and the randomized algorithm is highly accurate. They also show that our techniques significantly outperform the existing techniques for computing p-skyline.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号