首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We investigate the typechecking problem for XML transformations: statically verifying that every answer to a transformation conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for query languages capable of testing equality of data values, we return to the limited framework where we abstract XML documents as labeled ordered trees. We focus on simple top-down recursive transformations motivated by XSLT and structural recursion on trees. We parameterize the problem by several restrictions on the transformations (deleting, non-deleting, bounded width) and consider both tree automata and DTDs as input and output schemas. The complexity of the typechecking problems in this scenario ranges from PTIME to EXPTIME.  相似文献   

2.
Typechecking consists of statically verifying whether the output of an XML transformation is always conform to an output type for documents satisfying a given input type. We focus on complete algorithms which always produce the correct answer. We consider top–down XML transformations incorporating XPath expressions and abstract document types by grammars and tree automata. By restricting schema languages and transformations, we identify several practical settings for which typechecking can be done in polynomial time. Moreover, the resulting framework provides a rather complete picture as we show that most scenarios cannot be enlarged without rendering the typechecking problem intractable. So, the present research sheds light on when to use fast complete algorithms and when to reside to sound but incomplete ones.  相似文献   

3.
Relevance feedback (RF) is a technique that allows to enrich an initial query according to the user feedback. The goal is to express more precisely the user’s needs. Some open issues arise when considering semi-structured documents like XML documents. They are mainly related to the form of XML documents which mix content and structure information and to the new granularity of information. Indeed, the main objective of XML retrieval is to select relevant elements in XML documents instead of whole documents. Most of the RF approaches proposed in XML retrieval are simple adaptation of traditional RF to the new granularity of information. They usually enrich queries by adding terms extracted from relevant elements instead of terms extracted from whole documents. In this article, we describe a new approach of RF that takes advantage of two sources of evidence: the content and the structure. We propose to use the query term proximity to select terms to be added to the initial query and to use generic structures to express structural constraints. Both sources of evidence are used in different combined forms. Experiments were carried out within the INEX evaluation campaign and results show the effectiveness of our approaches.  相似文献   

4.
XML data stores: emerging practices   总被引:1,自引:0,他引:1  
We survey emerging native XML storage approaches and identify and highlight popular implementations tailored to XML's "nature" and syntax. By understanding the storage practices of emerging native XML environments, programmers and software designers can better exploit the technology's scalability and reliability benefits. It is because XML is rapidly becoming the Internet standard for data representation and exchange, efficient XML document storage has become a core data management issue. Most early XML storage practices rely on conventional database management systems. However, such systems involve mappings and transformations between XML and the underlying database structure. More recent efforts are based on specific XML-tailored systems that provide ad hoc functionalities. This overview of emerging XML storage approaches highlights current practices along with prospective research and implementation trends.  相似文献   

5.
6.
XML是W3C组织于1998年2月发布的一种标记语言标准,其具有易于扩展、结构性强、交互性好、语义丰富、基于内容的数据标识、可格式化、易于处理、与平台无关的特点,使得数据层在XML技术的支持下得到统一。通过对海洋温盐深数据进行结构分析,本文设计了温盐深数据XML Schema,定义了温盐深数据的XML数据结构。  相似文献   

7.
XML is rapidly becoming a standard for data representation and exchange. It provides a common format for expressing both data structures and contents. As such, it can help in integrating structured, semistructured, and unstructured data over the Web. Still, it is well recognized that XML alone cannot provide a comprehensive solution to the articulated problem of data integration. There are still several challenges to face, including: developing a formal foundation for Web metadata standards; developing techniques and tools for the creation, extraction, and storage of metadata; investigating the area of semantic interoperability frameworks; and developing semantic-based tools for knowledge discovery  相似文献   

8.
Information imprecision and uncertainty exist in many real-world applications and for this reason fuzzy data modeling has been extensively investigated in various data models. Currently, huge amounts of electronic data are available on the Internet, and XML has been the de facto standard of information representation and exchange over the Web. This paper focuses on fuzzy XML data modeling, which is mainly involved in the representation model of the fuzzy XML, its conceptual design, and its storage in databases. Based on “possibility distribution theory”, we developed this fuzzy XML data model. We developed this fuzzy UML data model to design the fuzzy XML model conceptually. We investigated the formal conversions from the fuzzy UML model to the fuzzy XML model and the formal mapping from the fuzzy XML model to the fuzzy relational databases.  相似文献   

9.
The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label a large volume of XML data in a timely manner. To address this issue, we devise parallel tree labeling algorithms for massive XML data. Specifically, we focus on how to efficiently label a single large XML file in parallel. We first propose parallel versions of two prominent tree labeling schemes based on the MapReduce framework. We then present techniques for runtime workload balancing and data repartition to solve performance issues caused by data skewness and MapReduce’s inherited limitation. Through extensive experiments with synthetic and real-world datasets on 15 nodes, we show that our parallel labeling algorithms are up to 17 times faster than conventional algorithms, providing strong durability against data skewness.  相似文献   

10.
11.
Boundaries occur naturally in everyday life. This paper introduces numerical constraints into the framework of XML to take advantage of the benefits that result from the explicit specification of such boundaries. Roughly speaking, numerical constraints restrict the number of elements in an XML data fragment based on the data values of selected subelements. Efficient reasoning about numerical constraints provides effective means for predicting the number of answers to XQuery and XPath queries, the number of updates when using the XQuery update facility, and the number of encryptions or decryptions when using XML encryption. Moreover, numerical constraints can help to optimise XQuery and XPath queries, to exclude certain choices of indices from the index selection problem, and to generate views for efficient processing of common queries and updates.We investigate decision problems associated with numerical constraints in order to capitalise on the range of applications in XML data processing. To begin with we demonstrate that the implication problem is strongly coNP-hard for several classes of numerical constraints. These sources of potential intractability direct our attention towards the class of numerical keys that permit the specification of positive upper bounds. Numerical keys are of interest as they are reminiscent of cardinality constraints that are widely used in conceptual data modelling. At the same time, they form a natural generalisation of XML keys that are popular in XML theory and practice. We show that numerical keys are finitely satisfiable and establish a finite axiomatisation for their implication problem. Finally, we propose an algorithm that decides numerical key implication in quadratic time using shortest path methods.  相似文献   

12.
Scholars in the humanities often have to account exhaustively for the structure of large masses of data. Tree-diagrams implemented by means of suitable computer programs can be of considerable assistance in achieving a cohesive representation of the data. This paper discusses the respective merits of the two main approaches to tree representation and introduces a new method based on the use of unrooted trees. After a detailed examination of the topological properties of such trees, two algorithms are described. The second part of the paper consists in practical applications of the method of tree representation to a corpus of contemporary English poetry. Several sets of data made up of both lexical and grammatical items (adjectives, modals, auxiliaries and personal pronouns) have been submitted to the method. The findings are assessed in terms of their heuristic value in the light of modern linguistic theory and compared with the results obtained by means of more traditional statistical procedures.N. X. Luong is a doctor of Sciences and a lecturer at the University of Nice. He is conducting research on algorithms in the field of discrete mathematics. He has, among other things, created several algorithms for the representation of data in the form of non-hierarchic trees.Michel Juillard teaches English at Nice University and is a member of the Unité de Recherche 9 of the CNRS. He obtained a Doctorat d'Etat in English linguistics from Paris VII University. His publications include L'Expression poétique chez Cecil Day Lewis, vocabulaire, syntaxe, métaphore, étude stylostatistique (Geneva and Paris: Slatkine, 1983), and many more articles on linguistics and stylistics. His present research centers on language variation, text linguistics and comparative stylistics.  相似文献   

13.
14.
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Vincent T. Y. NgEmail:
  相似文献   

15.
XML的关系化存储及与关系数据库的数据转换   总被引:8,自引:1,他引:8  
随着不断增长的基于XML应用的出现,如何在数据库中可靠、有效地存储XML文档以及XML和数据库间的数据转换技术变得越来越重要。讨论了基于模式驱动的XML与关系数据库的数据映射,给出了其基本原理和实例,并且分析了采用该技术实现的一个与数据库平台无关的、支持多模式和多种转换方式的应用系统,给出了该系统的设计和实现方案。其原型的测试结果表明,该系统的原理和设计是可行的和有效的。  相似文献   

16.
XML在关系数据库中的存储问题是XML研究领域中的一个重要问题。在总结多种映射方法的基础上,提出了一种方法将多个相似的XML文档进行解析,根据映射关系,生成各自的关系模式,并分析归纳出一个集成的关系模式,然后创建一个关系数据库,并在映射关系的基础上提取并存储XML文档数据到关系数据库。此方法以较为简洁的结构保存了XML文档的数据信息,其最大的特点就是不用考虑文档的模式信息(DTD,XML Schema)。并通过一个具体的实验结果来说明这种方法的有效性。  相似文献   

17.
XML document may contain inconsistencies that violate predefined integrity constraints, which causes the data inconsistency problem. In this paper, we consider how to get the consistent data from an inconsistent XML document. There are two basic concepts for this problem: Repair is the data consistent with the integrity constraints, and also minimally differs from the original one. Consistent data is the data common for every possible repair. First we give a general constraint model for XML, which can express the commonly discussed integrity constraints, including functional dependencies, keys and multivalued dependencies. Next we provide a repair framework for inconsistent XML document with three basic update operations: node insertion, node deletion and node value modification. Following this approach, we introduce the concept of repair for inconsistent XML document, discuss the chase method to generate repairs, and prove some important properties of the chase. Finally we give a method to obtain the greatest lower bound of all possible repairs, which is sufficient for consistent data. We also implement prototypes of our method, and evaluate our framework and algorithms in the experiment.  相似文献   

18.
Transformation of XML data is an important task in data exchange, data publishing and data integration. Specifically in data integration, data in XML sources is transformed to match the target schema. Some of these sources have XML keys defined. When the data is transformed, the keys also need to be transformed for constraint comparisons, consistency checking and unification in the target schema. Thus, how the keys are transformed, and whether the transformed keys are valid and preserved to the target schema are important problems in XML data transformation and integration. Towards this problem, we firstly define XML keys and their satisfactions. We then study how keys are transformed and whether transformed keys are valid when a source schema is transformed to a target schema. Finally we show whether the transformed keys are satisfied by the transformed document.  相似文献   

19.
In the area of cost-sensitive learning, inductive learning algorithms have been extended to handle different types of costs to better represent misclassification errors. Most of the previous works have only focused on how to deal with misclassification costs. In this paper, we address the equally important issue of how to handle the test costs associated with querying the missing values in a test case. When an attribute contains a missing value in a test case, it may or may not be worthwhile to take the extra effort in order to obtain a value for that attribute, or attributes, depending on how much benefit the new value bring about in increasing the accuracy. In this paper, we consider how to integrate test-cost-sensitive learning with the handling of missing values in a unified framework that includes model building and a testing strategy. The testing strategies determine which attributes to perform the test on in order to minimize the sum of the classification costs and test costs. We show how to instantiate this framework in two popular machine learning algorithms: decision trees and naive Bayesian method. We empirically evaluate the test-cost-sensitive methods for handling missing values on several data sets.  相似文献   

20.
Recently, there has been growing interest in streaming XML data. Much of the work on streaming XML data has been focused on efficient filtering. Filtering systems deliver XML documents to interested users. The burden of extracting the XML fragments of interest from XML documents is placed on users. In this paper, we propose XTREAM which evaluates multiple queries in conjunction with the read-once nature of streaming data. In contrast to the previous work, XTREAM supports a wide class of XPath queries including tree shaped expressions, order based predicates, and nested predicates. In addition, to improve the efficiency and scalability of XTREAM, we devise an optimization technique called Query Compaction. Experimental results with real-life and synthetic XML data demonstrate the efficiency and scalability of XTREAM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号