首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The capabilities of XSLT processing are widely used to transform XML documents into target XML documents. These target XML documents conform to output schemas of the used XSLT stylesheet. Output schemas of XSLT stylesheets can be used for a static analysis of the used XSLT stylesheet, to automatically detect the XSLT stylesheet of target XML documents or to reason on the output schema without access to the target XML documents. In this paper, we develop an approach to automatically determining the output schema of an XSLT stylesheet. We also describe several application scenarios of output schemas. The experimental evaluation shows that our prototype can determine the output schemas of nearly all typical XSLT stylesheets and the improvements in preciseness in several application scenarios when using output schemas in comparison to when not using output schemas.  相似文献   

2.
Due to the increase of XML-based applications, XML schema design has become an important task. One approach is to consider conceptual schemas as a basis for generating XML documents compliant to consensual information of specific domains. However, the conversion of conceptual schemas to XML schemas is not a straightforward process and inconvenient design decisions can lead to a poor query processing on XML documents generated. This paper presents a conversion approach which considers data and query workload estimated for XML applications, in order to generate an XML schema from a conceptual schema. Load information is used to produce XML schemas which can respond well to the main queries of an XML application. We evaluate our approach through a case study carried out on a native XML database. The experimental results demonstrate that the XML schemas generated by our methodology contribute to a better query performance than related approaches.
Ronaldo dos Santos MelloEmail:
  相似文献   

3.
目前XML工具的编辑能力不足以保证生成有效的XML文档,缺乏对文档模式的合理利用。而由XMLSchema定义的文档模式,不仅提供了有效性验证标准,同时蕴含了生成有效XML文档的准则。本文提出了一种生成有效XML文档的方法。该方法基于XML文档模式图定义不同节点的操作集合,以及该操作集合上的计算和相应操作语义,并分析论证了该方法本身的有效性。  相似文献   

4.
Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of ??possible mappings?? between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.  相似文献   

5.
6.
XML documents are extensively used in several applications and evolve over time. Identifying the semantics of these changes becomes a fundamental process to understand their evolution. Existing approaches related to understanding changes (diff) in XML documents focus only on syntactic changes. These approaches compare XML documents based on their structure, without considering the associated semantics. However, for large XML documents, which have undergone many changes from a version to the next, a large number of syntactic changes in the document may correspond to fewer semantic changes, which are then easier to analyze and understand. For instance, increasing the annual salary and the gross pay, and changing the job title of an employee (three syntactic changes) may mean that this employee was promoted (one semantic change). In this paper, we explore this idea and present the XChange approach. XChange considers the semantics of the changes to calculate the diff of different versions of XML documents. For such, our approach analyzes the granular syntactic changes in XML attributes and elements using inference rules to combine them into semantic changes. Thus, differently from existing approaches, XChange proposes the use of syntactic changes in versions of an XML document to infer the real reason for the change and support the process of semantic diff. Results of an experimental study indicate that XChange can provide higher effectiveness and efficiency when used to understand changes between versions of XML documents when compared with the (syntactic) state-of-the-art approaches.  相似文献   

7.
XML Schema与DTD的比较及应用   总被引:3,自引:0,他引:3  
XML是目前广泛应用的数据交换标准,而模式是应用XML进行数据交换的正确性的保证机制之一。模式详细描述了文档的结构,确保文档的元素和属性等的正确性。XML Schema和DTD是其中应用最广泛的模式。文中详细比较了两者的异同,并指出了各自的局限及最佳的应用,DTD非常适合于文本密集型XML文档,而XML Schema更适合于数据密集型XML文档。  相似文献   

8.
《IT Professional》2001,3(2):37-40
Schemas add data typing and inheritance features, giving XML the sophistication required to create enterprise-class business applications. The XML schema language describes the legal structure, content, and constraints of XML documents. The XML schema language provides the necessary framework for creating XML documents by specifying the valid structure, constraints, and data types for the various elements and attributes of an XML document. Schema language provides enhanced as well as more comprehensive and powerful features than a document type definition (DTD). The XML schema language provides the rich data typing associated with ordinary programming languages. The W3C XML schema specification defines several different built-in data types, such as string, integer, Boolean, date, and time, among others. The specification also provides the capability for defining new types. Developers can use these built-in as well as user-defined data types to effectively define and constrain XML document attributes and element values  相似文献   

9.

Context

UML and XML are two of the most commonly used languages in software engineering processes. One of the most critical of these processes is that of model evolution and maintenance. More specifically, when an XML schema is modified, the changes should be propagated to the corresponding XML documents, which must conform with the new, modified schema.

Objective

The goal of this paper is to provide an evolution framework by which the XML schema and documents are incrementally updated according to the changes in the conceptual model (expressed as a UML class model). In this framework, we include the transformation and evolution of UML profiles specified in UML class models because they are widely used to capture domain specific semantics.

Method

We have followed a metamodeling approach which allowed us to achieve a language independent framework, not tied to the specific case of UML-XML. Besides, our proposal considers a traceability setting as a key aspect of the transformation process which allows changes to be propagated from UML class models to both XML schemas and documents.

Results

As a general framework, we propose a Generic Evolution Architecture (GEA) for the model-driven engineering context. Within this architecture and for the particular case of the UML-to-XML setting, our contribution is a UML-to-XML framework that, to our knowledge, is the only approach that incorporates the following four characteristics. Firstly, the evolution tasks are carried out in a conceptual model. Secondly, our approach includes the transformation to XML of UML profiles. Thirdly, the proposal allows stereotyped UML class models to be evolved, propagating changes to XML schemas and documents in such a way that the different elements are kept in synch. Finally, we propose a traceability setting that enables evolution tasks to be performed seamlessly.

Conclusions

Generic frameworks such as that proposed in this paper help to reduce the work overload experienced by software engineers in keeping different software artifacts synchronized.  相似文献   

10.
We describe a method for generating queries for retrieving data from distributed heterogeneous semistructured documents, and its implementation in the metadata interface DDXMI (distributed document XML metadata interchange). The proposed system generates local queries appropriate to local schemas from a user query over the global schema. The system constructs mappings between global schema and local schemas (extracted from local documents if not given), path substitution, and node identification for resolving the heterogeneity among nodes with the same label that often exist in semistructured data. The system uses Quilt as its XML query language. An experiment is reported over three local semistructured documents: ‘thesis’, ‘reports’, and ‘journal’ documents with ‘article’ global schema. The prototype was developed under Windows system with Java and JavaCC.  相似文献   

11.
DTDs(或XML Schema)的一致性是XML研究中的一个重要课题.一个DTD是一致的当且仅当存在有效的XML文档遵循这个DTD.然而一个一致性成立的DTD仍有可能存在一致性不成立的不合理子结构,同一致性不成立的DTDs一样,DTDs中一致性不成立的子结构同样应该尽量避免.为解决这一问题,对"元素在DTD中的一致性"、"DTDs完全一致性"等概念进行了定义和分析,并给出了一种新的DTDs完全一致性判断算法,该算法的最坏时间复杂度是O(n),具有较高的效率.  相似文献   

12.
We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with the source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and in XML data exchange. This indicates that to make the use of relational systems possible, restrictions have to be imposed on XML schemas and mappings, as well as on XML shredding schemes. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the in-lining technique for data-exchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system.  相似文献   

13.
Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance.  相似文献   

14.
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.  相似文献   

15.
In this work, we focus on XML data integration by studying rewritings of XML target schemas in terms of source schemas. Rewriting is very important in data integration systems where the system is asked to find and assemble XML documents from the data sources and produce documents that satisfy a target schema.As schema representation, we consider Visibly Pushdown Automata (VPAs), which accept Visibly Pushdown Languages (VPLs). The latter have been shown to coincide with the family of (word-encoded) regular tree languages, which are the basis of formalisms for specifying XML schemas. Furthermore, practical semi-formal XML schema specifications (defined by simple pattern conditions on XML) compile into VPAs that are exponentially more concise than other representations based on tree automata.Notably, VPLs enjoy a “well-behavedness” that facilitates us in addressing rewriting problems for XML data integration. Based on VPAs, we positively solve these problems, and present detailed complexity analyses.  相似文献   

16.
Recently, there is an increasing research efforts in XML data mining. These research efforts largely assumed that XML documents are static. However, in reality, the documents are rarely static. In this paper, we propose a novel research problem called XML structural delta mining. The objective of XML structural delta mining is to discover knowledge by analyzing structural evolution pattern (also called structural delta) of history of XML documents. Unlike existing approaches, XML structural delta mining focuses on the dynamic and temporal features of XML data. Furthermore, the data source for this novel mining technique is a sequence of historical versions of an XML document rather than a set of snapshot XML documents. Such mining technique can be useful in many applications such as change detection for very large XML documents, efficient XML indexing, XML search engine, etc. Our aim in this paper is not to provide a specific solution to a particular mining problem. Rather, we present the vision of the mining framework and present the issues and challenges for three types of XML structural delta mining: identifying various interesting structures, discovering association rules from structural deltas, and structural change pattern-based classification.  相似文献   

17.
存在多值依赖的XML DTD规范化研究   总被引:1,自引:0,他引:1  
丘威  张立臣 《计算机科学》2007,34(2):149-151
XML DTD文档中可能包含由非函数依赖引起的数据冗余和操作异常,首先从消除DTD文档内数据冗余的角度出发研究了文档的规范化的问题,讨论了在DTD文档中存在多值依赖的情况下,如何规范XML文档,提出了以DTD为模式的XML文档的多值依赖的概念。然后基于多值依赖的概念,提出了XML文档的一种多值依赖范式MXNF。最后在此基础上提出了把一个XML文档的DTD无损联接地分解成为符合MXNF的规范化算法,来规范存在多值依赖的XML DTD文档,并给出了该算法的分析说明。  相似文献   

18.
保持数据约束的关系数据库至XML文档的转换   总被引:2,自引:0,他引:2  
XML已成为Internet上的技术趋势,在保留原有关系数据库的同时发展XML文档是目前的最佳选择,它需要在保持数据依赖约束基础上实现关系数据库与XML文档的转换.这一过程中,模式转换必须先于数据转换,因为现有的关系数据库通常是规范化的,重建XML文档树结构才能实现这一转换.为了达成此目的,首先依据已有的数据依赖约束将规范化的关系联合进一组表格,实现反向规范化,然后将这些联合表格映射为一组DOM,归并成XML文档树,根据用户选择的根结点,以及与它相连的结点形成一个期望的局部文档树,被选的XML文档树又映射为DTD格式的XML模式.这样就可以将联合表映射成一组DOM,并将其归并成单一DOM,最终转换成XML文档.  相似文献   

19.
XML文档在关系数据库中的规范化存储   总被引:8,自引:0,他引:8  
提出了一种存储方法,首先把XML文档映射为泛关系模式,再利用算法DeriveFDs推导出XML键所蕴含的泛关系模式上函数依赖集的规范覆盖,根据此规范覆盖,最后将泛关系模式保持函数依赖地分解为3NF模式集。得到了保持XML键约束的规范化存储模式,实现了XML文档在关系数据库中的规范化存储。实验研究表明文中提出的方法是有效的。  相似文献   

20.
This paper presents the use of XML technology in modelling library documents, i.e. catalogue cards (and the type of reports) found in a library information system. The method of schema formation for content of various types of library catalogue cards is also described. The display of catalogue cards has been done based on described schemas. The display process has been performed in two steps. The first one extracts the content from a bibliographical record based on the schema describing catalogue card concepts. The result is an XML document containing all catalogue card concepts filled in with the corresponding content from the bibliographical record. Its display is being performed in the second step, resulting in an HTML document. As well as adequate representation of data obtained by searching bibliographical material databases, reporting is of prime importance in a library information system. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号