首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
Generating the fewest redundancy-free scheme trees from conceptual-model hypergraphs is NP-hard [17]. We show, however, that the problem has a polynomial-time solution if the conceptual-model hypergraph is acyclic. We define conceptual-model hypergraphs, cycles, and scheme trees, and then present a polynomial-time algorithm and show that it generates the fewest redundancy-free scheme trees. As a practical application for the algorithm, we comment on its use for the design of “good” XML schemas for data storage.  相似文献   

2.
As XML data becomes more and more prevalent and as larger quantities of data find their way into XML documents, the need for quality XML data organization only increase. One standard way of structuring data well is to reduce and, if possible, eliminate redundancy, while at the same time making the storage structures as compact as possible. In this paper, we present a methodology to generate XML storage structures where conforming XML documents are redundancy-free, and for most practical cases, are also fully compact. Our methodology assumes the input is a conceptual-model hypergraph. For the special case that every edge in the hypergraph is binary, we present a simple algorithm, guaranteed to always generate redundancy-free storage structures. We show, however, that generating a minimum number of redundancy-free storage structures is NP-hard. We therefore provide heuristics to guide the process and observe that these heuristics result in satisfactory solutions, which are often optimal. We then present a general algorithm for n-ary edges and show that it generates redundancy-free storage structures. The general algorithm must overcome several problems that do not arise in the special case.  相似文献   

3.
XML tree structures can conveniently be represented using ordered unranked trees. Due to the repetitiveness of XML markup these trees can be compressed effectively using dictionary-based methods, such as minimal directed acyclic graphs (DAGs) or straight-line context-free (SLCF) tree grammars. While minimal SLCF tree grammars are in general smaller than minimal DAGs, they cannot be computed in polynomial time unless P=NPP=NP. Here, we present a new linear time algorithm for computing small SLCF tree grammars, called TreeRePair, and show that it greatly outperforms the best known previous algorithm BPLEX. TreeRePair is a generalization to trees of Larsson and Moffat's RePair string compression algorithm. SLCF tree grammars can be used as efficient memory representations of trees. Using TreeRePair, we are able to produce the smallest queryable memory representation of ordered trees that we are aware of. Our investigations over a large corpus of commonly used XML documents show that tree traversals over TreeRePair grammars are 14 times slower than over pointer structures and 5 times slower than over succinct trees, while memory consumption is only 1/43 and 1/6, respectively. With respect to file compression we are able to show that a Huffman-based coding of TreeRePair grammars gives compression ratios comparable to the best known XML file compressors.  相似文献   

4.
该文以逆向超图为工具,讨论了基于超图的BCNF判定和无损联结,给出了基于逆向超图的系模式到改进的BCNF的分解算法。  相似文献   

5.
如何对依赖任务进行高效合理的调度是云计算急需解决的关键问题之一。对云计算环境下的依赖任务调度系统进行了形式化描述。采用赋权有向无环超图来构造依赖任务调度问题的数学模型,结点对应于依赖任务,有向超边对应于任务之间的执行先后依赖关系。将云计算依赖任务调度问题转换为赋权有向超图的优化划分问题,提出了基于多水平方法和赋权有向超图的依赖任务划分优化算法。设计并实现了基于多水平方法的云计算依赖任务调度原型系统。在CloudSim云计算仿真实验平台下,与Min-Min算法、Max-Min算法进行了对比实验,实验数据对比表明该算法在减少依赖任务执行时间的同时,优化了资源负载均衡性能。  相似文献   

6.
We study the problem of repairing XML functional dependency violations by making the smallest modifications in terms of repair cost. Our cost model assigns a weight to each leaf node in the XML document, and the cost of a repair is measured by the total weight of the modified nodes. We define an optimum repair as the repair with the minimum cost among all of the repairs. We prove lower and upper bounds for the optimum XML repair problem. We show that, in practice, it is beyond reach to find the optimum repairs; this problem is already NP-complete for a setting with a fixed DTD, a fixed set of functional dependencies, and equal weights for all of the nodes in the XML document. Instead, we provide an efficient two-step heuristic method to repair XML functional dependency violations. First, the initial violations are captured and fixed by leveraging the conflict hypergraph. Second, the remaining conflicts are resolved by modifying the violating nodes and their related nodes called determinants in a way that guarantees no new violations. We implement our method and evaluate it on synthetic and real-life data. The experimental results demonstrate that our algorithm scales well and is effective at improving data quality.  相似文献   

7.
The following problem is called the everywhere-cover problem:“Given a set of dependencies over a database scheme,is he set of dependencies explicitly given for each relation scheme equivalent to the dependencies implied for that relation sheme?”It is shown that when the everywhere-cover problem has a yes answer,examining only the dependencies explicitly given will suffice to test 3NF, BCNF and 4NF of a database scheme.But this does not hold for 2NF.Consequently,in such cases,tests of BCNF and 4NF all take polynomial time.Then a proof is given that test of 3NF of a database scheme is Co-NP-complete,and from this result it is shown that everywhere-cover is also Co-NP-complete when only functional dependencies are allowed.These results lead to doubt the truth of the well believed conjecture that no polynomial time algorithm for designing a lossless BCNF database scheme is likely to exist.  相似文献   

8.
This paper investigates the view update problem for XML views published from relational data.We consider XML views defined in terms of mappings directed by possibly reeursive DTDs compressed into DAGs and stored in relations. We provide new techniques to efficiently support XML view updates specified in terms of XPath expressions with recursion and complex filters.The interaction between XPath recursion and DAG compression of XML views makes the analysis of the XML view update problem rather intriguing.Furthermore,many issues are still open even for relational view updates, and need to be explored.In response to these,on the XML side,we revise the notion of side effects and update semantics based on the semantics of XML views,and present efficient algorithms to translate XML updates to relational view updates. On the relational side,we propose a mild condition on SPJ views,and show that under this condition the analysis of deletions on relational views becomes PTIME while the insertion analysis is NP-complete.We develop an efficient algorithm to process relational view deletions,and a heuristic algorithm to handle view insertions.Finally,we present an experimental study to verify the effectiveness of our techniques.  相似文献   

9.
基于特征路径的XML文档变化检测算法   总被引:2,自引:0,他引:2  
由于在线信息变化频繁,XML文档变化快速检测成为Internet查询系统、搜索引擎以及连续查询系统的关键技术。目前国际上的研究主要集中于有序模式的XML文档比较,针对有序模式最好的算法复杂度为O(nkgn),其中n为文档的长度,而针对无序模式为多项式时间复杂度,为提高处理效率,提出一种基于特征路径的变化检测算法,将传统标号树匹配问题转换为基于特征路径的无重复路径标号树的匹配问题,同时适于有序和无序两种模式,复杂度为O(n),其中n为文档结点的个数.实验证明KF-Diff 能够非常高效地比较XML文档。  相似文献   

10.
一种挖掘XML文档频繁子树的方法   总被引:1,自引:0,他引:1       下载免费PDF全文
本文主要研究从由带标签有序树构成的森林中挖掘嵌入式频繁子树,具体做法是:首先对XML文档进行预处理,生成最简结构树SST,然后从SST中挖掘出频繁子树。本文提出了SSTMiner算法,该算法针对TreeMiner算法存在的瓶颈问题,结合当前所处理的SST的结构特点进行改进,进一步提高了算法执行的效率。实验证明,本文提出的方法能够准确高效地
地挖掘出XML文档中的频繁子树。  相似文献   

11.
基于代价模型的不一致XML 数据修复启发式计算   总被引:1,自引:1,他引:0  
在实际应用中,为不一致的XML 文档计算最优修复意义重大.但求解最优修复是一个NP 完全问题,特别是在XML 文档同时违反函数依赖约束和主键约束时.提出一个基于代价模型的、可以在多项式时间内完成的启发式修复求解算法.该算法首先借助索引表,在一遍扫描原始XML 文档的情况下寻找不一致数据集,然后为每一类约束的不一致数据集构造候选修复,同时计算其修复代价,最后启发式地求解一个代价最小的修复方案.实验结果表明,该算法的时间复杂度不超过冲突类的3 次方,即便是在不一致数据量很大、噪声比例很大以及涉及多类语义约束时,也能较快地完成修复.  相似文献   

12.
后缀树的重要性可以为多年来学术界对它总是有新的发现而印证.它的结构简单,但可以在线性的时间里解决许多复杂的问题,被大量的使用在字符串及树的模式匹配中,对于XML标准,有很多基于关系库和对象库的索引技术和查询方案被提出来,我们试图给出一种基于后缀树进行路径导航的查询机制:用后缀树构造XML路径字典加速路径查询评价速度,我们提出可以在线地建立一个trie树的后缀树,讨论了XML路径字典中的后缀树建树算法,阐述了整个索引方案和查询机制,并探讨了包括RPE在内的它所支持的各种查询操作,XML路径字典被用于加快路径查询的评价速度.  相似文献   

13.
Frequent subgraph mining in outerplanar graphs   总被引:1,自引:1,他引:0  
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we consider the class of outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for outerplanar graphs, and show that it works in incremental polynomial time for the practically relevant subclass of well-behaved outerplanar graphs, i.e., which have only polynomially many simple cycles. We evaluate the algorithm empirically on chemo- and bioinformatics applications.  相似文献   

14.
文中指出了文献「1」中对BCNF判定问题证明过程的错之处,通过分析属于BCNF的关系模式的结构特点,给出了一个判定关系模式否属于BNCF的多项式时间算法。  相似文献   

15.
基于XML Schema的XML存储   总被引:6,自引:0,他引:6  
郝春辉  邹静 《计算机工程与应用》2006,42(11):173-175,204
文章介绍了一个在关系数据库中,基于XMLSchema的XML存储方案。描述了一个以树模型为模型,XMLSchema为模式的XML数据库的存储系统。首先给出了在关系数据库中存储XMLSchema的方法,在此基础上,又给出了存储XML文档的方法。与通常的XML分解存储方案不同之处在于,在该方案中,XMLSchema被保存到数据库中,未作模式映射,避免了模式映射通常会带来的数据丢失和数据要分散到多个关系表中的问题;对XML文档的存储和查询都是基于XMLSchema的;并且由于所有基于同一个模式的XML文档共享该模式的结构,不必对结构信息进行重复存储,减少了存储空间;最后,由于我们为每一个元素赋予了一个唯一ID值,在进行查询的时候,可以利用该ID值进行定位,具有和XPath表达式相同的作用,但是更为简便。  相似文献   

16.
XML正在迅速成为WWW上采用的信息交换、表示和存储手段之一。本文首先基于OEM数据模型提出了离散的XML数据模式概念,并以形式化的方式表达了这一思想,以此为出发点给出了带冗余的可拆分XML数据树存储方法,定义了基于模式匹配的数据查询概念,最后给出了以本文方法与传统方法所存储数据查询效率的比较。  相似文献   

17.
基于树自动机理论,研究了Active xML(简记为AXML)模式重写问题,提出了一种多项式时间的AXML模式重写判定算法,并对算法进行了实现.实验结果证明了所提算法用于判定AXML模式重写的优越性.  相似文献   

18.
As XML becomes increasingly popular, XML schema design has become an increasingly important issue. One of the central objectives of good schema design is to avoid data redundancies: redundantly stored information can lead not just only to a higher data storage cost but also to increased costs for data transfer and data manipulation. Furthermore, such data redundancies can lead to potential update anomalies, rendering the database inconsistent. One strategy to avoid data redundancies is to design redundancy-free schema from the start on the basis of known functional dependencies. We observe that XML databases are often “casually designed” and XML FDs may not be determined in advance. Under such circumstances, discovering XML data redundancies from the data itself becomes necessary and is an integral part of the schema refinement (or re-design) process. We present the design and implementation of the first system, DiscoverXFD, for efficient discovery of XML data redundancies. It employs a novel XML data structure and introduces a new class of partition-based algorithms. The XML data redundancies are defined on the basis of a new notion of XML functional dependency (XML FD) that (1) extends previous notions by incorporating set elements into the XML FD specification, and (2) maintains tuple-based semantics through the novel concept of Generalized Tree Tuple (GTT). Using this comprehensive XML FD notion, we introduce a new normal form (GTT-XNF) for XML documents, and provide comprehensive comparisons with previous studies. Given the set of data redundancies (in the form of redundancy-indicating XML FDs) discovered by DiscoverXFD, we describe a normalization algorithm for converting any original XML schema into one in GTT-XNF.  相似文献   

19.
目前现有的前缀编码、区间编码等编码方案均不能很好地支持XML文档的更新计算。为此,提出一种新的前缀编码方案TDE。将实数映射为二维元组,利用任意2个实数间存在无限个实数的特点,对XML文档进行插入节点操作而无需对其他节点进行二次编码,并采用压缩存储减小编码的存储空间。实验结果表明,该方案能有效支持XML文档的更新计算。  相似文献   

20.
We present a technique for refining the design of relational storage for XML data. The technique is based on XML key propagation: given a set of keys on XML data and a mapping (transformation) from the XML data to relations, what functional dependencies must hold on the relations produced by the mapping? With the functional dependencies one can then convert the relational design into, e.g. 3NF, BCNF, and thus develop efficient relational storage for XML data. We provide several algorithms for computing XML key propagation. One algorithm is to check whether a functional dependency is propagated from a set of XML keys via a predefined mapping; this allows one to determine whether or not the relational design is in a normal form. The others are to compute a minimum cover for all functional dependencies that are propagated from a set of XML keys and hold on a universal relation; these provide guidance for how to design a relational schema for storing XML data. These algorithms show that XML key propagation and its associated minimum cover can be computed in polynomial time. Our experimental results verify that these algorithms are efficient in practice. We also investigate the complexity of propagating other XML constraints to relations. The ability to compute XML key propagation is a first step toward establishing a connection between XML data and its relational representation at the semantic level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号