首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
范书义  李岩  孟晨 《微型电脑应用》2011,27(12):42-44,70,71
针对目前XML文档的两种解析方法SAX和DOM各自的特点,探讨了在哪些情况下适宜将两种解析方法结合应用对文档进行解析,并给出了SAX和DOM结合应用的一般方法,最后对单纯采用DOM和两种方法结合解析XML文档的性能进行了比较。实验结果表明,将SAX和DOM结合使用,在解析大XML文档时,可以极大地提高解析程序的性能。  相似文献   

2.
基于DOM模型的解析是多数XML文档处理系统所采用的技术,文档解析中在对DTD、XML注释以及XML结点的处理存在着一些漏洞,导致系统在解析那些利用了这些漏洞的恶意XML文档过程中易遭到攻击,分析了漏洞产生的原因,并提出相应的防御方法。  相似文献   

3.
介绍了一种新的更有效的解析技术StAX,在与其它两种较常用的解析技术DOM、SAX进行比较的基础上,通过实例展示了如何用StAX更有效地解析XML文档。  相似文献   

4.
基于区间编码方案分裂大型XML文档到关系存储   总被引:6,自引:0,他引:6  
将一个XML文档分裂存储到关系数据库中,通常的方法是利用DOM对该XML文档进行解析,并利用DOM接口提供的XML文档树信息来实现分裂。但是,DOM在解析一个大型XML文档时效率特别低,甚至是无法胜任。文中对转换XML文档到关系数据库中进行存储和查询的策略以及区间编码方案进行了综述;基于区间编码方案探讨了如何分裂一个大型XML文档到关系存储的基本原理,并给出了相应的算法。实验结果表明,该方法是通用的、高效的。  相似文献   

5.
Scannerless generalized parsing techniques allow parsers to be derived directly from unified, declarative specifications. Unfortunately, in order to uniquely parse existing programming languages at the character level, disambiguation extensions beyond the usual context-free formalism are required.This paper explains how scannerless parsers for boolean grammars (context-free grammars extended with intersection and negation) can specify such languages unambiguously, and can also describe other interesting constructs such as indentation-based block structure.The sbp package implements this parsing technique and is publicly available as Java source code.  相似文献   

6.
根据XML文档与GML文档的数据特征差别,剖析拉DOM和StAX混合解析方法的原理,结合GML的数据特征及应用操作需求,探讨实现混合型方案的可行性,提出一种具有拉DOM功能和StAX优势的混合型解析方法。实验结果表明,该方法对解析大数据量的GML文档效果明显,能满足GML文档解析过程中的复杂空间操作。  相似文献   

7.
John Tobin  Carl Vogel 《Knowledge》2009,22(7):516-522
Some parsers need to be very precise and strict when parsing, yet must allow users to easily adapt or extend the parser to parse new inputs, without requiring that the user have an in-depth knowledge and understanding of the parser’s internal workings. This paper presents a novel parsing architecture, designed for parsing Postfix log files, that aims to make the process of parsing new inputs as simple as possible, enabling users to trivially add new rules (to parse variants of existing inputs) and relatively easily add new actions (to process a previously unknown category of input). The architecture scales linearly or better as the number of rules and size of input increases, making it suitable for parsing large corpora or months of accumulated data.  相似文献   

8.
Analyzing the syntactic structure of natural languages by parsing is an important task in artificial intelligence. Due to the complexity of natural languages, individual parsers tend to make different yet complementary errors. We propose a neural network based approach to combine parses from different parsers to yield a more accurate parse than individual ones. Unlike conventional approaches, our method directly transforms linearized candidate parses into the ground-truth parse. Experiments on the Penn English Treebank show that the proposed method improves over a state-of-the-art parser combination approach significantly.  相似文献   

9.
How to design a connectionist holistic parser   总被引:1,自引:0,他引:1  
Ho EK  Chan LW 《Neural computation》1999,11(8):1995-2016
Connectionist holistic parsing offers a viable and attractive alternative to traditional algorithmic parsers. With exposure to a limited subset of grammatical sentences and their corresponding parse trees only, a holistic parser is capable of learning inductively the grammatical regularity underlying the training examples that affects the parsing process. In the past, various connectionist parsers have been proposed. Each approach had its own unique characteristics, and yet some techniques were shared in common. In this article, various dimensions underlying the design of a holistic parser are explored, including the methods to encode sentences and parse trees, whether a sentence and its corresponding parse tree share the same representation, the use of confluent inference, and the inclusion of phrases in the training set. Different combinations of these design factors give rise to different holistic parsers. In succeeding discussions, we scrutinize these design techniques and compare the performances of a few parsers on language parsing, including the confluent preorder parser, the backpropagation parsing network, the XERIC parser of Berg (1992), the modular connectionist parser of Sharkey and Sharkey (1992), Reilly's (1992) model, and their derivatives. Experiments are performed to evaluate their generalization capability and robustness. The results reveal a number of issues essential for building an effective holistic parser.  相似文献   

10.
利用关系表构建XML文档解析的树模型   总被引:2,自引:1,他引:1  
祝青  阳王东 《计算机应用》2009,29(6):1719-1721
在对XML文档的数据解析和查询操作研究中,发现树能较好地反映XML文档的层次结构,但其查询效率较低,而关系表是一种适合存储大量数据且有较好查询效率与操作功能的数据结构。给出了一个把树和关系表相结合构建一种存储XML文档的数据模型;在这个模型的解析过程中,采用回调事件式的分段解析方法以减少解析时间和存储空间。这样既能较好保存XML文档的结构特点,又能提高其查询的效率和操作的便利性。通过对大数据量XML文档的解析和操作实验,实验结果证明这种数据模型在处理大型XML文档中具有明显优势。  相似文献   

11.
传统AJAX引擎在解析较大的回传XML文档时时间开销过大,为了解决这一问题,提出了一种改进的AJAX模型,并给出了一个应用实例.改进AJAX模型采用结构化的并行数组来存储回传数据,避免了对半结构化的XML文档进行解析,提高了数据利用效率.实验结果表明,改进AJAX模型可以满足较大数据量的业务处理要求,明显缩短了用户等待时间.在3000千条记录以下的数据表中应用这一模型,客户端与服务器交互顺畅,可以很好地满足用户实时性要求.  相似文献   

12.
This paper presents a syntactic method for sophisticated logical structure analysis that transforms document images with multiple pages and hierarchical structure into an electronic document based on SGML/XML. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed parsing method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of documents efficiently and present its automated creation method. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) show that the method has performed logical structure analysis successfully and generated a document model automatically. Particularly, the method generates SGML/XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.  相似文献   

13.
Ho EK  Chan LW 《Neural computation》2001,13(5):1137-1170
Holistic parsers offer a viable alternative to traditional algorithmic parsers. They have good generalization performance and are robust inherently. In a holistic parser, parsing is achieved by mapping the connectionist representation of the input sentence to the connectionist representation of the target parse tree directly. Little prior knowledge of the underlying parsing mechanism thus needs to be assumed. However, it also makes holistic parsing difficult to understand. In this article, an analysis is presented for studying the operations of the confluent preorder parser (CPP). In the analysis, the CPP is viewed as a dynamical system, and holistic parsing is perceived as a sequence of state transitions through its state-space. The seemingly one-shot parsing mechanism can thus be elucidated as a step-by-step inference process, with the intermediate parsing decisions being reflected by the states visited during parsing. The study serves two purposes. First, it improves our understanding of how grammatical errors are corrected by the CPP. The occurrence of an error in a sentence will cause the CPP to deviate from the normal track that is followed when the original sentence is parsed. But as the remaining terminals are read, the two trajectories will gradually converge until finally the correct parse tree is produced. Second, it reveals that having systematic parse tree representations alone cannot guarantee good generalization performance in holistic parsing. More important, they need to be distributed in certain useful locations of the representational space. Sentences with similar trailing terminals should have their corresponding parse tree representations mapped to nearby locations in the representational space. The study provides concrete evidence that encoding the linearized parse trees as obtained via preorder traversal can satisfy such a requirement.  相似文献   

14.
Foundations of Fast Communication via XML   总被引:3,自引:0,他引:3  
Communication with XML often involves pre-agreed document types. In this paper, we propose an offline parser generation approach to enhance online processing performance for documents conforming to a given DTD. Our examination of DTDs and the languages they define demonstrates the existence of ambiguities. We present an algorithm that maps DTDs to deterministic context-free grammars defining the same languages. We prove the grammars to be LL(1) and LALR(1), making them suitable for standard parser generators. Our experiments show the superior performance of generated optimized parsers. Our results generalize from DTDs to XML schema specifications with certain restrictions, most notably the absence of namespaces, which exceed the scope of context-free grammars.  相似文献   

15.
We present the design philosophy, implementation, and various applications of an XML-based genetic programming (GP) framework (XGP). The key feature of XGP is the distinct representation of genetic programs as DOM parsing trees featuring corresponding flat XML text. XGP contributes to the achievements of: (i) fast prototyping of GP by using the standard built-in API of DOM parsers for manipulating the genetic programs, (ii) human readability and modifiability of the genetic representations, (iii) generic support for the representation of the grammar of a strongly typed GP using W3C-standardized XML schema; (iv) inherent inter-machine migratability of the text-based genetic representation (i.e., the XML text) in the distributed implementations of GP.  相似文献   

16.
徐明  庄毅 《计算机科学》2006,33(2):205-207
作为构建开放和分布式应用系统的一种主流模式,多Agent系统有着广阔的研究前帚和应用价值。在统一建模语言(UML)的支持下,面向Agent的软件工程研究开始走向成熟。一些面向Agent的方法学提供了开发多A—gent系统的工具、应用方法或技术。随着Web服务技术的发展,XML成为Internet上数据组织和交换的标准。现有研究工作所提出的多Agent系统对XML文档提供很少的支持。针对上述问题,设计了一个基于XML的多Agent系统——XMAS。该系统采用带根连通有向图来表示XML文档数据模型,并给出相应的文档模式提取算法,XML文档数据的解析以及对Web服务的相关支持。在数据存储过程中的索引优化使得XMAS在数据查询上具有良好的性能。  相似文献   

17.
在XML开发应用中,带有中文字符的XML数据解析一直都是一个难点,其主要原因是当前很多通用的XML解析器不支持中文编码。本文提出伪UTF-16编解码算法,为XML中文数据的解析提供了简单、通用的方法。  相似文献   

18.
XML作为SGML标记语言的一个子集,由于它能很好地表示结构化和半结构化数据,而逐渐成为Internet上或应用程序间数据交换和信息表示的标准。分析和处理XML文档的场合也越来越多,其方法和工具也有很多,然而,对于很大的文档,传统的处理方法存在着很多的缺点和不足之处。文中提出了一种新的分析处理XML文档的方法,即利用NativeXML Database(NXD),以提高分析处理的性能。  相似文献   

19.
Summary The concept of incremental parsing is briefly introduced. An algorithm which augments an LR parser with the capability of reanalyzing a limited part of a modified program is illustrated. The algorithm operates on a sequence of configurations representing the parse of the old input and finds the smallest part of the sequence which must be recomputed to obtain the parse of the new input.The implementation is discussed: a suitable data structure and a version of the algorithm which operates upon it are introduced; finally the problem of realizing efficient incremental parsers is faced, showing a modification to the basic algorithm which enable the reanalysis to be performed in linear time.Work supported by C.N.R., Italy  相似文献   

20.
XML通过DTD或Schema定义文法。XML解析器根据预定义的文法对XML文档进行验证。但如果应用程序需要处理多个XML文档,并且这些文档之间存在引用,在应用程序中验证这些引用就非常困难。论文针对多XML文档关联验证,提出了一种通用的基于Xerces2-java的多XML文档关联验证机制,该机制通过扩展Xerces2-java现有的组件,在原有的解析过程中增加了关联验证的逻辑,给应用程序提供了方便、灵活的调用接口。实验表明,该机制能够完成XML文档的关联验证,较好地解决了多XML文档关联验证问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号