首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
In their recogniser forms, the Earley and RIGLR algorithms for testing whether a string can be derived from a grammar are worst-case cubic on general context free grammars (CFG). Earley gave an outline of a method for turning his recognisers into parsers, but it turns out that this method is incorrect. Tomita’s GLR parser returns a shared packed parse forest (SPPF) representation of all derivations of a given string from a given CFG but is worst-case unbounded polynomial order. The parser version of the RIGLR algorithm constructs Tomita-style SPPFs and thus is also worst-case unbounded polynomial order. We have given a modified worst-case cubic GLR algorithm, that, for any string and any CFG, returns a binarised SPPF representation of all possible derivations of a given string. In this paper we apply similar techniques to develop worst-case cubic Earley and RIGLR parsing algorithms.  相似文献   

2.
In its recogniser form, Earley's algorithm for testing whether a string can be derived from a grammar is worst case cubic on general context free grammars (CFG). Earley gave an outline of a method for turning his recognisers into parsers, but it turns out that this method is incorrect. Tomita's GLR parser returns a shared packed parse forest (SPPF) representation of all derivations of a given string from a given CFG but is worst case unbounded polynomial order. We have given a modified worst-case cubic version, the BRNGLR algorithm, that, for any string and any CFG, returns a binarised SPPF representation of all possible derivations of a given string. In this paper we apply similar techniques to develop two versions of an Earley parsing algorithm that, in worst-case cubic time, return an SPPF representation of all derivations of a given string from a given CFG.  相似文献   

3.
We present a novel algorithm using new hypothesis representations for learning context-free grammars from a finite set of positive and negative examples. We propose an efficient hypothesis representation method which consists of a table-like data structure similar to the parse table used in efficient parsing algorithms for context-free grammars such as Cocke-Younger-Kasami algorithm. By employing this representation method, the problem of learning context-free grammars from examples can be reduced to the problem of partitioning the set of nonterminals. We use genetic algorithms for solving this partitioning problem. Further, we incorporate partially structured examples to improve the efficiency of our learning algorithm, where a structured example is represented by a string with some parentheses inserted to indicate the shape of the derivation tree of the unknown grammar. We demonstrate some experimental results using these algorithms and theoretically analyse the completeness of the search space using the tabular method for context-free grammars.  相似文献   

4.
Ho EK  Chan LW 《Neural computation》2001,13(5):1137-1170
Holistic parsers offer a viable alternative to traditional algorithmic parsers. They have good generalization performance and are robust inherently. In a holistic parser, parsing is achieved by mapping the connectionist representation of the input sentence to the connectionist representation of the target parse tree directly. Little prior knowledge of the underlying parsing mechanism thus needs to be assumed. However, it also makes holistic parsing difficult to understand. In this article, an analysis is presented for studying the operations of the confluent preorder parser (CPP). In the analysis, the CPP is viewed as a dynamical system, and holistic parsing is perceived as a sequence of state transitions through its state-space. The seemingly one-shot parsing mechanism can thus be elucidated as a step-by-step inference process, with the intermediate parsing decisions being reflected by the states visited during parsing. The study serves two purposes. First, it improves our understanding of how grammatical errors are corrected by the CPP. The occurrence of an error in a sentence will cause the CPP to deviate from the normal track that is followed when the original sentence is parsed. But as the remaining terminals are read, the two trajectories will gradually converge until finally the correct parse tree is produced. Second, it reveals that having systematic parse tree representations alone cannot guarantee good generalization performance in holistic parsing. More important, they need to be distributed in certain useful locations of the representational space. Sentences with similar trailing terminals should have their corresponding parse tree representations mapped to nearby locations in the representational space. The study provides concrete evidence that encoding the linearized parse trees as obtained via preorder traversal can satisfy such a requirement.  相似文献   

5.
How to design a connectionist holistic parser   总被引:1,自引:0,他引:1  
Ho EK  Chan LW 《Neural computation》1999,11(8):1995-2016
Connectionist holistic parsing offers a viable and attractive alternative to traditional algorithmic parsers. With exposure to a limited subset of grammatical sentences and their corresponding parse trees only, a holistic parser is capable of learning inductively the grammatical regularity underlying the training examples that affects the parsing process. In the past, various connectionist parsers have been proposed. Each approach had its own unique characteristics, and yet some techniques were shared in common. In this article, various dimensions underlying the design of a holistic parser are explored, including the methods to encode sentences and parse trees, whether a sentence and its corresponding parse tree share the same representation, the use of confluent inference, and the inclusion of phrases in the training set. Different combinations of these design factors give rise to different holistic parsers. In succeeding discussions, we scrutinize these design techniques and compare the performances of a few parsers on language parsing, including the confluent preorder parser, the backpropagation parsing network, the XERIC parser of Berg (1992), the modular connectionist parser of Sharkey and Sharkey (1992), Reilly's (1992) model, and their derivatives. Experiments are performed to evaluate their generalization capability and robustness. The results reveal a number of issues essential for building an effective holistic parser.  相似文献   

6.
The paper is the second in a series of three papers devoted to a detailed study of LR(k) parsing with error recovery and correction. Error recovery in LR(k) parsing of a context-free grammar is formalized by extending an LR(k) parser of the grammar such that it accepts all strings over the terminal vocabulary. The parse produced by this extension for a terminal string is a right parse if the string is in the language. In the case of a string not in the language the parse produced by the extension contains so-called error productions which represent the error recovery actions performed by the extension. The treatment is based on the formalization of LR(k) parsing presented in the first paper in the series and it covers practically all error recovery methods designed for LR(k) parsing.  相似文献   

7.
Understanding circuits is a prerequisite for circuit design and trouble shooting. Circuit understanding by engineers is described as a process that starts with a structural analysis and then proceeds to a causal analysis. As a step toward automatic circuit understanding, a method for analyzing circuit structures is presented. In this method, a circuit is reviewed as a sentence and its elements as words. Circuit structures are defined by rules written in a logic grammar called definite clause set grammar (DCSG). Given circuits are decomposed into parse trees by the DCSG top-down parsing mechanism. These parse trees represent hierarchical structures of functional blocks. This representation is presented as one step in the process of automatic understanding of circuit structures  相似文献   

8.
《国际计算机数学杂志》2012,89(9):1097-1106
The problem of converting a regular expression to nondeterministic finite automaton (NFA) is a fundamental problem that has been well studied. However, the two basic construction algorithms: (1) Thompson, (2) McNaughton–Yamada and Glushkov, both have disadvantages. In this article: first, a ‘smart’ parsing algorithm is developed which constructs a parse tree with at most (3l???1) nodes form a regular expression with l literals; second, we propose an algorithm that works on the resulting NFA from Thompson's construction, eliminating as many auxiliary states as possible while maintaining Thompson's properties. It is shown that the resulting NFA is minimized. This means that no auxiliary states can be eliminated without violating the defining properties of Thompson NFA. The time and space requirements for the above algorithms are linear with respect to the length of the regular expression.  相似文献   

9.
句法分析的研究是自然语言处理领域的一个重要组成部分。本文提出并实现了一个基于二元关系的句法树生成算法。该算法通过引入二元关系的优先级概念,巧妙借鉴了算术表达式的求解方法,从根本上解决了句法树生成过程中的层次问题。通过对大量的真实文本进行实验,结果表明,该算法是正确的,且具有较高的分析效率。  相似文献   

10.
Schmeiser and Barnard described a method for producing the left parse at the end of the bottom-up parsing process. We improve their method in the sense that the left parse is actually produced during the bottom-up parsing process (i.e., with considerably less delay).  相似文献   

11.
This article describes an algorithm for incremental parsing of expressions in the context of syntax-directed editors for programming languages. Since a syntax-directed editor represents programs as trees and statements and expressions as nodes in trees, making minor modifications in an expression can be difficult. Consider, for example, changing a “ + ” operator to a “1” operator or adding a short subexpression at a syntactically but not structurally correct position, such as inserting “) 1 (d“ at the # mark in” (a + b # + c)”. To make these changes in a typical syntax-directed editor, the user must understand the tree structure and type a number of tree-oriented construction and manipulation commands. This article describes an algorithm that allows the user to think in terms of the syntax of the expression as it is displayed on the screen (in infix notation) rather than in terms of its internal representation (which is effectively prefix), while maintaining the benefits of syntax-directed editing. This algorithm is significantly different from other incremental parsing algorithms in that it does not involve modifications to a traditional parsing algorithm or the overhead of maintaining a parser stack or any data structure other than the syntax tree. Instead, the algorithm applies tree transformations, in real-time as each token is inserted or deleted, to maintain a correct syntax tree.  相似文献   

12.
In many applications, it is useful to extract structured data from sections of unstructured text. A common approach is to use pattern matching (e.g., regular expressions) or more general grammar-based techniques. In cases where exact templates or grammar fragments are not known, it is possible to use machine learning approaches, based on words or n-grams, to identify the structured data. This is generally a two-stage (train/use) process that cannot easily cope with incremental extensions of the training set. In this paper, we combine a fuzzy grammar-based approach with incremental learning. This enables a set of grammar fragments to evolve incrementally, each time a new example is given, while guaranteeing that it can parse previously seen examples. We propose a novel measure of overlap between fuzzy grammar fragments that can also be used to determine the degree to which a string is parsed by a grammar fragment. This measure of overlap allows us to compare the range of two fuzzy grammar fragments (i.e., to estimate and compare the sets of strings that fuzzily conform to each grammar) without explicitly parsing any strings. A simple application shows the method's validity.   相似文献   

13.
Visual YACC is a tool that automatically creates visualizations of the YACC LR parsing process and synthesized attribute computation. The Visual YACC tool works by instrumenting a standard YACC grammar with graphics calls that draw the appropriate data structures given the current actions by the parser. The new grammar is processed by the YACC tools and the resulting parser displays the parse stack and parse tree for every step of the parsing process of a given input string. Visual YACC was initially designed to be used in compiler construction courses to supplement the teaching of parsing and syntax directed evaluation. We have also found it to be useful in the difficult task of debugging YACC grammars. In this paper, we describe this tool and how it is used in both contexts. We also detail two different implementations of this tool: one that produces a parser written in C with calls to Motif; and a second implementation that generates Java source code. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

14.
This paper describes an original approach to semantics representation based on the use of a non-strict functional programming language with polymorphic typing. This approach provides a unified formalism needing no preprocessing or postprocessing to the functional language itself: parsing and semantics are declared naturally using function definition and evaluation is done by lambda application along the lines of Montague. We show that by changing only the model we can, after parsing, compute either the truth value of a sentence or its parse tree.  相似文献   

15.
Steven M. Kearns 《Software》1991,21(8):787-804
Regular expressions are used in many applications to specify patterns because any regular expression can be compiled into a very efficient one-pass pattern matcher called a finite automaton. Finding matches is useful, but even more useful is parse extraction, which describes in detail how a pattern matches some input. After matching an address, for example, parse extraction makes it easy to find out the Zip code part of the address. We present an elegant, efficient algorithm for extracting a parse after matching with a finite automaton. In addition, we extend the regular expression language to include new operators for matching arbitrary left context and single character right context. The extended language can be matched as efficiently as the usual regular expression language, but is more expressive. Finally, we suggest how to apply the matching algorithms to match regular expressions containing arbitrary right context and single character left context. In effect, this allows one to specify patterns that seem to require an unlimited amount of look-ahead to match.  相似文献   

16.
David R. Hanson 《Software》1985,15(12):1205-1212
Compiler writing tools, such as parser generators, are-commonly used to build new operators. Nevertheless, constructing compilers by hand remains common. Such compilers are often built using recursive-descent parsing for most of the language and operator-precedence parsing for expressions. This paper describes a simple technique for parsing expressions using recursive descent that avoids the usual proliferation of procedures that occurs when recursive descent is used to parse expressions. By taking advantage of the similarity of the productions describing expressions in most languages, the n + 1 procedures usually required to parse expressions with n precedence levels can be replaced with a table and two procedures.  相似文献   

17.
A two-phase annotation method for semantic labeling in natural language processing is proposed. The dynamic programming approach stresses on a non-exact string matching which takes full advantage of the underlying grammatical structure of the parse trees in a Treebank. The first phase of the labeling is a coarse-grained syntactic parsing which is complementary to a semantic dissimilarities analysis in its latter phase. The approach goes beyond shallow parsing to a deeper level of case role identification, while preserving robustness, without being bogged down into a complete linguistic analysis. The paper presents experimental results for recognizing more than 50 different semantic labels in 10,000 sentences. Results show that the approach improves the labeling, even though with incomplete information. Detailed evaluations are discussed in order to justify its significances.  相似文献   

18.
汉语句子的组块分析体系   总被引:26,自引:1,他引:25  
周强  孙茂松  黄昌宁 《计算机学报》1999,22(11):1158-1165
介绍了一种描述能力介于线性词序列和完整句法树表示之间的浅层句法知识描述体系-组块分析体系,并详细讨论了其中两大部分;词界块和成分组的基本内容及其自动识别算法,在此基础上,提出了一种分阶段构造汉语树库的新设想,即先构造组块库,再构造树库,进行了一系列句法分析和知识获取实验,包括1)自然识别汉语最长名词短语;2)自动获取汉语句法知识等。所有这些工作都证明了这种知识描述体系的实用性和有效性。  相似文献   

19.
数据抽取常用正则表达式(RE)来描述数据源.为实现可视化描述,需将RE转换成分析树.但现有基于改写的RE分析树构造方法会破坏数据对象的内在结构,不能用于数据抽取问题.提出了一种无改写的RE分析树构造算法.实验表明,该算法在时空间性能和实用性等方面优于现有RE分析树构造算法.  相似文献   

20.
上下文无关语言分析树的一种表示形式   总被引:5,自引:0,他引:5  
介绍了上下文无关语言(CFL)的句子的一种分析树表示,它适用用于一类与以往不同的CFL的应用,即对分析树空间效率要求较高且不需标记分析树的应用,典型的就是把CFL的句子用作算法加工对象,这种表示比传统分析树不仅空间较小,而且进行结构匹配的快速快,还介绍了这种分析树表示的实现技术。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号