首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 80 毫秒
1.
In its recogniser form, Earley's algorithm for testing whether a string can be derived from a grammar is worst case cubic on general context free grammars (CFG). Earley gave an outline of a method for turning his recognisers into parsers, but it turns out that this method is incorrect. Tomita's GLR parser returns a shared packed parse forest (SPPF) representation of all derivations of a given string from a given CFG but is worst case unbounded polynomial order. We have given a modified worst-case cubic version, the BRNGLR algorithm, that, for any string and any CFG, returns a binarised SPPF representation of all possible derivations of a given string. In this paper we apply similar techniques to develop two versions of an Earley parsing algorithm that, in worst-case cubic time, return an SPPF representation of all derivations of a given string from a given CFG.  相似文献   

2.
In their recogniser forms, the Earley and RIGLR algorithms for testing whether a string can be derived from a grammar are worst-case cubic on general context free grammars (CFG). Earley gave an outline of a method for turning his recognisers into parsers, but it turns out that this method is incorrect. Tomita’s GLR parser returns a shared packed parse forest (SPPF) representation of all derivations of a given string from a given CFG but is worst-case unbounded polynomial order. The parser version of the RIGLR algorithm constructs Tomita-style SPPFs and thus is also worst-case unbounded polynomial order. We have given a modified worst-case cubic GLR algorithm, that, for any string and any CFG, returns a binarised SPPF representation of all possible derivations of a given string. In this paper we apply similar techniques to develop worst-case cubic Earley and RIGLR parsing algorithms.  相似文献   

3.
Tomita-style generalised LR (GLR) algorithms extend the standard LR algorithm to non-deterministic grammars by performing all possible choices of action. Cubic complexity is achieved if all rules are of length at most two. In this paper we shall show how to achieve cubic time bounds for all grammars by binarising the search performed whilst executing reduce actions in a GLR-style parser. We call the resulting algorithm Binary Right Nulled GLR (BRNGLR) parsing. The binarisation process generates run-time behaviour that is related to that shown by a parser which pre-processes its grammar or parse table into a binary form, but without the increase in table size and with a reduced run-time space overhead. BRNGLR parsers have worst-case cubic run time on all grammars, linear behaviour on LR(1) grammars and produce, in worst-case cubic time, a cubic size binary SPPF representation of all the derivations of a given sentence.  相似文献   

4.
This article explores how the effectiveness of learning to parse with neural networks can be improved by including two architectural features relevant to language: generalisations across syntactic constituents and bounded resource effects. A number of neural network parsers have recently been proposed, each with a different approach to the representational problem of outputting parse trees. In addition, some of the parsers have explicitly attempted to capture an important regularity within language, which is to generalise information across syntactic constituents. A further property of language is that natural bounds exist for the number of constituents which a parser need retain for later processing. Both the generalisations and the resource bounds may be captured in architectural features which enhance the effectiveness and efficiency of learning to parse with neural networks. We describe a number of different types of neural network parser, and compare them with respect to these two features. These features are both explicitly present in the Simple Synchrony Network parser, and we explore and illustrate their impact on the process of learning to parse in some experiments with a recursive grammar.  相似文献   

5.
基于动作建模的中文依存句法分析   总被引:1,自引:0,他引:1  
决策式依存句法分析,也就是基于分析动作的句法分析方法,常常被认为是一种高效的分析算法,但是它的性能稍低于一些更复杂的句法分析模型。本文将决策式句法分析同产生式、判别式句法分析这些复杂模型做了比较,试验数据采用宾州中文树库。结果显示,对于中文依存句法分析,决策式句法分析在性能上好于产生式和判别式句法分析。更进一步,我们观察到决策式句法分析是一种贪婪的算法,它在每个分析步骤只挑选最有可能的分析动作而丢失了对整句话依存分析的全局视角。基于此,我们提出了两种模型用来对句法分析动作进行建模以避免原决策式依存分析方法的贪婪性。试验结果显示,基于动作建模的依存分析模型在性能上好于原决策式依存分析方法,同时保持了较低的时间复杂度。  相似文献   

6.
Dependency parsers, which are widely used in natural language processing tasks, employ a representation of syntax in which the structure of sentences is expressed in the form of directed links (dependencies) between their words. In this article, we introduce a new approach to transition‐based dependency parsing in which the parsing algorithm does not directly construct dependencies, but rather undirected links, which are then assigned a direction in a postprocessing step. We show that this alleviates error propagation, because undirected parsers do not need to observe the single‐head constraint, resulting in better accuracy. Undirected parsers can be obtained by transforming existing directed transition‐based parsers as long as they satisfy certain conditions. We apply this approach to obtain undirected variants of three different parsers (the Planar, 2‐Planar, and Covington algorithms) and perform experiments on several data sets from the CoNLL‐X shared tasks and on the Wall Street Journal portion of the Penn Treebank, showing that our approach is successful in reducing error propagation and produces improvements in parsing accuracy in most of the cases and achieving results competitive with state‐of‐the‐art transition‐based parsers.  相似文献   

7.
H. Mssenbck 《Software》1988,18(7):691-700
We present a simple method for connecting semantic actions to parsers. Although applicable to any kind of parser it is especially suited for LR parsers. The method is based on the idea of separating syntax analysis and semantic processing and executing semantic actions by procedures, similar to those of a recursive descent compiler. The procedures are driven by structural information about the source program, which is collected during parsing. The method is applicable to L-attributed grammars. It can be incorporated easily into any existing parser.  相似文献   

8.
汉语术语定义的结构分析和提取   总被引:13,自引:2,他引:13  
本文介绍的工作是在汉语句法分析研究基础上的一种应用研究,对术语如何下定义问题进行了理论上的探讨。术语的定义形式在汉语语法结构方面提供了模板结构和构成方式,可以作为知识发现研究的数据基础,也可以作为特定领域的语法知识系统。本文针对电子学和计算机领域的语料进行了分词和词性标注处理,然后应用句法分析工具分析出句子中的短语成分,并根据汉语句子的句型结构,总结出术语定义的结构特点,自动提取定义的模板。最后根据已建立的数据和概念描述,给出了术语发现的算法。  相似文献   

9.
This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the original analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people do), and can be used to induce a grammar capable of analysing such sentences. This article demonstrates these two applications using the Penn Treebank. In a robustness evaluation experiment, two state-of-the-art statistical parsers are evaluated on an ungrammatical version of Sect. 23 of the Wall Street Journal (WSJ) portion of the Penn treebank. This experiment shows that the performance of both parsers degrades with grammatical noise. A breakdown by error type is provided for both parsers. A second experiment retrains both parsers using an ungrammatical version of WSJ Sections 2–21. This experiment indicates that an ungrammatical treebank is a useful resource in improving parser robustness to grammatical errors, but that the correct combination of grammatical and ungrammatical training data has yet to be determined.  相似文献   

10.
Summary An extended LR(k) (ELR(k)) grammar is a context free grammar in which the right sides of the productions are regular expressions and which can be parsed from left to right with k symbol look-ahead. We present a practical algorithm for producing small fast parsers directly from certain ELR(k) grammars, and an algorithm for converting the remaining ELR(k) grammars into a form that can be processed by the first algorithm. This method, when combined with previously developed methods for improving the efficiency of LR(k) parsers, usually produces parsers that are significantly smaller and faster than those produced by previous LR(k) and ELR(k) algorithms.  相似文献   

11.
A new approach to describing communication protocols is introduced. In the style of a formal language, the protocol is considered as the set of all legal sequences of symbols that can be exchanged by the communicating processes. Although context free grammars cannot adequately describe such sequences, it is shown that attribute grammars may be used. Examples are given which show that common protocol features such as interleaving, windowing and flow control can be described by attribute grammars.It is shown how deadlock-proneness of a protocol can be formalised as a property of its attribute grammar specification, and the undecidability of deadlock-proneness for arbitrary grammars is proved. An algorithm is given for determining whether a protocol is deadlock-prone in the decidable case.A method of automatically implementing protocols from their specifications is described. The implementation takes the form of a pair of communicating attributed pushdown automata. These are based on LR(0) parsers, with attribute evaluation being performed in parallel with the parse; attribute values are used to help direct the parse. Consideration is also given to the handling of errors.  相似文献   

12.
The work presented here attempts to bring out some fundamental concepts that underlie some known parsing algorithms, usually called chart or dynamic programming parsers, in the hope of guiding the design of similar algorithms for other formalisms that could be considered for describing the "surface" syntax of languages. The key idea is that chart parsing is essentially equivalent to a simple construction of the intersection of the language (represented by its grammar) with a regular set containing only the input sentence to be parsed (represented by a finite state machine). The resulting grammar for that intersection is precisely what is usually called a shared forest: it represents all parses of a syntactically ambiguous sentence. Since most techniques for processing ill-formed input can be modeled by considering a nonsingleton regular set of input sentences, we can expect to generalize these ill-formed input processing techniques to all parsers describable with our approach.  相似文献   

13.
We present a compiler that can be used to automatically obtain efficient Java implementations of parsing algorithms from formal specifications expressed as parsing schemata. The system performs an analysis of the inference rules in the input schemata in order to determine the best data structures and indexes to use, and to ensure that the generated implementations are efficient. The system described is general enough to be able to handle all kinds of schemata for different grammar formalisms, such as context‐free grammars and tree‐adjoining grammars, and it provides an extensibility mechanism allowing the user to define custom notational elements. This compiler has proven very useful for analyzing, prototyping and comparing natural‐language parsers in real domains, as can be seen in the empirical examples provided at the end of the paper. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

14.
Recurrent neural networks processing symbolic strings can be regarded as adaptive neural parsers. Given a set of positive and negative examples, picked up from a given language, adaptive neural parsers can effectively be trained to infer the language grammar. In this paper we use adaptive neural parsers to face the problem of inferring grammars from examples that are corrupted by a kind of noise that simply changes their membership. We propose a training algorithm, referred to as hybrid finite state filter, which is based on a parsimony principle that penalizes the development of complex rules. We report very promising experimental results showing that the proposed inductive inference scheme is indeed capable of capturing rules, while removing noise.  相似文献   

15.
A wide range of parser generators are used to generate parsers for programming languages. The grammar formalisms that come with parser generators provide different approaches for defining operator precedence. Some generators (e.g. YACC) support precedence declarations, others require the grammar to be unambiguous, thus encoding the precedence rules. Even if the grammar formalism provides precedence rules, a particular grammar might not use it. The result is grammar variants implementing the same language. For the C language, the GNU Compiler uses YACC with precedence rules, the C-Transformers uses SDF without priorities, while the SDF library does use priorities. For PHP, Zend uses YACC with precedence rules, whereas PHP-front uses SDF with priority and associativity declarations.The variance between grammars raises the question if the precedence rules of one grammar are compatible with those of another. This is usually not obvious, since some languages have complex precedence rules. Also, for some parser generators the semantics of precedence rules is defined operationally, which makes it hard to reason about their effect on the defined language. We present a method and tool for comparing the precedence rules of different grammars and parser generators. Although it is undecidable whether two grammars define the same language, this tool provides support for comparing and recovering precedence rules, which is especially useful for reliable migration of a grammar from one grammar formalism to another. We evaluate our method by the application to non-trivial mainstream programming languages, such as PHP and C.  相似文献   

16.
The LR(0) goto-graph is the basis for the construction of parsers for several interesting grammar classes such as LALR and GLR. Early work has shown that even when a grammar is an extension to another, the goto-graph of the first is not necessarily a subgraph of the second. Some authors presented algorithms to grow and shrink these graphs incrementally, but the formal proof of the existence of a particular relation between a given goto-graph and a grown or shrunk counterpart seems to be still missing in literature as of today. In this paper we use the recursive projection of paths of limited length to prove the existence of one such relation, when the sets of productions are in a subset relation. We also use this relation to present two algorithms (Grow and Shrink) that transform the goto-graph of a given grammar into the goto-graph of an extension or a restriction to that grammar. We implemented these algorithms in a dynamically updatable LALR parser generator called DEXTER (the Dynamically EXTEnsible Recognizer) that we are now shipping with our current implementation of the Neverlang framework for programming language development.  相似文献   

17.
Haruno  Masahiko  Shirai  Satoshi  Ooyama  Yoshifumi 《Machine Learning》1999,34(1-3):131-149
This paper describes a novel and practical Japanese parser that uses decision trees. First, we construct a single decision tree to estimate modification probabilities; how one phrase tends to modify another. Next, we introduce a boosting algorithm in which several decision trees are constructed and then combined for probability estimation. The constructed parsers are evaluated using the EDR Japanese annotated corpus. The single-tree method significantly outperforms the conventional Japanese stochastic methods. Moreover, the boosted version of the parser is shown to have great advantages; (1) a better parsing accuracy than its single-tree counterpart for any amount of training data and (2) no over-fitting to data for various iterations. The presented parser, the first non-English stochastic parser with practical performance, should tighten the coupling between natural language processing and machine learning.  相似文献   

18.
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).  相似文献   

19.
The use of a single grammar in natural language parsing and generation is most desirable for a variety of reasons, including efficiency, perspicuity, integrity, robustness, and a certain amount of elegance. These characteristics have been noted before by several researchers, but it was only recently that more serious attention started to be paid to the problem of creating a bidirectional system for natural language processing. In this paper we discuss a somewhat more radical version of the problem: given a parser for a language, can we reverse it so that it becomes an efficient generator for the same language? Furthermore, since both the parser and the generator are based upon the same grammar, are there any normalization conditions upon the form of the grammar that must be met in order to assure the maximum efficiency of the reversed program? Can other grammars be transformed into the normal form? We describe the results of an experiment with PROLOG-based logic grammar which has been derived from a substantial-coverage string grammar for English. We present an alogorithm for automated inversion of a unification parser into an efficient unification generator, using the collections of minimal sets of essential arguments for predicates. We discuss the scope of the present version of the algorithm and then point out several possible avenues for extension. We also outline a preliminary solution to the question of grammar's “normal form” and suggest a handful of normalizing transformations that can be used to enhance the efficiency of the generator. This research interacts closely with a Japanese-English machine translation project at New York University, for which the first implementation of the inversion algorithm has been prepared.  相似文献   

20.
It is shown that if the basic method for eliminating single productions from canonical LR parsers developed by Pager is applied to an SLR parser and the resulting parser is free of conflicts, then the resulting parser is a valid parser which accepts exactly the strings in the language. However, if the elimination process is performed during the construction of an SLR parser, then the resulting parser may be invalid even if it were free of conflicts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号