期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reversible logic grammars for natural language parsing and generation

Tomek Strzalkowski 《Computational Intelligence》1990,6(3):145-171

The use of a single grammar in natural language parsing and generation is most desirable for a variety of reasons, including efficiency, perspicuity, integrity, robustness, and a certain amount of elegance. These characteristics have been noted before by several researchers, but it was only recently that more serious attention started to be paid to the problem of creating a bidirectional system for natural language processing. In this paper we discuss a somewhat more radical version of the problem: given a parser for a language, can we reverse it so that it becomes an efficient generator for the same language? Furthermore, since both the parser and the generator are based upon the same grammar, are there any normalization conditions upon the form of the grammar that must be met in order to assure the maximum efficiency of the reversed program? Can other grammars be transformed into the normal form? We describe the results of an experiment with PROLOG-based logic grammar which has been derived from a substantial-coverage string grammar for English. We present an alogorithm for automated inversion of a unification parser into an efficient unification generator, using the collections of minimal sets of essential arguments for predicates. We discuss the scope of the present version of the algorithm and then point out several possible avenues for extension. We also outline a preliminary solution to the question of grammar's “normal form” and suggest a handful of normalizing transformations that can be used to enhance the efficiency of the generator. This research interacts closely with a Japanese-English machine translation project at New York University, for which the first implementation of the inversion algorithm has been prepared. 相似文献

2.

Grammar Engineering Support for Precedence Rule Recovery and Compatibility Checking

Eric Bouwers Martin Bravenboer Eelco Visser 《Electronic Notes in Theoretical Computer Science》2008,203(2):85

A wide range of parser generators are used to generate parsers for programming languages. The grammar formalisms that come with parser generators provide different approaches for defining operator precedence. Some generators (e.g. YACC) support precedence declarations, others require the grammar to be unambiguous, thus encoding the precedence rules. Even if the grammar formalism provides precedence rules, a particular grammar might not use it. The result is grammar variants implementing the same language. For the C language, the GNU Compiler uses YACC with precedence rules, the C-Transformers uses SDF without priorities, while the SDF library does use priorities. For PHP, Zend uses YACC with precedence rules, whereas PHP-front uses SDF with priority and associativity declarations.The variance between grammars raises the question if the precedence rules of one grammar are compatible with those of another. This is usually not obvious, since some languages have complex precedence rules. Also, for some parser generators the semantics of precedence rules is defined operationally, which makes it hard to reason about their effect on the defined language. We present a method and tool for comparing the precedence rules of different grammars and parser generators. Although it is undecidable whether two grammars define the same language, this tool provides support for comparing and recovering precedence rules, which is especially useful for reliable migration of a grammar from one grammar formalism to another. We evaluate our method by the application to non-trivial mainstream programming languages, such as PHP and C. 相似文献

3.

Parsing Arabic using induced probabilistic context free grammar

Nabil?Khoufi Email author Chafik?Aloulou Lamia?Hadrich?Belguith 《International Journal of Speech Technology》2016,19(2):313-323

The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %). 相似文献

4.

Semi‐automatic grammar recovery

R. Lmmel C. Verhoef 《Software》2001,31(15):1395-1438

We propose an approach to the construction of grammars for existing languages. The main characteristic of the approach is that the grammars are not constructed from scratch but they are rather recovered by extracting them from language references, compilers and other artifacts. We provide a structured process to recover grammars including the adaptation of raw extracted grammars and the derivation of parsers. The process is applicable to possibly all existing languages for which business critical applications exist. We illustrate the approach with a non‐trivial case study. Using our process and some basic tools, we constructed in a few weeks a complete and correct VS COBOL II grammar specification for IBM mainframes. In addition, we constructed a parser for VS COBOL II, and were the first to publish a (Web‐enabled) grammar specification so that others can use this result to construct their own grammar‐based tools for VS COBOL II or derivatives. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

5.

On the semantics of parsing actions

《Science of Computer Programming》2014

Parsers, whether constructed by hand or automatically via a parser generator tool, typically need to compute some useful semantic information in addition to the purely syntactic analysis of their input. Semantic actions may be added to parsing code by hand, or the parser generator may have its own syntax for annotating grammar rules with semantic actions. In this paper, we take a functional programming view of such actions. We use concepts from the semantics of mostly functional programming languages and adapt them to give meaning to the actions of the parser. Specifically, the semantics is inspired by the categorical semantics of lambda calculi and the use of premonoidal categories for the semantics of effects in programming languages. This framework is then applied to our leading example, the transformation of grammars to eliminate left recursion. The syntactic transformation of left-recursion elimination leads to a corresponding semantic transformation of the actions for the grammar. We prove the semantic transformation correct and relate it to continuation passing style, a widely studied transformation in lambda calculi and functional programming. As an idealization of the input language of parser generators, we define a call-by-value calculus with first-order functions and a type-and-effect system where the effects are given by sequences of grammar symbols. The account of left-recursion elimination is then extended to this calculus. 相似文献

6.

Component-based LR parsing

Xiaoqing Wu Barrett R. Bryant Jeff Gray Marjan Mernik 《Computer Languages, Systems and Structures》2010,36(1):16-33

A language implementation with proper compositionality enables a compiler developer to divide-and-conquer the complexity of building a large language by constructing a set of smaller languages. Ideally, these small language implementations should be independent of each other such that they can be designed, implemented and debugged individually, and later be reused in different applications (e.g., building domain-specific languages). However, the language composition offered by several existing parser generators resides at the grammar level, which means all the grammar modules need to be composed together and all corresponding ambiguities have to be resolved before generating a single parser for the language. This produces tight coupling between grammar modules, which harms information hiding and affects independent development of language features. To address this problem, we have developed a novel parsing algorithm that we call Component-based LR (CLR) parsing, which provides code-level compositionality for language development by producing a separate parser for each grammar component. In addition to shift and reduce actions, the algorithm extends general LR parsing by introducing switch and return actions to empower the parsing action to jump from one parser to another. Our experimental evaluation demonstrates that CLR increases the comprehensibility, reusability, changeability and independent development ability of the language implementation. Moreover, the loose coupling among parser components enables CLR to describe grammars that contain LR parsing conflicts or require ambiguous token definitions, such as island grammars and embedded languages. 相似文献

7.

From functional specification to syntactic structures: systemic grammar and tree adjoining grammar

Gijoo Yang Kathleen F. McCoy K. Vijay-Shanker 《Computational Intelligence》1991,7(4):207-219

In this paper we provide an implementation strategy to map a functional specification of an utterance into a syntactically well-formed sentence. We do this by integrating the functional and the syntactic perspectives on language, which we take to be exemplified by systemic grammars and tree adjoining grammars (TAGs) respectively. From systemic grammars we borrow the use of networks of choices to classify the set of possible constructions. The choices expressed in an input are mapped by our generator to a syntactic structure as defined by a TAG. We argue that the TAG structures can be appropriate structural units of realization in an implementation of a generator based on systemic grammar and also that a systemic grammar provides an effective means of deciding between various syntactic possibilities expressed in a TAG grammar. We have developed a generation strategy which takes advantage of what both paradigms offer to generation, without compromising either. 相似文献

8.

Incremental generation of parsers

Heering J. Klint P. Rekers J. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(12):1344-1351

An LR-based parser generator for arbitrary context-free grammars that generates parsers by need and handles modifications to its input grammar by updating the parser it has generated so far is described. The need for these techniques is discussed in the context of interactive language definition environments. All required algorithms are presented. Measurements are given comparing their performance with that of conventional techniques 相似文献

9.

Analyzing ambiguity of context-free grammars

Claus Brabrand Anders Møller 《Science of Computer Programming》2010,75(3):176-191

It has been known since 1962 that the ambiguity problem for context-free grammars is undecidable. Ambiguity in context-free grammars is a recurring problem in language design and parser generation, as well as in applications where grammars are used as models of real-world physical structures.We observe that there is a simple linguistic characterization of the grammar ambiguity problem, and we show how to exploit this by presenting an ambiguity analysis framework based on conservative language approximations. As a concrete example, we propose a technique based on local regular approximations and grammar unfoldings. We evaluate the analysis using grammars that occur in RNA analysis in bioinformatics, and we demonstrate that it is sufficiently precise and efficient to be practically useful. 相似文献

10.

A pure embedding of attribute grammars

Anthony M. Sloane Lennart C.L. Kats Eelco Visser 《Science of Computer Programming》2013

Attribute grammars are a powerful specification paradigm for many language processing tasks, particularly semantic analysis of programming languages. Recent attribute grammar systems use dynamic scheduling algorithms to evaluate attributes on demand. In this paper, we show how to remove the need for a generator, by embedding a dynamic approach in a modern, object-oriented and functional programming language. The result is a small, lightweight attribute grammar library that is part of our larger Kiama language processing library. Kiama’s attribute grammar library supports a range of advanced features including cached, uncached, higher order, parameterised and circular attributes. Forwarding is available to modularise higher order attributes and decorators abstract away from the details of attribute value propagation. Kiama also implements new techniques for dynamic extension and variation of attribute equations. We use the Scala programming language because of its support for domain-specific notations and emphasis on scalability. Unlike generators with specialised notation, Kiama attribute grammars use standard Scala notations such as pattern-matching functions for equations, traits and mixins for composition and implicit parameters for forwarding. A benchmarking exercise shows that our approach is practical for realistic language processing. 相似文献

11.

Recovering grammar relationships for the Java Language Specification

Ralf Lämmel Vadim Zaytsev 《Software Quality Journal》2011,19(2):333-378

Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent. 相似文献

12.

A survey of grammatical inference in software engineering

《Science of Computer Programming》2014

Grammatical inference – used successfully in a variety of fields such as pattern recognition, computational biology and natural language processing – is the process of automatically inferring a grammar by examining the sentences of an unknown language. Software engineering can also benefit from grammatical inference. Unlike these other fields, which use grammars as a convenient tool to model naturally occurring patterns, software engineering treats grammars as first-class objects typically created and maintained for a specific purpose by human designers. We introduce the theory of grammatical inference and review the state of the art as it relates to software engineering. 相似文献

13.

LR parsing of probabilistic grammars with input uncertainty for speech recognition 总被引：1，自引：0，他引：1

J. H. Wright 《Computer Speech and Language》1990,4(4)

A shift-reduce parser for probabilistic context-free grammars is described, based on the LR algorithm. Each of the standard types of LR parser generator has a probabilistic version and a Bayesian interpretation is advanced. A graph-structured stack permits action conflicts and allows the parser to be used with uncertain input, typical of speech recognition applications. The sentence uncertainty is measured using entropy and is found to be significantly lower for the grammar than for a derived first-order Markov model. 相似文献

14.

The metafront System: Extensible Parsing and Transformation

Claus Brabrand Michael I. Schwartzbach Mads Vanggaard 《Electronic Notes in Theoretical Computer Science》2003,82(3):592-611

We present the metafront tool for specifying flexible, safe, and efficient syntactic transformations between languages defined by context-free grammars. The transformations are guaranteed to terminate and to map grammatically legal input to grammatically legal output.We rely on a novel parser algorithm that is designed to support gradual extensions of a grammar by allowing productions to remain in a natural style and by statically reporting ambiguities and errors in terms of individual productions as they are being added.Our tool may be used as a parser generator in which the resulting parser automatically supports a flexible, safe, and efficient macro processor, or as an extensible lightweight compiler generator for domain-specific languages. We show substantial examples of both kinds. 相似文献

15.

An interactive method for extracting grammar from programs

Rahul Jain Sanjeev Kumar Aggarwal Pankaj Jalote Shiladitya Biswas 《Software》2004,34(5):433-447

The grammar of the language in which some given code is written is essential for developing automated tools for maintenance, reengineering, and program analysis. Frequently grammar is available for a language but not for its variants that are implemented by various vendors and in which the given code may be written. In this work we address the problem of obtaining the grammar from source code, which can then be used for generating tools for the programs. We propose an incremental method for obtaining grammar for a particular language variant, from a set of programs written in the language variant and an approximate grammar (presumably of the standard language) with some user interaction. We also present the design of a tool for implementing this approach and our experience in working with grammars of C, C++ and COBOL. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

16.

Even faster generalized LR parsing 总被引：2，自引：1，他引：1

John Aycock Nigel Horspool Jan Janoušek Bořivoj Melichar 《Acta Informatica》2001,37(9):633-651

We prove a property of generalized LR (GLR) parsing – if the grammar is without right and hidden left recursions, then the number of consecutive reductions between the shifts of two adjacent symbols cannot be greater than a constant. Further, we show that this property can be used for constructing an optimized version of our GLR parser. Compared with a standard GLR parser, our optimized parser reads one symbol on every transition and performs significantly fewer stack operations. Our timings show that, especially for highly ambiguous grammars, our parser is significantly faster than a standard GLR parser. Received: 9 May 2000 / 5 March 2001 相似文献

17.

Rie,a compiler generator based on a one-pass-type attribute grammar

Masataka Sassa Harushi Ishizuka Ikuo Nakata 《Software》1995,25(3):229-250

相似文献

18.

Decorating tokens to facilitate recognition of ambiguous language constructs

Brian A. Malloy Tanton H. Gibbs James F. Power 《Software》2003,33(1):19-39

Software tools are fundamental to the comprehension, analysis, testing and debugging of application systems. A necessary first step in the development of many tools is the construction of a parser front‐end that can recognize the implementation language of the system under development. In this paper, we describe our use of token decoration to facilitate recognition of ambiguous language constructs. We apply our approach to the C++ language since its grammar is replete with ambiguous derivations such as the declaration/expression and template‐declaration/expression ambiguity. We describe our implementation of a parser front‐end for C++, keystone, and we describe our results in decorating tokens for our test suite including the examples from Clause Three of the C++ standard. We are currently exploiting the keystone front‐end to develop a taxonomy for implementation‐based class testing and to reverse‐engineer Unified Modeling Language (UML) class diagrams. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

19.

The IELR(1) algorithm for generating minimal LR(1) parser tables for non-LR(1) grammars with conflict resolution

Joel E. Denny Brian A. Malloy 《Science of Computer Programming》2010,75(11):943-979

There has been a recent effort in the literature to reconsider grammar-dependent software development from an engineering point of view. As part of that effort, we examine a deficiency in the state of the art of practical LR parser table generation. Specifically, LALR sometimes generates parser tables that do not accept the full language that the grammar developer expects, but canonical LR is too inefficient to be practical particularly during grammar development. In response, many researchers have attempted to develop minimal LR parser table generation algorithms. In this paper, we demonstrate that a well known algorithm described by David Pager and implemented in Menhir, the most robust minimal LR(1) implementation we have discovered, does not always achieve the full power of canonical LR(1) when the given grammar is non-LR(1) coupled with a specification for resolving conflicts. We also detail an original minimal LR(1) algorithm, IELR(1) (Inadequacy Elimination LR(1)), which we have implemented as an extension of GNU Bison and which does not exhibit this deficiency. Using our implementation, we demonstrate the relevance of this deficiency for several real-world parser specifications, and we demonstrate the feasibility of IELR(1). Finally, we demonstrate that, if canonical LR(1) were employed instead, grammar development would be severely impeded regardless of the power of the computer hardware. 相似文献

20.

AN UNSUPERVISED INCREMENTAL LEARNING ALGORITHM FOR DOMAIN-SPECIFIC LANGUAGE DEVELOPMENT

Faizan Javed Marjan Mernik Alan Sprague 《Applied Artificial Intelligence》2013,27(7-8):707-729

While grammar inference (or grammar induction) has found extensive application in the areas of robotics, computational biology, and speech recognition, its application to problems in programming language and software engineering domains has been limited. We have found a new application area for grammar inference which intends to make domain-specific language development easier for domain experts not well versed in programming language design, and finds a second application in construction of renovation tools for legacy software systems. As a continuation of our previous efforts to infer context-free grammars (CFGs) for domain-specific languages which previously involved a genetic-programming based CFG inference system, we discuss extensions to the inference capabilities of GenInc, an incremental learning algorithm for inferring CFGs. We show that these extensions enable GenInc to infer more comprehensive grammars, discuss the results of applying GenInc to various domain-specific languages and evaluate the results using a comprehensive suite of grammar metrics. 相似文献