首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
R. Lmmel  C. Verhoef 《Software》2001,31(15):1395-1438
We propose an approach to the construction of grammars for existing languages. The main characteristic of the approach is that the grammars are not constructed from scratch but they are rather recovered by extracting them from language references, compilers and other artifacts. We provide a structured process to recover grammars including the adaptation of raw extracted grammars and the derivation of parsers. The process is applicable to possibly all existing languages for which business critical applications exist. We illustrate the approach with a non‐trivial case study. Using our process and some basic tools, we constructed in a few weeks a complete and correct VS COBOL II grammar specification for IBM mainframes. In addition, we constructed a parser for VS COBOL II, and were the first to publish a (Web‐enabled) grammar specification so that others can use this result to construct their own grammar‐based tools for VS COBOL II or derivatives. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

3.
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).  相似文献   

4.
In this paper, we describe our experience in grammar engineering to construct multiple parsers and front ends for the Python language. We present a metrics-based study of the evolution of the Python grammars through the multiple versions of the language in an effort to distinguish and measure grammar evolution and to provide a basis of comparison with related research in grammar engineering. To conduct this research, we have built a toolkit, pygrat , which builds on tools developed in other research. We use pygrat to build a system that automates much of the process needed to translate the Python grammars from EBNF to a formalism acceptable to the bison parser generator. We exploit the suite of Python test cases, used by the Python developers, to validate our parser generation. Finally, we describe our use of the menhir parser generator to facilitate the parser and front-end construction, eliminating some of the transformations and providing practical support for grammar modularisation.  相似文献   

5.
We establish a semantics for building grammars from a modularised specification in which modules are able to delete productions from imported nonterminals. Modules have import lists of nonterminals; some or all of an imported nonterminal's productions may be suppressed at import time. There are two basic import mechanisms which (a) reference or (b) clone an imported nonterminal's productions. One of our goals is to allow a precise answer to the question: ‘what character level language does this grammar generate’ in the face of difficult issues such as the mutual embedding of languages that have different whitespace and commenting conventions. Our technique is to automatically generate a character level grammar from grammars written at token level in the conventional way; the grammar is constructed from modules each of which may have its own whitespace convention.  相似文献   

6.
Berkeley FrameNet is a lexico-semantic resource for English based on the theory of frame semantics. It has been exploited in a range of natural language processing applications and has inspired the development of framenets for many languages. We present a methodological approach to the extraction and generation of a computational multilingual FrameNet-based grammar and lexicon. The approach leverages FrameNet-annotated corpora to automatically extract a set of cross-lingual semantico-syntactic valence patterns. Based on data from Berkeley FrameNet and Swedish FrameNet, the proposed approach has been implemented in Grammatical Framework (GF), a categorial grammar formalism specialized for multilingual grammars. The implementation of the grammar and lexicon is supported by the design of FrameNet, providing a frame semantic abstraction layer, an interlingual semantic application programming interface (API), over the interlingual syntactic API already provided by GF Resource Grammar Library. The evaluation of the acquired grammar and lexicon shows the feasibility of the approach. Additionally, we illustrate how the FrameNet-based grammar and lexicon are exploited in two distinct multilingual controlled natural language applications. The produced resources are available under an open source license.  相似文献   

7.
In text generation, various kinds of choices need to be decided. In conventional frameworks, which we callone-path generation frameworks, choices are made in an order carefully designed in advance. In general, however, since choices depend on one another, it is difficult to make optimal decisions in such frameworks. Our approach to this issue is to introduce the revision process into the overall generation process. In our framework, revision of output texts is realized as dependency-directed backtracking (DDB). As well as Justification-based Truth Maintenance System (JTMS), we maintain dependencies among choices in a dependency network. In this paper, we propose an efficient implementation of DDB for text generation using functional unification grammar (FUG). We use bindings of logical variables in Prolog and destructive argument substitutions to decrease the overhead of handling a dependency network. This paper describes the algorithm in detail and shows the results of preliminary experiments to demonstrate the performance of our implementation.  相似文献   

8.
We callsyntactic coding a technique which converts syntactic principles or constraints on representations into grammatical rules which can be implemented in any given rule grammar. In this paper we show that any principle or constraint on output trees formalizable in a certain fragment of dynamic logic over trees can be coded in this sense. This allows to reduce in a mechanical fashion most of the current theories of government and binding into GPSG-style grammars. This will be exemplified with Rizzi'sRelativized Minimality.  相似文献   

9.
The grammar of the language in which some given code is written is essential for developing automated tools for maintenance, reengineering, and program analysis. Frequently grammar is available for a language but not for its variants that are implemented by various vendors and in which the given code may be written. In this work we address the problem of obtaining the grammar from source code, which can then be used for generating tools for the programs. We propose an incremental method for obtaining grammar for a particular language variant, from a set of programs written in the language variant and an approximate grammar (presumably of the standard language) with some user interaction. We also present the design of a tool for implementing this approach and our experience in working with grammars of C, C++ and COBOL. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

10.
The use of a single grammar in natural language parsing and generation is most desirable for a variety of reasons, including efficiency, perspicuity, integrity, robustness, and a certain amount of elegance. These characteristics have been noted before by several researchers, but it was only recently that more serious attention started to be paid to the problem of creating a bidirectional system for natural language processing. In this paper we discuss a somewhat more radical version of the problem: given a parser for a language, can we reverse it so that it becomes an efficient generator for the same language? Furthermore, since both the parser and the generator are based upon the same grammar, are there any normalization conditions upon the form of the grammar that must be met in order to assure the maximum efficiency of the reversed program? Can other grammars be transformed into the normal form? We describe the results of an experiment with PROLOG-based logic grammar which has been derived from a substantial-coverage string grammar for English. We present an alogorithm for automated inversion of a unification parser into an efficient unification generator, using the collections of minimal sets of essential arguments for predicates. We discuss the scope of the present version of the algorithm and then point out several possible avenues for extension. We also outline a preliminary solution to the question of grammar's “normal form” and suggest a handful of normalizing transformations that can be used to enhance the efficiency of the generator. This research interacts closely with a Japanese-English machine translation project at New York University, for which the first implementation of the inversion algorithm has been prepared.  相似文献   

11.
12.
This paper gives a simple method for providing categorial brands of feature-based unification grammars with a model-theoretic semantics. The key idea is to apply the paradigm of fibred semantics (or layered logics, see Gabbay (1990)) in order to combine the two components of a feature-based grammar logic. We demonstrate the method for the augmentation of Lambek categorial grammar with Kasper/Rounds-style feature logic. These are combined by replacing (or annotating) atomic formulas of the first logic, i.e. the basic syntactic types, by formulas of the second. Modelling such a combined logic is less trivial than one might expect. The direct application of the fibred semantics method where a combined atomic formula like np (num: sg & pers: 3rd) denotes those strings which have the indicated property and the categorial operators denote the usual left- and right-residuals of these string sets, does not match the intuitive, unification-based proof theory. Unification implements a global bookkeeping with respect to a proof whereas the direct fibring method restricts its view to the atoms of the grammar logic. The solution is to interpret the (embedded) feature terms as global feature constraints while maintaining the same kind of fibred structures. For this adjusted semantics, the anticipated proof system is sound and complete.  相似文献   

13.
A graph grammar programming style for recognition of music notation   总被引:1,自引:0,他引:1  
Graph grammars are a promising tool for solving picture processing problems. However, the application of graph grammars to diagram recognition has been limited to rather simple analysis of local symbol configurations. This paper introduces the Build-Weed-Incorporate programming style for graph grammars and shows its application in determining the meaning of complex diagrams, where the interaction among physically distant symbols is semantically important. Diagram recognition can be divided into two stages: symbol recognition and high-level recognition. Symbol recognition has been studied extensively in the literature. In this work we assume the existence of a symbol recognizer and use a graph grammar to assemble the diagram's information content from the symbols and their spatial relationships. The Build-Weed-Incorporate approach is demonstrated by a detailed discussion of a graph grammar for high-level recognition of music notation. See Appendix A for an illustration of the terms for musical symbols used in this paper.  相似文献   

14.
Toward a lexicalized grammar for interlinguas   总被引:1,自引:0,他引:1  
In this paper we present one aspect of our research on machine translation (MT): capturing the grammatical and computational relation between (i) the interlingua (IL) as defined declaratively in the lexicon and (ii) the IL as defined procedurally by way of algorithms that compose and decompose pivot IL forms. We begin by examining the interlinguas in the lexicons of a variety of current IL-based approaches to MT. This brief survey makes it clear that no consensus exists among MT researchers on the level of representation for defining the IL. In the section that follows, we explore the consequences of this missing formal framework for MT system builders who develop their own lexical-IL entries. The lack of software tools to support rapid IL respecification and testing greatly hampers their ability to modify representations to handle new data and new domains. Our view is that IL-based MT research needs both (a) the formal framework to specify possible IL grammars and (b) the software support tools to implement and test these grammars. With respect to (a), we propose adopting a lexicalized grammar approach, tapping research results from the study oftree grammars for natural language syntax. With respect to (b), we sketch the design and functional specifications for parts of ILustrate, the set of software tools that we need to implement and test the various IL formalisms that meet the requirements of a lexicalized grammar. In this way, we begin to address a basic issue in MT research, how to define and test an interlingua as a computational language — without building a full MT system for each possible IL formalism that might be proposed.  相似文献   

15.
Tree-adjoining grammars (TAG) have been proposed as a formalism for generation based on the intuition that the extended domain of syntactic locality that TAGs provide should aid in localizing semantic dependencies as well, in turn serving as an aid to generation from semantic representations. We demonstrate that this intuition can be made concrete by using the formalism of synchronous tree-adjoining grammars. The use of synchronous TAGs for generation provides solutions to several problems with previous approaches to TAG generation. Furthermore, the semantic monotonicity requirement previously advocated for generation grammars as a computational aid is seen to be an inherent property of synchronous TAGs.  相似文献   

16.
Parsers, whether constructed by hand or automatically via a parser generator tool, typically need to compute some useful semantic information in addition to the purely syntactic analysis of their input. Semantic actions may be added to parsing code by hand, or the parser generator may have its own syntax for annotating grammar rules with semantic actions. In this paper, we take a functional programming view of such actions. We use concepts from the semantics of mostly functional programming languages and adapt them to give meaning to the actions of the parser. Specifically, the semantics is inspired by the categorical semantics of lambda calculi and the use of premonoidal categories for the semantics of effects in programming languages. This framework is then applied to our leading example, the transformation of grammars to eliminate left recursion. The syntactic transformation of left-recursion elimination leads to a corresponding semantic transformation of the actions for the grammar. We prove the semantic transformation correct and relate it to continuation passing style, a widely studied transformation in lambda calculi and functional programming. As an idealization of the input language of parser generators, we define a call-by-value calculus with first-order functions and a type-and-effect system where the effects are given by sequences of grammar symbols. The account of left-recursion elimination is then extended to this calculus.  相似文献   

17.
This paper describes a parsing algorithm for Tree Adjoining Grammar (TAG) and its parallel implementation on the Connection Machine. TAG is a formalism for natural language that employs trees as the basic grammar structures. Parsing involves the application of two operations, called adjunction and substitution, to produce derived tree structures. Sequential parsing algorithms for TAGs run in time quadratic in the grammar size, which is impractical for the very large grammars currently being developed for natural language. This paper presents two parallel algorithms, one running in time nearly linear in the grammar size, and the other running in time logarithmic in the grammar size. Both parallel algorithms were implemented on a Connection Machine CM-2 and performance measurements were obtained for varying grammar sizes.This research was supported in part by NSF Grant BNS-9022010, by the ARO Center for Excellence in Artificial Intelligence, University of Pennsylvania, and by the Army High Performance Computing Research Center (AHPCRC), University of Minnesota.  相似文献   

18.
A recognizing syntactic groups (SG) grammar, which corresponds to the formalism of systems of syntactic groups, is considered. The grammar was developed to describe the syntax of sentences in the natural language. The general criteria of separating out syntactic groups from a sequence of words, composing a text, and partial rules of the SG-grammar for the Russian and Ukrainian languages are described. Translated from Kibernetika i Sistemnyi Analiz, No. 6, pp. 166–176, November–December, 1999.  相似文献   

19.
《Pattern recognition》1988,21(6):623-629
An edNLC-graph grammar, introduced by Janssens,(4) is a strong formalism for generating scene representations. This grammar generates directed node- and edge-labelled graphs, EDG-graphs. A method of construction of unambiguous string EDG-graph representation is briefly described. The characteristics of edNLC-graph grammar for syntactic pattern recognition allows us to construct the parsing algorithm. The deterministic top-down syntax analyzer is constructed for the subfamily of an edNLC-graph grammar, called an ETL/1-graph grammar. An ETL/1-graph grammar is parallel to a finite state string grammar. The notions introduced in the paper are useful for researches in less restricted edNLC-graph grammars, for example grammars analogical to context-free string grammars.  相似文献   

20.
This paper describes an improved version of TBL algorithm [Y. Sakakibara, Learning context-free grammars using tabular representations, Pattern Recognition 38(2005) 1372–1383; Y. Sakakibara, M. Kondo, GA-based learning of context-free grammars using tabular representations, in: Proceedings of 16th International Conference in Machine Learning (ICML-99), Morgan-Kaufmann, Los Altos, CA, 1999] for inference of context-free grammars in Chomsky Normal Form. The TBL algorithm is a novel approach to overcome the hardness of learning context-free grammars from examples without structural information available. The algorithm represents the grammars by parsing tables and thanks to this tabular representation the problem of grammar learning is reduced to the problem of partitioning the set of nonterminals. Genetic algorithm is used to solve NP-hard partitioning problem. In the improved version modified fitness function and new delete specialized operator is applied. Computer simulations have been performed to determine improved a tabular representation efficiency. The set of experiments has been divided into 2 groups: in the first one learning the unknown context-free grammar proceeds without any extra information about grammatical structure, in the second one learning is supported by a partial knowledge of the structure. In each of the performed experiments the influence of partition block size in an initial population and the size of population at grammar induction has been tested. The new version of TBL algorithm has been experimentally proved to be not so much vulnerable to block size and population size, and is able to find the solutions faster than standard one.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号