首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
逐步求精法获取上下文无关文法   总被引:3,自引:0,他引:3  
文法推断研究如何从语言的有限实例,通过归纳推断获取语言的文法定义。文中提出一个基于逐步求精的上下文无关文法推断方法,以尝试将文法推断用于替代或帮助传统手工的文法构造工作。文中的推断方法以Angluinh的交互式学习模型为框架,以逐步求精和复用为主要策略,具有增量式获取结构自然的文法的特点。  相似文献   

2.
This paper describes an improved version of TBL algorithm [Y. Sakakibara, Learning context-free grammars using tabular representations, Pattern Recognition 38(2005) 1372–1383; Y. Sakakibara, M. Kondo, GA-based learning of context-free grammars using tabular representations, in: Proceedings of 16th International Conference in Machine Learning (ICML-99), Morgan-Kaufmann, Los Altos, CA, 1999] for inference of context-free grammars in Chomsky Normal Form. The TBL algorithm is a novel approach to overcome the hardness of learning context-free grammars from examples without structural information available. The algorithm represents the grammars by parsing tables and thanks to this tabular representation the problem of grammar learning is reduced to the problem of partitioning the set of nonterminals. Genetic algorithm is used to solve NP-hard partitioning problem. In the improved version modified fitness function and new delete specialized operator is applied. Computer simulations have been performed to determine improved a tabular representation efficiency. The set of experiments has been divided into 2 groups: in the first one learning the unknown context-free grammar proceeds without any extra information about grammatical structure, in the second one learning is supported by a partial knowledge of the structure. In each of the performed experiments the influence of partition block size in an initial population and the size of population at grammar induction has been tested. The new version of TBL algorithm has been experimentally proved to be not so much vulnerable to block size and population size, and is able to find the solutions faster than standard one.  相似文献   

3.
The grammar of the language in which some given code is written is essential for developing automated tools for maintenance, reengineering, and program analysis. Frequently grammar is available for a language but not for its variants that are implemented by various vendors and in which the given code may be written. In this work we address the problem of obtaining the grammar from source code, which can then be used for generating tools for the programs. We propose an incremental method for obtaining grammar for a particular language variant, from a set of programs written in the language variant and an approximate grammar (presumably of the standard language) with some user interaction. We also present the design of a tool for implementing this approach and our experience in working with grammars of C, C++ and COBOL. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

4.
An unsupervised incremental algorithm for grammar inference and its application to domain-specific language development are described. Grammatical inference is the process of learning a grammar from the set of positive and optionally negative sentences. Learning general context-free grammars is still considered a hard problem in machine learning and is not completely solved yet. The main contribution of the paper is a newly developed memetic algorithm, which is a population-based evolutionary algorithm enhanced with local search and a generalization process. The learning process is incremental since a new grammar is obtained from the current grammar and false negative samples, which are not parsed by the current grammar. Despite being incremental, the learning process is not sensitive to the order of samples. All important parts of this algorithm are explained and discussed. Finally, a case study of a domain specific language for rendering graphical objects is used to show the applicability of this approach.  相似文献   

5.
Inference of high-dimensional grammars is discussed. Specifically, techniques for inferring tree grammars are briefly presented. The problem of inferring a stochastic grammar to model the behavior of an information source is also introduced and techniques for carrying out the inference process are presented for a class of stochastic finite-state and context-free grammars. The possible practical application of these methods is illustrated by examples.  相似文献   

6.
R. Lmmel  C. Verhoef 《Software》2001,31(15):1395-1438
We propose an approach to the construction of grammars for existing languages. The main characteristic of the approach is that the grammars are not constructed from scratch but they are rather recovered by extracting them from language references, compilers and other artifacts. We provide a structured process to recover grammars including the adaptation of raw extracted grammars and the derivation of parsers. The process is applicable to possibly all existing languages for which business critical applications exist. We illustrate the approach with a non‐trivial case study. Using our process and some basic tools, we constructed in a few weeks a complete and correct VS COBOL II grammar specification for IBM mainframes. In addition, we constructed a parser for VS COBOL II, and were the first to publish a (Web‐enabled) grammar specification so that others can use this result to construct their own grammar‐based tools for VS COBOL II or derivatives. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

7.
The high complexity of natural language and the huge amount of human and temporal resources necessary for producing the grammars lead several researchers in the area of Natural Language Processing to investigate various solutions for automating grammar generation and updating processes. Many algorithms for Context-Free Grammar inference have been developed in the literature. This paper provides a survey of the methodologies for inferring context-free grammars from examples, developed by researchers in the last decade. After introducing some preliminary definitions and notations concerning learning and inductive inference, some of the most relevant existing grammatical inference methods for Natural Language are described and classified according to the kind of presentation (if text or informant) and the type of information (if supervised, unsupervised, or semi-supervised). Moreover, the state of the art of the strategies for evaluation and comparison of different grammar inference methods is presented. The goal of the paper is to provide a reader with introduction to major concepts and current approaches in Natural Language Learning research.  相似文献   

8.
Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent.  相似文献   

9.
While grammar inference (or grammar induction) has found extensive application in the areas of robotics, computational biology, and speech recognition, its application to problems in programming language and software engineering domains has been limited. We have found a new application area for grammar inference which intends to make domain-specific language development easier for domain experts not well versed in programming language design, and finds a second application in construction of renovation tools for legacy software systems. As a continuation of our previous efforts to infer context-free grammars (CFGs) for domain-specific languages which previously involved a genetic-programming based CFG inference system, we discuss extensions to the inference capabilities of GenInc, an incremental learning algorithm for inferring CFGs. We show that these extensions enable GenInc to infer more comprehensive grammars, discuss the results of applying GenInc to various domain-specific languages and evaluate the results using a comprehensive suite of grammar metrics.  相似文献   

10.
Summary The present categorial approach to bottom-up parsing context free grammars treats two aspects of determinism. One is an abstraction of grammatical determinism from actual parsing strategies. The other is the transfer of determinism under grammar transformations. The approach is based on the characterization of a parse step as categorial limit, which on the one hand yields a convenient pattern for grammar type definition, and leads on the other hand in a transparent way to invariance results on deterministic grammars under homomorphic transformations.  相似文献   

11.
This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a fitness function incorporating a prior over the space of possible grammars. Our choice of prior is designed to bias learning towards structurally simpler grammars. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. Full details are given of our genetic algorithm (GA) and of our fitness function for grammars. We present the results of a number of experiments in learning grammars for a range of formal languages. Finally we compare the grammars induced using the GA-based approach with those found using the inside-outside algorithm. We find that our approach learns grammars that are both compact and fit the corpus data well.  相似文献   

12.
Recurrent neural networks processing symbolic strings can be regarded as adaptive neural parsers. Given a set of positive and negative examples, picked up from a given language, adaptive neural parsers can effectively be trained to infer the language grammar. In this paper we use adaptive neural parsers to face the problem of inferring grammars from examples that are corrupted by a kind of noise that simply changes their membership. We propose a training algorithm, referred to as hybrid finite state filter, which is based on a parsimony principle that penalizes the development of complex rules. We report very promising experimental results showing that the proposed inductive inference scheme is indeed capable of capturing rules, while removing noise.  相似文献   

13.
Grammars are traditionally used to recognize or parse sentences in a language, but they can also be used to generate sentences. In grammar‐based test generation (GBTG), context‐free grammars are used to generate sentences that are interpreted as test cases. A generator reads a grammar G and generates L(G), the language accepted by the grammar. Often L(G) is so large that it is not practical to execute all of the generated cases. Therefore, GBTG tools support ‘tags’: extra‐grammatical annotations which restrict the generation. Since its introduction in the early 1970s, GBTG has become well established: proven on industrial projects and widely published in academic venues. Despite the demonstrated effectiveness, the tool support is uneven; some tools target specific domains, e.g. compiler testing, while others are proprietary. The tools can be difficult to use and the precise meaning of the tags are sometimes unclear. As a result, while many testing practitioners and researchers are aware of GBTG, few have detailed knowledge or experience. We present YouGen, a new GBTG tool supporting many of the tags provided by previous tools. In addition, YouGen incorporates covering‐array tags, which support a generalized form of pairwise testing. These tags add considerable power to GBTG tools and have been available only in limited form in previous GBTG tools. We provide semantics for the YouGen tags using parse trees and a new construct, generation trees. We illustrate YouGen with both simple examples and a number of industrial case studies. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

14.
Attribute grammars are a powerful specification paradigm for many language processing tasks, particularly semantic analysis of programming languages. Recent attribute grammar systems use dynamic scheduling algorithms to evaluate attributes on demand. In this paper, we show how to remove the need for a generator, by embedding a dynamic approach in a modern, object-oriented and functional programming language. The result is a small, lightweight attribute grammar library that is part of our larger Kiama language processing library. Kiama’s attribute grammar library supports a range of advanced features including cached, uncached, higher order, parameterised and circular attributes. Forwarding is available to modularise higher order attributes and decorators abstract away from the details of attribute value propagation. Kiama also implements new techniques for dynamic extension and variation of attribute equations. We use the Scala programming language because of its support for domain-specific notations and emphasis on scalability. Unlike generators with specialised notation, Kiama attribute grammars use standard Scala notations such as pattern-matching functions for equations, traits and mixins for composition and implicit parameters for forwarding. A benchmarking exercise shows that our approach is practical for realistic language processing.  相似文献   

15.
递归概念可以在句子中重复派生、循环出现。对这样的句子推断时,若为递归概念的每一个派生部分引进一个递归概念来描述,将推断出多个与之有相似的产生式结构的递归概念,同时也构造出一个新文法。本文先给出新文法的形式化构造方法,证明了新文法与原文法的等价性。在文章的后部,通过实例,介绍该定理在简化复杂文法推断中的应用。  相似文献   

16.
Fuzzy context-free max- grammar (or FCFG, for short), as a straightforward extension of context-free grammar, has been introduced to express uncertainty, imprecision, and vagueness in natural language fragments. Li recently proposed the approximation of fuzzy finite automata, which may effectively deal with the practical problems of fuzziness, impreciseness and vagueness. In this paper, we further develop the approximation of fuzzy context-free grammars. In particular, we show that a fuzzy context-free grammar under max- compositional inference can be approximated by some fuzzy context-free grammar under max-min compositional inference with any given accuracy. In addition, some related properties of fuzzy context-free grammars and fuzzy languages generated by them are studied. Finally, the sensitivity of fuzzy context-free grammars is also discussed.  相似文献   

17.
One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts — newspaper reports — illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish.  相似文献   

18.
One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts—newspaper reports—illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish.  相似文献   

19.
Syntactic analysis forms a foundation of many source analysis and reverse engineering tools. However, a single standard grammar is not always appropriate for all source analysis and manipulation tasks. Small custom modifications to the grammar can make the programs used to implement these tasks simpler, clearer and more efficient. This leads to a new paradigm for programming these tools: agile parsing. In agile parsing the effective grammar used by a particular tool is a combination of two parts: the standard base grammar for the input language, and a set of explicit grammar overrides that modify the parse to support the task at hand. This paper introduces the basic techniques of agile parsing in TXL and discusses several industry proven techniques for exploiting agile parsing in software source analysis and transformation.  相似文献   

20.
正则文法是研究自动机的重要工具。引入取值于赋值幺半群的加权正则文法、加权类正则文法的定义,讨论了赋值幺半群上加权正则文法、加权类正则文法和加权有限自动机(WFA)的关系。证明了在赋值幺半群上,已知一个加权正则文法或加权类正则文法,分别存在一个WFA与之等价。定义了可分配的赋值幺半群,证明了在可分配的赋值幺半群上已知一个WFA,存在一个加权正则文法和加权类正则文法与之等价,即证明了可分配的赋值幺半群上加权正则文法、加权类正则文法和WFA在生成语言上等价,并举例说明了赋值幺半群的可分配性不是已知WFA存在与之等价的加权正则文法或加权类正则文法的必要条件。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号