期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张瑞岭董韫美《计算机研究与发展》2000,37(1):30-36

文法推断研究如何从语言的有限实例,通过归纳推断获取语言的文法定义。文中提出一个基于逐步求精的上下文无关文法推断方法,以尝试将文法推断用于替代或帮助传统手工的文法构造工作。文中的推断方法以Ａｎｇｌｕｉｎｈ的交互式学习模型为框架,以逐步求精和复用为主要策略,具有增量式获取结构自然的文法的特点。相似文献

2.

Learning context-free grammar using improved tabular representation

Olgierd Unold Marcin Jaworski 《Applied Soft Computing》2010,10(1):44-52

This paper describes an improved version of TBL algorithm [Y. Sakakibara, Learning context-free grammars using tabular representations, Pattern Recognition 38(2005) 1372–1383; Y. Sakakibara, M. Kondo, GA-based learning of context-free grammars using tabular representations, in: Proceedings of 16th International Conference in Machine Learning (ICML-99), Morgan-Kaufmann, Los Altos, CA, 1999] for inference of context-free grammars in Chomsky Normal Form. The TBL algorithm is a novel approach to overcome the hardness of learning context-free grammars from examples without structural information available. The algorithm represents the grammars by parsing tables and thanks to this tabular representation the problem of grammar learning is reduced to the problem of partitioning the set of nonterminals. Genetic algorithm is used to solve NP-hard partitioning problem. In the improved version modified fitness function and new delete specialized operator is applied. Computer simulations have been performed to determine improved a tabular representation efficiency. The set of experiments has been divided into 2 groups: in the first one learning the unknown context-free grammar proceeds without any extra information about grammatical structure, in the second one learning is supported by a partial knowledge of the structure. In each of the performed experiments the influence of partition block size in an initial population and the size of population at grammar induction has been tested. The new version of TBL algorithm has been experimentally proved to be not so much vulnerable to block size and population size, and is able to find the solutions faster than standard one. 相似文献

3.

An interactive method for extracting grammar from programs

Rahul Jain Sanjeev Kumar Aggarwal Pankaj Jalote Shiladitya Biswas 《Software》2004,34(5):433-447

The grammar of the language in which some given code is written is essential for developing automated tools for maintenance, reengineering, and program analysis. Frequently grammar is available for a language but not for its variants that are implemented by various vendors and in which the given code may be written. In this work we address the problem of obtaining the grammar from source code, which can then be used for generating tools for the programs. We propose an incremental method for obtaining grammar for a particular language variant, from a set of programs written in the language variant and an approximate grammar (presumably of the standard language) with some user interaction. We also present the design of a tool for implementing this approach and our experience in working with grammars of C, C++ and COBOL. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

4.

A memetic grammar inference algorithm for language learning

Dejan Hrn?i? Marjan Mernik Barrett R. BryantFaizan Javed 《Applied Soft Computing》2012,12(3):1006-1020

An unsupervised incremental algorithm for grammar inference and its application to domain-specific language development are described. Grammatical inference is the process of learning a grammar from the set of positive and optionally negative sentences. Learning general context-free grammars is still considered a hard problem in machine learning and is not completely solved yet. The main contribution of the paper is a newly developed memetic algorithm, which is a population-based evolutionary algorithm enhanced with local search and a generalization process. The learning process is incremental since a new grammar is obtained from the current grammar and false negative samples, which are not parsed by the current grammar. Despite being incremental, the learning process is not sensitive to the order of samples. All important parts of this algorithm are explained and discussed. Finally, a case study of a domain specific language for rendering graphical objects is used to show the applicability of this approach. 相似文献

5.

Grammatical Inference: Introduction and Survey-Part II

Fu KS Booth TL 《IEEE transactions on pattern analysis and machine intelligence》1986,(3):360-375

Inference of high-dimensional grammars is discussed. Specifically, techniques for inferring tree grammars are briefly presented. The problem of inferring a stochastic grammar to model the behavior of an information source is also introduced and techniques for carrying out the inference process are presented for a class of stochastic finite-state and context-free grammars. The possible practical application of these methods is illustrated by examples. 相似文献

6.

Semi‐automatic grammar recovery

R. Lmmel C. Verhoef 《Software》2001,31(15):1395-1438

We propose an approach to the construction of grammars for existing languages. The main characteristic of the approach is that the grammars are not constructed from scratch but they are rather recovered by extracting them from language references, compilers and other artifacts. We provide a structured process to recover grammars including the adaptation of raw extracted grammars and the derivation of parsers. The process is applicable to possibly all existing languages for which business critical applications exist. We illustrate the approach with a non‐trivial case study. Using our process and some basic tools, we constructed in a few weeks a complete and correct VS COBOL II grammar specification for IBM mainframes. In addition, we constructed a parser for VS COBOL II, and were the first to publish a (Web‐enabled) grammar specification so that others can use this result to construct their own grammar‐based tools for VS COBOL II or derivatives. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

7.

A survey of grammatical inference methods for natural language learning

Arianna D’Ulizia Fernando Ferri Patrizia Grifoni 《Artificial Intelligence Review》2011,36(1):1-27

The high complexity of natural language and the huge amount of human and temporal resources necessary for producing the grammars lead several researchers in the area of Natural Language Processing to investigate various solutions for automating grammar generation and updating processes. Many algorithms for Context-Free Grammar inference have been developed in the literature. This paper provides a survey of the methodologies for inferring context-free grammars from examples, developed by researchers in the last decade. After introducing some preliminary definitions and notations concerning learning and inductive inference, some of the most relevant existing grammatical inference methods for Natural Language are described and classified according to the kind of presentation (if text or informant) and the type of information (if supervised, unsupervised, or semi-supervised). Moreover, the state of the art of the strategies for evaluation and comparison of different grammar inference methods is presented. The goal of the paper is to provide a reader with introduction to major concepts and current approaches in Natural Language Learning research. 相似文献

8.

Recovering grammar relationships for the Java Language Specification

Ralf Lämmel Vadim Zaytsev 《Software Quality Journal》2011,19(2):333-378

Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent. 相似文献

9.

AN UNSUPERVISED INCREMENTAL LEARNING ALGORITHM FOR DOMAIN-SPECIFIC LANGUAGE DEVELOPMENT

Faizan Javed Marjan Mernik Alan Sprague 《Applied Artificial Intelligence》2013,27(7-8):707-729

While grammar inference (or grammar induction) has found extensive application in the areas of robotics, computational biology, and speech recognition, its application to problems in programming language and software engineering domains has been limited. We have found a new application area for grammar inference which intends to make domain-specific language development easier for domain experts not well versed in programming language design, and finds a second application in construction of renovation tools for legacy software systems. As a continuation of our previous efforts to infer context-free grammars (CFGs) for domain-specific languages which previously involved a genetic-programming based CFG inference system, we discuss extensions to the inference capabilities of GenInc, an incremental learning algorithm for inferring CFGs. We show that these extensions enable GenInc to infer more comprehensive grammars, discuss the results of applying GenInc to various domain-specific languages and evaluate the results using a comprehensive suite of grammar metrics. 相似文献

10.

Deterministic grammars and grammar morphisms

Dr. Johannes Reichardt 《Acta Informatica》1986,23(5):567-583

Summary The present categorial approach to bottom-up parsing context free grammars treats two aspects of determinism. One is an abstraction of grammatical determinism from actual parsing strategies. The other is the transfer of determinism under grammar transformations. The approach is based on the characterization of a parse step as categorial limit, which on the one hand yields a convenient pattern for grammar type definition, and leads on the other hand in a transparent way to invariance results on deterministic grammars under homomorphic transformations. 相似文献

11.

Evolutionary induction of stochastic context free grammars

Bill Keller Author Vitae Author Vitae 《Pattern recognition》2005,38(9):1393-1406

This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a fitness function incorporating a prior over the space of possible grammars. Our choice of prior is designed to bias learning towards structurally simpler grammars. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. Full details are given of our genetic algorithm (GA) and of our fitness function for grammars. We present the results of a number of experiments in learning grammars for a range of formal languages. Finally we compare the grammars induced using the GA-based approach with those found using the inside-outside algorithm. We find that our approach learns grammars that are both compact and fit the corpus data well. 相似文献

12.

Inductive inference from noisy examples using the hybrid finitestate filter

Gori M. Maggini M. Martinelli E. Soda G. 《Neural Networks, IEEE Transactions on》1998,9(3):571-575

Recurrent neural networks processing symbolic strings can be regarded as adaptive neural parsers. Given a set of positive and negative examples, picked up from a given language, adaptive neural parsers can effectively be trained to infer the language grammar. In this paper we use adaptive neural parsers to face the problem of inferring grammars from examples that are corrupted by a kind of noise that simply changes their membership. We propose a training algorithm, referred to as hybrid finite state filter, which is based on a parsimony principle that penalizes the development of complex rules. We report very promising experimental results showing that the proposed inductive inference scheme is indeed capable of capturing rules, while removing noise. 相似文献

13.

Grammar‐based test generation with YouGen

Daniel Malcolm Hoffman David Ly‐Gagnon Paul Strooper Hong‐Yi Wang 《Software》2011,41(4):427-447

Grammars are traditionally used to recognize or parse sentences in a language, but they can also be used to generate sentences. In grammar‐based test generation (GBTG), context‐free grammars are used to generate sentences that are interpreted as test cases. A generator reads a grammar G and generates L(G), the language accepted by the grammar. Often L(G) is so large that it is not practical to execute all of the generated cases. Therefore, GBTG tools support ‘tags’: extra‐grammatical annotations which restrict the generation. Since its introduction in the early 1970s, GBTG has become well established: proven on industrial projects and widely published in academic venues. Despite the demonstrated effectiveness, the tool support is uneven; some tools target specific domains, e.g. compiler testing, while others are proprietary. The tools can be difficult to use and the precise meaning of the tags are sometimes unclear. As a result, while many testing practitioners and researchers are aware of GBTG, few have detailed knowledge or experience. We present YouGen, a new GBTG tool supporting many of the tags provided by previous tools. In addition, YouGen incorporates covering‐array tags, which support a generalized form of pairwise testing. These tags add considerable power to GBTG tools and have been available only in limited form in previous GBTG tools. We provide semantics for the YouGen tags using parse trees and a new construct, generation trees. We illustrate YouGen with both simple examples and a number of industrial case studies. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

14.

A pure embedding of attribute grammars

Anthony M. Sloane Lennart C.L. Kats Eelco Visser 《Science of Computer Programming》2013

Attribute grammars are a powerful specification paradigm for many language processing tasks, particularly semantic analysis of programming languages. Recent attribute grammar systems use dynamic scheduling algorithms to evaluate attributes on demand. In this paper, we show how to remove the need for a generator, by embedding a dynamic approach in a modern, object-oriented and functional programming language. The result is a small, lightweight attribute grammar library that is part of our larger Kiama language processing library. Kiama’s attribute grammar library supports a range of advanced features including cached, uncached, higher order, parameterised and circular attributes. Forwarding is available to modularise higher order attributes and decorators abstract away from the details of attribute value propagation. Kiama also implements new techniques for dynamic extension and variation of attribute equations. We use the Scala programming language because of its support for domain-specific notations and emphasis on scalability. Unlike generators with specialised notation, Kiama attribute grammars use standard Scala notations such as pattern-matching functions for equations, traits and mixins for composition and implicit parameters for forwarding. A benchmarking exercise shows that our approach is practical for realistic language processing. 相似文献

15.

一个上下文无关文法的构造定理及其在推断中的应用

张雪峰许丽娟罗洪霞党德玉《计算机与现代化》2008,(6):4-7

递归概念可以在句子中重复派生、循环出现。对这样的句子推断时,若为递归概念的每一个派生部分引进一个递归概念来描述,将推断出多个与之有相似的产生式结构的递归概念,同时也构造出一个新文法。本文先给出新文法的形式化构造方法,证明了新文法与原文法的等价性。在文章的后部,通过实例,介绍该定理在简化复杂文法推断中的应用。相似文献

16.

Approximation of fuzzy context-free grammars

Yongbing Wang Yongming Li 《Information Sciences》2009,179(22):3920-3929

Fuzzy context-free max- grammar (or FCFG, for short), as a straightforward extension of context-free grammar, has been introduced to express uncertainty, imprecision, and vagueness in natural language fragments. Li recently proposed the approximation of fuzzy finite automata, which may effectively deal with the practical problems of fuzziness, impreciseness and vagueness. In this paper, we further develop the approximation of fuzzy context-free grammars. In particular, we show that a fuzzy context-free grammar under max- compositional inference can be approximated by some fuzzy context-free grammar under max-min compositional inference with any given accuracy. In addition, some related properties of fuzzy context-free grammars and fuzzy languages generated by them are studied. Finally, the sensitivity of fuzzy context-free grammars is also discussed. 相似文献

17.

The potential of Swetra — A multilanguage MT system

Bengt Sigurd Barbara Gawronska-Werngren 《Machine Translation》1988,3(2):237-250

One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts — newspaper reports — illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish. 相似文献

18.

The potential of swetra—A multilanguage MT system

Bengt Sigurd Barbara Gawronska-Werngren 《Machine Translation》1988,3(3-4):237-250

One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts—newspaper reports—illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish. 相似文献

19.

Agile Parsing in TXL

Thomas R. Dean James R. Cordy Andrew J. Malton Kevin A. Schneider 《Automated Software Engineering》2003,10(4):311-336

Syntactic analysis forms a foundation of many source analysis and reverse engineering tools. However, a single standard grammar is not always appropriate for all source analysis and manipulation tasks. Small custom modifications to the grammar can make the programs used to implement these tasks simpler, clearer and more efficient. This leads to a new paradigm for programming these tools: agile parsing. In agile parsing the effective grammar used by a particular tool is a combination of two parts: the standard base grammar for the input language, and a set of explicit grammar overrides that modify the parse to support the task at hand. This paper introduces the basic techniques of agile parsing in TXL and discusses several industry proven techniques for exploiting agile parsing in software source analysis and transformation. 相似文献

20.

取值于赋值幺半群的加权正则文法语言

赵菲李永明《计算机工程与科学》2016,38(7):1405-1412

正则文法是研究自动机的重要工具。引入取值于赋值幺半群的加权正则文法、加权类正则文法的定义,讨论了赋值幺半群上加权正则文法、加权类正则文法和加权有限自动机(WFA)的关系。证明了在赋值幺半群上,已知一个加权正则文法或加权类正则文法,分别存在一个WFA与之等价。定义了可分配的赋值幺半群,证明了在可分配的赋值幺半群上已知一个WFA,存在一个加权正则文法和加权类正则文法与之等价,即证明了可分配的赋值幺半群上加权正则文法、加权类正则文法和WFA在生成语言上等价,并举例说明了赋值幺半群的可分配性不是已知WFA存在与之等价的加权正则文法或加权类正则文法的必要条件。相似文献