首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于产生式集划分的上下文无关语言句子生成   总被引:2,自引:0,他引:2  
王泓皓  董韫美 《软件学报》2000,11(8):1030-1034
给出了上下文无关文法(context-free grammar,简称CFG)产生式集的一种划分方法,可将产 生式分为两类.使用一类产生式进行推导时,推导过程将无限进行下去;使用另一类进行推导 时,推导过程将迅速结束.证明了CFG句子生成过程一定是先使用一类产生式使生成的句型不 断变长、变复杂,再使用另一类产生式使句型变成句子.据此,提出了一种可控制的通用句子 生成方法.其生成一条句子的时间和空间复杂度是O(r+n),其中n是生成句子的长度或深度 限制  相似文献   

2.
A cost function is developed, based on information-theoretic concepts, that measures the complexity of a stochastic context-free grammar, as well as the discrepancy between its language and a given stochastic language sample. This function is used to guide a search procedure that finds simple grammars whose languages are good fits to a sample. Reasonable results have been obtained in a variety of cases, including parenthesis and addition strings, Basic English (the first 25 sentences in English Through Pictures) and chain-encoded chromosome boundaries.  相似文献   

3.
This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a fitness function incorporating a prior over the space of possible grammars. Our choice of prior is designed to bias learning towards structurally simpler grammars. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. Full details are given of our genetic algorithm (GA) and of our fitness function for grammars. We present the results of a number of experiments in learning grammars for a range of formal languages. Finally we compare the grammars induced using the GA-based approach with those found using the inside-outside algorithm. We find that our approach learns grammars that are both compact and fit the corpus data well.  相似文献   

4.
A cost function is developed, based on information-theoretic concepts, that measures the complexity of a stochastic context-free grammar, as well as the discrepancy between its language and a given stochastic language sample. This function is used to guide a search procedure that finds simple grammars whose languages are good fits to a sample. Reasonable results have been obtained in a variety of cases, including parenthesis and addition strings, Basic English (the first 25 sentences in English Through Pictures) and chain-encoded chromosome boundaries.  相似文献   

5.
Use of executable declarative metalanguages has simplified programming language syntax specification and implementation, whereas existing formalisms for static semantics are still relatively procedural. A working hypothesis is that the context-sensitivity of languages (under static semantic rules) is derived in significant part from the interleaved presences therein of sentences in implicitly-defined and effectively invisible context-free languages. Procedures by which these sentences and context-free grammars for their languages can be respectively derived from the original sentence and the combination of the original language's grammar and semantic rules, lead to the possibility of automatic generation of static semantic analysers from the purely context-free specifications of “Facet Grammars” (FG)!

We show that the utility of FG for static semantic analysis has a non-trivial lower bound, by specifying the relatively complicated identifier scope and accessibility rules for Dijkstra's Guarded Commands Language.  相似文献   


6.
Summary Simple LR(1) and lookahead LR(1) phrase structure grammars are defined and corresponding deterministic two-pushdown automata which parse all sentences are given. These grammars include a wide variety of grammars for non context-free languages. A given phrase structure grammar is one of these types if the parse table for the associated automaton has no multiple entries. A technique for construction of this parse table is given which in the lookahead case involves elimination of inverses in a grammar for lookahead strings for LR(0) items and computation of first sets for strings of symbols in the given grammar.  相似文献   

7.
Kaleidoscope's approach is presented in the context of seeking improvement in the usability of interactive structured query language (SQL) interfaces. The system's cooperation is summarized as proposing valid query constituents step-by-step and providing lexical and semantic feedback immediately to users. To implement this intraquery guidance, the context-free grammar (CFG) is extended to capture the constraints useful for intraquery guidance, and the knowledge useful for pruning nonsensical queries and providing semantic feedback is articulated. For the SQL interface, this knowledge includes a strong domain concept, functional dependency, and integrity constraint rules, which can be acquired once in the database design step. The same types of knowledge are useful both for postquery cooperation and intraquery guidance. As SQL is supported bv virtually all database management system (DBMS) vendors, the approach presents a practical solution for casual database access  相似文献   

8.
Abstract: This paper presents a simple connectionist approach to parsing of a subset of sentences in the Hindi language, using Rule based Connectionist Networks (RBCN) as suggested by Fu in 1993. The basic grammar rules representing Kernel Hindi sentences have been used to determine the initial topology of the RBCN. The RBCN is based on a multilayer perceptron, trained using the backpropagation algorithm. The terminal symbols defined in the language structure are mapped onto the input nodes, the non-terminals onto hidden nodes and the start symbol onto the single output node of the network structure. The training instances are sentences of arbitrary, but fixed maximum length and fixed word order. A neural network based recognizer is used to perform grammaticality determination and parse tree generation of a given sentence. The network is exposed to both positive and negative training instances, derived from a simple context-free-grammar (CFG), during the training phase. The trained network recognizes seen sentences (sentences present in the training set) with 98–100% accuracy. Since a neural net based recognizer is trainable in nature, it can be trained to recognize any other CFG, simply by changing the training set. This results in reducing programming effort involved in parser development, as compared to that of the conventional AI approach. The parsing time is also reduced to a great extent as compared to that of a conventional parser, as a result of the inherent parallelism exhibited by neural net architecture.  相似文献   

9.
This study reports on the requirements for developing computer-interpretable rules for checking the compliance of a building design in a request for proposal (RFP), especially in the building information modeling (BIM) environment. It focuses on RFPs for large public buildings (over 5 million dollars) in South Korea, which generally entail complex designs. A total of 27 RFPs for housing, office, exhibition, hospital, sports center, and courthouse projects were analyzed to develop computer-interpreted RFP rules. Each RFP was composed of over 1800 sentences. Of these, only three to 366 sentences could be translated into a computer-interpretable sentence. For further analysis, this study deployed context-free grammar (CFG) in natural language processing, and classified morphemes into four categories: i.e., object (noun), method (verb), strictness (modal), and others. The subcategorized morphemes included three types of objects, twenty-nine types of methods, and five levels of strictness. The coverage applicability of the derived objects and methods was checked and validated against three additional RFP cases and then through a test case using a newly developed model checker system. The findings are expected to be useful as a guideline and basic data for system developers in the development of a generalized automated design checking system for South Korea.  相似文献   

10.
While grammar inference (or grammar induction) has found extensive application in the areas of robotics, computational biology, and speech recognition, its application to problems in programming language and software engineering domains has been limited. We have found a new application area for grammar inference which intends to make domain-specific language development easier for domain experts not well versed in programming language design, and finds a second application in construction of renovation tools for legacy software systems. As a continuation of our previous efforts to infer context-free grammars (CFGs) for domain-specific languages which previously involved a genetic-programming based CFG inference system, we discuss extensions to the inference capabilities of GenInc, an incremental learning algorithm for inferring CFGs. We show that these extensions enable GenInc to infer more comprehensive grammars, discuss the results of applying GenInc to various domain-specific languages and evaluate the results using a comprehensive suite of grammar metrics.  相似文献   

11.
《国际计算机数学杂志》2012,89(3-4):159-180
We investigate context-free grammars the rules of which can be used in a productive and in a reductive fashion, while the application of these rules is controlled by a regular language. We distinguish several modes of derivation for this kind of grammar. The resulting language families (properly) extend the family of context-free languages. We establish some closure properties of these language families and some grammatical transformations which yield a few normal forms for this type of grammar. Finally, we consider some special cases (viz. the context-free grammar is linear or left-linear), and generalizations, in particular, the use of arbitrary rather than regular control languages.  相似文献   

12.
Long-range word order differences are a well-known problem for machine translation. Unlike the standard phrase-based models which work with sequential and local phrase reordering, the hierarchical phrase-based model (Hiero) embeds the reordering of phrases within pairs of lexicalized context-free rules. This allows the model to handle long range reordering recursively. However, the Hiero grammar works with a single nonterminal label, which means that the rules are combined together into derivations independently and without reference to context outside the rules themselves. Follow-up work explored remedies involving nonterminal labels obtained from monolingual parsers and taggers. As of yet, no labeling mechanisms exist for the many languages for which there are no good quality parsers or taggers. In this paper we contribute a novel approach for acquiring reordering labels for Hiero grammars directly from the word-aligned parallel training corpus, without use of any taggers or parsers. The new labels represent types of alignment patterns in which a phrase pair is embedded within larger phrase pairs. In order to obtain alignment patterns that generalize well, we propose to decompose word alignments into trees over phrase pairs. Beside this labeling approach, we contribute coarse and sparse features for learning soft, weighted label-substitution as opposed to standard substitution. We report extensive experiments comparing our model to two baselines: Hiero and the known syntax augmented machine translation (SAMT) variant, which labels Hiero rules with nonterminals extracted from monolingual syntactic parses. We also test a simplified labeling scheme based on inversion transduction grammar (ITG). For the Chinese–English task we obtain performance improvement up to 1 BLEU point, whereas for the German–English task, where morphology is an issue, a minor (but statistically significant) improvement of 0.2 BLEU points is reported over SAMT. While ITG labeling does give a performance improvement, it remains sometimes suboptimal relative to our proposed labeling scheme.  相似文献   

13.
14.
In this paper we present an investigation into whether and how decision procedures can be learnt and built automatically. Our approach consists of two stages. First, a refined brute-force search procedure applies exhaustively a set of given elementary methods to try to solve a corpus of conjectures generated by a stochastic context-free grammar. The successful proof traces are saved. In the second stage, a learning algorithm (by Jamnik et al.) tries to extract a required supermethod (i.e., decision procedure) from the given traces. In the paper, this technique is applied to elementary methods that encode the operations of the Fourier-Motzkin's decision procedure for Presburger arithmetic on rational numbers. The results of our experiment are encouraging.  相似文献   

15.
16.
17.
An increasing number of structural homology search tools, mostly based on profile stochastic context-free grammars (SCFGs) have been recently developed for the non-coding RNA gene identification. SCFGs can include statistical biases that often occur in RNA sequences, necessary to profile specific RNA structures for structural homology search. In this paper, a succinct stochastic grammar model is introduced for RNA that has competitive search effectiveness. More importantly, the profiling model can be easily extended to include pseudoknots, structures that are beyond the capability of profile SCFGs. In addition, the model allows heuristics to be exploited, resulting in a significant speed-up for the CYK algorithm-based search.  相似文献   

18.
The usual approach to numerical training in syntactic pattern recognition involves estimation of the production probabilities of stochastic context-free grammars. Here we describe a different approach, namely that of finding an LMSE discriminant hyperplane between sets of class samples in a space of ``structural indices' determined by a context-free grammar.  相似文献   

19.
A context-free grammar is said to be NTS if the set of sentential forms it generates is unchanged when the rules are used both ways. We prove that this class of grammars has a decidable equivalence problem. Then we show that one can decide whether a given c.f. grammar is NTS or not. We prove that the class of NTS grammars has an undecidable inclusion problem.  相似文献   

20.
编写SQL语句是测试数据库管理系统的一个重要部分。自动生成SQL语句可以有效减少测试人员的工作量,而目前没有直接生成SQL语句的自动化工具。通过模拟产生式的直接推导过程,根据SQL文法,给出生成符合该文法的SQL语句,用作测试用例的方法;研究从表示文法的BNF文件生成SQL测试用例集合的自动化过程。这个过程包括几个阶段:将SQL文法的每一个非终结符转换成一个对应的解析函数,所有解析函数的集合构成规则库;遍历文法的产生式自动生成SQL测试用例;使用权值数组结合随机数,加大生成测试用例的灵活性;使用非终结符的最大调用次数来终止SQL测试用例的生成。通过介绍的工具原型,可以得到符合SQL语法的SQL测试用例。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号