期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Probabilistic context-free grammars estimated from infinite distributions

Corazza A Satta G 《IEEE transactions on pattern analysis and machine intelligence》2007,29(8):1379-1393

In this paper, we consider probabilistic context-free grammars, a class of generative devices that has been successfully exploited in several applications of syntactic pattern matching, especially in statistical natural language parsing. We investigate the problem of training probabilistic context-free grammars on the basis of distributions defined over an infinite set of trees or an infinite set of sentences by minimizing the cross-entropy. This problem has applications in cases of context-free approximation of distributions generated by more expressive statistical models. We show several interesting theoretical properties of probabilistic context-free grammars that are estimated in this way, including the previously unknown equivalence between the grammar cross-entropy with the input distribution and the so-called derivational entropy of the grammar itself. We discuss important consequences of these results involving the standard application of the maximum-likelihood estimator on finite tree and sentence samples, as well as other finite-state models such as hidden Markov models and probabilistic finite automata. 相似文献

2.

On parsing arithmetic expressions in a multiprocessing environment

François Baccelli Thierry Fleury 《Acta Informatica》1982,17(3):287-310

Summary Some decomposition of the parsing of the sentences of context-free grammars into sequences of independant sub-tasks is proposed. An example of grammar is presented, for which this decomposition provides an efficient parser for a multiprocessing environment. The average speed-up resulting from the parallelization of the parser of an arithmetic infix grammar is evaluated by means of probabilistic models and real world measurements. 相似文献

3.

基于概率上下文无关文法的句法分析歧义消解新模式 总被引：2，自引：1，他引：2

陈宣孔骏陈玉泉陆汝占《计算机工程》2002,28(2):126-128

基于自然语言句法歧义消解常用的一种概率模型-概率上下文无关文法，融入上下文相关的概率信息，提出一种新的歧义消解计算模式，该模式经测试可以有效地提高句法分析中歧义消解的正确率。相似文献

4.

Latent Variable Model for Estimation of Distribution Algorithm Based on a Probabilistic Context-Free Grammar

《Evolutionary Computation, IEEE Transactions on》2009,13(4):858-878

Estimation of distribution algorithms are evolutionary algorithms using probabilistic techniques instead of traditional genetic operators. Recently, the application of probabilistic techniques to program and function evolution has received increasing attention, and this approach promises to provide a strong alternative to the traditional genetic programming techniques. Although a probabilistic context-free grammar (PCFG) is a widely used model for probabilistic program evolution, a conventional PCFG is not suitable for estimating interactions among nodes because of the context freedom assumption. In this paper, we have proposed a new evolutionary algorithm named programming with annotated grammar estimation based on a PCFG with latent annotations, which allows this context freedom assumption to be weakened. By applying the proposed algorithm to several computational problems, it is demonstrated that our approach is markedly more effective at estimating building blocks than prior approaches. 相似文献

5.

Approximation of fuzzy context-free grammars

Yongbing Wang Yongming Li 《Information Sciences》2009,179(22):3920-3929

Fuzzy context-free max- grammar (or FCFG, for short), as a straightforward extension of context-free grammar, has been introduced to express uncertainty, imprecision, and vagueness in natural language fragments. Li recently proposed the approximation of fuzzy finite automata, which may effectively deal with the practical problems of fuzziness, impreciseness and vagueness. In this paper, we further develop the approximation of fuzzy context-free grammars. In particular, we show that a fuzzy context-free grammar under max- compositional inference can be approximated by some fuzzy context-free grammar under max-min compositional inference with any given accuracy. In addition, some related properties of fuzzy context-free grammars and fuzzy languages generated by them are studied. Finally, the sensitivity of fuzzy context-free grammars is also discussed. 相似文献

6.

层级分类概率句法分析 总被引：3，自引：0，他引：3

代印唐吴承荣马胜祥钟亦平《软件学报》2011,22(2):245-257

对已有的句法分析中引入知识的方法进行了归纳分析,认为多种句法分析方法都可被看作是基于特征标记的分类,然后分析了其中的欠分类和过分类问题.在此基础上,提出一种层级分类短语结构文法和一种层级分类概率句法分析方法(hierarchically classified probabilistic context-free grammar),并设计了一种通过对实例进行聚类来消除句法规则的分类歧义方法.还进一步将层级分类扩展到概率上下文相关句法分析方法,利用上下文相关性的层级分类来解决引入上下文相关时的数据稀疏性问题.通过上述一系列方法有效地克服了过分类与前分类之间的矛盾. 相似文献

7.

YACC技术在网络入侵检测中的应用

黄金钟朱淼良《计算机应用与软件》2011,(10)

介绍一种利用YACC(Yet Another Compiler-Compiler)技术实现检测网络服务器程序异常行为的新方法。该方法用一种携带语义标注的上下文无关文法描述服务器程序正常行为模式,利用YACC自动生成的语法分析器构成异常检测引擎,并利用YACC提供的错误处理和语义处理接口对异常现场进行分析。实验结果表明,该方法不仅能有效检测各种利用服务器程序漏洞进行的缓冲区溢出、堆内存破环等入侵方式,而且能实时地对异常行为进行分析追踪并向安全管理人员提供入侵相关详细信息,而这种能力正是目前同类方法所缺乏的。相似文献

8.

A Two-Stage BNF Specification of Natural Language

Naomi Sager 《控制论与系统》2013,44(3):39-50

Abstract

This paper presents a method of using BNF to specify natural language in such a way that a relatively small grammar of English can express the major grammatical constraints of the language and can be refined without undue proliferation of the rules. The results show that the departures of natural language from a context-free language are of a very restricted kind. The analysis obtained for sentences of the scientific literature is relevant for information processing 相似文献

9.

Real-word error correction with trigrams: correcting multiple errors in a sentence

Seyed MohammadSadegh Dashti 《Language Resources and Evaluation》2018,52(2):485-502

Spelling correction is a fundamental task in text mining. In this study, we assess the real-word error correction model proposed by Mays, Damerau and Mercer and describe several drawbacks of the model. We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence, by manipulating a probabilistic context-free grammar to discriminate between items in the search space. We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky’s WordNet-based method and Wilcox-O’Hearn, Hirst, and Budanitsky’s fixed windows size method. 相似文献

10.

Learning context-free grammars using tabular representations

Yasubumi Sakakibara^{Author Vitae} 《Pattern recognition》2005,38(9):1372-1383

We present a novel algorithm using new hypothesis representations for learning context-free grammars from a finite set of positive and negative examples. We propose an efficient hypothesis representation method which consists of a table-like data structure similar to the parse table used in efficient parsing algorithms for context-free grammars such as Cocke-Younger-Kasami algorithm. By employing this representation method, the problem of learning context-free grammars from examples can be reduced to the problem of partitioning the set of nonterminals. We use genetic algorithms for solving this partitioning problem. Further, we incorporate partially structured examples to improve the efficiency of our learning algorithm, where a structured example is represented by a string with some parentheses inserted to indicate the shape of the derivation tree of the unknown grammar. We demonstrate some experimental results using these algorithms and theoretically analyse the completeness of the search space using the tabular method for context-free grammars. 相似文献

11.

基于二元组合文法的歧义消解模型

下载免费PDF全文

张燕万建成杨潇《计算机工程与科学》2008,30(9):73-76

本文论述了一种基于二元组合文法的汉语句法结构分析的消除歧义方法。首先给出了二元组合文法的基本概念以及基本思想,然后研究了概率上下文无关文法独立性假设的限性,并针对局限性引入了基于二元组合文法的上下文相关的概率信息,同时提出了一种新的计算分值模式。实验结果证明,这种方法对句法分析过程中的歧义消解是有效的。相似文献

12.

Context-free like restrictions on selective rewriting

H.C.M. Kleijn G. Rozenberg 《Theoretical computer science》1981,16(3):237-269

Selective substitution grammars based on ‘context-free’ productions form a possible framework for the study of ‘grammatically oriented’ formal language theory. Such grammars (with no control governing the composition of derivation steps) are studied in this paper. In particular we study the effect of various conditions on selectors (which define the way that rewriting is performed); those conditions are aimed to formalize the notion of ‘using information about the context’ during the rewriting process. Each of them captures a particular feature of a rewriting according to a context-free grammar or an EOS system (essentially a context-free grammar that can also rewrite terminal symbols). Some of those conditions yield characterizations of the class of context-free languages for other conditions the lower and upper bound on the language generating power are given. Also a natural notion of a class of ‘simple’ rewriting systems is introduced (pattern grammars) and it is demonstrated that they possess surprisingly high language generating power. 相似文献

13.

Analyzing ambiguity of context-free grammars

Claus Brabrand Anders Møller 《Science of Computer Programming》2010,75(3):176-191

It has been known since 1962 that the ambiguity problem for context-free grammars is undecidable. Ambiguity in context-free grammars is a recurring problem in language design and parser generation, as well as in applications where grammars are used as models of real-world physical structures.We observe that there is a simple linguistic characterization of the grammar ambiguity problem, and we show how to exploit this by presenting an ambiguity analysis framework based on conservative language approximations. As a concrete example, we propose a technique based on local regular approximations and grammar unfoldings. We evaluate the analysis using grammars that occur in RNA analysis in bioinformatics, and we demonstrate that it is sufficiently precise and efficient to be practically useful. 相似文献

14.

Variable-Length Unit Selection in TTS Using Structural Syntactic Cost

Wu C.-H. Hsia C.-C. Chen J.-F. Wang J.-F. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1227-1235

This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference between target and candidate units (words or phrases) is estimated by the cosine measure with the inside probability of PCFG acting as a weight. Latent semantic analysis (LSA) is applied to reduce the dimensionality of the syntactic vectors. The dynamic programming algorithm is adopted to obtain a concatenated unit sequence with minimum cost. A syntactic property-rich speech database is designed and collected as the unit inventory. Several experiments with statistical testing are conducted to assess the quality of the synthetic speech as perceived by human subjects. The proposed method outperforms the synthesizer without considering syntactic property. The structural syntax estimates the substitution cost better than the acoustic features alone 相似文献

15.

An LR parsing technique for extended context-free grammars

Augusto Celentano 《Computer Languages, Systems and Structures》1981,6(2):95-107

Extended context-free grammars allow regular expressions to appear in productions right hand sides, and are a clear and natural way to describe the syntax of programming languages.In this paper an LR parsing technique for extended context-free grammars is presented, which is based on an underlying transformation of the grammar into an equivalent context-free one.The technique is suitable for inclusion in one-pass compilers: the implementation requires little extensions to the algorithms working for normal LR grammars. Besides describing the parsing method, the paper shows also the algorithms for deriving the parsing tables; tables optimization is also discussed. Finally, this technique is compared with other proposals appeared in the literature. 相似文献

16.

THE COMPUTATIONAL COMPLEXITY OF THE UNIVERSAL RECOGNITION PROBLEM FOR PARALLEL MULTIPLE CONTEXT-FREE GRAMMARS

Yuichi Kaji Ryuichi Nakanishi Hiroyuki Seki Tadao Kasami 《Computational Intelligence》1994,10(4):440-452

A number of grammatical formalisms have been proposed to describe the syntax of natural languages, and the universal recognition problems for some of those classes of grammars have been studied. A universal recognition problem for a class Q of grammars is the one to decide, taking a grammar G ∈ G and a string ui as an input, whether G can generate w or not. In this paper, the computational complexities of the universal recognition problems for parallel multiple context-free grammars, multiple context-free grammars, and their subclasses are discussed. 相似文献

17.

Context-free languages over infinite alphabets

Edward Y.C. Cheng Michael Kaminski 《Acta Informatica》1998,35(3):245-267

In this paper we introduce context-free grammars and pushdown automata over infinite alphabets. It is shown that a language is generated by a context-free grammar over an infinite alphabet if and only if it is accepted by a pushdown automaton over an infinite alphabet. Also the generated (accepted) languages possess many of the properties of the ordinary context-free languages: decidability, closure properties, etc.. This provides a substantial evidence for considering context-free grammars and pushdown automata over infinite alphabets as a natural extension of the classical ones. Received November 27, 1995 / March 4, 1997 相似文献

18.

Characteristic Sets for Polynomial Grammatical Inference

de la Higuera Colin 《Machine Learning》1997,27(2):125-138

When concerned about efficient grammatical inference two issues are relevant: the first one is to determine the quality of the result, and the second is to try to use polynomial time and space. A typical idea to deal with the first point is to say that an algorithm performs well if it infers in the limit the correct language. The second point has led to debate about how to define polynomial time: the main definitions of polynomial inference have been proposed by Pitt and Angluin. We return in this paper to a definition proposed by Gold that requires a characteristic set of strings to exist for each grammar, and this set to be polynomial in the size of the grammar or automaton that is to be learned, where the size of the sample is the sum of the lengths of all strings it includes. The learning algorithm must also infer correctly as soon as the characteristic set is included in the data. We first show that this definition corresponds to a notion of teachability as defined by Goldman and Mathias. By adapting their teacher/learner model to grammatical inference we prove that languages given by context-free grammars, simple deterministic grammars, linear grammars and nondeterministic finite automata are not identifiable in the limit from polynomial time and data. 相似文献

19.

LR parsing of probabilistic grammars with input uncertainty for speech recognition 总被引：1，自引：0，他引：1

J. H. Wright 《Computer Speech and Language》1990,4(4)

A shift-reduce parser for probabilistic context-free grammars is described, based on the LR algorithm. Each of the standard types of LR parser generator has a probabilistic version and a Bayesian interpretation is advanced. A graph-structured stack permits action conflicts and allows the parser to be used with uncertain input, typical of speech recognition applications. The sentence uncertainty is measured using entropy and is found to be significantly lower for the grammar than for a derived first-order Markov model. 相似文献

20.

Boundelly LR(k)-conflictable grammars

Myung-Joon Lee Kwang-Moo Choe 《Acta Informatica》1994,31(3):261-283

相似文献