首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this paper, we consider probabilistic context-free grammars, a class of generative devices that has been successfully exploited in several applications of syntactic pattern matching, especially in statistical natural language parsing. We investigate the problem of training probabilistic context-free grammars on the basis of distributions defined over an infinite set of trees or an infinite set of sentences by minimizing the cross-entropy. This problem has applications in cases of context-free approximation of distributions generated by more expressive statistical models. We show several interesting theoretical properties of probabilistic context-free grammars that are estimated in this way, including the previously unknown equivalence between the grammar cross-entropy with the input distribution and the so-called derivational entropy of the grammar itself. We discuss important consequences of these results involving the standard application of the maximum-likelihood estimator on finite tree and sentence samples, as well as other finite-state models such as hidden Markov models and probabilistic finite automata.  相似文献   

2.
Summary Some decomposition of the parsing of the sentences of context-free grammars into sequences of independant sub-tasks is proposed. An example of grammar is presented, for which this decomposition provides an efficient parser for a multiprocessing environment. The average speed-up resulting from the parallelization of the parser of an arithmetic infix grammar is evaluated by means of probabilistic models and real world measurements.  相似文献   

3.
基于概率上下文无关文法的句法分析歧义消解新模式   总被引:2,自引:1,他引:2  
基于自然语言句法歧义消解常用的一种概率模型-概率上下文无关文法,融入上下文相关的概率信息,提出一种新的歧义消解计算模式,该模式经测试可以有效地提高句法分析中歧义消解的正确率。  相似文献   

4.
Estimation of distribution algorithms are evolutionary algorithms using probabilistic techniques instead of traditional genetic operators. Recently, the application of probabilistic techniques to program and function evolution has received increasing attention, and this approach promises to provide a strong alternative to the traditional genetic programming techniques. Although a probabilistic context-free grammar (PCFG) is a widely used model for probabilistic program evolution, a conventional PCFG is not suitable for estimating interactions among nodes because of the context freedom assumption. In this paper, we have proposed a new evolutionary algorithm named programming with annotated grammar estimation based on a PCFG with latent annotations, which allows this context freedom assumption to be weakened. By applying the proposed algorithm to several computational problems, it is demonstrated that our approach is markedly more effective at estimating building blocks than prior approaches.   相似文献   

5.
Fuzzy context-free max- grammar (or FCFG, for short), as a straightforward extension of context-free grammar, has been introduced to express uncertainty, imprecision, and vagueness in natural language fragments. Li recently proposed the approximation of fuzzy finite automata, which may effectively deal with the practical problems of fuzziness, impreciseness and vagueness. In this paper, we further develop the approximation of fuzzy context-free grammars. In particular, we show that a fuzzy context-free grammar under max- compositional inference can be approximated by some fuzzy context-free grammar under max-min compositional inference with any given accuracy. In addition, some related properties of fuzzy context-free grammars and fuzzy languages generated by them are studied. Finally, the sensitivity of fuzzy context-free grammars is also discussed.  相似文献   

6.
层级分类概率句法分析   总被引:3,自引:0,他引:3  
对已有的句法分析中引入知识的方法进行了归纳分析,认为多种句法分析方法都可被看作是基于特征标记的分类,然后分析了其中的欠分类和过分类问题.在此基础上,提出一种层级分类短语结构文法和一种层级分类概率句法分析方法(hierarchically classified probabilistic context-free grammar),并设计了一种通过对实例进行聚类来消除句法规则的分类歧义方法.还进一步将层级分类扩展到概率上下文相关句法分析方法,利用上下文相关性的层级分类来解决引入上下文相关时的数据稀疏性问题.通过上述一系列方法有效地克服了过分类与前分类之间的矛盾.  相似文献   

7.
介绍一种利用YACC(Yet Another Compiler-Compiler)技术实现检测网络服务器程序异常行为的新方法。该方法用一种携带语义标注的上下文无关文法描述服务器程序正常行为模式,利用YACC自动生成的语法分析器构成异常检测引擎,并利用YACC提供的错误处理和语义处理接口对异常现场进行分析。实验结果表明,该方法不仅能有效检测各种利用服务器程序漏洞进行的缓冲区溢出、堆内存破环等入侵方式,而且能实时地对异常行为进行分析追踪并向安全管理人员提供入侵相关详细信息,而这种能力正是目前同类方法所缺乏的。  相似文献   

8.
Abstract

This paper presents a method of using BNF to specify natural language in such a way that a relatively small grammar of English can express the major grammatical constraints of the language and can be refined without undue proliferation of the rules. The results show that the departures of natural language from a context-free language are of a very restricted kind. The analysis obtained for sentences of the scientific literature is relevant for information processing  相似文献   

9.
Spelling correction is a fundamental task in text mining. In this study, we assess the real-word error correction model proposed by Mays, Damerau and Mercer and describe several drawbacks of the model. We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence, by manipulating a probabilistic context-free grammar to discriminate between items in the search space. We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky’s WordNet-based method and Wilcox-O’Hearn, Hirst, and Budanitsky’s fixed windows size method.  相似文献   

10.
We present a novel algorithm using new hypothesis representations for learning context-free grammars from a finite set of positive and negative examples. We propose an efficient hypothesis representation method which consists of a table-like data structure similar to the parse table used in efficient parsing algorithms for context-free grammars such as Cocke-Younger-Kasami algorithm. By employing this representation method, the problem of learning context-free grammars from examples can be reduced to the problem of partitioning the set of nonterminals. We use genetic algorithms for solving this partitioning problem. Further, we incorporate partially structured examples to improve the efficiency of our learning algorithm, where a structured example is represented by a string with some parentheses inserted to indicate the shape of the derivation tree of the unknown grammar. We demonstrate some experimental results using these algorithms and theoretically analyse the completeness of the search space using the tabular method for context-free grammars.  相似文献   

11.
本文论述了一种基于二元组合文法的汉语句法结构分析的消除歧义方法。首先给出了二元组合文法的基本概念以及基本思想,然后研究了概率上下文无关文法独立性假设的限性,并针对局限性引入了基于二元组合文法的上下文相关的概率信息,同时提出了一种新的计算分值模式。实验结果证明,这种方法对句法分析过程中的歧义消解是有效的。  相似文献   

12.
Selective substitution grammars based on ‘context-free’ productions form a possible framework for the study of ‘grammatically oriented’ formal language theory. Such grammars (with no control governing the composition of derivation steps) are studied in this paper. In particular we study the effect of various conditions on selectors (which define the way that rewriting is performed); those conditions are aimed to formalize the notion of ‘using information about the context’ during the rewriting process. Each of them captures a particular feature of a rewriting according to a context-free grammar or an EOS system (essentially a context-free grammar that can also rewrite terminal symbols). Some of those conditions yield characterizations of the class of context-free languages for other conditions the lower and upper bound on the language generating power are given. Also a natural notion of a class of ‘simple’ rewriting systems is introduced (pattern grammars) and it is demonstrated that they possess surprisingly high language generating power.  相似文献   

13.
It has been known since 1962 that the ambiguity problem for context-free grammars is undecidable. Ambiguity in context-free grammars is a recurring problem in language design and parser generation, as well as in applications where grammars are used as models of real-world physical structures.We observe that there is a simple linguistic characterization of the grammar ambiguity problem, and we show how to exploit this by presenting an ambiguity analysis framework based on conservative language approximations. As a concrete example, we propose a technique based on local regular approximations and grammar unfoldings. We evaluate the analysis using grammars that occur in RNA analysis in bioinformatics, and we demonstrate that it is sufficiently precise and efficient to be practically useful.  相似文献   

14.
This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference between target and candidate units (words or phrases) is estimated by the cosine measure with the inside probability of PCFG acting as a weight. Latent semantic analysis (LSA) is applied to reduce the dimensionality of the syntactic vectors. The dynamic programming algorithm is adopted to obtain a concatenated unit sequence with minimum cost. A syntactic property-rich speech database is designed and collected as the unit inventory. Several experiments with statistical testing are conducted to assess the quality of the synthetic speech as perceived by human subjects. The proposed method outperforms the synthesizer without considering syntactic property. The structural syntax estimates the substitution cost better than the acoustic features alone  相似文献   

15.
Extended context-free grammars allow regular expressions to appear in productions right hand sides, and are a clear and natural way to describe the syntax of programming languages.In this paper an LR parsing technique for extended context-free grammars is presented, which is based on an underlying transformation of the grammar into an equivalent context-free one.The technique is suitable for inclusion in one-pass compilers: the implementation requires little extensions to the algorithms working for normal LR grammars. Besides describing the parsing method, the paper shows also the algorithms for deriving the parsing tables; tables optimization is also discussed. Finally, this technique is compared with other proposals appeared in the literature.  相似文献   

16.
A number of grammatical formalisms have been proposed to describe the syntax of natural languages, and the universal recognition problems for some of those classes of grammars have been studied. A universal recognition problem for a class Q of grammars is the one to decide, taking a grammar G ∈ G and a string ui as an input, whether G can generate w or not. In this paper, the computational complexities of the universal recognition problems for parallel multiple context-free grammars, multiple context-free grammars, and their subclasses are discussed.  相似文献   

17.
In this paper we introduce context-free grammars and pushdown automata over infinite alphabets. It is shown that a language is generated by a context-free grammar over an infinite alphabet if and only if it is accepted by a pushdown automaton over an infinite alphabet. Also the generated (accepted) languages possess many of the properties of the ordinary context-free languages: decidability, closure properties, etc.. This provides a substantial evidence for considering context-free grammars and pushdown automata over infinite alphabets as a natural extension of the classical ones. Received November 27, 1995 / March 4, 1997  相似文献   

18.
When concerned about efficient grammatical inference two issues are relevant: the first one is to determine the quality of the result, and the second is to try to use polynomial time and space. A typical idea to deal with the first point is to say that an algorithm performs well if it infers in the limit the correct language. The second point has led to debate about how to define polynomial time: the main definitions of polynomial inference have been proposed by Pitt and Angluin. We return in this paper to a definition proposed by Gold that requires a characteristic set of strings to exist for each grammar, and this set to be polynomial in the size of the grammar or automaton that is to be learned, where the size of the sample is the sum of the lengths of all strings it includes. The learning algorithm must also infer correctly as soon as the characteristic set is included in the data. We first show that this definition corresponds to a notion of teachability as defined by Goldman and Mathias. By adapting their teacher/learner model to grammatical inference we prove that languages given by context-free grammars, simple deterministic grammars, linear grammars and nondeterministic finite automata are not identifiable in the limit from polynomial time and data.  相似文献   

19.
A shift-reduce parser for probabilistic context-free grammars is described, based on the LR algorithm. Each of the standard types of LR parser generator has a probabilistic version and a Bayesian interpretation is advanced. A graph-structured stack permits action conflicts and allows the parser to be used with uncertain input, typical of speech recognition applications. The sentence uncertainty is measured using entropy and is found to be significantly lower for the grammar than for a derived first-order Markov model.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号