首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
概率Earley句法分析算法采用Viterbi路径构造输入序列的句法树,由于文法限制,存在空树问题。提出了扩展启始状态、省略未覆盖句首和补充未覆盖子树等方法来对Viterbi路径进行扩展,解决了绝大多数空树问题,并有效提高了Earley算法的整体性能。  相似文献   

2.
In their recogniser forms, the Earley and RIGLR algorithms for testing whether a string can be derived from a grammar are worst-case cubic on general context free grammars (CFG). Earley gave an outline of a method for turning his recognisers into parsers, but it turns out that this method is incorrect. Tomita’s GLR parser returns a shared packed parse forest (SPPF) representation of all derivations of a given string from a given CFG but is worst-case unbounded polynomial order. The parser version of the RIGLR algorithm constructs Tomita-style SPPFs and thus is also worst-case unbounded polynomial order. We have given a modified worst-case cubic GLR algorithm, that, for any string and any CFG, returns a binarised SPPF representation of all possible derivations of a given string. In this paper we apply similar techniques to develop worst-case cubic Earley and RIGLR parsing algorithms.  相似文献   

3.
We study CFG parse tree enumeration in this paper. By dividing the set of all parse trees into infinite hierarchies according to height of parse tree, the hierarchical lexicographic order on the set of parse trees is established. Then grammar-based algorithms for counting and enumerating CFG parse trees in this order are presented. To generate a parse tree of height n, the time complexity is O(n). If τ is a lowest parse tree for its yield, then O(n) =O(∥ gt ∥+ 1), where ∥τ ∥ is the length of the sentence (yield) generated by τ. The sentence can be obtained as a by-product of the parse tree. To compute sentence from its parse tree (needn’t be lowest one), the time complexity is O(node)+O(∥τ ∥ + 1), where node is the number of non-leaf nodes of parse tree τ. To generate both a complete lowest parse tree and its yield at the same time, the time complexity is O(∥τ ∥ + 1). Supported by the National Natural Science Foundation of China (Grant Nos. 60273023, 60721061)  相似文献   

4.
This paper considers the problem of finding relevant answers to multi-sentence questions, which is urgent for many applied fields. In particular, this problem arises in industrial systems that are aimed at providing goods and services. One of the major approaches to this problem is that a set of potential answers that were obtained using a keyword search is repeatedly ordered by comparing syntactic answer parse trees with a question parse tree. This work modifies the approach based on using parse trees and improves it by passing to a more exact representation of semantic and syntactic text structure: the consideration of text paragraphs as a unit of analyzed information. The software implementation of the approach was performed and the results of the implementation were placed in open access as an adjustment for the Apache SOLR search engine, by which the suggested technology can be easily integrated with industrial search systems.  相似文献   

5.
谷波  李茹  刘开瑛 《计算机科学》2010,37(1):229-232
在自然语言处理中,句法分析主要有基于统计的方法和基于规则的方法。Earley算法是一种基于规则的方法,可以分析任意上下文无关文法(CFG),而不需要对文法进行修改。详细分析了Earley算法的特点。在通常的Earley算法中增加了多种预测机制,这些预测机制借鉴了LL,LR以及SLR等确定性分析算法的一些思想,并对这几种不同的预测机制及其组合在相同条件下进行了中文句法分析实验。结果显示,引入这些预测机制通常可以减少产生项目的数量,从而节省存储空间,减少运行时间。  相似文献   

6.
In its recogniser form, Earley's algorithm for testing whether a string can be derived from a grammar is worst case cubic on general context free grammars (CFG). Earley gave an outline of a method for turning his recognisers into parsers, but it turns out that this method is incorrect. Tomita's GLR parser returns a shared packed parse forest (SPPF) representation of all derivations of a given string from a given CFG but is worst case unbounded polynomial order. We have given a modified worst-case cubic version, the BRNGLR algorithm, that, for any string and any CFG, returns a binarised SPPF representation of all possible derivations of a given string. In this paper we apply similar techniques to develop two versions of an Earley parsing algorithm that, in worst-case cubic time, return an SPPF representation of all derivations of a given string from a given CFG.  相似文献   

7.
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

8.
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

9.
INTERACTIVE SEMANTIC ANALYSIS OF TECHNICAL TEXTS   总被引:4,自引:0,他引:4  
Sentence syntax is the basis for organizing semantic relations in TANKA, a project that aims to acquire knowledge from technical text. Other hallmarks include an absence of precoded domain-specific knowledge; significant use of public-domain generic linguistic information sources; involvement of the user as a judge and source of expertise; and learning from the meaning representations produced during processing. These elements shape the realization of the TANKA project: implementing a trainable text processing system to propose correct semantic interpretations to the user. A three-level model of sentence semantics, including a comprehensive Case system, provides the framework for TANKA's representations. Text is first processed by the DIPETT parser, which can handle a wide variety of unedited sentences. The semantic analysis module HAIKU then semi-automatically extracts semantic patterns from the parse trees and composes them into domain knowledge representations. HAIKU's dictionaries and main algorithm are described with the aid of examples and traces of user interaction. Encouraging experimental results are described and evaluated.  相似文献   

10.
We build an open-source toolkit which implements deterministic learning to support search and text classification tasks. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common sub-trees and performed at a level of paragraphs, sentences, phrases and individual words. We analyze semantic features of such similarity measure and compare it with semantics of traditional anti-unification of terms. Nearest-neighbor machine learning is then applied to relate a sentence to a semantic class. Using syntactic parse tree-based similarity measure instead of bag-of-words and keyword frequency approach, we expect to detect a weak semantic signal otherwise unobservable. The proposed approach is evaluated in a four distinct domains where a lack of semantic information makes classification of sentences rather difficult. We describe a toolkit which is a part of Apache Software Foun-dation project OpenNLP, designed to aid search engineers in tasks requiring text relevance assessment.  相似文献   

11.
We propose a layered-grammar model to represent actions. Using this model, an action is represented by a set of grammar rules. The bottom layer of an action instance’s parse tree contains action primitives such as spatiotemporal (ST) interest points. At each layer above, we iteratively mine grammar rules and “super rules” that account for the high-order compositional feature structures. The grammar rules are categorized into three classes according to three different ST-relations of their action components, namely the strong relation, weak relation and stochastic relation. These ST-relations characterize different action styles (degree of stiffness), and they are pursued in terms of grammar rules for the purpose of action recognition. By adopting the Emerging Pattern (EP) mining algorithm for relation pursuit, the learned production rules are statistically significant and discriminative. Using the learned rules, the parse tree of an action video is constructed by combining a bottom-up rule detection step and a top-down ambiguous rule pruning step. An action instance is recognized based on the discriminative configurations generated by the production rules of its parse tree. Experiments confirm that by incorporating the high-order feature statistics, the proposed method largely improves the recognition performance over the bag-of-words models.  相似文献   

12.
We show in this paper that parsing with regular expressions instead of context-free grammars, when it is possible, is desirable. We present efficient algorithms for performing different tasks that concern parsing: producing the external representation and the internal representation of parse trees; producing all possible parse trees or a single one. Each of our algorithms to produce a parse tree from an input string has an optimal time complexity, linear with the length of the string. Moreover, ambiguous regular expressions can be used. Received: 21 October 1997 / 30 May 2000  相似文献   

13.
提出一种基于短语和依存句法结构的中文语义角色标注(SRL)方法。联合短语句法特征和依存句法特征,对句法树进行剪枝,过滤句法树上不可能担当语义角色的组块短语单元和关系结点,对担当语义角色的组块或节点进行角色类别标注。基于正确句法树和正确谓词的识别结果表明,该方法的SRL性能F1值为73.53%,优于目前国内外的同类系统。  相似文献   

14.
One of the difficult problems that faces a compiler writer is to devise a grammar that is suitable for both efficient parsing and semantic attribution. This paper describes a system that resolves conflicts in LR(1) parsing by taking advantage of information in the parse tree. The system, which functions as part of a compiler generator, rewrites the user's grammar to remove parsing conflicts. It then places code into the generated compiler that rewrites the parse tree during parsing so as to produce the tree of the original grammar. The compiler writer can then write the semantic attribution to fit his or her original grammar without any knowledge of the changes made. The method is expected to be efficient in most cases, even in parsing systems that do not explicitly build the entire parse tree. The method complements previous work in its capabilities and advantages. The system has been implemented and integrated into a compiler generator system.  相似文献   

15.
The problem of outputting all parse trees of a string accepted by a context-free grammar is considered. A systolic algorithms is presented that operates inO(m·n) time, wherem is the number of distinct parse trees andn is the length of the input. The systolic array usesn 2 processors, each of which requires at mostO(logn) bits of storage. This is much more space-efficient that a previously reported systolic algorithm for the same problem, which requiredO(n logn) space per processor. The algorithm also extends previous algorithms that only output a single parse tree of the input.Research squpported in part by NSF Grant DCR-8420935 and DCR-8604603.  相似文献   

16.
We present a novel language model, suitable for large-vocabulary continuous speech recognition, based on parsing with a probabilistic left corner grammar (PLCG). The PLCG probabilities are conditioned on local and non-local features of the partial parse tree, and some of these features are lexical. They are not derived from another stochastic grammar, but directly induced from a treebank, a corpus of text sentences, annotated with parse trees. A context-enriched constituent represents all partial parse trees that are equivalent with respect to the probability of the next parse move. For computational efficiency the parsing problem is represented as a traversal through a compact stochastic network of constituents connected by PLCG moves. The efficiency of the algorithm is due to the fact that the network consists of recursively nested, shared subnetworks. The PLCG-based language model results from accumulating the probabilities of all (partial) paths through this network. Next word probabilities can be computed synchronously with the probabilistic left corner parsing algorithm in one single pass from left to right. They are guaranteed to be normalized, even when pruning less likely paths. Finally, it is shown experimentally that the PLCG-based language model is a competitive alternative to other syntax-based language models, both in efficiency and accuracy.  相似文献   

17.
This paper explores a tree kernel based method for semantic role labeling (SRL) of Chinese nominal predicates via a convolution tree kernel. In particular, a new parse tree representation structure, called dependency-driven constituent parse tree (D-CPT), is proposed to combine the advantages of both constituent and dependence parse trees. This is achieved by directly representing various kinds of dependency relations in a CPT-style structure, which employs dependency relation types instead of phrase labels in CPT (Constituent Parse Tree). In this way, D-CPT not only keeps the dependency relationship information in the dependency parse tree (DPT) structure but also retains the basic hierarchical structure of CPT style. Moreover, several schemes are designed to extract various kinds of necessary information, such as the shortest path between the nominal predicate and the argument candidate, the support verb of the nominal predicate and the head argument modified by the argument candidate, from D-CPT. This largely reduces the noisy information inherent in D-CPT. Finally, a convolution tree kernel is employed to compute the similarity between two parse trees. Besides, we also implement a feature-based method based on D-CPT. Evaluation on Chinese NomBank corpus shows that our tree kernel based method on D-CPT performs significantly better than other tree kernel-based ones and achieves comparable performance with the state-of-the-art feature-based ones. This indicates the effectiveness of the novel D-CPT structure in representing various kinds of dependency relations in a CPT-style structure and our tree kernel based method in exploring the novel D-CPT structure. This also illustrates that the kernel-based methods are competitive and they are complementary with the feature- based methods on SRL.  相似文献   

18.
19.
Intrusion Detection System (IDS) is a security technology that attempts to identify intrusions. Defending against multi-step intrusions which prepare for each other is a challenging task. In this paper, we propose a novel approach to alert post-processing and correlation, the Alerts Parser. Different from most other alert correlation methods, our approach treats the alerts as tokens and uses modified version of the LR parser to generate parse trees representing the scenarii in the alerts. An Attribute Context-Free Grammar (ACF-grammar) is used for representing the multi-step attacks. Attack scenarii information and prerequisites/consequences knowledge are included together in the ACF-grammar enhancing the correlation results. The modified LR parser depends on these ACF-grammars to generate parse trees. The experiments were performed on two different sets of network traffic traces, using different open-source and commercial IDS sensors. The discovered scenarii are represented by Correlation Graphs (CGs). The experimental results show that Alerts Parser can work in parallel, effectively correlate related alerts with low false correlation rate, uncover the attack strategies, and generate concise CGs.  相似文献   

20.
When a reinforcement learning agent executes actions that can cause frequent damage to itself, it can learn, by using Q-learning, that these actions must not be executed again. However, there are other actions that do not cause damage frequently but only once in a while, for example, risky actions such as parachuting. These actions may imply punishment to the agent and, depending on its personality, it would be better to avoid them. Nevertheless, using the standard Q-learning algorithm, the agent is not able to learn to avoid them, because the result of these actions can be positive on average. In this article, an additional mechanism of Q-learning, inspired by the emotion of fear, is introduced in order to deal with those risky actions by considering the worst results. Moreover, there is a daring factor for adjusting the consideration of the risk. This mechanism is implemented on an autonomous agent living in a virtual environment. The results present the performance of the agent with different daring degrees.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号