首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Elementary dependency relationships between words within parse trees produced by robust analyzers on a corpus help automate the discovery of semantic classes relevant for the underlying domain. We introduce two methods for extracting elementary syntactic dependencies from normalized parse trees. The groupings which are obtained help identify coarse-grain semantic categories and isolate lexical idiosyncrasies belonging to a specific sublanguage. A comparison shows a satisfactory overlapping with an existing nomenclature for medical language processing. This symbolic approach is efficient on medium size corpora which resist to statistical clustering methods but seems more appropriate for specialized texts. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

2.
This paper explores a tree kernel based method for semantic role labeling (SRL) of Chinese nominal predicates via a convolution tree kernel. In particular, a new parse tree representation structure, called dependency-driven constituent parse tree (D-CPT), is proposed to combine the advantages of both constituent and dependence parse trees. This is achieved by directly representing various kinds of dependency relations in a CPT-style structure, which employs dependency relation types instead of phrase labels in CPT (Constituent Parse Tree). In this way, D-CPT not only keeps the dependency relationship information in the dependency parse tree (DPT) structure but also retains the basic hierarchical structure of CPT style. Moreover, several schemes are designed to extract various kinds of necessary information, such as the shortest path between the nominal predicate and the argument candidate, the support verb of the nominal predicate and the head argument modified by the argument candidate, from D-CPT. This largely reduces the noisy information inherent in D-CPT. Finally, a convolution tree kernel is employed to compute the similarity between two parse trees. Besides, we also implement a feature-based method based on D-CPT. Evaluation on Chinese NomBank corpus shows that our tree kernel based method on D-CPT performs significantly better than other tree kernel-based ones and achieves comparable performance with the state-of-the-art feature-based ones. This indicates the effectiveness of the novel D-CPT structure in representing various kinds of dependency relations in a CPT-style structure and our tree kernel based method in exploring the novel D-CPT structure. This also illustrates that the kernel-based methods are competitive and they are complementary with the feature- based methods on SRL.  相似文献   

3.
This paper considers the problem of finding relevant answers to multi-sentence questions, which is urgent for many applied fields. In particular, this problem arises in industrial systems that are aimed at providing goods and services. One of the major approaches to this problem is that a set of potential answers that were obtained using a keyword search is repeatedly ordered by comparing syntactic answer parse trees with a question parse tree. This work modifies the approach based on using parse trees and improves it by passing to a more exact representation of semantic and syntactic text structure: the consideration of text paragraphs as a unit of analyzed information. The software implementation of the approach was performed and the results of the implementation were placed in open access as an adjustment for the Apache SOLR search engine, by which the suggested technology can be easily integrated with industrial search systems.  相似文献   

4.
Traditional Earley parsers operate in two phases: first recognizing the input, then constructing the forest of parse trees. Practically speaking, this quirk makes it awkward to use in a compiler-compiler, because semantic actions attached to rules are only executed after the fact. We address this problem by identifying safe Earley sets, points during the recognition phase at which partial parse trees can be constructed; this means that semantic actions may be executed on the fly. A secondary benefit is that Earley sets can be deleted during recognition, resulting in a substantial savings of both space and time.  相似文献   

5.
We build an open-source toolkit which implements deterministic learning to support search and text classification tasks. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common sub-trees and performed at a level of paragraphs, sentences, phrases and individual words. We analyze semantic features of such similarity measure and compare it with semantics of traditional anti-unification of terms. Nearest-neighbor machine learning is then applied to relate a sentence to a semantic class. Using syntactic parse tree-based similarity measure instead of bag-of-words and keyword frequency approach, we expect to detect a weak semantic signal otherwise unobservable. The proposed approach is evaluated in a four distinct domains where a lack of semantic information makes classification of sentences rather difficult. We describe a toolkit which is a part of Apache Software Foun-dation project OpenNLP, designed to aid search engineers in tasks requiring text relevance assessment.  相似文献   

6.
提出一种基于短语和依存句法结构的中文语义角色标注(SRL)方法。联合短语句法特征和依存句法特征,对句法树进行剪枝,过滤句法树上不可能担当语义角色的组块短语单元和关系结点,对担当语义角色的组块或节点进行角色类别标注。基于正确句法树和正确谓词的识别结果表明,该方法的SRL性能F1值为73.53%,优于目前国内外的同类系统。  相似文献   

7.
基于依存句法分析的中文语义角色标注   总被引:3,自引:0,他引:3  
依存句法是句法分析的一种,相比于短语结构句法分析,依存句法具有更简洁的表达方式。该文采用英文语义角色标注的研究方法,实现了一个基于中文依存句法分析的语义角色标注系统。该系统针对中文依存关系树,采用有效的剪枝算法和特征,使用最大熵分类器进行语义角色的识别和分类。系统使用了两种不同的语料,一种是由标准短语结构句法分析(CTB5.0)转换而来,另一种是CoNLL2009公布的中文语料。系统分别在两种语料的标准谓词和自动谓词的基础上进行实验,在标准谓词上取得的F1值分别为84.30%和81.68%,在自动谓词上的F1值为81.02%和81.33%。  相似文献   

8.
A two-phase annotation method for semantic labeling in natural language processing is proposed. The dynamic programming approach stresses on a non-exact string matching which takes full advantage of the underlying grammatical structure of the parse trees in a Treebank. The first phase of the labeling is a coarse-grained syntactic parsing which is complementary to a semantic dissimilarities analysis in its latter phase. The approach goes beyond shallow parsing to a deeper level of case role identification, while preserving robustness, without being bogged down into a complete linguistic analysis. The paper presents experimental results for recognizing more than 50 different semantic labels in 10,000 sentences. Results show that the approach improves the labeling, even though with incomplete information. Detailed evaluations are discussed in order to justify its significances.  相似文献   

9.
This paper proposes a novel tree kernel-based method with rich syntactic and semantic information for the extraction of semantic relations between named entities. With a parse tree and an entity pair, we first construct a rich semantic relation tree structure to integrate both syntactic and semantic information. And then we propose a context-sensitive convolution tree kernel, which enumerates both context-free and context-sensitive sub-trees by considering the paths of their ancestor nodes as their contexts to capture structural information in the tree structure. An evaluation on the Automatic Content Extraction/Relation Detection and Characterization (ACE RDC) corpora shows that the proposed tree kernel-based method outperforms other state-of-the-art methods.  相似文献   

10.
该文探索了基于树核函数的中文语义角色分类,重点研究如何获取有效的结构化信息特征。在最小句法树结构的基础上,根据语义角色分类的特点,进一步定义了三种不同的句法结构,并使用复合核将基于树核和基于特征的方法结合。在中文PropBank语料上的结果表明,基于树核函数的方法在中文语义角色分类任务中能够取得较好的结果,精确率达到91.79%。同时,与基于特征方法的结合,基于树核函数的方法能够进一步提高前者性能,精确率达到94.28%,优于同类系统。  相似文献   

11.
数据抽取常用正则表达式(RE)来描述数据源.为实现可视化描述,需将RE转换成分析树.但现有基于改写的RE分析树构造方法会破坏数据对象的内在结构,不能用于数据抽取问题.提出了一种无改写的RE分析树构造算法.实验表明,该算法在时空间性能和实用性等方面优于现有RE分析树构造算法.  相似文献   

12.
This paper proposes a tree kernel method of semantic relation detection and classification(RDC) between named entities.It resolves two critical problems in previous tree kernel methods of RDC.First,a new tree kernel is presented to better capture the inherent structural information in a parse tree by enabling the standard convolution tree kernel with context-sensitiveness and approximate matching of sub-trees.Second,an enriched parse tree structure is proposed to well derive necessary structural informat...  相似文献   

13.
INTERACTIVE SEMANTIC ANALYSIS OF TECHNICAL TEXTS   总被引:4,自引:0,他引:4  
Sentence syntax is the basis for organizing semantic relations in TANKA, a project that aims to acquire knowledge from technical text. Other hallmarks include an absence of precoded domain-specific knowledge; significant use of public-domain generic linguistic information sources; involvement of the user as a judge and source of expertise; and learning from the meaning representations produced during processing. These elements shape the realization of the TANKA project: implementing a trainable text processing system to propose correct semantic interpretations to the user. A three-level model of sentence semantics, including a comprehensive Case system, provides the framework for TANKA's representations. Text is first processed by the DIPETT parser, which can handle a wide variety of unedited sentences. The semantic analysis module HAIKU then semi-automatically extracts semantic patterns from the parse trees and composes them into domain knowledge representations. HAIKU's dictionaries and main algorithm are described with the aid of examples and traces of user interaction. Encouraging experimental results are described and evaluated.  相似文献   

14.
对未知网络协议进行逆向解析在网络安全应用中具有重要的意义。现有的协议逆向解析方法大都存在无法处理加密协议和无法获取协议字段语义信息的问题。针对这一问题,提出并实现了一种基于数据流分析的网络协议解析技术。该技术依托动态二进制插桩平台Pin下编写的数据流记录插件,以基于数据关联性分析的数据流跟踪技术为基础,对软件使用的网络通信协议进行解析,获取协议的格式信息,以及各个协议字段的语义。实验结果证明,该技术能够正确解析出软件通信的协议格式,并提取出各个字段所对应的程序行为语义,尤其对于加密协议有不错的解析效果,达到了解析网络协议的目的。  相似文献   

15.
《国际计算机数学杂志》2012,89(9):1051-1067
Semantic trees have often been used as a theoretical tool for showing the unsatisfiability of clauses in first-order predicate logic. Their practicality has been overshadowed, however, by other strategies. In this paper, we introduce unit clauses derived from resolutions when necessary to construct a semantic tree, leading to a strategy that combines the construction of semantic trees with resolution–refutation. The parallel semantic tree theorem prover, called PrHERBY, combines semantic trees and resolution–refutation methods. The parallel system is scalable by strategically selecting atoms with the help of dedicated resolutions. In addition, a parallel grounding scheme allows each system to have its own instance of generated atoms, thereby increasing the possibility of success. The PrHERBY system presented performs significantly better and generally finds proof using fewer atoms than the semantic tree prover, HERBY and its parallel version, PHERBY.  相似文献   

16.
基于逻辑行和最大接纳距离的网页正文抽取   总被引:3,自引:0,他引:3       下载免费PDF全文
网页正文抽取是很多互联网应用的基础工作和必须解决的问题。目前的主流方法是基于DOM树结构,此方法需要解析出网页的DOM树结构。对于目前互联网上的网页来源众多、结构众多的情形,基于DOM树的处理方法除了性能不足以外,还会遇到抽取精度上的问题。针对这些问题,该文提出了一个网页正文抽取的新方法,该方法不依赖DOM树,而是考虑人们编写网页的方式形成一些启发式规则,并结合相关的统计规律,以逻辑行为基本处理单位,基于最大接纳距离进行网页正文抽取。实验表明,论文的方法能够高效、高精度地抽取出网页正文。  相似文献   

17.
由于Java Web应用业务场景复杂,且对输入数据的结构有效性要求较高,现有的测试方法和工具在测试Java Web时存在测试用例的有效率较低的问题.为了解决上述问题,本文提出了基于解析树的Java Web应用灰盒模糊测试方法.首先为Java Web应用程序的输入数据包进行语法建模创建解析树,区分分隔符和数据块,并为解析树中每一个叶子结点挂接一个种子池,隔离测试用例的单个数据块,通过数据包拼接生成符合Java Web应用业务格式的输入,从而提高测试用例的有效率;为了保留高质量的数据块,在测试期间根据测试程序的执行反馈信息,为每个数据块种子单独赋予权值;为了突破深度路径,会在相应种子池中基于条件概率学习提取数据块种子特征.本文实现了基于解析树的Java Web应用灰盒模糊测试系统PTreeFuzz,测试结果表明,该系统相较于现有工具取得了更好的测试准确率.  相似文献   

18.
This article explores how the effectiveness of learning to parse with neural networks can be improved by including two architectural features relevant to language: generalisations across syntactic constituents and bounded resource effects. A number of neural network parsers have recently been proposed, each with a different approach to the representational problem of outputting parse trees. In addition, some of the parsers have explicitly attempted to capture an important regularity within language, which is to generalise information across syntactic constituents. A further property of language is that natural bounds exist for the number of constituents which a parser need retain for later processing. Both the generalisations and the resource bounds may be captured in architectural features which enhance the effectiveness and efficiency of learning to parse with neural networks. We describe a number of different types of neural network parser, and compare them with respect to these two features. These features are both explicitly present in the Simple Synchrony Network parser, and we explore and illustrate their impact on the process of learning to parse in some experiments with a recursive grammar.  相似文献   

19.
One of the difficult problems that faces a compiler writer is to devise a grammar that is suitable for both efficient parsing and semantic attribution. This paper describes a system that resolves conflicts in LR(1) parsing by taking advantage of information in the parse tree. The system, which functions as part of a compiler generator, rewrites the user's grammar to remove parsing conflicts. It then places code into the generated compiler that rewrites the parse tree during parsing so as to produce the tree of the original grammar. The compiler writer can then write the semantic attribution to fit his or her original grammar without any knowledge of the changes made. The method is expected to be efficient in most cases, even in parsing systems that do not explicitly build the entire parse tree. The method complements previous work in its capabilities and advantages. The system has been implemented and integrated into a compiler generator system.  相似文献   

20.
We investigate the distribution of fitness of programs concentrating on those represented as parse trees and, particularly, how such distributions scale with respect to changes in the size of the programs. By using a combination of enumeration and Monte Carlo sampling on a large number of problems from three very different areas, we suggest that, in general, once some minimum size threshold has been exceeded, the distribution of performance is approximately independent of program length. We proof this for both linear programs and simple side effect free parse trees. We give the density of solutions to the parity problems in program trees which are composed of XOR building blocks. Limited experiments with programs including side effects and iteration suggest a similar result may also hold for this wider class of programs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号