首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).  相似文献   

2.
Controlled natural languages (CNL) with a direct mapping to formal logic have been proposed to improve the usability of knowledge representation systems, query interfaces, and formal specifications. Predictive editors are a popular approach to solve the problem that CNLs are easy to read but hard to write. Such predictive editors need to be able to “look ahead” in order to show all possible continuations of a given unfinished sentence. Such lookahead features, however, are difficult to implement in a satisfying way with existing grammar frameworks, especially if the CNL supports complex nonlocal structures such as anaphoric references. Here, methods and algorithms are presented for a new grammar notation called Codeco, which is specifically designed for controlled natural languages and predictive editors. A parsing approach for Codeco based on an extended chart parsing algorithm is presented. A large subset of Attempto Controlled English has been represented in Codeco. Evaluation of this grammar and the parser implementation shows that the approach is practical, adequate and efficient.  相似文献   

3.
Practical natural language understanding systems used to be concerned with very small miniature domains only: They knew exactly what potential text might be about, and what kind of sentence structures to expect. This optimistic assumption is no longer feasible if NLU is to scale up to deal with text that naturally occurs in the "real world". The key issue is robustness: The system needs to be prepared for cases where the input data does not correspond to the expectations encoded in the grammar. In this paper, we survey the approaches towards the robustness problem that have been developed throughout the last decade. We inspect techniques to overcome both syntactically and semantically ill-formed input in sentence parsing and then look briefly into more recent ideas concerning the extraction of information from texts, and the related question of the role that linguistic research plays in this game. Finally, the robust sentence parsing schemes are classified on a more abstract level of analysis.Dept. of Computer Science, University of TorontoFor helpful comments on earlier drafts of this paper, I thank Judy Dick, Graeme Hirst, Diane Horton, Kem Luther, and Jan Wiebe. Financial support by the University of Toronto is acknowledged. Communication and requests for reprints should be directed to the author at Department of Computer Science, University of Toronto, Toronto, Canada M5S 1A4.  相似文献   

4.
This paper presents a robust parsing approach which is designed to address the issue of syntactic errors in text. The approach is based on the concept of an error grammar which is a grammar of ungrammatical sentences. An error grammar is derived from a conventional grammar on the basis of an analysis of a corpus of observed ill-formed sentences. A robust parsing algorithm is presented which is applied after a conventional bottom–up parsing algorithm has failed. This algorithm combines a rule from the error grammar with rules from the normal grammar to arrive at a parse for an ungrammatical sentence. This algorithm is applied to 50 test sentences, with encouraging results.  相似文献   

5.
GTB (the Grammar Tool Box) is the tool that underpins our investigations into generalised parsing. Our goal is to produce a system that supports systematic investigation of various styles of generalised parsing in a way that allows meaningful comparisons between them in a repeatable and easily accessible fashion whilst also allowing: (i) new theoretical ideas to be generated and explored; (ii) production quality parsers to be generated and (iii) humane pedagogy. GTB comprises a language (LC) with various kinds of built-in grammar and automata related objects, and a set of black-box methods written in C++ that provide implementations of grammar transforms, automata construction algorithms, parsing and recognition algorithms, and a variety of visualisation aids. In this paper we focus on the overall rationale for the GTB framework; the GTB design goals; and some detailed operational flows that are supported by GTB.  相似文献   

6.
句法分析前沿动态综述   总被引:3,自引:2,他引:1  
句法分析的目标是分析输入句子并得到其句法结构,是自然语言处理领域的经典任务之一。目前针对该任务的研究主要集中于如何通过从数据中自动学习来提升句法分析器的精度。该文对句法分析方向的前沿动态进行了调研,分别从有监督句法分析、无监督句法分析和跨领域跨语言句法分析三个子方向梳理和介绍了2018—2019年发表的新方法和新发现,并对句法分析子方向的研究前景进行了分析和展望。  相似文献   

7.
Jean Bovet  Terence Parr 《Software》2008,38(12):1305-1332
Programmers tend to avoid using language tools, resorting to ad hoc methods, because tools can be hard to use, their parsing strategies can be difficult to understand and debug, and their generated parsers can be opaque black‐boxes. In particular, there are two very common difficulties encountered by grammar developers: understanding why a grammar fragment results in a parser non‐determinism and determining why a generated parser incorrectly interprets an input sentence. This paper describes ANTLRWorks, a complete development environment for ANTLR grammars that attempts to resolve these difficulties and, in general, make grammar development more accessible to the average programmer. The main components are a grammar editor with refactoring and navigation features, a grammar interpreter, and a domain‐specific grammar debugger. ANTLRWorks' primary contributions are a parser non‐determinism visualizer based on syntax diagrams and a time‐traveling debugger that pays special attention to parser decision‐making by visualizing lookahead usage and speculative parsing during backtracking. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

8.
Summary Most applications of parsing require that the parser call semantic action routines while processing the input. For LR(k) parsers it is well known that a semantic action routine can be called when the end of a production is recognized. Often, however, it is desirable to call routines at other times.This paper presents fast algorithms that determine, for an LR(k) (or SLR(k)) grammar, which positions are suitable for calling routines. The algorithms are practical for use with LR(1) (SLR(1)) parser building programs, because the worst case running time is dominated by the time required to build the LR(1) (SLR(1)) parser. Applications of the algorithms to attribute grammars and automatic indentation are discussed.  相似文献   

9.
A new class of context-free grammars, called dynamic context-free grammars, is introduced. These grammars have the ability to change the set of production rules dynamically during the derivation of some terminal string. The notion of LL() parsing is adapted to this grammar model. We show that dynamic LL() parsers are as powerful as LR() parsers, i.e. that they are capable to analyze every deterministic context-free language while using only one symbol of lookahead. Received: 24 August 1994 / 5 January 1996  相似文献   

10.
Three classes of parsable indexed grammars are defined. The new parsing mechanisms are derived by extending those that have been most successful for contextfree grammars, the LR and LL algorithms. Design criteria for the new grammar classes include membership decidability and unambiguity. We show that all three parsers operate in linear time for at least some grammars in our new classes. One of our new classes generates all the deterministic contextfree languages, along with some noncontextfree languages. We also show that the flag strings generated by indexed grammars are regular sets. This is done by showing that they can be generated by regular canonical systems.  相似文献   

11.
现代汉语句法与英语句法不同,具有明显复杂性,一是不容易获得完整的规则集,二是整句剖析所得结果含有大量的歧义结构难以消除。使用分治的策略将句法剖析任务分为不同层面的小任务,逐层进行句法剖析是一种可行有效的方法。其基本思想是:首先采用多层马尔可夫模型对句子进行短语组块剖析,将整个句子分割为名词组块、动词组块等短语语块,然后在此基础上运行CYK剖析算法,剖析组块间的依存关系,最终实现对完整语句的句法分析,浅层剖析简化了CYK算法规则集,在一定程度上降低了句法剖析难度。  相似文献   

12.
How to design a connectionist holistic parser   总被引:1,自引:0,他引:1  
Ho EK  Chan LW 《Neural computation》1999,11(8):1995-2016
Connectionist holistic parsing offers a viable and attractive alternative to traditional algorithmic parsers. With exposure to a limited subset of grammatical sentences and their corresponding parse trees only, a holistic parser is capable of learning inductively the grammatical regularity underlying the training examples that affects the parsing process. In the past, various connectionist parsers have been proposed. Each approach had its own unique characteristics, and yet some techniques were shared in common. In this article, various dimensions underlying the design of a holistic parser are explored, including the methods to encode sentences and parse trees, whether a sentence and its corresponding parse tree share the same representation, the use of confluent inference, and the inclusion of phrases in the training set. Different combinations of these design factors give rise to different holistic parsers. In succeeding discussions, we scrutinize these design techniques and compare the performances of a few parsers on language parsing, including the confluent preorder parser, the backpropagation parsing network, the XERIC parser of Berg (1992), the modular connectionist parser of Sharkey and Sharkey (1992), Reilly's (1992) model, and their derivatives. Experiments are performed to evaluate their generalization capability and robustness. The results reveal a number of issues essential for building an effective holistic parser.  相似文献   

13.
Grammar learning has been a bottleneck problem for a long time. In this paper, we propose a method of semantic separator learning, a special case of grammar learning. The method is based on the hypothesis that some classes of words, called semantic separators, split a sentence into several constituents. The semantic separators are represented by words together with their part-of-speech tags and other information so that rich semantic information can be involved. In the method, we first identify the semantic separators with the help of noun phrase boundaries, called subseparators. Next, the argument classes of the separators are learned from corpus by generalizing argument instances in a hypernym space. Finally, in order to evaluate the learned semantic separators, we use them in unsupervised Chinese text parsing. The experiments on a manually labeled test set show that the proposed method outperforms previous methods of unsupervised text parsing.  相似文献   

14.
Summary A parser model is presented whose structure is a generalization of the well known LR(k) parsers. Various classes of this parser that would be both practical and efficient to use in a compiler are examined. Associated with these classes of parsers is a hierarchy of type-0 grammars, each grammatical class being defined in terms of the form and structure of derivations. In particular, parsers based on a class called deterministic regular parsable (DRP) grammars will detect any errors as soon as possible during a left to right scan of the input. LR(k) grammars are also DRP. Much research related to LR(k) grammars and parsing is also applicable to DRP grammars and their associated parsers.  相似文献   

15.
Robustness, the ability to analyze any input regardless of its grammaticality, is a desirable property for any system dealing with unrestricted natural language text. Error-repair parsing approaches achieve robustness by considering ungrammatical sentences as corrupted versions of valid sentences. In this article we present a deductive formalism, based on Sikkel’s parsing schemata, that can be used to define and relate error-repair parsers and study their formal properties, such as correctness. This formalism allows us to define a general transformation technique to automatically obtain robust, error-repair parsers from standard non-robust parsers. If our method is applied to a correct parsing schema verifying certain conditions, the resulting error-repair parsing schema is guaranteed to be correct. The required conditions are weak enough to be fulfilled by a wide variety of popular parsers used in natural language processing, such as CYK, Earley and Left-Corner.  相似文献   

16.
XML语法检查的实现   总被引:3,自引:0,他引:3  
XML是可扩展标记语言,开发者可根据需要定义合适的标记。由于其灵活性,已被广泛应用于各个领域。主要讨论XML语法检查的两种方法,着重分析其中的一种,并给出具体的实现算法。对XML语法进行检查包括两部分,XML的有效性检查和XML的结构良好性检查,首先对表示XML文档类型定义(DTD)的局部树语法进行改进,然后在DTD构造的语法基础上,对XML进行有效性检查,针对语法检查的两个部分分别构造了检查算法。实验结果表明,该语法检查算法是切实可行的。  相似文献   

17.
Ho EK  Chan LW 《Neural computation》2001,13(5):1137-1170
Holistic parsers offer a viable alternative to traditional algorithmic parsers. They have good generalization performance and are robust inherently. In a holistic parser, parsing is achieved by mapping the connectionist representation of the input sentence to the connectionist representation of the target parse tree directly. Little prior knowledge of the underlying parsing mechanism thus needs to be assumed. However, it also makes holistic parsing difficult to understand. In this article, an analysis is presented for studying the operations of the confluent preorder parser (CPP). In the analysis, the CPP is viewed as a dynamical system, and holistic parsing is perceived as a sequence of state transitions through its state-space. The seemingly one-shot parsing mechanism can thus be elucidated as a step-by-step inference process, with the intermediate parsing decisions being reflected by the states visited during parsing. The study serves two purposes. First, it improves our understanding of how grammatical errors are corrected by the CPP. The occurrence of an error in a sentence will cause the CPP to deviate from the normal track that is followed when the original sentence is parsed. But as the remaining terminals are read, the two trajectories will gradually converge until finally the correct parse tree is produced. Second, it reveals that having systematic parse tree representations alone cannot guarantee good generalization performance in holistic parsing. More important, they need to be distributed in certain useful locations of the representational space. Sentences with similar trailing terminals should have their corresponding parse tree representations mapped to nearby locations in the representational space. The study provides concrete evidence that encoding the linearized parse trees as obtained via preorder traversal can satisfy such a requirement.  相似文献   

18.
We present a compiler that can be used to automatically obtain efficient Java implementations of parsing algorithms from formal specifications expressed as parsing schemata. The system performs an analysis of the inference rules in the input schemata in order to determine the best data structures and indexes to use, and to ensure that the generated implementations are efficient. The system described is general enough to be able to handle all kinds of schemata for different grammar formalisms, such as context‐free grammars and tree‐adjoining grammars, and it provides an extensibility mechanism allowing the user to define custom notational elements. This compiler has proven very useful for analyzing, prototyping and comparing natural‐language parsers in real domains, as can be seen in the empirical examples provided at the end of the paper. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

19.
A technique that represents derivations of a context-free grammarG over a semiring and that obtains for a wordw inL(G) the set of all canonical parses forw has previously been described. A state grammar is one of a collection of grammars that place restrictions on the manner of application of context-free-like productions and that generate a noncontext-free language. The context-free properties of a state grammar have been used to extend the algebraic parsing technique for languages generated by state grammars,viz., context-sensitive languages. The extension for state grammars is not unlike that required for other types of grammars in whose collection state grammars are representative.  相似文献   

20.
This paper describes a parsing algorithm for Tree Adjoining Grammar (TAG) and its parallel implementation on the Connection Machine. TAG is a formalism for natural language that employs trees as the basic grammar structures. Parsing involves the application of two operations, called adjunction and substitution, to produce derived tree structures. Sequential parsing algorithms for TAGs run in time quadratic in the grammar size, which is impractical for the very large grammars currently being developed for natural language. This paper presents two parallel algorithms, one running in time nearly linear in the grammar size, and the other running in time logarithmic in the grammar size. Both parallel algorithms were implemented on a Connection Machine CM-2 and performance measurements were obtained for varying grammar sizes.This research was supported in part by NSF Grant BNS-9022010, by the ARO Center for Excellence in Artificial Intelligence, University of Pennsylvania, and by the Army High Performance Computing Research Center (AHPCRC), University of Minnesota.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号