首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
为解决由于长复句以及搭配关系所导致的复句层次自动识别准确率下降问题,论文分析了复句中的标点使用规律,提出了基于SVM的分句界定方法;并基于复句关系词搭配规则,建立了复句的上下文无关文法形式化模型;依据该模型,提出一种改进的移进-归约算法;以期提高复句层次关系识别的准确率。  相似文献   

2.
汉语复句层次关系分析是中文信息处理领域极具挑战性的课题之一。为解决关系词标识信息不充足所导致的复句层次关系识别准确率下降问题,挖掘了影响分句关联的形式化语义知识,在此基础上构建了小句关联体识别算法并将其应用于相应的复句层次判定规则之中,以辅助分析其层次关系;对于其余单、多重有标复句的层次识别,使用基于搭配规则的移进-归约算法;最后提出了一种语义与规则相结合的复句层次分析模型。实验结果表明,此方法在一定程度上提高了复句层次关系识别的准确率。  相似文献   

3.
基于VML的复句关系层次树的可视化研究*   总被引:1,自引:0,他引:1  
在现代汉语复句教学和研究中,为了分析复句的逻辑语义关系,经常需要绘制复句关系层次结构图。传统的做法是利用手工方式绘制,以图片的形式存储,但这种方式具有存储容量大、绘制工作量大的缺点。采用VML技术,在标注复句的基础上,研究了复句关系层次树的自动生成方法以及复句关系层次树在网页中的可视化。通过对不同类型复句进行试验,结果表明各种类型的复句均能准确显示。这就表明该可视化方法能够有效地应用到复句信息工程的研究中。  相似文献   

4.
汉语文章中复句占多数, 复句关系类别的识别是对复句分句之间的语义关系的甄别, 是分析复句语义的关键. 在关系词非充盈态复句中, 部分关系词缺省, 因此, 不能通过关系词搭配的规则来对非充盈态复句进行类别识别, 且通过人工分析分句的特征进行类别识别费时费力. 本文以二句式非充盈态复句为研究对象, 采用在卷积神经网络中融合关系词特征的FCNN模型, 尽可能减少对语言学知识和语言规则的依赖, 通过学习自动分析两个分句之间语法语义等特征, 从而识别出复句的关系类别. 使用本文提出的方法对复句关系类别识别准确率达97%, 实验结果证明了该方法的有效性.  相似文献   

5.
面向中文信息处理的复句关系词提取算法研究   总被引:2,自引:1,他引:1       下载免费PDF全文
关系词语对于标明复句关系有重要的作用。在用计算机来实现汉语多重关系复句的关系层次分析的过程中,关系词语的提取和标引是首要的任务。本文针对利用计算机处理汉语复句的研究需求,结合词性标记和关系词搭配理论,提出了一种关系词提取算法——正向选择算法。通过测试可知,关系词提取的正确率达到89.88%,这表明了算法的有效性以及用于利用计算机处理汉语复句的可行性。  相似文献   

6.
复句关系识别是对分句间语义关系的甄别,是复句语义分析的关键.由于非充盈态汉语复句存在隐式关系的特点,给语义关系识别造成了困难.因此,为了深度挖掘复句中隐含的语义信息实现正确的关系分类,该文提出了一种基于词聚类的CNN与Bi-LSTM相结合的网络结构BCCNN.其中,通过使用词聚类算法对单词向量建模提取单词间的语义相似特征,并在此基础上使用CNN对复句进行深度建模以获得复句的局部特征.另外,该文将CNN中的池化层替换为Bi-LSTM网络层,在减少池化操作所带来语义信息丢失的同时又得到了全局的长距离语义依赖特征.与其他基于汉语复句语料库(CCCS)和清华汉语树库(TCT)的实验结果对比,论文的方法达到了较好的识别效果,其准确率分别达到了92.4%和90.7%.  相似文献   

7.
复句关系是指复句分句之间的逻辑语义关系, 复句关系识别是对分句间语义关系的甄别, 是自然语言处理中的难点问题. 本文以有标复句为研究对象, 提出了一种BERT-FHAN模型, 该模型利用BERT模型获取词向量, 在HAN模型中融入关系词本体知识以及词性、句法依存关系、语义依存关系特征. 通过实验对提出的模型进行验证, BERT-FHAN模型取得的最高宏平均F1值和准确率分别为95.47%与96.97%, 表明了本文方法的有效性.  相似文献   

8.
杨进才  曹元  胡泉  沈显君 《计算机科学》2021,48(z1):295-298,305
汉语复句的语义关系丰富而复杂,复句关系自动识别是对复句语义关系的判别,是分析复句所表达意义的重要环节.因果类复句是使用最多的汉语复句,文中以二句式有标因果类复句为研究对象,通过深度学习的方法自动挖掘复句隐含的特征,同时融合了关系词这一语言学研究的显著知识.将word2vec词向量与one-hot编码的关系词特征结合作为模型的输入,利用卷积神经网络作为前馈层的transformer模型来对因果复句关系进行识别.采用文中的方法对因果类复句关系类别进行识别,实验结果的F1值达到92.13%,优于现有的对比模型,表明了该方法的有效性.  相似文献   

9.
关系词是多重复句的连接成分,其功能是关联分句且标志分句间的语义关系,它对多重复句的研究具有重要意义。但是,在研究基于规则的现代汉语复句关系词的自动标识过程中,发现多重复句内初次识别出的关系标记,较多是伪关系词。这就需要判定其是否是真正的关系词,而判定的基础是确定关系标记之间的搭配关系,这是一个难点。为解决该问题,本文提出了两个算法:(1)利用解空间树得到关系标记所有的搭配集合;(2)对解空间树进行剪枝,去掉无用搭配集。实验测试可知:这两个算法不仅通用性强,而且判定正确率达到98.9%,剩下的1.1%还可以得到近似解,这表明本文提出的算法在处理多重复句问题上具有较好的可行性。  相似文献   

10.
通常复句关系分析基于分类机制,由于缺乏统一逻辑,面临不少分歧。该文提出基于特征结构描写复句关系。复句关系的特征结构由[特征: 值]元组构成,该文初步构拟汉语复句关系的特征结构系统,并用于具体分析。较之分类机制,特征结构对复句关系的描写深刻,且分析判断准确、易行。目前特征结构系统开放,但特征调整,可以完善而不大量更改已有特征描写结果。特征结构可用于复句关系的深度语义分析资源构建与计算研究。  相似文献   

11.
This paper presents an algorithm (a parser) for analyzing sentences according to grammatical constraints expressed in the framework of lexicalized tree-adjoining grammar. For the current grammars of English, the algorithm behaves much better and requires much less time than its worst-case complexity. The main objective of this work is to design a practical parser whose average-case complexity is much superior to its worst case. Most of the previous methods always required the worst-case complexity. The algorithm can be used in two modes. As a recognizer it outputs whether the input sentence is grammatically correct or not. As a parser it outputs a detailed analysis of the grammatically correct sentences. As sentences are read from left to right, information about possible continuations of the sentence is computed. In this sense, the algorithm is called a predictive left to right parser. This feature reduces the average time required to process a given sentence. In the worst case, the parser requires an amount of time proportional to G2n6 for a sentence of n words and for a lexicalized tree-adjoining grammar of size G. The worst-case complexity is only reached with pathological (not naturally occurring) grammars and inputs.  相似文献   

12.
Real-world natural language sentences are often long and complex, and contain unexpected grammatical constructions. They even include noise and ungrammaticality. This paper describes the Controlled Skip Parser, a program that parses such real-world sentences by skipping some of the words in the sentence. The new feature of this parser is that it controls its behavior by finding out which words to skip, without using domain-specific knowledge. The parser is a priority-based chart parser. By assigning appropriate priority levels to the constituents in the chart, the parser's behavior is controlled. Statistical information is used for assigning priority levels. The statistical information (n-grams) can be thought of as a generalized approximation of the grammar learned from past successful experiences. The control mechanism gives a great speed-up and reduction in memory usage. Experiments on real newspaper articles are shown, and our experience with this parser in a machine translation system is described.  相似文献   

13.
This paper describes the implementation of a constraint-based parser, PARSEC (Parallel ARchitecture SEntence Constrainer), which has the required flexibility that a user may easily construct a custom grammar and test it. Once the user designs grammar parameters, constraints, and a lexicon, our system checks them for consistency and creates a parser for the grammar. The parser has an X-windows interface that allows a user to view the state of a parse of a sentence, test new constraints, and dump the constraint network to a file. The parser has an option to perform the computationally expensive constraint propagation steps on the MasPar MP-1. Stream and socket communication was used to interface the MasPar constraint parser with a standard X-windows interface on our Sun Sparcstation. The design of our heterogeneous parser has benefitted from the use of object-oriented techniques. Without these techniques, it would have been more difficult to combine the processing power of the MasPar with a Sun Sparcstation. Also, these techniques allowed the parser to gracefully evolve from a system that operated on single sentences, to one capable of processing word graphs containing multiple sentences, consistent with speech processing. This system should provide an important component of a real-time speech understanding system.  相似文献   

14.
This paper describes our work on parsing Turkish using thelexical-functional grammar formalism [11]. This work represents the first effort for wide-coverage syntactic parsing of Turkish. Our implementation is based on Tomita's parser developed at Carnegie Mellon University Center for Machine Translation. The grammar covers a substantial subset of Turkish including structurally simple and complex sentences, and deals with a reasonable amount of word order freeness. The complex agglutinative morphology of Turkish lexical structures is handled using a separate two-level morphological analyzer, which has been incorporated into the syntactic parser. After a discussion of the key relevant issues regarding Turkish grammar, we discuss aspects of our system and present results from our implementation. Our initial results suggest that our system can parse about 82% of the sentences directly and almost all the remaining with very minor pre-editing.This work was done as a part of the first author's M.Sc. degree work at the Department of Computer Engineering and Information Science, Bilkent University, Ankara, 06533, Turkey.  相似文献   

15.
Abstract: This paper presents a simple connectionist approach to parsing of a subset of sentences in the Hindi language, using Rule based Connectionist Networks (RBCN) as suggested by Fu in 1993. The basic grammar rules representing Kernel Hindi sentences have been used to determine the initial topology of the RBCN. The RBCN is based on a multilayer perceptron, trained using the backpropagation algorithm. The terminal symbols defined in the language structure are mapped onto the input nodes, the non-terminals onto hidden nodes and the start symbol onto the single output node of the network structure. The training instances are sentences of arbitrary, but fixed maximum length and fixed word order. A neural network based recognizer is used to perform grammaticality determination and parse tree generation of a given sentence. The network is exposed to both positive and negative training instances, derived from a simple context-free-grammar (CFG), during the training phase. The trained network recognizes seen sentences (sentences present in the training set) with 98–100% accuracy. Since a neural net based recognizer is trainable in nature, it can be trained to recognize any other CFG, simply by changing the training set. This results in reducing programming effort involved in parser development, as compared to that of the conventional AI approach. The parsing time is also reduced to a great extent as compared to that of a conventional parser, as a result of the inherent parallelism exhibited by neural net architecture.  相似文献   

16.
The focus of this article is on the creation of a collection of sentences manually annotated with respect to their sentence structure. We show that the concept of linear segments—linguistically motivated units, which may be easily detected automatically—serves as a good basis for the identification of clauses in Czech. The segment annotation captures such relationships as subordination, coordination, apposition and parenthesis; based on segmentation charts, individual clauses forming a complex sentence are identified. The annotation of a sentence structure enriches a dependency-based framework with explicit syntactic information on relations among complex units like clauses. We have gathered a collection of 3,444 sentences from the Prague Dependency Treebank, which were annotated with respect to their sentence structure (these sentences comprise 10,746 segments forming 6,341 clauses). The main purpose of the project is to gain a development data—promising results for Czech NLP tools (as a dependency parser or a machine translation system for related languages) that adopt an idea of clause segmentation have been already reported. The collection of sentences with annotated sentence structure provides the possibility of further improvement of such tools.  相似文献   

17.
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).  相似文献   

18.
S. Glass  D. Ince  E. Fergus 《Software》2001,31(10):983-1001
Parser generators such as yacc have been used in a large number of applications, not just those that involve compiler writing. This has meant that these tools are being used increasingly by nonspecialist developers. A consequence of this is that good support is required for debugging a grammar and its generated parser(s). This paper describes Llun, a debugging tool that visualizes the operation of a generated parser at both a high‐level and a low‐level. Llun is superior to other parser visualization products by virtue of the high‐level facilities it offers. The paper describes some of the problems encountered using parser generators, outlines a visualization system which addresses a number of the problems and uses a taxonomy developed by Price to categorize the system. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

19.
该文提出汉语的块依存语法,以谓词为核心,以组块为研究对象,在句内和句间寻找谓词所支配的组块,构建句群级别的句法分析框架。这一操作可提升叶子节点的语言单位,并针对汉语语义特点进行了分析方式和分析规则上的创新,能够较好地解决微观层次的逻辑结构知识,并为中观论元知识和宏观篇章知识打好基础。该文主要介绍了块依存语法理念、表示、分析方法及特点,并简要介绍了块依存树库的构建情况。截至2020年8月,树库规模为187万字符(4万复句、10万小句),其中包含67%新闻文本和32%百科文本。  相似文献   

20.
This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the original analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people do), and can be used to induce a grammar capable of analysing such sentences. This article demonstrates these two applications using the Penn Treebank. In a robustness evaluation experiment, two state-of-the-art statistical parsers are evaluated on an ungrammatical version of Sect. 23 of the Wall Street Journal (WSJ) portion of the Penn treebank. This experiment shows that the performance of both parsers degrades with grammatical noise. A breakdown by error type is provided for both parsers. A second experiment retrains both parsers using an ungrammatical version of WSJ Sections 2–21. This experiment indicates that an ungrammatical treebank is a useful resource in improving parser robustness to grammatical errors, but that the correct combination of grammatical and ungrammatical training data has yet to be determined.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号