首页 | 本学科首页   官方微博 | 高级检索  
     

基于分治策略的组块分析
引用本文:周俏丽,刘新,郎文静,蔡东风.基于分治策略的组块分析[J].中文信息学报,2012,26(5):120-129.
作者姓名:周俏丽  刘新  郎文静  蔡东风
作者单位:沈阳航空航天大学 知识工程中心, 辽宁 沈阳 110136
基金项目:国家自然科学基金资助项目
摘    要:组块分析的主要任务是语块的识别和划分,它使句法分析的任务在某种程度上得到简化。针对长句子组块分析所遇到的困难,该文提出了一种基于分治策略的组块分析方法。该方法的基本思想是首先对句子进行最长名词短语识别,根据识别的结果,将句子分解为最长名词短语部分和句子框架部分;然后,针对不同的分析单元选用不同的模型加以分析,再将分析结果进行组合,完成整个组块分析过程。该方法将整句分解为更小的组块分析单元,降低了句子的复杂度。通过在宾州中文树库CTB4数据集上的实验结果显示,各种组块识别结果平均F1值结果为91.79%,优于目前其他的组块分析方法。

关 键 词:汉语组块分析  分治策略  句法分析  最长名词短语  条件随机场  支持向量机  

A Divide-and-Conquer Strategy for Chunking
ZHOU Qiaoli , LIU Xin , LANG Wenjing , CAI Dongfeng.A Divide-and-Conquer Strategy for Chunking[J].Journal of Chinese Information Processing,2012,26(5):120-129.
Authors:ZHOU Qiaoli  LIU Xin  LANG Wenjing  CAI Dongfeng
Affiliation:Knowledge Engineering Research Center, Shenyang Aerospace University, Shenyang, Liaoning 110136, China
Abstract:Chunking includes identification and labeling of chunks, which is a way to reduce the difficulty of complete syntactic parsing through segmenting a sentence into small chunking parts. In order to reduce the complexity of long sentence chunking, a divide-and-conquer strategy is described in this paper. The basic idea of this method is to first recognize the maximal noun phrases (MNP) form a full sentence; then identify the chunks within the MNPs and among the frame of the sentence without MNPs ;. Experiments are carried out on the data set of UPenn Chinese Treebank-4 (CTB4) and the results show the the best of overall F1 score of Chinese chunking is 91.79%, which is higher than the performance produced by the state-of-the-art machine learning models.
Key wordsChinese chunking; divide-and-conquer; complete syntactic parsing; maximal noun phrase; conditional random fields; support vector machines
Keywords:Chinese chunking  divide-and-conquer  complete syntactic parsing  maximal noun phrase  conditional random fields  support vector machines  
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号