首页 | 本学科首页   官方微博 | 高级检索  
     

基于字符的中文分词、词性标注和依存句法分析联合模型
引用本文:郭振,张玉洁,苏晨,徐金安.基于字符的中文分词、词性标注和依存句法分析联合模型[J].中文信息学报,2014,28(6):1-8.
作者姓名:郭振  张玉洁  苏晨  徐金安
作者单位:北京交通大学 计算机与信息技术学院,北京 100044
基金项目:国家国际科技合作专项资助(2014DFA11350);国家自然科学基金(61370130);北京交通大学人才基金(KKRC11001532)
摘    要:目前,基于转移的中文分词、词性标注和依存句法分析联合模型存在两大问题: 一是任务的融合方式有待改进;二是模型性能受限于全标注语料的规模。针对第一个问题,该文利用词语内部结构将基于词语的依存句法树扩展成了基于字符的依存句法树,采用转移策略,实现了基于字符的中文分词、词性标注和依存句法分析联合模型;依据序列标注的中文分词方法,将基于转移的中文分词处理方案重新设计为4种转移动作: Shift_S、Shift_B、Shift_M和Shift_E,同时能够将以往中文分词的研究成果融入联合模型。针对第二个问题,该文使用具有部分标注信息的语料,从中抽取字符串层面的n-gram特征和结构层面的依存子树特征融入联合模型,实现了半监督的中文分词、词性标注和依存句法分析联合模型。在宾州中文树库上的实验结果表明,该文的模型在中文分词、词性标注和依存分析任务上的F1值分别达到了98.31%、94.84%和81.71%,较单任务模型的结果分别提升了0.92%、1.77%和3.95%。其中,中文分词和词性标注在目前公布的研究结果中取得了最好成绩。

关 键 词:联合模型  中文分词和词性标注  依存句法分析  词语内部依存结构  半监督学习  

Character-level Dependency Model for Joint Word Segmentation,POS Tagging,and Dependency Parsing in Chinese
GUO Zhen,ZHANG Yujie,SU Chen,XU Jinan.Character-level Dependency Model for Joint Word Segmentation,POS Tagging,and Dependency Parsing in Chinese[J].Journal of Chinese Information Processing,2014,28(6):1-8.
Authors:GUO Zhen  ZHANG Yujie  SU Chen  XU Jinan
Affiliation:School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
Abstract:Recent work on joint word segmentation, POS tagging, and dependency parsing in Chinese has two key problems: one is that the word segmentation based on character and the dependency parsing based on word are not well-combined in the transition-based framework; the other is that the current joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the conventional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. For Chinese word segmentation, we design 4 transition actions: Shfit_S, Shift_B, Shift_M and Shift_E, through which the features used in previous researches can also be integrated into the model. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved the F1-scores of 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model in the three tasks by 0.92%, 1.77% and 3.95%, respectively. Especially, the F1 value of word segmentation and POS tagging achieved the best among the public results so far.
Keywords:joint model  Chinese word segmentation and POS tagging  dependency parsing  word internal dependency structure  semi-supervised learning  
本文献已被 CNKI 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号