首页 | 本学科首页   官方微博 | 高级检索  
     

三位一体字标注的汉语词法分析
引用本文:于江德,胡顺义,余正涛. 三位一体字标注的汉语词法分析[J]. 中文信息学报, 2015, 29(6): 1-7
作者姓名:于江德  胡顺义  余正涛
作者单位:1. 安阳师范学院 计算机与信息工程学院,河南 安阳 455000;
2. 昆明理工大学 信息工程与自动化学院,云南 昆明 650051
基金项目:国家自然科学基金(60863011),河南省基础与前沿技术研究计划项目(112300410182),河南省教育厅科学技术研究重点项目(14A520077)
摘    要:针对汉语词法分析中分词、词性标注、命名实体识别三项子任务分步处理时多类信息难以整合利用,且错误向上传递放大的不足,该文提出一种三位一体字标注的汉语词法分析方法,该方法将汉语词法分析过程看作字序列的标注过程,将每个字的词位、词性、命名实体三类信息融合到该字的标记中,采用最大熵模型经过一次标注实现汉语词法分析的三项任务。并在Bakeoff2007的PKU语料上进行了封闭测试,通过对该方法和传统分步处理的分词、词性标注、命名实体识别的性能进行大量对比实验,结果表明,三位一体字标注方法的分词、词性标注、命名实体识别的性能都有不同程度的提升,汉语分词的F值达到了96.4%,词性标注的标注精度达到了95.3%,命名实体识别的F值达到了90.3%,这说明三位一体字标注的汉语词法分析性能更优。

关 键 词:汉语词法分析  最大熵模型  三位一体  字标注
  

A Unified Character-Based Tagging Approach to Chinese Lexical Analysis
YU Jiangde,HU Shunyi,YU Zhengtao. A Unified Character-Based Tagging Approach to Chinese Lexical Analysis[J]. Journal of Chinese Information Processing, 2015, 29(6): 1-7
Authors:YU Jiangde  HU Shunyi  YU Zhengtao
Affiliation:1. School of Computer and Information Engineering, Anyang Normal University, Anyang, Henan 455000, China;
2. School of Information Engineering and Automation, Kunming University of
Science and Technology, Kunming, Yunnan 650051, China)
Abstract:To integrate multi-information without error accumulation in the pipeline approach, a unified character-based tagging approach is proposed for Chinese lexical analysis, including word segmentation, part-of-speech tagging and named entity recognition. Treating Chinese lexical analysis as a character sequence tagging problem, each character tagging could be integrated with three kinds of information that is word-position, part-of-speech and named entity. After the tagging process, the maximum entropy model is applied to complete the three subtasks. The closed evaluation is performed on PKU corpus from Bakeoff2007, and the results show a F-score of 96.4% on word segmentation, 95.3% on POS tagging and 90.3% on named entity recognition.Key words Chinese lexical analysis; maximum entropy model; trinity; character-based tagging
Keywords:Chinese lexical analysis   maximum entropy model   trinity   character-based tagging  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号