首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机域的词性标注模型
引用本文:姜维,关毅,王晓龙.基于条件随机域的词性标注模型[J].计算机工程与应用,2006,42(21):13-16,42.
作者姓名:姜维  关毅  王晓龙
作者单位:哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家自然科学基金;国家高技术研究发展计划(863计划)
摘    要:词性标注主要面临兼类词消歧以及未知词标注的难题,传统隐马尔科夫方法不易融合新特征,而最大熵马尔科夫模型存在标注偏置等问题。论文引入条件随机域建立词性标注模型,易于融合新的特征,并能解决标注偏置的问题。此外,又引入长距离特征有效地标注复杂兼类词,以及应用后缀词与命名实体识别等方法提高未知词的标注精度。在条件随机域模型框架下,进一步探讨了融合模型的方法及性能。词性标注开放实验表明,条件随机域模型获得了96.10%的标注精度。

关 键 词:词性标注  条件随机域  触发对
文章编号:1002-8331-(2006)21-0013-04
收稿时间:2006-05-01
修稿时间:2006-05-01

Conditional Random Fields Based POS Tagging
Jiang Wei,Guan Yi,Wang Xiaolong.Conditional Random Fields Based POS Tagging[J].Computer Engineering and Applications,2006,42(21):13-16,42.
Authors:Jiang Wei  Guan Yi  Wang Xiaolong
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001
Abstract:The main difficulties in POS tagging are multi-class word disambiguation and unknown word tagging.However,more features cannot be added into Hidden Markov Model,and there is label bias problem in Maximum Entropy Markov Model.So Conditional Random Field(CRF) is introduced to build POS tagging model in this paper,in order to overcome above problems.In addition,long distance features are extracted and utilized to label complicated multi-class word.As for the unknown word tagging,named entities recognition and suffix-based method etc. are adopted to improve the POS tagging performance.Moreover,we explore the mixing models' performance based on CRF.The experiment indicates our model can achieve a good performance with 96.10% tagging precision.
Keywords:POS tagging  Conditional Random Fields  trigger
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号