首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场的蒙古语词性标注方法
引用本文:应玉龙,李淼,乌达巴拉,朱海. 基于条件随机场的蒙古语词性标注方法[J]. 计算机应用, 2010, 30(8): 2038-2041
作者姓名:应玉龙  李淼  乌达巴拉  朱海
作者单位:1. 中国科学院合肥智能机械研究所2.
基金项目:中国科学院知识创新工程项目 
摘    要:为了保留蒙古语词缀中大量的语法、语义信息和缩小蒙古语词典的规模,蒙古语词性标注需要对词干和词缀都进行词性标注。针对这一问题提出了一种基于条件随机场(CRF)的蒙古语词性标注方法。该方法利用CRF模型能够添加任意特征的特点,充分使用蒙文上下文信息,针对词素之间的相互影响添加了新的统计特征,并在3.8万句的蒙古语词性标注语料上进行了封闭测试,该方法的标注准确率达到了96.65%,优于使用隐马尔可夫模型(HMM)的词性标注模型。

关 键 词:词干   词缀   条件随机场   词性标注   词素
收稿时间:2010-02-03
修稿时间:2010-04-18

Mongolian part-of-speech tagging approach based on conditional random fields
YING Yu-long,LI Miao,Wudabala,ZHU Hai. Mongolian part-of-speech tagging approach based on conditional random fields[J]. Journal of Computer Applications, 2010, 30(8): 2038-2041
Authors:YING Yu-long  LI Miao  Wudabala  ZHU Hai
Abstract:It is necessary to tag both stem and affix in the Mongolian part of speech tagging,in order to save lots of syntax and semantic information of affix and to reduce the size of Mongolian dictionary. This paper presented a new approach of Mongolian part of speech tagging based on CRF. To take advantage of the ability of using arbitrary features as input in CRF,the system exploited not only the contexts of words,but also new statistical features adopted for mutual influence between the morphemes. The system was tested in the 38000 part-of-speech dataset provided by Inner Mongolia University. The closed test results show that POS tagging accuracy of the testing set reaches 96.65%, outperforming the HMM-based model.
Keywords:Stem   Affix   Conditional Random Field (CRF)   Part-of-speech tagging   Morpheme
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号