首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场的蒙古语词切分研究
引用本文:赵伟,侯宏旭,从伟,宋美娜.基于条件随机场的蒙古语词切分研究[J].中文信息学报,2010,24(5):31-36.
作者姓名:赵伟  侯宏旭  从伟  宋美娜
作者单位:内蒙古大学计算机学院,内蒙古 呼和浩特 010021
基金项目:973前期研究项目资助 
摘    要:词干和构形附加成分是蒙古语词的组成成分,在构形附加成分中包含着数、格、体、时等大量语法信息。利用这些语法信息有助于使用计算机对蒙古语进行有效处理。蒙古语词在结构上表现为一个整体,为了利用其中的语法信息需要识别出词干和各构形附加成分。通过分析蒙古语词的构形特点,提出一种有效的蒙古语词标注方法,并基于条件随机场模型构建了一个实用的蒙古语词切分系统。实验表明该系统的词切分准确率比现有蒙古语词切分系统的准确率有较大提高,达到了0.992。

关 键 词:蒙古语  词切分  词干  构形附加成分  条件随机场  统计语言模型  

Research on Conditional Random Fields Based Mongolian Word Segmentation
ZHAO Wei,HOU Hongxu,CONG Wei,SONG Meina.Research on Conditional Random Fields Based Mongolian Word Segmentation[J].Journal of Chinese Information Processing,2010,24(5):31-36.
Authors:ZHAO Wei  HOU Hongxu  CONG Wei  SONG Meina
Affiliation:College of Computer Science, Inner Mongolia University, Inner Mongolia, Hohhot 010021, China
Abstract:Etyma and morphological affix are the components of Mongolian words, which include lots of grammar information. Using this grammar information is helpful for effective processing Mongolian language. Mongolian words exhibit as a whole in the structure, and therefore, the detection of etyma and each morphological affix is necessary to capture this grammar information. By analyzing the characteristics of morphological construction of Mongolian words, this paper proposes an effective Mongolian word labeling method, and constructs a practical Mongolian word segmentation system based on conditional random fields model. Experiments show that the accuracy of segmentation has a significant improvement than current system, reaching an accuracy rate of 0.992.
Key wordsMongolian; word segmentation; etyma; morphological affix; conditional random fields; statistical language model
Keywords:Mongolian  word segmentation  etyma  morphological affix  conditional random fields  statistical language model  
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号