首页 | 本学科首页   官方微博 | 高级检索  
     

基于HMM的柯尔克孜语词性标注的研究
引用本文:陈莉,古丽拉·阿东别克. 基于HMM的柯尔克孜语词性标注的研究[J]. 计算机工程与应用, 2014, 50(15): 120-124
作者姓名:陈莉  古丽拉·阿东别克
作者单位:新疆大学 信息科学与工程学院,乌鲁木齐 830046
摘    要:柯尔克孜语的语言信息处理研究,对新疆柯尔克孜族是否能跨入信息时代,传承民族文化起着至关重要的作用。采用两级标注法,基于传统的HMM理论,改进了HMM模型参数的计算、数据平滑和未登入词的处理方法,更好地体现了上下文依赖关系。同时,把基于自动分词词典的词干提取算法与规则和统计相结合的方法用于柯尔克孜语的词性标注系统上。相对于传统的HMM,改进后的方法有效提高了准确性。

关 键 词:柯尔克孜语  自动分词词典  隐马尔可夫模型(HMM)  词性标注  

Research on Kirgiz language part of speech tagging based on HMM
CHEN Li,Gulila·ALTENBEK. Research on Kirgiz language part of speech tagging based on HMM[J]. Computer Engineering and Applications, 2014, 50(15): 120-124
Authors:CHEN Li  Gulila·ALTENBEK
Affiliation:College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Abstract:Research on the Kirghiz information processing plays an important role to whether Xinjiang Kirghiz can enter the information age, and inherit the national culture. Based on the traditional HMM theory, this paper uses the two stage dimension method and improves the HMM parameters calculation, data-smoothing and unknown words, so it can reflect the context dependence better. Meanwhile, stem extraction algorithm, which is based on automatic words segmentation dictionary, with rules and statistics method is used for the using of Kirghiz part-of-speech tagging system. Compared to traditional HMM, the improved method is effective to enhance accuracy.
Keywords:Kirghiz  automatic words segmentation dictionary  Hidden Markov Model(HMM)  part-of-speech tagging  
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号