首页 | 本学科首页   官方微博 | 高级检索  
     

基于N元模型的维吾尔语词性标注实验研究
引用本文:尼加提·纳吉米,买合木提·买买提,吐尔根·依布拉音.基于N元模型的维吾尔语词性标注实验研究[J].计算机工程与应用,2012,48(25):137-140,173.
作者姓名:尼加提·纳吉米  买合木提·买买提  吐尔根·依布拉音
作者单位:1. 华北电力大学,北京102206;新疆电力信息通信有限责任公司,乌鲁木齐830026
2. 新疆信息产业有限责任公司,乌鲁木齐,830026
3. 新疆大学信息科学与工程学院,乌鲁木齐,830046
基金项目:国家电子信息产业发展基金(文号:财建[2009]537,工信部财[2009]453);国家自然科学基金(No.60963018,No.61063026);国家教育部项目(No.MZ115-75);新疆维吾尔自治区高新技术项目(No.200712109);新疆维吾尔自治区高校项目(No.XJEDU2008I08);新疆多语种信息技术重点实验室开放课题
摘    要:词性标注有很多不同的研究方法,目前的维吾尔语词性标注方法都以基于规则的方法为主,其准确程度尚不能完全令人满意。在大规模人工标注的语料库的基础之上,研究了基于N元语言模型的维吾尔语词性自动标注的方法,分析了N元语言模型参数的选取以及数据平滑,比较了二元、三元文法模型对维吾尔语词性标注的效率;研究了标注集和训练语料规模对词性标注正确率的影响。实验结果表明,用该方法对维吾尔语进行词性标注有良好的效果。

关 键 词:词性标注  N元模型  维吾尔语词性标注

Experimental study of N-gram based Uyghur part of speech tagging
NIJAT Najmidin , MAHMUD Mamat , TURGUN Ibrahim.Experimental study of N-gram based Uyghur part of speech tagging[J].Computer Engineering and Applications,2012,48(25):137-140,173.
Authors:NIJAT Najmidin  MAHMUD Mamat  TURGUN Ibrahim
Affiliation:NIJAT Najmidin 1,2,MAHMUD Mamat 3,TURGUN Ibrahim 4 1.North China Electric Power University,Beijing 102206,China 2.Xinjiang Electric Power Information Communications Co.,LTD.,Urumqi 830026,China 3.Xinjiang Information Industry Co.,LTD.,Urumqi 830026,China 4.Information Science and Engineering Technology Institute,Xinjiang University,Urumqi 830046,China
Abstract:There are many approaches to the problem of part-of-speech tagging,current Uyghur part-of-speech tagging is mainly based on rule based methods and does not achieve the state-of-art accuracy.A large scale of manually annotated Uyghur corpus and a number of well-conducted experiments are used to identify the efficiency of N-gram based part-of-speech tagging scheme for Uyghur texts.The N-gram language model parameters and data smoothing are analyzed,and the efficiency of Bigram and Trigram models are compared.The impacts of tag sets and size of training data on tagging accuracy are studied.The experiments show that N-gram based part-of-speech tagging for Uyghur texts has achieved good results.
Keywords:part-of-speech tagging  N-gram model  Uyghur part-of-speech tagging
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号