首页 | 本学科首页   官方微博 | 高级检索  
     

藏文紧缩格识别方法
引用本文:拉玛扎西,才智杰,扎西吉.藏文紧缩格识别方法[J].计算机应用研究,2019,36(4).
作者姓名:拉玛扎西  才智杰  扎西吉
作者单位:青海师范大学计算机学院,西宁,810008;青海师范大学计算机学院,西宁,810008;青海师范大学计算机学院,西宁,810008
基金项目:国家自然科学基金资助项目(61866032,61163018,61262051);国家社科基金项目(13BYY141,16BYY167,15BYY167);教育部“春晖计划”合作科研项目(Z2012093,Z2016077);青海省基础研究计划项目(2017-ZJ-767,2019-SF-129,2015-SF-520);“长江学者和创新团队发展计划”创新团队资助项目(IRT1068);青海省重点实验室项目(2013-Z-Y17、2014-Z-Y32、2015-Z-Y03);藏文信息处理与机器翻译重点实验室(2013-Y-17)。
摘    要:分词是自然语言处理的一项基础性工作,对自然语言处理的后继工作有较大的影响。紧缩格的识别是藏文分词中最难最重要的技术之一。通过剖析已有藏文紧缩词识别方法,分析藏文字词的特征,针对性地提出了识别藏文紧缩格的规则算法、添加—还原算法和最大熵模型的特征模板,从而得到基于规则、添加还原法与最大熵模型相结合的藏文紧缩格识别方法。实验数据表明,该方法识别藏文紧缩格的准确率、召回率和F1值分别达99.26%、96.47%、97.85%,比现有最高的准确率有了较明显的提高。

关 键 词:藏文  自然语言处理  分词  紧缩格
收稿时间:2017/11/22 0:00:00
修稿时间:2019/2/28 0:00:00

Recognition method of Tibetan abbreviated case-auxiliary words
La mazhaxi,Cai zhijie and Zha xiji.Recognition method of Tibetan abbreviated case-auxiliary words[J].Application Research of Computers,2019,36(4).
Authors:La mazhaxi  Cai zhijie and Zha xiji
Affiliation:Qinghai Normal University,,
Abstract:Word segmentation is a basic work of natural language processing, which has a great influence on the subsequent work of it, the recognition of abbreviated case-auxiliary words is one of the most difficult and important technologies of Tibetan word segmentation. Through dissecting the existing recognition methods of abbreviated case-auxiliary words, this paper analyzed the characteristics of Tibetan words, targetedly proposed recognition algorithm of Tibetan abbreviated case-auxiliary words rules, add - restore algorithm and the maximum entropy models feature template, then the methods of recognizing abbreviated case-auxiliary words based on the rules, add-restore methods and the maximum entropy model were obtained. The experimental data showed that the accuracy, recall rate and F value of the method is 99.26%, 96.47%, and 97.85% respectively , which shows a obvious progress than that of the existing methods.
Keywords:Tibetan  NLP  segmentation  abbreviated case-auxiliary words
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号