首页 | 本学科首页   官方微博 | 高级检索  
     

蒙古语词法分析的有向图模型
引用本文:姜文斌,吴金星,长青,那顺乌日图,刘群,赵理莉.蒙古语词法分析的有向图模型[J].中文信息学报,2011,25(5):94-101.
作者姓名:姜文斌  吴金星  长青  那顺乌日图  刘群  赵理莉
作者单位:1. 中国科学院 计算技术研究所,北京 100190;
2. 内蒙古大学 蒙古学学院,内蒙古 呼和浩特 010021;
3. 河南师范大学 计算机与信息技术学院,河南 新乡 453007
基金项目:国家自然科学基金资助项目(Contract60736014); 863重点项目(2006AA010108);教育部、国家语委民族语言文字规范标准建设及信息化资助项目(MZ115-038)
摘    要:我们为蒙古语词法分析建立了一种生成式的概率统计模型。该模型将蒙古语语句的词法分析结果描述为有向图结构,图中节点表示分析结果中的词干、词缀及其相应标注,而边则表示节点之间的转移或生成关系。特别地,在本工作中我们刻画了词干到词干转移概率、词缀到词缀转移概率、词干到词缀生成概率、相应的标注之间的三种转移或生成概率,以及词干或词缀到相应标注相互生成概率。以内蒙古大学开发的20万词规模的三级标注人工语料库为训练数据,该模型取得了词级切分正确率95.1%,词级联合切分与标注正确率93%的成绩。

关 键 词:蒙古语  词法分析  词语切分  词性标注  词干提取  有向图  

Directed Graph Model for Mongolian Lexical Analysis
JIANG Wenbin,WU Jinxing,CHANG Qing,Nasanurtu,LIU Qun,ZHAO Lili.Directed Graph Model for Mongolian Lexical Analysis[J].Journal of Chinese Information Processing,2011,25(5):94-101.
Authors:JIANG Wenbin  WU Jinxing  CHANG Qing  Nasanurtu  LIU Qun  ZHAO Lili
Affiliation:1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2. Inner Mongolian University, Huhhot, Inner Mongolia 010021, China;
3. Henan Normal University, Xinxiang, Henan 453007, China
Abstract:We propose a generative statistical model for Mongolian lexical analysis. This model describes the lexical analysis result as a directed graph, where the nodes represent the stems, affixes and their tags, while the edges represent the transition or generation relationships between nodes. Especially in this work, we adopt three kinds of transition or generation probabilitiesa) probabilities of stem-stem transition, affix-affix transition and stem-affix generation; b) the transition or generation probabilities between the corresponding tags; and c) the generation probabilities between stems or affixes and their tags. Using the 3rd-level annotated corpus with about 200 000 words as the training data, this model achieves a word-level segmentation accuracy of 95.1%, and a word-level joint segmentation and tagging accuracy of 93%.
Key wordsMongolian; lexical analysis; segmentation; POS tagging; stemming; directed graph
Keywords:Mongolian  lexical analysis  segmentation  POS tagging  stemming  directed graph  
本文献已被 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号