首页 | 官方网站   微博 | 高级检索  
     

基于N-gram语言模型的哈萨克文机构名识别
引用本文:冯鲸华,古丽拉·阿东别克,玛依来·哈帕尔.基于N-gram语言模型的哈萨克文机构名识别[J].计算机工程与应用,2010,46(31):135-138.
作者姓名:冯鲸华  古丽拉·阿东别克  玛依来·哈帕尔
作者单位:新疆大学,信息科学与工程学院,乌鲁木齐,830046
基金项目:国家自然科学基金,国家教育部、国家语委民族语言文字规范标准建设及信息化科研项目
摘    要:针对哈萨克文文本中机构名构成特点,提出了一种基于N-gram语言模型的哈萨克文机构名可信度计算方法,并以机构名尾词为触发词,构建了一个哈萨克文机构名识别系统。系统分为训练和识别两个模块,识别过程是:首先从训练语料中提取特征进行训练,得到一个特征训练模型,然后利用训练好的特征模型及少量的附加规则,对测试文本中的机构名进行识别,实验结果表明该方法可行。

关 键 词:N-gram语言模型  哈萨克文机构名识别  实体名识别
收稿时间:2010-1-7
修稿时间:2010-4-20  

Kazakh organization name recognition based on N-gram model
FENG Jing-hua,Guma·Altenbek,Mayra·Hapar.Kazakh organization name recognition based on N-gram model[J].Computer Engineering and Applications,2010,46(31):135-138.
Authors:FENG Jing-hua  Guma·Altenbek  Mayra·Hapar
Affiliation:FENG Jing-hua, Gulila. Altenbek, Mayra. Hapar(Information Science and Engineering College of Xinjiang University,Urumqi 830046,China)
Abstract:Aiming at the characters of Kazakh organization name' composition in Kazakh text,an effective method based on N-gram model for computing Kazakh organization name' confidence is proposed.Using the tail words of Kazakh organization name as the burst words, this paper constructs a recognition system for Kazakh organization name.The system consists of a training module and a recognizing module.The recognition process is as follows:At first, features are extracted from the training corpus, and they are trained.A model is established,which has been trained by some features.Then, this model and some simple rule-bases are used to recognize Kazakh organization name in the testing corpus.The experimental results show that this method is feasible.
Keywords:N-gram model  recognition of Kazakh organization name  name entity recognition
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号