首页 | 本学科首页   官方微博 | 高级检索  
     

统计与规则相结合的维吾尔语人名识别方法
引用本文:塔什甫拉提&#,尼扎木丁,汪昆,艾斯卡尔&#,艾木都拉,帕力旦&#,吐尔逊. 统计与规则相结合的维吾尔语人名识别方法[J]. 自动化学报, 2017, 43(4): 653-664. DOI: 10.16383/j.aas.2017.c150769
作者姓名:塔什甫拉提&#  尼扎木丁  汪昆  艾斯卡尔&#  艾木都拉  帕力旦&#  吐尔逊
作者单位:1.新疆大学信息科学与工程学院 乌鲁木齐 830046
基金项目:国家自然科学基金(61562081),新疆高技术研究发展计划(201312103)资助
摘    要:命名实体识别(Named entity recognition,NER)是自然语言处理(Natural language processing,NLP)中重要的任务,其中人名实体是主要的识别对象之一.本文从维吾尔语黏着性特点出发,从词干、音节、字符串三个角度对维吾尔语单词进行拆分,获得更小的语言单元,并把切分的新单元作为特征加入到条件随机场(Conditional random field,CRF)中,明显缓解了数据稀疏的影响,取得了比以单词为基本单元的人名识别方法更好的性能.同时还从维吾尔语中汉族人名的特点出发,提出了基于规则的维吾尔语中汉族人名的识别方法,最终利用统计和规则相结合的方法进一步提高了识别的准确率.实验结果表明,该方法人名识别的准确率、召回率和F1值分别达到了87.47%、89.12%和88.29%.

关 键 词:维吾尔语   人名识别   条件随机场   音节库
收稿时间:2015-11-15

Combination of Statistical and Rule-based Approaches for Uyghur Person Name Recognition
TASHPOLAT Nizamidin,WANG Kun,ASKAR Hamdulla,PALIDAN Tuerxun. Combination of Statistical and Rule-based Approaches for Uyghur Person Name Recognition[J]. Acta Automatica Sinica, 2017, 43(4): 653-664. DOI: 10.16383/j.aas.2017.c150769
Authors:TASHPOLAT Nizamidin  WANG Kun  ASKAR Hamdulla  PALIDAN Tuerxun
Affiliation:1.Institute of Information Science and Engineering, Xinjiang University, Urumqi 8300462.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 1001903.School of Software, Xinjiang University, Urumqi 830046
Abstract:Named entity recognition (NER) is an important subtask of natural language processing, where person name is one of the major objects. From agglutinative characteristics of the Uyghur language, we split a Uygur word into different level units such as syllable, suffix, stem, etc., so as to significantly reduce the data sparse problem. Since the Han people name is the major remaining errors for the CRF (Conditional random field)-based approach, we also propose a rule-based post-processing approach for Han people name recognition in Uyghur language. Experimental results show that this cascade approach achieves satisfactory performance, and that the recognition accuracy, recall rate and F1 score are 87.47%、89.12% and 88.29%, respectively.
Keywords:Uyghur language processing  person name recognition  conditional random field(CRF)  syllable bank
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号