首页 | 本学科首页   官方微博 | 高级检索  
     

基于Bi-LSTM-CRF模型的维吾尔语词干提取的研究
引用本文:古丽尼格尔·阿不都外力,吐尔根·依布拉音,卡哈尔江·阿比的热西提,王路路.基于Bi-LSTM-CRF模型的维吾尔语词干提取的研究[J].中文信息学报,2019,33(8):60-66.
作者姓名:古丽尼格尔·阿不都外力  吐尔根·依布拉音  卡哈尔江·阿比的热西提  王路路
作者单位:1.新疆大学 信息科学与工程学院,新疆 乌鲁木齐 830046;
2.新疆大学 新疆多语种信息技术实验室,新疆 乌鲁木齐 830046
基金项目:国家自然科学基金(61762084,61662077,61462083);国家语委科研项目(ZDI 135-54);国家重点研发计划(2017YFB1002103)
摘    要:词干提取是维吾尔语自然语言处理中的基础性研究,其提取质量直接影响其他任务的性能。但目前维吾尔语词干提取研究存在过度切分、不切分和歧义切分等问题,这些问题导致词干提取质量不高,对后续任务的性能影响较大。因此该文提出了基于Bi-LSTM-CRF的维吾尔语词干提取模型,将字符作为最小切分单位,选取维吾尔语字符特征、音类特征以及语音特征为候选特征,结合模型进行实验。实验表明,该文提出的Bi-LSTM-CRF模型在维吾尔语词干提取任务上,F1值达到了88%,在融入手工提取的候选特征之后,F1值提高了1.8个点,有效提高了词干提取的准确性,缓解了上述问题带来的影响。

关 键 词:维吾尔语  词干提取  Bi-LSTM-CRF

Research on Uyghur Stemming Based on Bi-LSTM-CRF Model
GULINIGEER Abudouwaili,TUERGEN Yibulayin,KAHAERJIANG Abiderexiti,WANG Lulu.Research on Uyghur Stemming Based on Bi-LSTM-CRF Model[J].Journal of Chinese Information Processing,2019,33(8):60-66.
Authors:GULINIGEER Abudouwaili  TUERGEN Yibulayin  KAHAERJIANG Abiderexiti  WANG Lulu
Affiliation:1.College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China;
2.Xinjiang Laboratory of Multi-Language Information Technology, Xinjiang University, Urumqi, Xinjiang 830046, China
Abstract:Stemming is a basic research in Uyghur Natural-language Processing (NLP), which is still challenged by issues of over-segmentation, non-segmentation and ambiguity segmentation in Uyghur stemming. This paper propose a neural network model of Bi-LSTM-CRF, which is based on bidirectional (Bi) long short-term memories (LSTMs) and conditional random fields (CRFs). It uses Uyghur character as minimum language unit to extract Uyghur character features, phonological features and phonetic features, and use them as the candidate features. The stemming result shows that an F-score of 88% for the Bi-LSTM-CRF model of Uyghur stemming, with further 1.8% increase after incorporating the manual features.
Keywords:Uyghur language  stemming  Bi-LSTM-CRF  
本文献已被 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号