首页 | 本学科首页   官方微博 | 高级检索  
     

维吾尔语多词表达抽取方法研究
引用本文:麦热哈巴·艾力,阿孜古丽·夏力甫,吐尔根·依布拉音. 维吾尔语多词表达抽取方法研究[J]. 计算机工程与应用, 2014, 50(8): 26-30
作者姓名:麦热哈巴·艾力  阿孜古丽·夏力甫  吐尔根·依布拉音
作者单位:1.新疆大学 信息科学与工程学院,乌鲁木齐 8300462.新疆多语种信息技术重点实验室,乌鲁木齐 8300463.新疆大学 人文学院,乌鲁木齐 830046
基金项目:国家自然科学基金(No.61262061);新疆多语种信息技术重点实验室开放课题
摘    要:多词表达是特殊的语言现象,一般由多个词构成来表示一个意义,语料中常出现在一起。多词表达因是特殊的单元,其抽取在自然语言处理的很多领域有着非常重要的作用。讨论了目前常见的三种统计方法即互信息、对数似然比以及卡方等在维吾尔语多词表达抽取方面的影响。根据维吾尔语的特点,将词干作为一项特征加到抽取方法中。语料的选择上考虑了覆盖面及领域,并探讨了它们对抽取方法的影响。

关 键 词:多词表达  互信息  对数似然比  卡方  维吾尔语  

Research on extracting methods of multi word expression in Uyghur texts
Mairehaba Aili,Aziguli Xialifu,Tuergen Yibulayin. Research on extracting methods of multi word expression in Uyghur texts[J]. Computer Engineering and Applications, 2014, 50(8): 26-30
Authors:Mairehaba Aili  Aziguli Xialifu  Tuergen Yibulayin
Affiliation:1.School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China2.Xinjiang Laboratory of Multi-Language Information Technology, Urumqi 830046, China3.School of Humanity, Xinjiang University, Urumqi 830046, China
Abstract:Multi word expression is a special language phenomenon, which is combination of words. As a block of meaning, multi word expression appears together more often than by chance. They play more important role in natural language processing applications. In this study, it explores the effect of three more used statistical methods on extracting multi word expression in Uyghur texts. The three methods contain mutual information, log-likelihood and chi-square. According to the characteristics of Uighur, it adds stemmed form of words as features of extraction methods. On the choosing corpus, it considers the coverage and field, and explores its effect on extraction methods.
Keywords:collocation  mutual information  log-likelihood  chi-square  Uyghur
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号