首页 | 本学科首页   官方微博 | 高级检索  
     

基于语料的哈萨克语词频统计研究
引用本文:王花,古丽拉·阿东别克.基于语料的哈萨克语词频统计研究[J].计算机工程,2010,36(24):59-61.
作者姓名:王花  古丽拉·阿东别克
作者单位:(新疆大学信息科学与工程学院,乌鲁木齐 830046)
基金项目:国家自然科学基金资助项目,国家教育部、国家语委民族语言文字规范标准建设及信息化科研基金资助项目
摘    要:哈萨克语作为新疆少数民族语言之一,其词频统计作为自然语言处理的基础性课题,成为需要迫切解决的问题。基于此,介绍Zapf 定律及哈萨克语词频统计之间的联系。对连续输入哈萨克语字符串进行切分,再输入切分后的哈萨克语词串,由此得到哈萨克语词典。在词典中存储词形不同的哈语词组,以及这些词组出现的频率,并进行哈萨克语的统计实验,结果说明哈萨克语词频之间存在内在联系,同时验证哈萨克词频符合Zapf 的幂率定律。

关 键 词:哈萨克语词频统计  幂率定律  齐普夫  频率

Study on Frequency Statistic of Kazak Word Based on Corpus
WANG Hua,GULILA·Altenbek.Study on Frequency Statistic of Kazak Word Based on Corpus[J].Computer Engineering,2010,36(24):59-61.
Authors:WANG Hua  GULILA·Altenbek
Affiliation:(College of Information Science &; Engineering, Xinjiang University, Urumqi 830046, China)
Abstract:Kazak as one of the minority languages and characters being universally applied or used in Xinjiang, frequency statistic of word in Kazak natural language treatment becomes the problem to be solved urgently. This paper introduces the relation of Zapf in Kazak word segmentation, which is based on frequency statistic of the word. Through the system, continuous Kazak character bunch input can be segmented, and then the cut apartment word bunch output can be gotten. The cut apartment word bunch usually is two Kazak word bunch, and dictionary can be gotten. The dictionary stores Kazak word and the frequency that the word appears in these disposal test that combines proceeding Kazak covariance of article experiment. Experimental result expresses the relation of frequency of the Kazak word, and the resulting Kazak word frequency distribution accords with power-law of Zapf.
Keywords:frequency statistic of Kazak word  power-law  Zapf  frequency
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号