首页 | 本学科首页   官方微博 | 高级检索  
     

维吾尔语语音识别语料库中的OOV研究
引用本文:张小燕,宿建军,薛化建,王磊.维吾尔语语音识别语料库中的OOV研究[J].计算机工程与设计,2012,33(2):772-776.
作者姓名:张小燕  宿建军  薛化建  王磊
作者单位:1. 中国科学院新疆理化技术研究所,新疆乌鲁木齐830011;中国科学院研究生院,北京100049
2. 中国科学院新疆理化技术研究所,新疆乌鲁木齐,830011
基金项目:中国科学院西部行动计划高新技术基金项目
摘    要:鉴于维吾尔语丰富的形态变化产生大量单词引起的集外词(out of vocabulary,OOV)问题,为了定量研究OOV对维吾尔语语音识别的影响,采用控制语料库测试集OOV的算法及最佳文本挑选算法对不同OOV的测试集进行实验,算法通过Python语言实现.应用该算法进行电话语音库的文本转写,构建了维吾尔语的电话语音库.实验结果表明,该控制测试集OOV的方法能够有效地提高维吾尔语语音识别率.

关 键 词:维吾尔语  集外词  语料库  文本挑选  语音识别

Research on OOV problem in constructing Uyghur speech corpus
ZHANG Xiao-yan , SU Jian-jun , XUE Hua-jian , WANG Lei.Research on OOV problem in constructing Uyghur speech corpus[J].Computer Engineering and Design,2012,33(2):772-776.
Authors:ZHANG Xiao-yan  SU Jian-jun  XUE Hua-jian  WANG Lei
Affiliation:1 (1.Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,China; 2.Graduate University,Chinese Academy of Science,Beijing 100049,China)
Abstract:A serious problem of OOV(out of vocabulary) is produced by abundant morphology of Uyghur which has created a large number of words.To quantify the effect on speech recognition brought by OOV,based on Python programming language,an algorithm that can control OOV rate of test sets in Uyghur speech corpus and an algorithm that can select optimal text are proposed.Using these algorithms,telephone speech database of Uyghur is conducted.The experimental results demonstrate that controlling OOV rate of test sets can increase rate of Uyghur speech recognition.
Keywords:Uyghur  OOV  corpus  text selection  speech recognition
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号