维吾尔语语音识别语料库中的OOV研究 Research on OOV problem in constructing Uyghur speech corpus期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

维吾尔语语音识别语料库中的OOV研究

引用本文：	张小燕,宿建军,薛化建,王磊.维吾尔语语音识别语料库中的OOV研究[J].计算机工程与设计,2012,33(2):772-776.

作者姓名：	张小燕宿建军薛化建王磊

作者单位：	1. 中国科学院新疆理化技术研究所,新疆乌鲁木齐830011;中国科学院研究生院,北京100049 2. 中国科学院新疆理化技术研究所,新疆乌鲁木齐,830011

基金项目：	中国科学院西部行动计划高新技术基金项目

摘要：	鉴于维吾尔语丰富的形态变化产生大量单词引起的集外词(out of vocabulary,OOV)问题,为了定量研究OOV对维吾尔语语音识别的影响,采用控制语料库测试集OOV的算法及最佳文本挑选算法对不同OOV的测试集进行实验,算法通过Python语言实现.应用该算法进行电话语音库的文本转写,构建了维吾尔语的电话语音库.实验结果表明,该控制测试集OOV的方法能够有效地提高维吾尔语语音识别率.
关键词：	维吾尔语集外词语料库文本挑选语音识别
Research on OOV problem in constructing Uyghur speech corpus

ZHANG Xiao-yan , SU Jian-jun , XUE Hua-jian , WANG Lei.Research on OOV problem in constructing Uyghur speech corpus[J].Computer Engineering and Design,2012,33(2):772-776.

Authors:	ZHANG Xiao-yan SU Jian-jun XUE Hua-jian WANG Lei

Affiliation:	1 (1.Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi 830011,China; 2.Graduate University,Chinese Academy of Science,Beijing 100049,China)

Abstract:	A serious problem of OOV(out of vocabulary) is produced by abundant morphology of Uyghur which has created a large number of words.To quantify the effect on speech recognition brought by OOV,based on Python programming language,an algorithm that can control OOV rate of test sets in Uyghur speech corpus and an algorithm that can select optimal text are proposed.Using these algorithms,telephone speech database of Uyghur is conducted.The experimental results demonstrate that controlling OOV rate of test sets can increase rate of Uyghur speech recognition.

Keywords:	Uyghur OOV corpus text selection speech recognition
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏