首页 | 本学科首页   官方微博 | 高级检索  
     

基于解码多候选结果的半监督数据挑选的语音识别
引用本文:王兮楼,郭武,解传栋. 基于解码多候选结果的半监督数据挑选的语音识别[J]. 模式识别与人工智能, 2018, 31(7): 662-667. DOI: 10.16451/j.cnki.issn1003-6059.201807009
作者姓名:王兮楼  郭武  解传栋
作者单位:1.中国科学技术大学 语音及语言信息处理国家工程实验室合肥 230027
摘    要:基于资源稀少情况下的语音识别,提出针对大量无标注数据的半监督学习的挑选策略,应用到声学模型和语言模型建模.采用少量数据训练种子模型后,解码无标注数据.首先在解码的最佳候选结果中采用置信度与困惑度结合的方法挑选高可信的语句训练声学模型及语言模型.进一步对解码得到的格进行转化,得到多候选文本,用于语言模型训练.在日语识别任务上,相比基于置信度挑选数据的方法,文中方法在识别率上具有较大提升.

关 键 词:置信度  半监督学习  多候选  低资源  
收稿时间:2017-11-08

Speech Recognition Based on Semi-supervised Data Selection via Decoding Multiple Candidate Results
WANG Xilou,GUO Wu,XIE Chuandong. Speech Recognition Based on Semi-supervised Data Selection via Decoding Multiple Candidate Results[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(7): 662-667. DOI: 10.16451/j.cnki.issn1003-6059.201807009
Authors:WANG Xilou  GUO Wu  XIE Chuandong
Affiliation:1.National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230027
Abstract:For speech recognition of low resources, a selection strategy for semi-supervised learning with a large number of unlabeled data is proposed, and this strategy is applied to both acoustic modeling and language modeling. After a small amount of data is used to train the seed model, the unlabeled data is decoded using the seed model. Firstly, high-confidence sentences are selected by using a combination of confidence measure and perplexity in the decoded best candidate results. Then, the high-confidence sentences are used to train acoustic model and language model. Furthermore, the decoded lattice is transformed to obtain multiple candidate texts for language model training. In the Japanese recognition task, the proposed method obtains a better recognition rate than the method of selecting data based on confidence measure.
Keywords:Confidence Measure  Semi-supervised Learning  N-BEST  Low Resource  
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号