首页 | 本学科首页   官方微博 | 高级检索  
     

语音任务下声学特征提取综述
引用本文:郑纯军,王春立,贾宁.语音任务下声学特征提取综述[J].计算机科学,2020,47(5):110-119.
作者姓名:郑纯军  王春立  贾宁
作者单位:大连海事大学信息科学技术学院 辽宁 大连 116023;大连东软信息学院计算机与软件学院 辽宁 大连 116023;大连海事大学信息科学技术学院 辽宁 大连 116023;大连东软信息学院计算机与软件学院 辽宁 大连 116023
摘    要:语音是一种重要的信息资源传递与交流方式,人们经常使用语音作为交流信息的媒介,在语音的声学信号中包含大量的说话者信息、语义信息和丰富的情感信息,因此形成了解决语音学任务的3个不同方向,即声纹识别(Speaker Recognition,SR)、语音识别(Auto Speech Recognition,ASR)和情感识别(Speech Emotion Recognition,SER),3个任务均在各自的领域使用不同的技术与特定的方法进行信息提取与模型设计。文中首先综述了3个任务在国内外早期的发展历史路线,将语音任务的发展归纳为4个不同阶段,同时总结了3个语音学任务在特征提取时所采用的公共语音学特征,并针对每类特征的侧重点进行了说明。然后,随着近年来深度学习技术在各个领域中的广泛应用,语音任务也得到了很好的发展,文中针对目前流行的深度学习模型在声学建模中的应用分别进行了分析,按照有监督、无监督的方式总结了针对3种不同语音任务的声学特征提取方式及技术路线,还总结了基于多通道并融合注意力机制的模型,用于语音的特征提取。为了同时完成语音识别、声纹识别和情感识别任务,针对声学信号的个性化特征提出了一个基于多任务的Tandem模型;此外,提出了一个多通道协作网络模型,利用这种设计思路可以提升多任务特征提取的准确度。

关 键 词:声学特征提取  声纹识别  语音识别  情感识别  深度学习  多通道融合

Survey of Acoustic Feature Extraction in Speech Tasks
ZHENG Chun-jun,WANG Chun-li,JIA Ning.Survey of Acoustic Feature Extraction in Speech Tasks[J].Computer Science,2020,47(5):110-119.
Authors:ZHENG Chun-jun  WANG Chun-li  JIA Ning
Affiliation:(College of Information Science and Technology,Dalian Maritime University,Dalian,Liaoning 116023,China;School of Computer&Software,Dalian Neusoft University of Information,Dalian,Liaoning 116023,China)
Abstract:Speech isan important means of information transmission and communication,people often use speech as a medium for exchanging information.The acoustic signal of speech contains a large amount of speaker information,semantic information and rich emotional information.Therefore,three different directions of speech tasks,speaker recognition(SR),auto speech recognition(ASR),and speech emotion recognition(SER),are formed.Each of the three tasks uses different techniques and specific methods for information extraction and model design in their respective fields.Firstly,the historical routes of three tasks at the early stage of development at home and abroad were summarized.The development of speech tasks was summarized into four different stages.At the same time,the public phonetics features for three speech tasks were summarized.The focus of each type of feature was explained.Then,with the wide application of deep learning technology in various fields in recent years,the speech task is well developed.The application of the current popular deep learning model in acoustic modeling was analyzed separately.The acoustic features extraction methods and technical routes for three different speech tasks were summarized in two ways,supervised and unsupervised.In addition,a multi-channel fusion model based on attention mechanism for feature extraction was proposed.In order to solve three speech tasks at the same time,a multi-task model based personalized was proposed for speech feature extraction.This paper also proposed a multi-channel cooperative network model.By using this design idea,the accuracy of multi-task feature extraction can be improved.
Keywords:Acoustic features extraction  Speaker recognition  Auto speech recognition  Speech emotion recognition  Deep lear-ning  Multi-channel fusion
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号