首页 | 本学科首页   官方微博 | 高级检索  
     


Speaker adaptive voice source modeling with applications to speech coding and processing
Affiliation:1. Department of Mathematics and Computer Science, University of Udine, Udine, Italy;2. Department of Computer Science, University of Verona, Verona, Italy;1. Carnegie Mellon University, Pittsburgh, PA, USA;2. North-West University, Vanderbijlpark, South Africa;3. Telefonica Research, Barcelona, Spain;4. CNRS–IRISA, Rennes, France;1. Human Computer Technology Laboratory, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain;2. ATVS-Biometric Recognition Group, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain;3. Center for Speech and Language Technologies, Tsinghua University, Beijing 100084, PR China;4. Centre for Speech Technology Research, University of Edinburgh, UK;1. Departamento de Física, Centro Universitario de Mérida, Universidad de Extremadura, Mérida, Spain;2. Departamento de Física, Facultad de Ciencias, Universidad de Extremadura, Badajoz, Spain;1. Graduate Institute of Communication Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan;2. Department of Electrical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan;1. Zoologische Staatssammlung München, Münchhausenstraße 21, 81247 München, Germany;2. Department Biologie II, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
Abstract:We discuss the use of low-dimensional physical models of the voice source for speech coding and processing applications. A class of waveform-adaptive dynamic glottal models and parameter identification procedures are illustrated. The model and the identification procedures are assessed by addressing signal transformations on recorded speech, achievable by fitting the model to the data, and then acting on the physically oriented parameters of the voice source. The class of models proposed provides in principle a tool for both the estimation of glottal source signals, and the encoding of the speech signal for transformation purposes. The application of this model to time stretching and to fundamental frequency control (pitch shifting) is also illustrated. The experiments show that copy synthesis is perceptually very similar to the target, and that time stretching and “pitch extrapolation” effects can be obtained by simple control strategies.
Keywords:Glottal modeling  Model inversion  Model-based transformations  Speech synthesis and processing
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号