期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

语音信息处理技术在深度学习的推动下发展迅速,其中语音合成和转换技术相结合能实现实时高保真的指定对象、内容的语音输出,在人机交互、泛娱乐等领域具有广泛的应用前景。文中旨在对基于深度学习的语音合成与转换技术进行综述。首先,简要回顾了语音合成和转换技术的发展历程;接着,列举了在语音合成、转换领域的常见公开数据集以便研究者开展相关探索;然后,讨论了从文本到语音模型,包括在风格、韵律、速度等方面进行改进的经典和前沿的模型、算法,并分别对比评述了其效果与发展潜力;进一步针对语音转换进行综述,归纳总结了转换方法与优化思路;最后,总结了语音合成与转换的应用与挑战,并根据其在模型、应用和规范方面所面临的问题,展望了未来在模型压缩、少样本学习和伪造检测方面的发展方向。相似文献

9.

语音合成技术的发展、关键技术及应用 总被引：2，自引：0，他引：2

陶建华《CTI世界》2001,(3):26-32

相似文献

10.

深度学习语音合成技术研究

张小峰谢钧罗健欣俞璐《计算机时代》2020,(9):24-28

相似文献

11.

深度学习语音合成技术综述

下载免费PDF全文

张小峰谢钧罗健欣杨涛《计算机工程与应用》2021,57(9):50-59

语音合成技术在人机交互中扮演着重要角色,深度学习的发展带动语音合成技术高速发展.基于深度学习的语音合成技术在合成语音的质量和速度上都超过了传统语音合成技术.从基于深度学习的声码器和声学模型出发对语音合成技术进行综述,探讨各类声码器和声学模型的工作原理及其优缺点,在此基础上对语音合成系统进行综述,系统综述经典的基于深度学... 相似文献

12.

基于Java Speech API规范的语音识别引擎的实现

倪素萍董滨赵庆卫颜永红《微计算机应用》2005,26(2):168-172

本文介绍了Java Speech API(JSAPI)规范的语音识别引擎的系统框架，描述了采用已有的C／C 识别引擎实现JSAPI语音识别引擎的思路和实现策略，提出并分析了以事件处理和状态处理为核心来实现JSAPI规范的具体方法，完成了基于JSAPI规范的语音识别软件系统的实现。相似文献

13.

嵌入TTS技术开发语音合成软件

马强《电脑开发与应用》2004,17(4):18-19,22

分析了语音合成技术及 TTS引擎技术的基本结构 ,并结合罪犯数据档案声音系统 ,在 VB平台下给出了一种嵌入 TTS开发 32位语音合成软件的具体方法相似文献

14.

语音合成及伪造、鉴伪技术综述

杨帅乔凯陈健王林元闫镔《计算机系统应用》2022,31(7):12-22

近年来随着移动智能设备的兴起, 人们越来越频繁的接触和使用语音信息, 语音伪造和鉴伪成为语音处理领域中愈加重要的技术. 本文首先梳理了语音合成系统的一般流程, 并对语音伪造领域中主要的文本到语音(text-to-speech, TTS)和语音转换(voice conversion, VC)两项技术进行系统归纳; 接着, 对语音鉴伪技术中常见的算法进行介绍和分类; 最后, 针对语音伪造和鉴伪目前存在的问题, 本文从数据、模型、训练方法以及应用场景等多个角度出发提出未来可能的发展方向. 相似文献

15.

基于虚拟不定长的语音库裁剪方法

张巍吴晓如赵志伟王仁华《软件学报》2006,17(5):983-990

语音库裁剪或语音库去冗余,是大语料库语音合成技术的一个重要问题.提出了虚拟不定长替换的概念,以弥补不定长的损失.结合合成使用变体的频度,构建了语音库裁剪算法StaRp-VPA.该算法能够以任意比例裁剪语音库.实验表明:当裁剪率小于50%时,合成自然度几乎没有下降;当裁剪率大于50%时,合成自然度也不会严重降低. 相似文献

16.

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Thanh X. Le An T. Le Quang H. Nguyen 《计算机系统科学与工程》2023,44(2):1263-1278

In recent years, speech synthesis systems have allowed for the production of very high-quality voices. Therefore, research in this domain is now turning to the problem of integrating emotions into speech. However, the method of constructing a speech synthesizer for each emotion has some limitations. First, this method often requires an emotional-speech data set with many sentences. Such data sets are very time-intensive and labor-intensive to complete. Second, training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning. In addition, each model for each emotion failed to take advantage of data sets of other emotions. In this paper, we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flowtron model. In addition, we provide a new method to build a speech corpus that is scalable and whose quality is easy to control. Next, to produce a high-quality speech synthesis model, we used this data set to train the Tacotron 2 model. We used it as a pre-trained model to train the Flowtron model. We applied this method to synthesize Vietnamese speech with sadness and happiness. Mean opinion score (MOS) assessment results show that MOS is 3.61 for sadness and 3.95 for happiness. In conclusion, the proposed method proves to be more effective for a high degree of automation and fast emotional sentence generation, using a small emotional-speech data set. 相似文献

17.

通信中的话音编码技术

张刚陈衍翊《计算机与网络》1995,(1)

在分析回顾现有话音编码方案基础上提出话音编码系统的五层结构模型以及“在收端利用边信息获取激励码”的概念。相似文献

18.

基于Tacotron模型和韵律修正的情感语音合成方法

张昕胡航烨曹欣怡王蔚《数据采集与处理》2022,37(4):909-916

语音合成技术日趋成熟,为了提高合成情感语音的质量,提出了一种端到端情感语音合成与韵律修正相结合的方法。在Tacotron模型合成的情感语音基础上,进行韵律参数的修改,提高合成系统的情感表达力。首先使用大型中性语料库训练Tacotron模型,再使用小型情感语料库训练,合成出具有情感的语音。然后采用Praat声学分析工具对语料库中的情感语音韵律特征进行分析并总结不同情感状态下的参数规律,最后借助该规律,对Tacotron合成的相应情感语音的基频、时长和能量进行修正,使情感表达更为精确。客观情感识别实验和主观评价的结果表明,该方法能够合成较为自然且表现力更加丰富的情感语音。相似文献

19.

基于FD—PSOLA算法的语音合成分析方法 总被引：3，自引：0，他引：3

郑新春柴佩琪《微型电脑应用》2001,17(7):26-29

介绍了一种基于FD－PSOLA算法来实现汉语韵律特征的修改。在短时信号频域修改的过程中,通过同态滤波处理分离了频谱包络和激励源频谱,并通过修改频率轴坐标来实现激励源频谱的压缩或拉伸。实验结果表明,FD－PSOLA算法比TD－PSOLA算法更适合于较高频率调整范围的语音合成分析。相似文献

20.

Challenges and Rewards in Using Parametric or Concatenative Speech Synthesis

Caroline Henton 《International Journal of Speech Technology》2002,5(2):117-131

Highest quality synthetic voices remain scarce in both parametric synthesis systems and in concatenative ones. Much synthetic speech lacks naturalness, pleasantness and flexibility. While great strides have been made over the past few years in the quality of synthetic speech, there is still much work that needs to be done. Now the major challenges facing developers are how to provide optimal size, performance, extensibility, and flexibility, together with developing improved signal processing techniques. This paper focuses on issues of performance and flexibility against a background containing a brief evolution of speech synthesis; some acoustic, phonetic and linguistic issues; and the merits and demerits of two commonly used synthesis techniques: parametric and concatenative. Shortcomings of both techniques are reviewed. Methodological developments in the variable size, selection and specification of the speech units used in concatenative systems are explored and shown to provide a more positive outlook for more natural, bearable synthetic speech. Differentiating considerations in making and improving concatenative systems are explored and evaluated. Acoustic and sociophonetic criteria are reviewed for the improvement of variable synthetic voices, and a ranking of their relative importance is suggested. Future rewards are weighed against current technical and developmental challenges. The conclusion indicates some of the current and future applications of TTS. 相似文献