首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 187 毫秒
1.
提出一种将STRAIGHT模型和深度信念网络DBN相结合实现语音转换的方式。首先,通过STRAIGHT模型提取出源说话人和目标说话人的语音频谱参数,用提取的频谱参数分别训练两个DBN得到语音高阶空间的个性特征信息;然后,用人工神经网络ANN将两个具有高阶特征的空间连接并进行特征转换;最后,用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到语音频谱参数,并用STRAIGHT模型合成具有目标说话人个性化特征的语音。实验结果表明,采用此种方式获得的语音转换效果要比传统的采用GMM实现语音转换更好,转换后的语音音质和相似度与目标语音更接近。  相似文献   

2.
基于粒子群优化神经网络的语音情感识别   总被引:1,自引:0,他引:1  
提出了一种基于粒子群优化算法的人工神经网络,并把它应用到语音情感识别系统中。依据情感的维度空间模型,分别提取了韵律特征与音质特征,研究了谐波噪声比特征随情感类别的变化。利用粒子群优化算法(PSO)训练随机产生的初始数据,优化神经网络的连接权值和阈值,快速地实现网络的收敛。在实验中比较了BP神经网络、RBF神经网络与PSO神经网络分别用于语音情感识别的识别率,PSO神经网络的平均识别率高于BP神经网络6.7%,高于RBF神经网络5.4%。结果显示,粒子群优化神经网络用于语音情感识别提高了识别性能。  相似文献   

3.
对说话人语音个性特征信息的表征和提取进行了深入研究,提出了一种基于深度信念网络(Deep Belief Nets,DBN)的语音转换方法。分别用提取出的源说话人和目标说话人语音频谱参数来训练DBN,分别得到其在高阶空间的语音个性特征表征;通过人工神经网络(Artificial Neural Networks,ANN)来连接这两个高阶空间并进行特征转换;使用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到转换后语音频谱参数,合成转换语音。实验结果表明,与传统的基于GMM方法相比,该方法效果更好,转换语音音质和相似度同目标语音更接近。  相似文献   

4.
提出了一种基于PAD三维情绪模型的情感语音韵律转换方法。选取了11种典型情感,设计了文本语料,录制了语音语料,利用心理学的方法标注了语音语料的PAD值,利用五度字调模型对情感语音音节的基频曲线建模。在此基础上,利用广义回归神经网络(Generalized Regression Neural Network,GRNN)构建了一个情感语音韵律转换模型,根据情感的PAD值和语句的语境参数预测情感语音的韵律特征,并采用STRAIGHT算法实现了情感语音的转换。主观评测结果表明,提出的方法转换得到的11种情感语音,其平均EMOS(Emotional Mean Opinion Score)得分为3.6,能够表现出相应的情感。  相似文献   

5.
语音转换是指在保持源说话人语义内容不变的前提下,通过改变源说话人的个性特征,使其听起来像目标说话人的语音。本文提出一种自适应粒子群优化算法训练径向基函数神经网络进行语音特征建模,以获取说话人谱包络的映射关系;此外,考虑到说话人谱包络参数与基频有着密切的联系,利用基于径向基函数神经网络的联合谱包络基频变换方法,将谱包络参数与基频联合进行建模和转换,使得转换后的基频含有更多的说话人个性特征。最后,运用主、客观方法对获得的转换语音进行性能测试。实验表明,与主流的基于高斯混合模型的语音转换相比,使用自适应粒子群优化的径向基函数神经网络方法能够获得更好的转换性能,且更加适用于男声到女声的转换。  相似文献   

6.
扩展T-S模糊模型的PSO神经网络优化算法   总被引:2,自引:1,他引:1       下载免费PDF全文
针对机械设备具有模糊性和非线性的特点,提出了一种利用扩展T-S模糊模型的,自适应PSO算法和BP神经网络相结合的新型智能结构优化算法。通过自适应的高斯函数来更改基本T-S模糊模型中的隶属度函数,进而使用扩展的T-S模糊模型来调整PSO算法的参数。以BP 神经网络隐含层神经元数目为设计变量,提取训练后的均方误差作为评价函数,用改进后的粒子群算法进行寻优。把优化后的网络模型应用于轮盘结构优化中,实验表明,该方法在保证轮盘性能的同时,对其结构进行了重新优化,是一种可行的结构优化方法。  相似文献   

7.
为提高对模拟电路故障模式的准确分类和减少网络模型的训练时间,提出基于小波包变换(WPT)和果蝇算法(FOA)优化广义回归神经网络(GRNN)的模拟电路故障诊断方法。首先采用小波包变换提取电路优质故障特征,以减少网络训练时间,然后建立GRNN网络模型,选择FOA算法优化GRNN网络参数,构建最优模型对电路故障特征进行训练测试,最后采用仿真测试其性能。实验结果表明,FOA算法有效提高诊断模型训练效率,相比于其它电路故障诊断模型,FOAGRNN模型具有更高的诊断率和优越性。  相似文献   

8.
秦楚雄  张连海 《计算机应用》2016,36(9):2609-2615
针对卷积神经网络(CNN)声学建模参数在低资源训练数据条件下的语音识别任务中存在训练不充分的问题,提出一种利用多流特征提升低资源卷积神经网络声学模型性能的方法。首先,为了在低资源声学建模过程中充分利用有限训练数据中更多数量的声学特征,先对训练数据提取几类不同的特征;其次,对每一类类特征分别构建卷积子网络,形成一个并行结构,使得多特征数据在概率分布上得以规整;然后通过在并行卷积子网络之上加入全连接层进行融合,从而得到一种新的卷积神经网络声学模型;最后,基于该声学模型搭建低资源语音识别系统。实验结果表明,并行卷积层子网络可以将不同特征空间规整得更为相似,且该方法相对传统多特征拼接方法和单特征CNN建模方法分别提升了3.27%和2.08%的识别率;当引入多语言训练时,该方法依然适用,且识别率分别相对提升了5.73%和4.57%。  相似文献   

9.
非平行语料下的语音转换(Voice Conversion,VC)是指在非平行语音数据集的情况下改变源语音特征到目标语音特征的映射技术.由于非平行数据的缺陷,所以当前研究多集中于平行语料下的语音转换,而有关非平行语料的研究提出的模型架构存在局限性,在特定说话人下进行训练得到的模型无法适用于任意说话人下的语音转换,且转化效果有待提高.对此,借鉴两种生成式对抗网络(Generative Adversarial Network,GAN)的变体StyleGAN和CycleGAN的结构特点,对生成器网络的层重新设计,添加辅助特征提取神经网络,提出一种称为Style-CycleGAN-VC的新模型,实现了非平行语料下任意说话人之间的任意语音转换.实验表明,与CycleGAN-VC模型相比,该模型对训练的特定说话人的语音转换效果有所提高,对任意说话人的语音转换效果与其相近.  相似文献   

10.
为了在语音转换过程中充分考虑语音的帧间相关性,提出了一种基于卷积非负矩阵分解的语音转换方法.卷积非负矩阵分解得到的时频基可较好地保存语音信号中的个人特征信息及帧间相关性.利用这一特性,在训练阶段,通过卷积非负矩阵分解从训练数据中提取源说话人和目标说话人相匹配的时频基.在转换阶段,通过时频基替换实现对源说话人语音的转换.相对于传统方法,本方法能够更好地保存和转换语音帧间相关性.实验仿真及主、客观评价结果表明,与基于高斯混合模型、状态空间模型的语音转换方法相比,该方法具有更好的转换语音质量和转换相似度.  相似文献   

11.
The objective of voice conversion system is to formulate the mapping function which can transform the source speaker characteristics to that of the target speaker. In this paper, we propose the General Regression Neural Network (GRNN) based model for voice conversion. It is a single pass learning network that makes the training procedure fast and comparatively less time consuming. The proposed system uses the shape of the vocal tract, the shape of the glottal pulse (excitation signal) and long term prosodic features to carry out the voice conversion task. In this paper, the shape of the vocal tract and the shape of source excitation of a particular speaker are represented using Line Spectral Frequencies (LSFs) and Linear Prediction (LP) residual respectively. GRNN is used to obtain the mapping function between the source and target speakers. The direct transformation of the time domain residual using Artificial Neural Network (ANN) causes phase change and generates artifacts in consecutive frames. In order to alleviate it, wavelet packet decomposed coefficients are used to characterize the excitation of the speech signal. The long term prosodic parameters namely, pitch contour (intonation) and the energy profile of the test signal are also modified in relation to that of the target (desired) speaker using the baseline method. The relative performances of the proposed model are compared to voice conversion system based on the state of the art RBF and GMM models using objective and subjective evaluation measures. The evaluation measures show that the proposed GRNN based voice conversion system performs slightly better than the state of the art models.  相似文献   

12.
针对业余歌手模仿专业歌手唱歌过程中音色不变的问题,提出一种基于高斯混合模型(GMM)的中文歌曲Morphing算法,采用GMM对语音频谱建模,并通过混合业余歌手和专业歌手的语音频谱,实现歌曲的音色转换。结果显示,混合比例因子k=0或1时,ABX测试正确率均为100%,0相似文献   

13.
Multi-layer perceptron artificial neural networks are used extensively in hydrological and water resources modelling. However, a significant limitation with their application is that it is difficult to determine the optimal model structure. General regression neural networks (GRNNs) overcome this limitation, as their model structure is fixed. However, there has been limited investigation into the best way to estimate the parameters of GRNNs within water resources applications. In order to address this shortcoming, the performance of nine different estimation methods for the GRNN smoothing parameter is assessed in terms of accuracy and computational efficiency for a number of synthetic and measured data sets with distinct properties. Of these methods, five are based on bandwidth estimators used in kernel density estimation, and four are based on single and multivariable calibration strategies. In total, 5674 GRNN models are developed and preliminary guidelines for the selection of GRNN parameter estimation methods are provided and tested.  相似文献   

14.
Voice conversion (VC) approach, which morphs the voice of a source speaker to be perceived as spoken by a specified target speaker, can be intentionally used to deceive the speaker identification (SID) and speaker verification (SV) systems that use speech biometric. Voice conversion spoofing attacks to imitate a particular speaker pose potential threat to these kinds of systems. In this paper, we first present an experimental study to evaluate the robustness of such systems against voice conversion disguise. We use Gaussian mixture model (GMM) based SID systems, GMM with universal background model (GMM-UBM) based SV systems and GMM supervector with support vector machine (GMM-SVM) based SV systems for this. Voice conversion is conducted by using three different techniques: GMM based VC technique, weighted frequency warping (WFW) based conversion method and its variation, where energy correction is disabled (WFW). Evaluation is done by using intra-gender and cross-gender voice conversions between fifty male and fifty female speakers taken from TIMIT database. The result is indicated by degradation in the percentage of correct identification (POC) score in SID systems and degradation in equal error rate (EER) in all SV systems. Experimental results show that the GMM-SVM SV systems are more resilient against voice conversion spoofing attacks than GMM-UBM SV systems and all SID and SV systems are most vulnerable towards GMM based conversion than WFW and WFW based conversion. From the results, it can also be said that, in general terms, all SID and SV systems are slightly more robust to voices converted through cross-gender conversion than intra-gender conversion. This work extended the study to find out the relationship between VC objective score and SV system performance in CMU ARCTIC database, which is a parallel corpus. The results of this experiment show an approach on quantifying objective score of voice conversion that can be related to the ability to spoof an SV system.  相似文献   

15.
In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.  相似文献   

16.
为了克服利用高斯混合模型(GMM)进行语音转换的过程中出现的过平滑现象,考虑到GMM模型参数的均值能够表征转换特征的频谱包络形状,本文提出一种基于GMM与ANN混合模型的语音转换,利用ANN对GMM模型参数的均值进行转换;为了获取连续的转换频谱,采用静态和动态频谱特征相结合来逼近转换频谱序列;鉴于基频对语音转换的重要性,在频谱转换的基础上,对基频也进行了分析和转换。最后,通过主观和客观实验对提出的混合模型的语音转换方法的性能进行测试,实验结果表明,与传统的基于GMM模型的语音转换方法相比,本文提出的方法能够获得更好的转换语音。  相似文献   

17.
This paper presents a technique to transform high-effort voices into breathy voices using adaptive pre-emphasis linear prediction (APLP). The primary benefit of this technique is that it estimates a spectral emphasis filter that can be used to manipulate the perceived vocal effort. The other benefit of APLP is that it estimates a formant filter that is more consistent across varying voice qualities. This paper describes how constant pre-emphasis linear prediction (LP) estimates a voice source with a constant spectral envelope even though the spectral envelope of the true voice source varies over time. A listening experiment demonstrates how differences in vocal effort and breathiness are audible in the formant filter estimated by constant pre-emphasis LP. APLP is presented as a technique to estimate a spectral emphasis filter that captures the combined influence of the glottal source and the vocal tract upon the spectral envelope of the voice. A final listening experiment demonstrates how APLP can be used to effectively transform high-effort voices into breathy voices. The techniques presented here are relevant to researchers in voice conversion, voice quality, singing, and emotion.  相似文献   

18.
在正弦激励模型的线性预测(LP)残差转换的基础上,提出了一种改进语音特征转换性能的语音转换方法.基于线性预测分析和综合的构架,该方法一方面通过谱包络估计声码器提取源说话人的线性预测编码(LPC)倒谱包络,并使用双线性变换函数实现倒谱包络的转换;另一方面由谐波正弦模型对线性预测残差信号建模和分解,采用基音频率变换将源说话人的残差信号转换为近似目标说话人的残差信号.最后由修正后的残差信号激励时变滤波器得到转换语音,滤波器参数通过转换得到的LPC倒谱包络实时更新.实验结果表明,该方法在主观和客观测试中都具有良好的结果,能有效地转换说话人声音特征,获得高相似度的转换语音.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号