首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
基于音素绑定码本映射的说话人声音转换方法   总被引:1,自引:0,他引:1  
介绍说话人声音转换系统框架,并对传统的基于码本映射的说话人声音转换方法进行讨论.指出传统的码本映射方法由于对谱的转换采用所有码本加权叠加,因此会产生转换后语音频谱平滑效应过重的问题,从而使转换后语音音质较差.为了克服这种问题,本文提出基于音素绑定的码本加权叠加方法来完成语音谱的转换,同时利用决策树来完成韵律的转换.实验表明,即使在数据量较少的情况下,该方法也能较好地完成说话人声音转换,并能得到较高的语音音质.  相似文献   

2.
语音转换是指在保持源说话人语义内容不变的前提下,通过改变源说话人的个性特征,使其听起来像目标说话人的语音。本文提出一种自适应粒子群优化算法训练径向基函数神经网络进行语音特征建模,以获取说话人谱包络的映射关系;此外,考虑到说话人谱包络参数与基频有着密切的联系,利用基于径向基函数神经网络的联合谱包络基频变换方法,将谱包络参数与基频联合进行建模和转换,使得转换后的基频含有更多的说话人个性特征。最后,运用主、客观方法对获得的转换语音进行性能测试。实验表明,与主流的基于高斯混合模型的语音转换相比,使用自适应粒子群优化的径向基函数神经网络方法能够获得更好的转换性能,且更加适用于男声到女声的转换。  相似文献   

3.
基于遗传径向基神经网络的声音转换   总被引:4,自引:1,他引:4  
声音转换技术可以将一个人的语音模式转换为与其特性不同的另一个人语音模式,使转换语音保持源说话人原有语音信息内容不变,而具有目标说话人的声音特点。本文研究了由遗传算法训练的RBF神经网络捕获说话人的语音频谱包络映射关系,以实现不同说话人之间声音特性的转换。实验对六个普通话单元音音素的转换语音质量分别作了客观和主观评估,结果表明用神经网络方法可以获得所期望的转换语音性能。实验结果还说明,与K-均值法相比,用遗传算法训练神经网络可以增强网络的全局寻优能力,使转换语音与目标语音的平均频谱失真距离减小约10%。  相似文献   

4.
通过对语音转换的研究,提出了一种把源说话人特征转换为目标说话人特征的方法。语音转换特征参数分为两类:(1)频谱特征参数;(2)基音和声调模式。分别描述信号模型和转换方法。频谱特征用基于音素的2维HMMS建模,F0轨迹用来表示基音和音调。用基音同步叠加法对基音周期﹑声调和语速进行变换。  相似文献   

5.
提出一种基于话者无关模型的说话人转换方法.考虑到音素信息共同存在于所有说话人的语音中,假设存在一个可以用高斯混合模型来描述的话者无关空间,且可用分段线性变换来描述该空间到各说话人相关空间之间的映射关系.在一个多说话人的数据库上,用话者自适应训练算法来训练模型,并在转换阶段使用源目标说话人空间到话者无关空间的变换关系来构造源与目标之间的特征变换关系,快速、灵活的构造说话人转换系统.通过主观测听实验来验证该算法相对于传统的基于话者相关模型方法的优点.  相似文献   

6.
通过对语音转换的研究,提出了一种把源说话人特征转换为目标说话人特征的方法。语音转换特征参数分为两类:(1)频谱特征参数;(2)基音和声调模式。分别描述信号模型和转换方法。频谱特征用基于音素的2维HMMS建模,F0轨迹用来表示基音和音调。用基音同步叠加法对基音厨期、声调和语速进行变换。  相似文献   

7.
在正弦激励模型的线性预测(LP)残差转换的基础上,提出了一种改进语音特征转换性能的语音转换方法.基于线性预测分析和综合的构架,该方法一方面通过谱包络估计声码器提取源说话人的线性预测编码(LPC)倒谱包络,并使用双线性变换函数实现倒谱包络的转换;另一方面由谐波正弦模型对线性预测残差信号建模和分解,采用基音频率变换将源说话人的残差信号转换为近似目标说话人的残差信号.最后由修正后的残差信号激励时变滤波器得到转换语音,滤波器参数通过转换得到的LPC倒谱包络实时更新.实验结果表明,该方法在主观和客观测试中都具有良好的结果,能有效地转换说话人声音特征,获得高相似度的转换语音.  相似文献   

8.
提出了一种用于源-目标说话人声门波导数参数转换的、基于勒让德正交分解的声门波导数波形参数提取方法。该方法将声门波导数波形在6维正交勒让德坐标系中的投影构成了描述其形状的特征矢量,并采用基于GMM的概率分类加权转换算法,使每个特征矢量的转换规则可由多个类所对应的规则的线性加权组合得到,可以使转换性能得到较大的提高。在此基础上,又给出了一种基于GMM的声门波导数波形的码本修正算法,以弥补声门波导数波形参数化而损失的含有说话人个性特征的高频送气分量和波纹分量。实验结果表明,本文方法转换性能明显好于基于矢量量化(VQ)的码本映射算法。  相似文献   

9.
对说话人语音个性特征信息的表征和提取进行了深入研究,提出了一种基于深度信念网络(Deep Belief Nets,DBN)的语音转换方法。分别用提取出的源说话人和目标说话人语音频谱参数来训练DBN,分别得到其在高阶空间的语音个性特征表征;通过人工神经网络(Artificial Neural Networks,ANN)来连接这两个高阶空间并进行特征转换;使用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到转换后语音频谱参数,合成转换语音。实验结果表明,与传统的基于GMM方法相比,该方法效果更好,转换语音音质和相似度同目标语音更接近。  相似文献   

10.
提出一种将STRAIGHT模型和深度信念网络DBN相结合实现语音转换的方式。首先,通过STRAIGHT模型提取出源说话人和目标说话人的语音频谱参数,用提取的频谱参数分别训练两个DBN得到语音高阶空间的个性特征信息;然后,用人工神经网络ANN将两个具有高阶特征的空间连接并进行特征转换;最后,用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到语音频谱参数,并用STRAIGHT模型合成具有目标说话人个性化特征的语音。实验结果表明,采用此种方式获得的语音转换效果要比传统的采用GMM实现语音转换更好,转换后的语音音质和相似度与目标语音更接近。  相似文献   

11.
Robust processing techniques for voice conversion   总被引:3,自引:0,他引:3  
Differences in speaker characteristics, recording conditions, and signal processing algorithms affect output quality in voice conversion systems. This study focuses on formulating robust techniques for a codebook mapping based voice conversion algorithm. Three different methods are used to improve voice conversion performance: confidence measures, pre-emphasis, and spectral equalization. Analysis is performed for each method and the implementation details are discussed. The first method employs confidence measures in the training stage to eliminate problematic pairs of source and target speech units that might result from possible misalignments, speaking style differences or pronunciation variations. Four confidence measures are developed based on the spectral distance, fundamental frequency (f0) distance, energy distance, and duration distance between the source and target speech units. The second method focuses on the importance of pre-emphasis in line-spectral frequency (LSF) based vocal tract modeling and transformation. The last method, spectral equalization, is aimed at reducing the differences in the source and target long-term spectra when the source and target recording conditions are significantly different. The voice conversion algorithm that employs the proposed techniques is compared with the baseline voice conversion algorithm with objective tests as well as three subjective listening tests. First, similarity to the target voice is evaluated in a subjective listening test and it is shown that the proposed algorithm improves similarity to the target voice by 23.0%. An ABX test is performed and the proposed algorithm is preferred over the baseline algorithm by 76.4%. In the third test, the two algorithms are compared in terms of the subjective quality of the voice conversion output. The proposed algorithm improves the subjective output quality by 46.8% in terms of mean opinion score (MOS).  相似文献   

12.
目前主流语音转换算法计算量大,复杂度高, 难以在内核小的嵌入式系统上运行。为了降低语音转换的计算复杂度,缩短训练时间,提出 一种基于混合码书映射的高效语音转换方法。在训练阶段,根据不同的参与训练的语音数据 量 建立不同的码书映射关系,节约训练时长,提高准确度。在转换阶段,系统依据训练阶段建 立的码书映射关系对浊音帧的声道参数进行转换。另外,为了提高转换语音的主观音质,系 统对清音帧的特征参数也作了相应转换,并且修正了转换语音的共振峰频率以克服帧间共振 峰抖动的问题。主客观测试结果表明:在保证转换音质的前提下,本文提出的语音转换方法 降低了计算复杂度、明显缩减了训练时间。  相似文献   

13.
We propose a pitch synchronous approach to design the voice conversion system taking into account the correlation between the excitation signal and vocal tract system characteristics of speech production mechanism. The glottal closure instants (GCIs) also known as epochs are used as anchor points for analysis and synthesis of the speech signal. The Gaussian mixture model (GMM) is considered to be the state-of-art method for vocal tract modification in a voice conversion framework. However, the GMM based models generate overly-smooth utterances and need to be tuned according to the amount of available training data. In this paper, we propose the support vector machine multi-regressor (M-SVR) based model that requires less tuning parameters to capture a mapping function between the vocal tract characteristics of the source and the target speaker. The prosodic features are modified using epoch based method and compared with the baseline pitch synchronous overlap and add (PSOLA) based method for pitch and time scale modification. The linear prediction residual (LP residual) signal corresponding to each frame of the converted vocal tract transfer function is selected from the target residual codebook using a modified cost function. The cost function is calculated based on mapped vocal tract transfer function and its dynamics along with minimum residual phase, pitch period and energy differences with the codebook entries. The LP residual signal corresponding to the target speaker is generated by concatenating the selected frame and its previous frame so as to retain the maximum information around the GCIs. The proposed system is also tested using GMM based model for vocal tract modification. The average mean opinion score (MOS) and ABX test results are 3.95 and 85 for GMM based system and 3.98 and 86 for the M-SVR based system respectively. The subjective and objective evaluation results suggest that the proposed M-SVR based model for vocal tract modification combined with modified residual selection and epoch based model for prosody modification can provide a good quality synthesized target output. The results also suggest that the proposed integrated system performs slightly better than the GMM based baseline system designed using either epoch based or PSOLA based model for prosody modification.  相似文献   

14.
朱廷劭  高文 《计算机学报》2000,23(11):1179-1183
普通话韵律规则对于语音合成和语音学研究具有重要意义。为了更有效地进行韵律规则学习,该文利用数据挖掘技术从语料库中的取规则。通过聚类分析进行基频模式提取,并以此进行基频序列的离散化;由语言学分析的结果得出训练句子中每个单节的参数,利用决策树和神经网络学习章节的韵律变化规则。测试表明基于数据挖掘的韵律规则学习取得了较好的结果,证实了方法的有效性。  相似文献   

15.
Tone study is very important for Mandarin speech recognition. In this paper, a Mixture Stochastic Polynomial Tone Model (MSPTM) is proposed for tone modeling in continuous Mandarin speech. In this model the pitch contour, main representative of tone pattern, is described as a mixed stochastic trajectory. The mean trajectory is represented by a polynomial function of normalized time while the variance is time varying. Effective training and tone recognition algorithms were developed. The experimental results based on the proposed MSPTM showed 40.7% tone recognition error rate reduction relative to the traditional Hidden Markov Model (HMM) tone model. We also present a decision tree based approach to learning the tone pattern variation in continuous speech. The phonetic and linguistic factors that may affect the tone patterns were taken into consideration while constructing the tree. After the tree was established, 28 different tone patterns were obtained. We found that in addition to the tone of the neighboring syllable, Consonant/Vowel type of the syllable and the position of the syllable in the utterance also made important contributions to tone pattern variations in continuous speech. Finally, a new approach of integrating tone information into the search process at word level is discussed. Experiments on continuous Mandarin speech recognition showed that the new tone model and tone information integration method were efficient, achieving a 16.2% relative character error rate reduction.  相似文献   

16.
以降低码率为目的对G.728算法进行改进,提出了一个延迟为2.5 ms的8 Kbit/s的语音编码算法。算法引入了由最近的历史激励构成的自适应码书和归一化的固定码书的双码书结构。计算增益真值并量化,增益量化时对自适应码书用固定量化,固定码书用自适应量化。码书搜索时先进行后向基音检测,在基音周期T附近对自适应码书进行精细搜索。搜索64个自适应码矢、256个固定码矢和各自8个增益值获得最佳激励,每帧耗费20 bit。用平均分段信噪比和感知语音质量评价(PESQ)测试,改进算法编码质量接近于G.728。  相似文献   

17.
由于ITU-TG.723.1语音编码算法具有较高的算法复杂度,故而在应用与实现时受到了很多的限制。该文提出一种低复杂度闭环基音搜索算法,该算法仍以5阶基音预测器为基础,但在求取5个基音预测增益时不是采用原算法中对20维矢量码本进行搜索的方法,而是利用这个20维矢量组成一个Wiener-Hopf方程,并利用语音的短时平稳特性将该方程简化为一个Toeplitz线性代数方程组,方程组的解就是所求的基音预测增益。对该增益进行5维码本矢量量化,从而用5维矢量码本搜索代替了原来的20维矢量码本搜索。这样使闭环基音搜索部分的运算量降低了一半,语音质量只有略微下降,同时与G.723.1算法码流兼容。  相似文献   

18.
19.
In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussian mixture models (GMMs) for design of voice conversion system using line spectral frequencies (LSFs) as feature vectors. Both the ANN and GMM based models are explored to capture nonlinear mapping functions for modifying the vocal tract characteristics of a source speaker according to a desired target speaker. The LSFs are used to represent the vocal tract transfer function of a particular speaker. Mapping of the intonation patterns (pitch contour) is carried out using a codebook based model at segmental level. The energy profile of the signal is modified using a fixed scaling factor defined between the source and target speakers at the segmental level. Two different methods for residual modification such as residual copying and residual selection methods are used to generate the target residual signal. The performance of ANN and GMM based voice conversion (VC) system are conducted using subjective and objective measures. The results indicate that the proposed ANN-based model using LSFs feature set may be used as an alternative to state-of-the-art GMM-based models used to design a voice conversion system.  相似文献   

20.
汉语是一种带声调的语言,声调信息在汉语语音识别中具有非常重要的意义。提出了emt}eaaea声调模型与explicit声调模型相结合的方法用以识别汉语连续语音的声调。该方法能够将逐帧的基频信息和较强时长的基频信息相结合来识别声调。在“863-Test”和“TestCorpus98"测试集上的实验表明,该方法分别能够达到96. 12%和93.78 %的声调识别正确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号