首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
The objective of voice conversion system is to formulate the mapping function which can transform the source speaker characteristics to that of the target speaker. In this paper, we propose the General Regression Neural Network (GRNN) based model for voice conversion. It is a single pass learning network that makes the training procedure fast and comparatively less time consuming. The proposed system uses the shape of the vocal tract, the shape of the glottal pulse (excitation signal) and long term prosodic features to carry out the voice conversion task. In this paper, the shape of the vocal tract and the shape of source excitation of a particular speaker are represented using Line Spectral Frequencies (LSFs) and Linear Prediction (LP) residual respectively. GRNN is used to obtain the mapping function between the source and target speakers. The direct transformation of the time domain residual using Artificial Neural Network (ANN) causes phase change and generates artifacts in consecutive frames. In order to alleviate it, wavelet packet decomposed coefficients are used to characterize the excitation of the speech signal. The long term prosodic parameters namely, pitch contour (intonation) and the energy profile of the test signal are also modified in relation to that of the target (desired) speaker using the baseline method. The relative performances of the proposed model are compared to voice conversion system based on the state of the art RBF and GMM models using objective and subjective evaluation measures. The evaluation measures show that the proposed GRNN based voice conversion system performs slightly better than the state of the art models.  相似文献   

2.
We propose a pitch synchronous approach to design the voice conversion system taking into account the correlation between the excitation signal and vocal tract system characteristics of speech production mechanism. The glottal closure instants (GCIs) also known as epochs are used as anchor points for analysis and synthesis of the speech signal. The Gaussian mixture model (GMM) is considered to be the state-of-art method for vocal tract modification in a voice conversion framework. However, the GMM based models generate overly-smooth utterances and need to be tuned according to the amount of available training data. In this paper, we propose the support vector machine multi-regressor (M-SVR) based model that requires less tuning parameters to capture a mapping function between the vocal tract characteristics of the source and the target speaker. The prosodic features are modified using epoch based method and compared with the baseline pitch synchronous overlap and add (PSOLA) based method for pitch and time scale modification. The linear prediction residual (LP residual) signal corresponding to each frame of the converted vocal tract transfer function is selected from the target residual codebook using a modified cost function. The cost function is calculated based on mapped vocal tract transfer function and its dynamics along with minimum residual phase, pitch period and energy differences with the codebook entries. The LP residual signal corresponding to the target speaker is generated by concatenating the selected frame and its previous frame so as to retain the maximum information around the GCIs. The proposed system is also tested using GMM based model for vocal tract modification. The average mean opinion score (MOS) and ABX test results are 3.95 and 85 for GMM based system and 3.98 and 86 for the M-SVR based system respectively. The subjective and objective evaluation results suggest that the proposed M-SVR based model for vocal tract modification combined with modified residual selection and epoch based model for prosody modification can provide a good quality synthesized target output. The results also suggest that the proposed integrated system performs slightly better than the GMM based baseline system designed using either epoch based or PSOLA based model for prosody modification.  相似文献   

3.
提出一种将STRAIGHT模型和深度信念网络DBN相结合实现语音转换的方式。首先,通过STRAIGHT模型提取出源说话人和目标说话人的语音频谱参数,用提取的频谱参数分别训练两个DBN得到语音高阶空间的个性特征信息;然后,用人工神经网络ANN将两个具有高阶特征的空间连接并进行特征转换;最后,用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到语音频谱参数,并用STRAIGHT模型合成具有目标说话人个性化特征的语音。实验结果表明,采用此种方式获得的语音转换效果要比传统的采用GMM实现语音转换更好,转换后的语音音质和相似度与目标语音更接近。  相似文献   

4.
Voice conversion (VC) approach, which morphs the voice of a source speaker to be perceived as spoken by a specified target speaker, can be intentionally used to deceive the speaker identification (SID) and speaker verification (SV) systems that use speech biometric. Voice conversion spoofing attacks to imitate a particular speaker pose potential threat to these kinds of systems. In this paper, we first present an experimental study to evaluate the robustness of such systems against voice conversion disguise. We use Gaussian mixture model (GMM) based SID systems, GMM with universal background model (GMM-UBM) based SV systems and GMM supervector with support vector machine (GMM-SVM) based SV systems for this. Voice conversion is conducted by using three different techniques: GMM based VC technique, weighted frequency warping (WFW) based conversion method and its variation, where energy correction is disabled (WFW). Evaluation is done by using intra-gender and cross-gender voice conversions between fifty male and fifty female speakers taken from TIMIT database. The result is indicated by degradation in the percentage of correct identification (POC) score in SID systems and degradation in equal error rate (EER) in all SV systems. Experimental results show that the GMM-SVM SV systems are more resilient against voice conversion spoofing attacks than GMM-UBM SV systems and all SID and SV systems are most vulnerable towards GMM based conversion than WFW and WFW based conversion. From the results, it can also be said that, in general terms, all SID and SV systems are slightly more robust to voices converted through cross-gender conversion than intra-gender conversion. This work extended the study to find out the relationship between VC objective score and SV system performance in CMU ARCTIC database, which is a parallel corpus. The results of this experiment show an approach on quantifying objective score of voice conversion that can be related to the ability to spoof an SV system.  相似文献   

5.
为了克服利用高斯混合模型(GMM)进行语音转换的过程中出现的过平滑现象,考虑到GMM模型参数的均值能够表征转换特征的频谱包络形状,本文提出一种基于GMM与ANN混合模型的语音转换,利用ANN对GMM模型参数的均值进行转换;为了获取连续的转换频谱,采用静态和动态频谱特征相结合来逼近转换频谱序列;鉴于基频对语音转换的重要性,在频谱转换的基础上,对基频也进行了分析和转换。最后,通过主观和客观实验对提出的混合模型的语音转换方法的性能进行测试,实验结果表明,与传统的基于GMM模型的语音转换方法相比,本文提出的方法能够获得更好的转换语音。  相似文献   

6.
提出了一种基于粒子群算法PSO优化广义回归神经网络GRNN模型的语音转换方法。首先,该方法利用训练语音的声道和激励源的个性化特征参数分别训练两个GRNN,得到GRNN的结构参数;然后,利用PSO对GRNN的结构参数进行优化,减少人为因素对转换结果的影响;最后,对语音的韵律特征、基音轮廓和能量分别进行了线性转换,使得转换后的语音包含更多源语音的个性化特征信息。主客观实验结果表明:与径向基神经网络RBF和GRNN相比,使用本文提出的转换模型获得的转换语音的自然度和似然度都得到了很大的提升,谱失真率明显降低并且更接近于目标语音。  相似文献   

7.
This article puts forward a new algorithm for voice conversion which not only removes the necessity of parallel corpus in the training phase but also resolves the issue of insufficiency of the target speaker’s corpus. The proposed approach is based on one of the new voice conversion models utilizing classical LPC analysis-synthesis model combined with GMM. Through this algorithm, the conversion functions among vowels and demi-syllables are derived. We assumed that these functions are rather the same for different speakers if their genders, accents, and languages are alike. Therefore, we will be able to produce the demi-syllables with just having access to few sentences from the target speaker and forming the GMM for one of his/her vowels. The results from the appraisal of the proposed method for voice conversion clarifies that this method has the ability to efficiently realize the speech features of the target speaker. It can also provide results comparable to the ones obtained through the parallel-corpus-based approaches.  相似文献   

8.
从线性预测(LP)残差信号中提出了一种新的特征提取方法,这种特征跟单个的说话人的声道密切相关。通过把HAAR小波变换运用于LP 残差而获得了一个新的特征(HOCOR)。为了进一步提高系统的鲁棒性和辨识率,在采用分级说话人辨识的基础上,将基音周期的高斯概率密度对GMM分类器的似然度进行加权,形成新的似然度进行说话人辨识。试验结果显示,所提出系统的鲁棒性和辨识率都有所提高。  相似文献   

9.
为了解决电话语音说话人确认系统中信道非线性失真导致系统性能下降的问题,提出一种消除信道影响的特征映射方法.采用高斯混合模型建立语音模型,通过最大后验概率自适应某种信道的语音模型,两种模型间相应高斯类的差异描述了该信道对于不同语音的影响.由此得出信道映射规则进行参数补偿,消除训练和测试语音中不匹配的影响.在NIST 1999年和2004年男性说话人的数据库上进行的实验表明,此方法使系统的等错误率分别改善了14.7%和15.18%.  相似文献   

10.
In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed system helps the vocal tract system to further improve the overall performance.  相似文献   

11.
肖星星  冯瑞 《计算机工程》2012,38(24):171-174
现有说话人识别方法在短时语音条件下识别性能明显下降。为此,提出一种基于共性特征选择的短时说话人识别方法。利用说话人语音数据得到高斯混合模型,提取说话人之间的公共重叠部分,建立共性重叠模型和非重叠模型,根据这2个模型完成测试语音特征的选择,计算其在所有说话人非重叠模型中的相似度,并根据相似性最大化原则进行决策。实验结果表明,该方法具有较强的鲁棒性,且系统识别错误率较低。  相似文献   

12.
对说话人语音个性特征信息的表征和提取进行了深入研究,提出了一种基于深度信念网络(Deep Belief Nets,DBN)的语音转换方法。分别用提取出的源说话人和目标说话人语音频谱参数来训练DBN,分别得到其在高阶空间的语音个性特征表征;通过人工神经网络(Artificial Neural Networks,ANN)来连接这两个高阶空间并进行特征转换;使用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到转换后语音频谱参数,合成转换语音。实验结果表明,与传统的基于GMM方法相比,该方法效果更好,转换语音音质和相似度同目标语音更接近。  相似文献   

13.
Voice conversion (VC) consists in modifying the source speaker’s voice toward the voice of the target speaker. In our paper, we are interested in calculating the performance of a conversion system based on GMM, applied to the Arabic language, by exploiting both the information of the pitch dynamics and the spectrum. We study three approaches to obtain the global conversion function of the pitch and the overall spectrum, using the joint probability model. In the first approach, we calculate the joint conversion of pitch and spectrum. In the second approach, the pitch is calculated by linear conversion. In the third approach, we use the relationship between the pitch and the spectrum. For the conversion of noise we use a new technique that consists in modeling the noise of the voiced or unvoiced frames by GMMs. We use the HNM for analysis/synthesis and a regularized discrete cepstrum in order to estimate the spectrum of the speech signal.  相似文献   

14.
提出一种基于话者无关模型的说话人转换方法.考虑到音素信息共同存在于所有说话人的语音中,假设存在一个可以用高斯混合模型来描述的话者无关空间,且可用分段线性变换来描述该空间到各说话人相关空间之间的映射关系.在一个多说话人的数据库上,用话者自适应训练算法来训练模型,并在转换阶段使用源目标说话人空间到话者无关空间的变换关系来构造源与目标之间的特征变换关系,快速、灵活的构造说话人转换系统.通过主观测听实验来验证该算法相对于传统的基于话者相关模型方法的优点.  相似文献   

15.
通过分析GMM(高斯混合模型)的说话人辨认系统的性能,提出了一种捕捉不同说话人交互信息的人工神经网络(ANN)方法,构成一个GMM/ANN混合说话人辨认系统。实验表明,GMM/ANN混合系统的说话人辨认能够取得比基于GMM和基于MLP(多层感知器)更高的辨认率。  相似文献   

16.
In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.  相似文献   

17.
The Gaussian mixture model – Universal background model (GMM–UBM) system is one of the predominant approaches for text-independent speaker verification, because both the target speaker model and the impostor model (UBM) have generalization ability to handle “unseen” acoustic patterns. However, since GMM–UBM uses a common anti-model, namely UBM, for all target speakers, it tends to be weak in rejecting impostors’ voices that are similar to the target speaker’s voice. To overcome this limitation, we propose a discriminative feedback adaptation (DFA) framework that reinforces the discriminability between the target speaker model and the anti-model, while preserving the generalization ability of the GMM–UBM approach. This is achieved by adapting the UBM to a target speaker dependent anti-model based on a minimum verification squared-error criterion, rather than estimating the model from scratch by applying the conventional discriminative training schemes. The results of experiments conducted on the NIST2001-SRE database show that DFA substantially improves the performance of the conventional GMM–UBM approach.  相似文献   

18.
Robust processing techniques for voice conversion   总被引:3,自引:0,他引:3  
Differences in speaker characteristics, recording conditions, and signal processing algorithms affect output quality in voice conversion systems. This study focuses on formulating robust techniques for a codebook mapping based voice conversion algorithm. Three different methods are used to improve voice conversion performance: confidence measures, pre-emphasis, and spectral equalization. Analysis is performed for each method and the implementation details are discussed. The first method employs confidence measures in the training stage to eliminate problematic pairs of source and target speech units that might result from possible misalignments, speaking style differences or pronunciation variations. Four confidence measures are developed based on the spectral distance, fundamental frequency (f0) distance, energy distance, and duration distance between the source and target speech units. The second method focuses on the importance of pre-emphasis in line-spectral frequency (LSF) based vocal tract modeling and transformation. The last method, spectral equalization, is aimed at reducing the differences in the source and target long-term spectra when the source and target recording conditions are significantly different. The voice conversion algorithm that employs the proposed techniques is compared with the baseline voice conversion algorithm with objective tests as well as three subjective listening tests. First, similarity to the target voice is evaluated in a subjective listening test and it is shown that the proposed algorithm improves similarity to the target voice by 23.0%. An ABX test is performed and the proposed algorithm is preferred over the baseline algorithm by 76.4%. In the third test, the two algorithms are compared in terms of the subjective quality of the voice conversion output. The proposed algorithm improves the subjective output quality by 46.8% in terms of mean opinion score (MOS).  相似文献   

19.
在正弦激励模型的线性预测(LP)残差转换的基础上,提出了一种改进语音特征转换性能的语音转换方法.基于线性预测分析和综合的构架,该方法一方面通过谱包络估计声码器提取源说话人的线性预测编码(LPC)倒谱包络,并使用双线性变换函数实现倒谱包络的转换;另一方面由谐波正弦模型对线性预测残差信号建模和分解,采用基音频率变换将源说话人的残差信号转换为近似目标说话人的残差信号.最后由修正后的残差信号激励时变滤波器得到转换语音,滤波器参数通过转换得到的LPC倒谱包络实时更新.实验结果表明,该方法在主观和客观测试中都具有良好的结果,能有效地转换说话人声音特征,获得高相似度的转换语音.  相似文献   

20.
随着语音产品在现代社会中的日益推广和普及,语声转换技术也将有着越来越广泛的应用。STRAIGHT模型与其他的语音模型相比,在语音分析和合成时能获得更高的语音质量。本文主要就语音转换所采用的STRAIGHT模型、所提取的参数、训练所用的高斯混合模型进行了讨论。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号