首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 453 毫秒
1.
通过对语音转换的研究,提出了一种把源说话人特征转换为目标说话人特征的方法。语音转换特征参数分为两类:(1)频谱特征参数;(2)基音和声调模式。分别描述信号模型和转换方法。频谱特征用基于音素的2维HMMS建模,F0轨迹用来表示基音和音调。用基音同步叠加法对基音厨期、声调和语速进行变换。  相似文献   

2.
提出一种将STRAIGHT模型和深度信念网络DBN相结合实现语音转换的方式。首先,通过STRAIGHT模型提取出源说话人和目标说话人的语音频谱参数,用提取的频谱参数分别训练两个DBN得到语音高阶空间的个性特征信息;然后,用人工神经网络ANN将两个具有高阶特征的空间连接并进行特征转换;最后,用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到语音频谱参数,并用STRAIGHT模型合成具有目标说话人个性化特征的语音。实验结果表明,采用此种方式获得的语音转换效果要比传统的采用GMM实现语音转换更好,转换后的语音音质和相似度与目标语音更接近。  相似文献   

3.
对说话人语音个性特征信息的表征和提取进行了深入研究,提出了一种基于深度信念网络(Deep Belief Nets,DBN)的语音转换方法。分别用提取出的源说话人和目标说话人语音频谱参数来训练DBN,分别得到其在高阶空间的语音个性特征表征;通过人工神经网络(Artificial Neural Networks,ANN)来连接这两个高阶空间并进行特征转换;使用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到转换后语音频谱参数,合成转换语音。实验结果表明,与传统的基于GMM方法相比,该方法效果更好,转换语音音质和相似度同目标语音更接近。  相似文献   

4.
一种使用声调映射码本的汉语声音转换方法   总被引:3,自引:0,他引:3  
在使用高斯混合模型实现说话人语音频谱包络变换的同时,提出了一种汉语声调码本映射技术来进一步提高转换语音目标说话人特征倾向性的方法。从源语音和目标语音分别提取汉语单音节的基频曲线作为基频变换单元,作预处理和聚类后分别形成源、目标声调码本,根据时间对准原则建立了一个由源特征空间到目标特征空间的声调模式映射码本。声音转换实验评估了声调码本映射算法的性能。实验结果表明,该算法较好地反映出源说话人与目标说话人基频曲线之间的映射关系,改善了声音转换性能。  相似文献   

5.
在与文本有关的说话人识别系统中,既需要识别说话人的身份,又需要识别语音文本的内容。语音信号特征参数的选取对系统来说至关重要。目前,在传统语音识别系统的研究中,主要采用MFCC参数作为特征参数进行识别。笔者对语音信号特征参数进行分析,对不同的语音特征参数组合进行实验。实验结果证明,在该系统中,MFCC参数与基音参数的组合提高了系统的识别率。  相似文献   

6.
在正弦激励模型的线性预测(LP)残差转换的基础上,提出了一种改进语音特征转换性能的语音转换方法.基于线性预测分析和综合的构架,该方法一方面通过谱包络估计声码器提取源说话人的线性预测编码(LPC)倒谱包络,并使用双线性变换函数实现倒谱包络的转换;另一方面由谐波正弦模型对线性预测残差信号建模和分解,采用基音频率变换将源说话人的残差信号转换为近似目标说话人的残差信号.最后由修正后的残差信号激励时变滤波器得到转换语音,滤波器参数通过转换得到的LPC倒谱包络实时更新.实验结果表明,该方法在主观和客观测试中都具有良好的结果,能有效地转换说话人声音特征,获得高相似度的转换语音.  相似文献   

7.
论文针对小波变换和语音信号的特点,把小波变换和形态滤波法结合应用于语音信号基音周期的提取,并在此基础上把小波变换和说话人声道特征参数相结合,用于声道特征的提取。最后在以上研究的基础上设计了一种用于公安侦破和司法鉴定的语音监测系统。  相似文献   

8.
语音转换技术在语音处理领域是一个比较新的研究方向,也是近年来语音领域的研究热点。语音转换技术是指改变源说话人的语音特征使之具有目标说话人特征的一项技术。本文说明了语音转换的定义,介绍了语音的个性特征,列举了频谱包络的几种主要的转换算法以及韵律转换的主要算法。最后说明了语音转换今后的研究方向。  相似文献   

9.
一种基于曲线拟合的二音节汉语声调识别方法   总被引:1,自引:0,他引:1  
本文提出了一种利用曲线拟合的方法,对连续二音节汉语语音进行了声调识别,并且加以实现。它采用倒频谱分析技术提取语音的基音周期,并利用倒频谱参数及短时功率进行音节分界。实验证明.基于曲线拟合的汉语声调识别方法,具有算法简单,可适用不同的说话人、高识别正确率等优点,是一种行之有效的方法。  相似文献   

10.
语音转换技术在语音处理领域是一个比较新的研究方向,也是近年来语音领域的研究热点。语音转换技术是指改变源说话人的语音特征使之具有目标说话人特征的一项技术。本文说明了语音转换的定义,介绍了语音的个性特征,列举了频谱包络的几种主要的转换算法以及韵律转换的主要算法。最后说明了语音转换今后的研究方向。  相似文献   

11.
This paper provides an introduction to the acoustic–phonetic structure of English regional accents and presents a signal processing method for the modeling and transformation of the acoustic correlates of English accents for example from British English to American English. The focus of this paper is on the modeling of intonation and duration correlates of accents as the modeling of formants is described in previous papers (Yan et al., 2007, Vaseghi et al., 2009). The intonation correlates of accents are modeled with the statistics of a set of broad features of the pitch contour. The statistical models of phoneme durations and word speaking rates are obtained from automatic segmentation of word/phoneme boundaries of speech databases. A contribution of this paper is the use of accent synthesis for comparative evaluation of the causal effects of the acoustic correlates of accent. The differences between the acoustics–phonetic realizations of British Received Pronunciation (RP), Broad Australian (BAU) and General American (GenAm) English accents are modeled and used in an accent transformation and synthesis method for evaluation of the influence of formant, pitch and duration on conveying accents.  相似文献   

12.
This paper presents a method for the estimation and mapping of parametric models of speech resonance at formants for voice conversion. The spectral features at formants that contribute to voice characteristics are the trajectories of the frequencies, the bandwidths and intensities of the resonance at formants. The formant features are extracted from the poles of a linear prediction (LP) model of speech. The statistical distributions of formants are modelled by a two-dimensional hidden Markov model (HMM) spanning the time and frequency dimensions. Experimental results are presented which show a close match between HMM-based formant models and the histograms of formants. For voice conversion two alternative methods are explored for mapping the formants of a source speaker to those of a target speaker. The first method is based on an adaptive formant-tracking warping of the frequency response of the LP model and the second method is based on the rotation of the poles of the LP model of speech. Both methods transform all spectral parameters of the resonance at formants of the source speaker towards those of the target speaker. In addition, the issues affecting the selection of the warping ratios for the mapping functions are investigated. Experimental results of formant estimation and perceptual evaluation of voice morphing based on parametric formant models are presented.  相似文献   

13.
In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussian mixture models (GMMs) for design of voice conversion system using line spectral frequencies (LSFs) as feature vectors. Both the ANN and GMM based models are explored to capture nonlinear mapping functions for modifying the vocal tract characteristics of a source speaker according to a desired target speaker. The LSFs are used to represent the vocal tract transfer function of a particular speaker. Mapping of the intonation patterns (pitch contour) is carried out using a codebook based model at segmental level. The energy profile of the signal is modified using a fixed scaling factor defined between the source and target speakers at the segmental level. Two different methods for residual modification such as residual copying and residual selection methods are used to generate the target residual signal. The performance of ANN and GMM based voice conversion (VC) system are conducted using subjective and objective measures. The results indicate that the proposed ANN-based model using LSFs feature set may be used as an alternative to state-of-the-art GMM-based models used to design a voice conversion system.  相似文献   

14.
Statistical Approach for Voice Personality Transformation   总被引:1,自引:0,他引:1  
A voice transformation method which changes the source speaker's utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker's voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods  相似文献   

15.
The objective of voice conversion system is to formulate the mapping function which can transform the source speaker characteristics to that of the target speaker. In this paper, we propose the General Regression Neural Network (GRNN) based model for voice conversion. It is a single pass learning network that makes the training procedure fast and comparatively less time consuming. The proposed system uses the shape of the vocal tract, the shape of the glottal pulse (excitation signal) and long term prosodic features to carry out the voice conversion task. In this paper, the shape of the vocal tract and the shape of source excitation of a particular speaker are represented using Line Spectral Frequencies (LSFs) and Linear Prediction (LP) residual respectively. GRNN is used to obtain the mapping function between the source and target speakers. The direct transformation of the time domain residual using Artificial Neural Network (ANN) causes phase change and generates artifacts in consecutive frames. In order to alleviate it, wavelet packet decomposed coefficients are used to characterize the excitation of the speech signal. The long term prosodic parameters namely, pitch contour (intonation) and the energy profile of the test signal are also modified in relation to that of the target (desired) speaker using the baseline method. The relative performances of the proposed model are compared to voice conversion system based on the state of the art RBF and GMM models using objective and subjective evaluation measures. The evaluation measures show that the proposed GRNN based voice conversion system performs slightly better than the state of the art models.  相似文献   

16.
提出了一种基于粒子群算法PSO优化广义回归神经网络GRNN模型的语音转换方法。首先,该方法利用训练语音的声道和激励源的个性化特征参数分别训练两个GRNN,得到GRNN的结构参数;然后,利用PSO对GRNN的结构参数进行优化,减少人为因素对转换结果的影响;最后,对语音的韵律特征、基音轮廓和能量分别进行了线性转换,使得转换后的语音包含更多源语音的个性化特征信息。主客观实验结果表明:与径向基神经网络RBF和GRNN相比,使用本文提出的转换模型获得的转换语音的自然度和似然度都得到了很大的提升,谱失真率明显降低并且更接近于目标语音。  相似文献   

17.
提出了一种基于自适应加权谱内插(STRAIGHT)的宽带语音编码算法。输入的语音信号首先经过STRAIGHT分析得到精确的基频参数和谱参数,然后通过时域抽取和频域建模实现有效的编码压缩。在时域抽取时采用的区别于传统编码算法固定帧长的自适应可变帧长方法,使得编码存储量可以根据实际语音变化情况得到更加合理的分配。主观测听结果表明,该算法针对16kHz采样的语音信号,在6kbps码率上可以取得与AMR-WB(G.722.2)在8.85kbps时的相当的音质效果。此外,该算法还具有对恢复语音的时长、基频以及谱参数较强的调整能力。  相似文献   

18.
In this paper we examined the human voice of 20 adults (20 smokers and 20 non-smokers) to determine the effects of cigarette smoking on formants frequency, pitch, shimmer and jitter based on 3 Amazigh language vowels (A, I, U). The statistical data parameters are collected from male Moroccan speakers aged between 26 and 50 years old. Our results show that, the pitch values of smokers are lower compared to those of non-smokers. Also, smokers’ formants frequency F1 and F2 are close to non-smokers ones for the three considered vowels .Whereas, F3 and F4 are lower in the case of smokers. Shimmer and Jitter analysis showed higher values for these parameters among smoker.  相似文献   

19.
基于前置滤波和小波变换的带噪语音基音周期检测方法   总被引:10,自引:0,他引:10  
根据语音信号的基音周期范围有限和在声门闭合时刻语音信号出现锐变的特点,提出一种基于前置滤波和小波变换的基音周期检测方法。带噪语音信号经过3阶椭圆低通滤波器滤波后,采用以二次样条小波作为小波函数,进行一级小波变换检测语音信号的锐变点,再计算基音周期。实验表明,本文提出的基音周期检测方法,与平均幅度差函数(AMDF)和自相关函数(ACF)方法相比,提高了提取基音周期的准确率;与多尺度小波变换的基音周期检测方法相比,减小了计算量,削弱了噪声信号和语音的共振峰对基音周期检测的影响。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号