首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 70 毫秒
1.
A method based on ensemble empirical mode decomposition (EEMD) is proposed for accurately detecting the time varying pitch of speech in tonal languages. Unlike frame-, event-, or subspace-based pitch detectors, the time varying information of pitch within the short duration, which is of crucial importance in speech processing of tonal languages, can be accurately extracted. The Chinese Linguistic Data Consortium (CLDC) database for Mandarin Chinese was employed as standard speech data for the evaluation of the effectiveness of the method. It is shown that the proposed method provides more accurate and reliable results, particularly in estimating the tones of non-monotonically varying pitches like the third one in Mandarin Chinese. Also, it is shown that the new method has strong resistance to noise disturbance.  相似文献   

2.
一种基于线性预测残差倒谱的基音检测算法   总被引:2,自引:1,他引:1       下载免费PDF全文
基音检测算法一直是音频处理领域的研究热点,但是语音信号声道特征对基音及其谐振结构的影响很大,增加了检测的难度。利用LP残差仅仅保留声门激励信号的特点,通过倒谱分析避免了声道特征和噪声的影响。同时针对倒谱分析中经常出现的半频倍频问题和低频截断问题,引入了谐波积谱(HPS)的解决方案,提高了识别的精度。实验表明,该方法能够较好地避免半频倍频错误,对于截去了低频和高频的电话信道语音也能够得到令人满意的检测结果,同时作为帧检测技术能够满足实时应用的需要。  相似文献   

3.
自相关函数法、平均幅度差函数法及小波变换法是经典的基音检测方法,本文简要分析了单独使用它们进行基音检测时存在的不足,提出了一种基于小波变换的加权自相关的检测方法。将多级小波变换的近似分量加权求和以突出基音信息,采用改进的平均幅度差函数加权自相关函数的方法以突出真实基音周期处的峰值,提高基音检测的正确率。实验表明,与传统的自相关函数法和平均幅度差函数法相比,本文方法减少了倍频和半频错误,提高了基音检测的精度,在信噪比为-5dB时仍能得到较准确的结果。  相似文献   

4.
针对单声道语音分离中浊音分离的问题,提出了一种准确估计基音周期的方法。首先,以语音的短时平稳性和基音周期的连续性等为线索,利用语音信号的倒谱峰值构成基音周期谱图,并自动提取基音周期轨迹。然后,利用谐波频率为基音频率整数倍的性质来拾取各次谐波的频谱。最后,通过傅里叶逆变换对浊音进行重构。实验结果表明,该方法能准确提取基音周期轨迹,有效分离浊音信号。  相似文献   

5.
In order to study the relationship between emotion and intonation, a new technique is introduced for the extraction of the dominant pitches within speech utterances and the quasi-musical analysis of the multipitch structure. After the distribution of fundamental frequencies over the entire utterance has been obtained, the underlying pitch structure is determined using an unsupervised "cluster" (Gaussian mixtures) algorithm. The technique normally results in 3-6 pitch clusters per utterance that can then be evaluated in terms of their inherent dissonance, harmonic "tension", and "major or minor modality". Stronger dissonance and tension were found in utterances with negative affect, relative to utterances with positive affect. Most importantly, utterances that were evaluated as having positive or negative affect had significantly different modality values. Factor analysis showed that the measures involving multiple pitches were distinct from other acoustical measures, indicating that the pitch substructure is an independent factor contributing to the affective valence of speech prosody.  相似文献   

6.
文章对语音信号的能量呈现周期性变化的现象进行了研究,提出了一种基于周期能量变化的基音检测方法。采用一个长帧语音信号平均能量作为基准参考能量,用一个短帧语音信号平均能量来表示瞬时能量。由二者可算出瞬时能量测度(InstantEnergyMeasure,IEM)序列。对瞬时能量测度序列进行自相关计算后,得到基音周期。实验分析表明,该方法在干净和噪声环境下,基音周期提取和清浊判决具有令人满意的效果。  相似文献   

7.
基于语音参数模型的语音隐藏算法   总被引:13,自引:0,他引:13  
陈亮  张雄伟 《计算机学报》2003,26(8):974-981
基于语音参数模型,该文提出一种将保密语音隐藏在公开语音中的信息隐藏算法.首先将保密语音经混合激励线性预测(MELP)编码和纠错编码形成隐藏信息.然后根据瞬态互相关基音周期检测算法确定频域嵌入点,并通过修改对应的DFT系数来隐藏信息.提取时按同样方法确定嵌入点恢复隐藏信息,并经MELP解码还原保密语音.实验结果表明嵌入信息后,中间语音的分段平均信噪比接近60dB,并且在受到压缩、滤波等攻击时具有较强的鲁棒性.算法为信息安全和数字水印领域研究开辟了一条新的途径.  相似文献   

8.
Higher quality synthesized speech is required for widespread use of text-to-speech (TTS) technology, and the prosodic pattern is the key feature that makes synthetic speech sound unnatural and monotonous, which mainly describes the variation of pitch. The rules used in most Chinese TTS systems are constructed by experts, with weak quality control and low precision. In this paper, we propose a combination of clustering and machine learning techniques to extract prosodic patterns from actual large mandarin speech databases to improve the naturalness and intelligibility of synthesized speech. Typical prosody models are found by clustering analysis. Some machine learning techniques, including Rough Set, Artificial Neural Network (ANN) and Decision tree, are trained for fundamental frequency and energy contours, which can be directly used in a pitch-synchronous-overlap-add-based (PSOLA-based) TTS system. The experimental results showed that synthesized prosodic features greatly resembled their original counterparts for most syllables.  相似文献   

9.
针对基音周期检测中容易出现的半周期和倍周期错误,综合考虑了常用的小波变换和短时自相关方法的优缺点,以及相邻基音周期长度的渐变性,提出了把两者相结合的基音周期检测算法.对语音信号进行清浊音检测和前置带通滤波,利用小波变换方法进行初步检测,对基音周期变化过大的情况使用自相关方法进行验证.实验结果表明,该方法在不同信噪比下的基音周期检测准确率都明显高于普通的小波变换检测方法.同时,该方法还有助于通过人工方式快速修正基音周期.  相似文献   

10.
基频(也称音高或F0)及其变化规律是语音信号的一个重要特征。语音作为一种近似周期性的信号,准确提取音频的基频特征参数对语音的后期处理如语音识别等有重要意义。许多学者也在此做了大量的研究工作,并提出了相关算法,取得了较好的结果。文章对语音信号的基频提取算法进行研究,做了一番系统的梳理和简介。  相似文献   

11.
In this paper, a novel method for voiced-unvoiced decision within a pitch tracking algorithm is presented. Voiced-unvoiced decision is required for many applications, including modeling for analysis/synthesis, detection of model changes for segmentation purposes and signal characterization for indexing and recognition applications. The proposed method is based on the generalized likelihood ratio test (GLRT) and assumes colored Gaussian noise with unknown covariance. Under voiced hypothesis, a harmonic plus noise model is assumed. The derived method is combined with a maximum a-posteriori probability (MAP) scheme to obtain a pitch and voicing tracking algorithm. The performance of the proposed method is tested using several speech databases for different levels of additive noise and phone speech conditions. Results show that the GLRT is robust to speaker and environmental conditions and performs better than existing algorithms.  相似文献   

12.
话者转换就是将A的语音转换为具有B发音特征的语音而保持内容不变。发音时基音周期是变化的,在语音转换的两话者特征参数匹配阶段,由于窗内语音信号周期不同,采用固定窗进行语音参数提取会造成了一定程度的匹配误差。提出的变滑动窗是按语音信号的基音周期变化来选择不同长度的滑动窗进行语音分割,这使得每个窗内的包含相同周期的语音信号,从而消除了由语音信号不同产生的参数差异。实验证明该方法提高了话者转换的效果。  相似文献   

13.
在对人用视觉提取基音周期过程模拟的基础上,提出一种基于语音波形外观形状的时域基音周期提取算法,该算法利用语音波形的一次峰值点和二次峰值点的幅度和位置以及峰到前峰的距离等几种属性,来判断决定基音周期值,具有算法简单、运算量小、能准确定位各基音周期位置的特点。  相似文献   

14.
噪声环境下的基音检测在语音信号处理中占有重要地位。为了有效提取低信噪比情况下的语音基音周期,提出了一种基于小波包变换加权线性预测自相关的检测方法。该方法首先利用小波包自适应阈值消除噪声,将多级小波包变换的近似分量求和以突出基音信息,并采用小波包系数加权线性预测误差自相关的方法突出基音周期处的峰值,提高了基音周期检测的精度。实验结果表明,与传统的自相关法、小波加权自相关法相比,该方法鲁棒性好,基音轨迹平滑,具有更高的准确性,即使在信噪比为-5dB时仍能取得较为理想的结果。  相似文献   

15.
经过多年的研究和发展,从声音中抽取精确的基频或音高,已经呈现了相当多的算法,很多算法仍然具有较高的错误率,达不到工业要求.用当前主要的基频提取方法,抽取流行音乐的音高,找出一个错误率最低的音高,提取算法去开发多媒体软件——麦客风.  相似文献   

16.
A new method for predicting pitch contour of a speech signal using a small number of pitch values is addressed, for the application of very low rate speech coding, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments. To track the phonetic evolution and specify perceptually significant time points, Temporal Decomposition (TD) is used. TD provides information required for both determination of critical pitch values and estimation of pitch contour by detecting event functions, as interpolation paths, and their centroids, as the most steady points, in the spectral parameters space. It is shown that the proposed method reduces the amount of pitch information to about one-tenth of that in conventional frame-by-frame based techniques with less than 5% error in pitch approximation.  相似文献   

17.
一种使用声调映射码本的汉语声音转换方法   总被引:3,自引:0,他引:3  
在使用高斯混合模型实现说话人语音频谱包络变换的同时,提出了一种汉语声调码本映射技术来进一步提高转换语音目标说话人特征倾向性的方法。从源语音和目标语音分别提取汉语单音节的基频曲线作为基频变换单元,作预处理和聚类后分别形成源、目标声调码本,根据时间对准原则建立了一个由源特征空间到目标特征空间的声调模式映射码本。声音转换实验评估了声调码本映射算法的性能。实验结果表明,该算法较好地反映出源说话人与目标说话人基频曲线之间的映射关系,改善了声音转换性能。  相似文献   

18.
A novel and robust pitch estimation method is presented in this paper. The basic idea is to reshape the speech signal using a combination of the dominant harmonic modification (DHM) and data adaptive time domain filtering techniques. The noisy speech signal is filtered within the ranges of fundamental frequencies to obtain the pre-filtered signal (PFS). The dominant harmonic (DH) of the PFS is determined and enhanced its amplitude. Normalized autocorrelation function (NACF) is applied to that modified signal. Then empirical mode decomposition (EMD) based data adaptive time domain filtering is applied to the NACF signal. Partial reconstruction is performed in EMD domain. The pitch period is determined from the partially reconstructed signal. The experimental results show that the proposed method performs better than the other recently developed methods for noisy and clean speech signals in terms of gross and fine pitch errors.  相似文献   

19.
The use of microphone arrays offers enhancements of speech signals recorded in meeting rooms and office spaces. A common solution for speech enhancement in realistic environments with ambient noise and multi-path propagation is the application of so-called beamforming techniques. Such beamforming algorithms enhance signals at the desired angle using constructive interference while attenuating signals coming from other directions by destructive interference. However, these techniques require as a priori the time difference of arrival information of the source. Therefore, the source localization and tracking algorithms are an integral part of such a system. The conventional localization algorithms deteriorate in realistic scenarios with multiple concurrent speakers. In contrast to conventional methods, the techniques presented in this paper make use of pitch information of speech signals in addition to the location information. This “position–pitch”-based algorithm pre-processes the speech signals by a multiband gammatone filterbank that is inspired from the auditory model of the human inner ear. The role of this gammatone filterbank is analyzed and discussed in details. For a robust localization of multiple concurrent speakers, a frequency-selective criterion is explored that is based on a study of the human neural system's use of correlations between adjacent sub-band frequencies. This frequency-selective criterion leads to improved localization performance. To further improve localization accuracy, an algorithm based on grouping of spectro-temporal regions formed by pitch cues is presented. All proposed speaker localization algorithms are tested using a multichannel database where multiple concurrent speakers are active. The real-world recordings were made with a 24-channel uniform circular microphone array using loudspeakers and human speakers under various acoustic environments including moving concurrent speaker scenarios. The proposed techniques produced a localization performance that was significantly better than the state-of-the-art baseline in the scenarios tested.  相似文献   

20.
Pitch detection methods are widely used for extracting musical data from digital signals. A review of those methods is presented in the paper. Since musical signals may contain noise and distortion, detection results can be erroneous. In this paper a new method employing music prediction to support pitch determination is introduced. This method was developed in order to override disadvantages of standard pitch detection algorithms. The new approach utilizes signal segmentation and pitch prediction based on musical knowledge extraction employing artificial neural networks. Signal segmentation allows for estimating the pitch for a single note as a whole, therefore suppressing errors in transient and decay phases. Pitch prediction helps correcting pitch estimation errors by tracking musical context of the analyzed signal. As it was shown in the experimental results, pitch estimation errors may be reduced by using both signal segmentation and music prediction techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号