首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 139 毫秒
1.
A hybrid pitch detector characterised by parallel analysis of the speech signal in temporal, spectral and cepstral domains is proposed. The voiced/unvoiced decision and pitch period evaluation is realised by a logical analysis of the results from three domains. The experimental analysis shows the robustness of the detector for noisy and telephone speech.<>  相似文献   

2.
The rate of oscillation of the vocal cords known as the pitch is an important sound feature that is useful in many speech applications. A novel approach for the automatic detection and estimation of the rate of oscillation of the vocal cords is described. The importance of this approach stems from the fact that pitch determination is conducted using three independent stages: a segmentation stage; a voiced-unvoiced classification stage; and a pitch estimation stage. Segmentation and the detection of voiced segments are implemented prior to pitch estimation in order to: exclude unvoiced sounds and silence from biasing the result of pitch estimation; employ a simple segmentation procedure with low computational complexity and time-delay; enhance the accuracy of voiced-unvoiced classification by including additional features in voicing detection; help pitch tracking by testing similarities over successive segments and to make use of a different analysis domain that enables a high resolution pitch estimation. A frequency-domain maximum likelihood procedure is used for the estimation of the pitch frequency of voiced segments by maximizing a log-likelihood function over the range of possible pitch frequencies in conversational speech. An efficient simplified realization of the generalized likelihood ratio segmentation method is also presented. Computer simulations on a number of utterances show that this approach gives an accurate, reliable and robust estimation of the pitch of voiced sounds.  相似文献   

3.
李晔  樊燕红  郝秋赟  郭强 《电声技术》2010,34(12):51-53
基于增强型混合激励线性预测模型,提出一种高质量的300 bit/s声码器算法。每个语音帧仅提取少量参数,为提高量化效率,每8个语音帧组成一个超级帧,对超级帧参数进行矢量量化。算法采用基于模式转移的码本映射估计带通浊音度参数,改善其量化精度。对不同带通浊音度模式下的基音参数量化码本尺寸进行联合优化,提高量化效率。同时,对线谱频率参数采用带有级间预测的多级矢量量化以降低谱失真。主观听觉测试表明,此声码器具有较高的可懂度并具有一定的自然度,诊断押韵测试(DRT)的分数为84.2%。  相似文献   

4.
In this work, six voiced/unvoiced speech classifiers based on the autocorrelation function (ACF), average magnitude difference function (AMDF), cepstrum, weighted ACF (WACF), zero crossing rate and energy of the signal (ZCR-E), and neural networks (NNs) have been simulated and implemented in real time using the TMS320C6713 DSP starter kit. These speech classifiers have been integrated into a linear-predictive-coding-based speech analysis-synthesis system and their performance has been compared in terms of the percentage of the voiced/unvoiced classification accuracy, speech quality, and computation time. The results of the percentage of the voiced/unvoiced classification accuracy and speech quality show that the NN-based speech classifier performs better than the ACF-, AMDF-, cepstrum-, WACF- and ZCR-E-based speech classifiers for both clean and noisy environments. The computation time results show that the AMDF-based speech classifier is computationally simple, and thus its computation time is less than that of other speech classifiers, while that of the NN-based speech classifier is greater compared with other classifiers.  相似文献   

5.
该文针对传统算法在实环境(不同噪声类型和信噪比)下容易发生清浊误判和基音估计错误问题,提出一种基于幅度压缩基音估计滤波(PEFAC)的清浊音分类及基音估计方法。首先,通过PEFAC削弱语音的低频噪声,提取出基音谐波;然后,采用基于对称平均幅度和函数的脉冲序列加权算法(SIM)确定谐波数目;最后,利用动态规划估计出基音,用基于3元素特征矢量的高斯混合模型对清浊音进行分类。仿真结果表明,在实环境下,所提方法能有效抑制清浊误判及基音估计错误现象的发生,性能优于传统方法。  相似文献   

6.
Unvoiced/voiced classification of speech is a challenging problem especially under conditions of low signal-to-noise ratio or the non-white-stationary noise environment. To solve this problem, an algorithm for speech classification, and a technique for the estimation of pairwise magnitude frequency in voiced speech are proposed. By using third order spectrum of speech signal to remove noise, in this algorithm the least spectrum difference to get refined pitch and the max harmonic number is given. And this algorithm utilizes spectral envelope to estimate signal-to-noise ratio of speech harmonics. Speech classification, voicing probability, and harmonic parameters of the voiced frame can be obtained. Simulation results indicate that the proposed algorithm, under complicated background noise, especially Gaussian noise, can effectively classify speech in high accuracy for voicing probability and the voiced parameters.  相似文献   

7.
Combined use of different laryngeal frequency information furnished simultaneously by short-time Fourier transform and by instantaneous frequency distribution function leads to a higher precision of the frequency estimation. Two parameters are defined : the spectral dissymmetry coefficient and the “inter-correlation function. These parameters give a clear insight into the nature of the analysed speech windows, leading to an exact detection of the voicing feature. The exploitation of the sensibility with respect to the nature of the speech windows enables the extraction of voiced/mixed/unvoiced decision and also the specification of the mixed source.  相似文献   

8.
李碧洲  姚峰英  张敏 《电子学报》1999,27(5):136-138
本文提出的声码器将语音分成静音、清音、浊音和混合音四类。用自适应方法进行分频带清浊音判决和有声/无声判决,提高了分类算法的稳定性、准确性和灵活性、准确性和灵活性,还保持了混合语音的音质,且无须对清浊音判决结果进行编码。对清音和浊音的频谱分别采用不同的LSP量化表进行编码,从而用标量量化器替代子矢量量化器,降低了复杂度。声码器的码率最高2.4kbps,最低为100bps,平均码率1.4kbps。实时  相似文献   

9.
Following a brief portrayal of the activities in 2.4-kbps speech coding, a wavelet-based pitch detector is invoked, which reduces the complexity of conventional autocorrelation-based pitch detectors, while ensuring smooth pitch trajectory evolution. This scheme is incorporated in a waveform-interpolated codec, which uses voiced-unvoiced (V/U) classification, and instead of simple Dirac pulses, an unconventional zinc basis function excitation is employed for modeling the voiced excitation. The required zinc-function parameters are determined in an analysis-by-synthesis loop, and for the sake of smooth waveform evolution and reduced complexity, a focused search strategy and a few further suboptimum restrictions are imposed without seriously affecting the speech quality. This baseline codec operates at a rate of 1.9 kbps, but it suffers from slight buzziness during the periods of excessive voicing. This impediment is then mitigated by invoking a mixed V/U multiband excitation, which slightly increases the bit rate to 2.35 kbps due to the transmission of the 3-b voicing strength code in each of the three excitation bands  相似文献   

10.
The perceptual quality of VoIP conversations depends tightly on the pattern of packet losses, i.e., the distribution and duration of packet loss runs. The wider (resp. smaller) the inter-loss gap (resp. loss gap) duration, the lower is the quality degradation. Moreover, a set of speech sequences impaired using an identical packet loss pattern results in a different degree of perceptual quality degradation because dropped voice packets have unequal impact on the perceived quality. Therefore, we consider the voicing feature of speech wave included in lost packets in addition to packet loss pattern to estimate speech quality scores. We distinguish between voiced, unvoiced, and silence packets. This enables to achieve better correlation and accuracy between human-based subjective and machine-calculated objective scores.  相似文献   

11.
A complete algorithm of a 1200-bits/s digital formant vocoder system is described. This vocoder algorithm draws heavily on the results of recent research in linear predictive coding. The transmitting parameters are frequencies and amplitudes of the first three formants, the pitch period, voiced/unvoiced decision, and the gain. Formant bandwidths are estimated at the synthesizer by using the amplitude information. The synthesizer structure is in the parallel form. The synthetic speech quality at 1200 bits/s is reasonably good; most of the speech is intelligible and speaker-recognizable.  相似文献   

12.
胡瑛  陈宁 《电声技术》2006,(11):63-66
提出了一种基于小波变换的鲁棒性基音周期检测方法。首先结合平均能量频带分布和短时过零率这两个特征参数对语音信号进行清浊音判决,然后对浊音段采用空域相关函数提取基音周期。实验表明,与传统的小波变换和自相关算法相比,该方法鲁棒性好,对基音检测具有更高的准确性。  相似文献   

13.
一种改进的自相关基音检测算法   总被引:3,自引:0,他引:3  
胡瑛  陈宁  夏旭 《电子科技》2007,(2):25-28
提出了一种改进的ACF基音检测算法。检测前在小波域上用Teager能量算子对语音信号进行清浊音判决,在基音检测过程的前端和末端加入了有效的预处理和后处理技术。实验结果表明,该算法比传统的自相关算法具有更高的准确性,在低信噪比下,基音周期提取和清浊判决具有令人满意的效果。  相似文献   

14.
基于归一化互相关函数的基音检测算法   总被引:34,自引:2,他引:32  
鲍长春  樊昌信 《通信学报》1998,19(10):27-31
本文提出了一种归一化互相关函数基音检测算法(NCCFPDA——NormalizedCross-CorrelationFunctionPitchDetectionAlgorithm),该算法在基音检测主过程的前端和末端加入了有效的预处理和后处理技术,实验分析表明,该方法在一般噪声环境下,基音周期提取和清浊判决具有令人满意的效果。  相似文献   

15.
一种改进的自相关函数基音检测算法   总被引:3,自引:0,他引:3  
提出了一种改进的自相关函数基音检测算法:首先利用清浊音自相关函数幅度值的不同性质进行清浊音判决,然后仅对浊音段检测基音周期。在基音检测前用带通滤波、中心削波和数值滤波等方法进行预处理,去除共振峰和高频噪声的影响;在基音检测过程的后端用搜索平滑方法进行后处理,平滑掉半、倍频点和随机错误点。实验结果表明,改进算法的效果优于传统自相关算法,而且在信噪比低至5 dB时仍有良好的清浊音判决和基音检测性能。  相似文献   

16.
该文提出了一种码率为 0.75-5.4kb/s可变速率的高质量语音编码讲法。该算法对CELP的激励进行了改进,根据语音的特征把语音分成4类,不同类型的语音采用不同的激励码本。特别是对于浊音,提出了一种基于基音同步的嵌入分裂式激励码本,该码本利用浊音具有准周期性的特点,使该算法在很低的码率下就可很好地恢复浊音信号,克服了CELP在4kb/s速率以下因码本尺寸小而导致合成语音质量差的缺点。经非正式听音测试,它的主观质量超过了1~8kb/s的可变速率QCELP系统,并且平均速率大约只有2kb/s,比QCELP的5kb/s平均速率低了很多、非常适用于 CDMA移动通信系统。  相似文献   

17.
基于离散余弦变换的波形内插语音编码算法   总被引:2,自引:0,他引:2       下载免费PDF全文
刘靖宇  鲍长春  李如玮 《电子学报》2009,37(7):1599-1605
 针对波形内插(Waveform Interpolation,WI)语音编码的特征波形分解问题,本文首先提出了基于离散余弦变换(Discrete Cosine Transform,DCT)的特征波形分解方法,避免了复杂的特征波形对齐运算;其次,针对WI的相位重建问题,提出了清/浊音相位判决和浊音相位分类的方法,提高了重建语音质量;最后,分别构建了速率为2.0kbps和1.6kbps的DCT-WI声码器.主观MOS分表明,2.0kbps的DCT-WI声码器质量优于2.4kbps MELP声码器,1.6kbps的DCT-WI声码器亦取得了良好的听觉效果.  相似文献   

18.
基于MBE算法的一种新的语音合成方法   总被引:1,自引:0,他引:1  
MBE音编译码算法的关键是合成语音的连续性,本文在研究MBE语音编译码模型、方法的基础上,提出用一个慢变频正弦信号产生窄带等功率谱信号作激励产生清音,同时采用增衰因子,把清音、浊音的合成统一到时域方法下的新算法,进一步保证了合成语音的度及其音边续性,对在4.8kbps计算机模拟结果的非正式听音表明,新的算法较传统MBE在音质上自然度更好。  相似文献   

19.
In speech processing an estimation of the speech pitch period is important. Real time pitch detection is only possible by the selection of an efficient algorithm suitable for implementation on a programmable processor or in special-purpose hardware. The use of the periodogram algorithm (p.a.) is proposed to detect the pitch period of voiced speech. This algorithm is attractive for the following reasons: (a) it has no multiply operation; (b) when implemented on a 16-bit computer (e.g. microprocessor) the computation can be done in integer arithmetic without exceeding the microprocessor's dynamic range; (c) it is a simple technique for estimating the pitch period with reasonable accuracy. Results of the analysis of speech signals and sinusoids using the periodogram algorithm are presented and comparisons are made with the average magnitude difference function (a.m.d.f.) which is an alternative method of estimating the pitch period of the voiced speech.  相似文献   

20.
In this article, we concentrate on spectral estimation techniques that are useful in extracting the features to be used by automatic speech recognition (ASR) system. As an aid to understanding the spectral estimation process for speech signals, we adopt the source filter model of speech production as presented in X. Huang et al. (2001), wherein speech is divided into two broad classes: voiced and unvoiced. Voiced speech is quasi-periodic, consisting of a fundamental frequency corresponding to the pitch of a speaker, as well as its harmonics. Unvoiced speech is stochastic in nature and is best modeled as white noise convolved with an infinite impulse response filter.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号