首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
李碧洲  姚峰英  张敏 《电子学报》1999,27(5):136-138
本文提出的声码器将语音分成静音、清音、浊音和混合音四类。用自适应方法进行分频带清浊音判决和有声/无声判决,提高了分类算法的稳定性、准确性和灵活性、准确性和灵活性,还保持了混合语音的音质,且无须对清浊音判决结果进行编码。对清音和浊音的频谱分别采用不同的LSP量化表进行编码,从而用标量量化器替代子矢量量化器,降低了复杂度。声码器的码率最高2.4kbps,最低为100bps,平均码率1.4kbps。实时  相似文献   

2.
该文针对传统算法在实环境(不同噪声类型和信噪比)下容易发生清浊误判和基音估计错误问题,提出一种基于幅度压缩基音估计滤波(PEFAC)的清浊音分类及基音估计方法。首先,通过PEFAC削弱语音的低频噪声,提取出基音谐波;然后,采用基于对称平均幅度和函数的脉冲序列加权算法(SIM)确定谐波数目;最后,利用动态规划估计出基音,用基于3元素特征矢量的高斯混合模型对清浊音进行分类。仿真结果表明,在实环境下,所提方法能有效抑制清浊误判及基音估计错误现象的发生,性能优于传统方法。  相似文献   

3.
一种改进的自相关基音检测算法   总被引:3,自引:0,他引:3  
胡瑛  陈宁  夏旭 《电子科技》2007,(2):25-28
提出了一种改进的ACF基音检测算法。检测前在小波域上用Teager能量算子对语音信号进行清浊音判决,在基音检测过程的前端和末端加入了有效的预处理和后处理技术。实验结果表明,该算法比传统的自相关算法具有更高的准确性,在低信噪比下,基音周期提取和清浊判决具有令人满意的效果。  相似文献   

4.
The authors describe an integrated speech feature extraction method consisting of: (1) a pitch detector; (2) a voicing decision to correctly partition speech into voiced and unvoiced intervals; (3) a confidence measure which reflects the probabilistic accuracy of the voicing decision; (4) a confidence measure which reflects the expected deviation of the pitch estimate from the true pitch and the probabilistic accuracy of this deviation; and (5) smoothing techniques for the pitch detector, the voicing decision, and the two confidence measures. The focus of their research is on voiced and unvoiced speech corrupted by high levels of white noise. The voicing decision and the confidence measures are developed by observing the behavior of three features derived from the autocorrelation function and experimentally fitting curves to the data. This integrated set of algorithms is statistically analyzed for speech at seven signal-to-noise ratios  相似文献   

5.
该文提出了一种码率为 0.75-5.4kb/s可变速率的高质量语音编码讲法。该算法对CELP的激励进行了改进,根据语音的特征把语音分成4类,不同类型的语音采用不同的激励码本。特别是对于浊音,提出了一种基于基音同步的嵌入分裂式激励码本,该码本利用浊音具有准周期性的特点,使该算法在很低的码率下就可很好地恢复浊音信号,克服了CELP在4kb/s速率以下因码本尺寸小而导致合成语音质量差的缺点。经非正式听音测试,它的主观质量超过了1~8kb/s的可变速率QCELP系统,并且平均速率大约只有2kb/s,比QCELP的5kb/s平均速率低了很多、非常适用于 CDMA移动通信系统。  相似文献   

6.
齐峰岩  鲍长春 《电子学报》2006,34(4):605-611
本文将支持向量机(SVM)方法应用于语音信号的清/浊/静音检测中,提出并验证了一种在各种信噪比等级下将语音信号有效地分为清音、浊音和静音三类信号的新型分类算法.首先,在高信噪比情况下,本文采用了G.729B VAD中的四个差分参数作为SVM分类器的输入特征参数,进行了静音分类的对比实验,得到了优于G.729B VAD和BP神经网络传统算法的实验结果,说明引入这种机器学习方法做语音分类是可行的,并分析讨论了在核函数不同的情况下支持向量机在实验中所表现出的性能.其次,又讨论了在低信噪比条件下,如何通过对含噪语音建立统计模型,提取对噪音免疫的统计特征参数,并给出了一种对时变背景噪声自适应的估计方法.最后,通过在不同噪音环境下的对比实验结果,验证了本文所提出的算法在中低信噪比情况下的分类性能要优于其他传统算法.  相似文献   

7.
胡瑛  陈宁 《电声技术》2006,(11):63-66
提出了一种基于小波变换的鲁棒性基音周期检测方法。首先结合平均能量频带分布和短时过零率这两个特征参数对语音信号进行清浊音判决,然后对浊音段采用空域相关函数提取基音周期。实验表明,与传统的小波变换和自相关算法相比,该方法鲁棒性好,对基音检测具有更高的准确性。  相似文献   

8.
Sandeep Kumar 《ETRI Journal》2016,38(3):425-434
A novel average magnitude difference function (AMDF)‐based pitch detection scheme (PDS) is proposed to achieve better performance in speech quality. A performance evaluation of the proposed PDS is carried out through both a simulation and a real‐time implementation of a speech analysis‐synthesis system. The parameters used to compare the performance of the proposed PDS with that of PDSs that are based on either a cepstrum, an autocorrelation function (ACF), an AMDF, or circular AMDF (CAMDF) methods are as follows: percentage gross pitch error (%GPE); a subjective listening test; an objective speech quality assessment; a speech intelligibility test; a synthesized speech waveform; computation time; and memory consumption. The proposed PDS results in lower %GPE and better synthesized speech quality and intelligibility for different speech signals as compared to the cepstrum‐, ACF‐, AMDF‐, and CAMDF‐based PDSs. The computational time of the proposed PDS is also less than that for the cepstrum‐, ACF‐, and CAMDF‐based PDSs. Moreover, the total memory consumed by the proposed PDS is less than that for the ACF‐ and cepstrum‐based PDSs.  相似文献   

9.
基于多分类器投票组合的语音情感识别   总被引:2,自引:0,他引:2  
为了提高语音情感的正确识别率,提出一种基于多分类器投票组合的语音情感识别新方法.在提取情感语音的韵律特征和音质特征基础上,利用投票方法将支持向量机、K近邻法和人工神经网络三种分类器构成组合分类器,实现对汉语生气、高兴、悲伤和惊奇4种主要情感类型的识别.实验结果表明,与使用单一分类器相比,组合分类器对语音情感的识别取得了87.4%的平均正确识别率,识别效果优于单一分类器.  相似文献   

10.
陈红红  刘加 《电声技术》2011,35(10):47-50
研究了音频信息处理中一项重要的预处理工作:语音音乐分类.针对语音信号处理中遇到的实际问题,选择合适的音频特征和分类器来对音频数据进行语音和音乐分类.采用二级系统,选择优化低能量率( Modified Low Energy Ratio,MLER)以及梅尔频谱倒谱系数(Mel Frequency Cepstral Coef...  相似文献   

11.
基于MBE算法的一种新的语音合成方法   总被引:1,自引:0,他引:1  
MBE音编译码算法的关键是合成语音的连续性,本文在研究MBE语音编译码模型、方法的基础上,提出用一个慢变频正弦信号产生窄带等功率谱信号作激励产生清音,同时采用增衰因子,把清音、浊音的合成统一到时域方法下的新算法,进一步保证了合成语音的度及其音边续性,对在4.8kbps计算机模拟结果的非正式听音表明,新的算法较传统MBE在音质上自然度更好。  相似文献   

12.
In speech processing an estimation of the speech pitch period is important. Real time pitch detection is only possible by the selection of an efficient algorithm suitable for implementation on a programmable processor or in special-purpose hardware. The use of the periodogram algorithm (p.a.) is proposed to detect the pitch period of voiced speech. This algorithm is attractive for the following reasons: (a) it has no multiply operation; (b) when implemented on a 16-bit computer (e.g. microprocessor) the computation can be done in integer arithmetic without exceeding the microprocessor's dynamic range; (c) it is a simple technique for estimating the pitch period with reasonable accuracy. Results of the analysis of speech signals and sinusoids using the periodogram algorithm are presented and comparisons are made with the average magnitude difference function (a.m.d.f.) which is an alternative method of estimating the pitch period of the voiced speech.  相似文献   

13.
董恩清  刘贵忠  周亚同  顿玉洁 《电子学报》2001,29(10):1364-1367
文中主要对王永忠等提出的灵活分割算法存在的问题做了相应的改进,并做了比较分析,然后将改进后的分割算法应用于语音信号的清-浊音自动分割中.经过大量的理论模型与实际语音信号验证该改进后的算法确实解决了二进分割算法及王永忠方法存在的问题,达到了对信号自适应有效分割.仍然采用Wesfreid等提出的清-浊音识别准则,将新的分割方法应用到实际语音信号的清-浊音自动分割中,不仅同样产生较好划分结果,而且在时间上没有过多的冗余分割.  相似文献   

14.
This paper presents several strategies to improve the performance of very low bit rate speech coders and describes a speech codec that incorporates these strategies and operates at an average bit rate of 1.2 kb/s. The encoding algorithm is based on several improvements in a mixed multiband excitation (MMBE) linear predictive coding (LPC) structure. A switched-predictive vector quantiser technique that outperforms previously reported schemes is adopted to encode the LSF parameters. Spectral and sound specific low rate models are used in order to achieve high quality speech at low rates. An MMBE approach with three sub-bands is employed to encode voiced frames, while fricatives and stops modelling and synthesis techniques are used for unvoiced frames. This strategy is shown to provide good quality synthesised speech, at a bit rate of only 0.4 kb/s for unvoiced frames. To reduce coding noise and improve decoded speech, spectral envelope restoration combined with noise reduction (SERNR) postfilter is used. The contributions of the techniques described in this paper are separately assessed and then combined in the design of a low bit rate codec that is evaluated against the North American Mixed Excitation Linear Prediction (MELP) coder. The performance assessment is carried out in terms of the spectral distortion of LSF quantisation, mean opinion score (MOS), A/B comparison tests and the ITU-T P.862 perceptual evaluation of speech quality (PESQ) standard. Assessment results show that the improved methods for LSF quantisation, sound specific modelling and synthesis and the new postfiltering approach can significantly outperform previously reported techniques. Further results also indicate that a system combining the proposed improvements and operating at 1.2 kb/s, is comparable (slightly outperforming) a MELP coder operating at 2.4 kb/s. For tandem connection situations, the proposed system is clearly superior to the MELP coder.  相似文献   

15.
The rate of oscillation of the vocal cords known as the pitch is an important sound feature that is useful in many speech applications. A novel approach for the automatic detection and estimation of the rate of oscillation of the vocal cords is described. The importance of this approach stems from the fact that pitch determination is conducted using three independent stages: a segmentation stage; a voiced-unvoiced classification stage; and a pitch estimation stage. Segmentation and the detection of voiced segments are implemented prior to pitch estimation in order to: exclude unvoiced sounds and silence from biasing the result of pitch estimation; employ a simple segmentation procedure with low computational complexity and time-delay; enhance the accuracy of voiced-unvoiced classification by including additional features in voicing detection; help pitch tracking by testing similarities over successive segments and to make use of a different analysis domain that enables a high resolution pitch estimation. A frequency-domain maximum likelihood procedure is used for the estimation of the pitch frequency of voiced segments by maximizing a log-likelihood function over the range of possible pitch frequencies in conversational speech. An efficient simplified realization of the generalized likelihood ratio segmentation method is also presented. Computer simulations on a number of utterances show that this approach gives an accurate, reliable and robust estimation of the pitch of voiced sounds.  相似文献   

16.
针对基于局部二值模式的伪装语音检测方法的合成语音检测准确度较低的情况,提出了一种基于中心对称局部二值模式的伪装语音检测方法。该方法通过短时傅里叶变换得到语音信号的语谱图,再利用中心对称局部二值模式提取语谱图的纹理特征,并用该纹理特征训练随机森林分类器,从而实现真伪语音的判别。该方法综合考虑语谱图中像素点的数值大小和位置关系,包含了更加全面的纹理信息,并将特征维度降低至16维,有利于减少计算量。实验结果表明,在ASVspoof 2019数据集上,与传统的基于局部二值模式的伪装语音检测方法相比,所提方法将合成伪装语音的串联检测代价函数(t-DCF)降低了16.98%,检测速度提高了89.73%。  相似文献   

17.
基于离散余弦变换的波形内插语音编码算法   总被引:2,自引:0,他引:2       下载免费PDF全文
刘靖宇  鲍长春  李如玮 《电子学报》2009,37(7):1599-1605
 针对波形内插(Waveform Interpolation,WI)语音编码的特征波形分解问题,本文首先提出了基于离散余弦变换(Discrete Cosine Transform,DCT)的特征波形分解方法,避免了复杂的特征波形对齐运算;其次,针对WI的相位重建问题,提出了清/浊音相位判决和浊音相位分类的方法,提高了重建语音质量;最后,分别构建了速率为2.0kbps和1.6kbps的DCT-WI声码器.主观MOS分表明,2.0kbps的DCT-WI声码器质量优于2.4kbps MELP声码器,1.6kbps的DCT-WI声码器亦取得了良好的听觉效果.  相似文献   

18.
Acoustical measures of vocal function are routinely used in the assessments of disordered voice, and for monitoring the patient's progress over the course of voice therapy. Typically, acoustic measures are extracted from sustained vowel stimuli where short-term and long-term perturbations in fundamental frequency and intensity, and the level of "glottal noise" are used to characterize the vocal function. However, acoustic measures extracted from continuous speech samples may well be required for accurate prediction of abnormal voice quality that is relevant to the client's "real world" experience. In contrast with sustained vowel research, there is relatively sparse literature on the effectiveness of acoustic measures extracted from continuous speech samples. This is partially due to the challenge of segmenting the speech signal into voiced, unvoiced, and silence periods before features can be extracted for vocal function characterization. In this paper we propose a joint time-frequency approach for classifying pathological voices using continuous speech signals that obviates the need for such segmentation. The speech signals were decomposed using an adaptive time-frequency transform algorithm, and several features such as the octave max, octave mean, energy ratio, length ratio, and frequency ratio were extracted from the decomposition parameters and analyzed using statistical pattern classification techniques. Experiments with a database consisting of continuous speech samples from 51 normal and 161 pathological talkers yielded a classification accuracy of 93.4%.  相似文献   

19.
基于局部余弦变换的低比特变速率语音编码算法研究   总被引:1,自引:0,他引:1  
提出将局部余弦变换(LCT)算法应用于语音编码中,系统设计了一个平均比特率近1.6kbit/s的低比特变速率语音编码器。在变比特率编码器设计中采用SVM算法进行VAD检测。激活语音帧的语音模式采用GSM半速率编码中的划分方法,但将其中的强浊音模式和中浊音模式合并为一个中强浊音模式。对各类语音模式和无声帧(背景噪声)的局部余弦变换系数采用分维矢量量化算法进行量化,码书设计采用LGB算法。编码中的码书搜索采用树形快速搜索算法。通过主观非正式听力测试表明设计的变比特率编码器编码的重建语音MOS约为3.15,与比特率为2.4kbit/s美国联邦声码器标准MELP的重建语音相当,具有较强的顽健性,适合于对存在各种环境噪声的语音进行编码。  相似文献   

20.
一种新的子波域语音增强方法   总被引:7,自引:0,他引:7  
王振力  张雄伟  郑翔  杨剑 《信号处理》2006,22(3):325-328
提出了一种新的子波域语音增强法,即首先对带噪语音进行1层离散小波变换,然后对提取出来的低频信号和高频信号分别作3层DWT和3层小波包分解,最后对去噪后的语音完成重构。为了在降噪过程中减少清音信息的损失, 文中对语音信号进行了清浊音判决并分别采用多阈值进行处理。计算机仿真结果表明,经本文方法增强语音的清音成分得到了较好保留,并且增强语音的主客观质量均优于DWT去噪法和WPD去噪法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号