期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

徐静云赵晓群王峤王缔罡《电子与信息学报》2016,38(3):586-593

该文针对传统算法在实环境(不同噪声类型和信噪比)下容易发生清浊误判和基音估计错误问题,提出一种基于幅度压缩基音估计滤波(PEFAC)的清浊音分类及基音估计方法。首先,通过PEFAC削弱语音的低频噪声,提取出基音谐波;然后,采用基于对称平均幅度和函数的脉冲序列加权算法(SIM)确定谐波数目;最后,利用动态规划估计出基音,用基于3元素特征矢量的高斯混合模型对清浊音进行分类。仿真结果表明,在实环境下,所提方法能有效抑制清浊误判及基音估计错误现象的发生,性能优于传统方法。相似文献

2.

一种基于小波变换的清浊音分类基音检测方法

胡瑛陈宁《电声技术》2006,(11):63-66

提出了一种基于小波变换的鲁棒性基音周期检测方法。首先结合平均能量频带分布和短时过零率这两个特征参数对语音信号进行清浊音判决,然后对浊音段采用空域相关函数提取基音周期。实验表明,与传统的小波变换和自相关算法相比,该方法鲁棒性好,对基音检测具有更高的准确性。相似文献

3.

基于DCT分带谱熵与信号分解的高精度基音检测算法 总被引：2，自引：0，他引：2

下载免费PDF全文

罗亚飞鲍长春《电子学报》2007,35(1):13-22

本文就低速率WI语音编码中的基音检测技术进行研究,针对基音检测在不同噪声与信噪比下容易发生清浊误判的问题,在基音检测前端引入基于DCT分带谱熵的语音检测算法划分语音段与非语音段;为了向基音检测算法提供更能准确反映基音周期实际变化的输入语音,基于谐波-噪声模型提出了一种改进的DCT域语音分解算法.然后,根据变形的MCAMDF(Modified Circular Average Magnitude Difference Function)与NCCF(Normalized Cross-Correlation Function)的峰值共性,结合上述两项基音检测前端处理技术,提出了MCAMDF-NCCF基音检测组合算法.为了满足不同环境下WI编码器对基音检测高精度的要求,在合成端更准确地恢复相位轨迹,本文又基于MCAMDF-NCCF算法提出了高精度MCAMDF-NCCF-FRAC基音检测算法以计算分数基音.将算法应用与2kb/s WI编码器,主观A/B听力测试结果表明,本文提出的基音检测算法在低信噪比下明显抑制了基音加倍减半及清浊误判现象的发生,得到了优异的基音检测结果,合成语音质量完全满足低速率WI编码器对基音检测技术的要求. 相似文献

4.

A multiband excited waveform-interpolated 2.35-kbps speech codecfor bandlimited channels

Brooks F.C.A. Hanzo L. 《Vehicular Technology, IEEE Transactions on》2000,49(3):766-777

Following a brief portrayal of the activities in 2.4-kbps speech coding, a wavelet-based pitch detector is invoked, which reduces the complexity of conventional autocorrelation-based pitch detectors, while ensuring smooth pitch trajectory evolution. This scheme is incorporated in a waveform-interpolated codec, which uses voiced-unvoiced (V/U) classification, and instead of simple Dirac pulses, an unconventional zinc basis function excitation is employed for modeling the voiced excitation. The required zinc-function parameters are determined in an analysis-by-synthesis loop, and for the sake of smooth waveform evolution and reduced complexity, a focused search strategy and a few further suboptimum restrictions are imposed without seriously affecting the speech quality. This baseline codec operates at a rate of 1.9 kbps, but it suffers from slight buzziness during the periods of excessive voicing. This impediment is then mitigated by invoking a mixed V/U multiband excitation, which slightly increases the bit rate to 2.35 kbps due to the transmission of the 3-b voicing strength code in each of the three excitation bands 相似文献

5.

基于语谱图的语音端点检测算法

陈向民张军韦岗《电声技术》2006,(4):46-49

利用语音在语谱图中表现出的不同特征,提出了一种基于语谱图的语音端点检测算法。首先利用基音频率检测的原理在语谱图矩阵中搜索浊音段,然后计算出浊音段的信噪比,再根据信噪比和语谱图矩阵中浊音段的峰值进行完整的端点检测。因多数突发噪声并没有稳定的频率或者频率不在人的基音频率范围内,因此,该算法能够很好地抑制突发噪声的干扰,实验结果表明,在信噪比为10dB以上时该算法能够准确检测出语音的端点位置。相似文献

6.

Unvoiced/voiced classification and voiced harmonic parameters estimation using the third-order statistics

YING Na Communication Engineering College of Hangzhou Dianzi University Hangzhou China ZHAO Xiao-hui DONG Jing Communication Engineering College of Jilin University Changchun China 《中国邮电高校学报(英文版)》2007,14(1):85-89

Unvoiced/voiced classification of speech is a challenging problem especially under conditions of low signal-to-noise ratio or the non-white-stationary noise environment. To solve this problem, an algorithm for speech classification, and a technique for the estimation of pairwise magnitude frequency in voiced speech are proposed. By using third order spectrum of speech signal to remove noise, in this algorithm the least spectrum difference to get refined pitch and the max harmonic number is given. And this algorithm utilizes spectral envelope to estimate signal-to-noise ratio of speech harmonics. Speech classification, voicing probability, and harmonic parameters of the voiced frame can be obtained. Simulation results indicate that the proposed algorithm, under complicated background noise, especially Gaussian noise, can effectively classify speech in high accuracy for voicing probability and the voiced parameters. 相似文献

7.

High resolution pole-zero analysis of Parkinsonian speech

Yair E. Gath I. 《IEEE transactions on bio-medical engineering》1991,38(2):161-167

High resolution analysis of voiced speech signals in Parkinsonian patients using a pitch synchronous pole-zero model is introduced. A modified estimation error is defined leading to an accurate and consistent determination of the excitation instants of the model. An infinite resolution in determining these instants, despite the finite sampling interval, is achieved by mapping the discrete (digitized) problem into a continuous one. The proposed analysis was found to be useful in analyzing Parkinsonian speech where the goal was to detect and quantify the Parkinsonian tremor and rigidity from sustained voiced sounds. 相似文献

8.

Vector quantization of pitch information in Mandarin speech

Sin-Horng Chen Yih-Ru Wang 《Communications, IEEE Transactions on》1990,38(9):1317-1320

A method of quantizing the shape of pitch contour segments of Mandarin speech by using orthogonal polynomial representation and vector quantization techniques is proposed. Only a very limited number of representative pitch contour patterns of words can be found in Mandarin conversation; therefore, pitch information can be represented by the shape and the length of the pitch contour segment word by word instead of frame by frame. An average bit rate of 0.78 b/frame (34.67 b/s) for voiced sounds was achieved. The method is a variable-rate coding scheme with an average delay of 317 ms 相似文献

9.

Glottal source estimation using a sum-of-exponentials model 总被引：1，自引：0，他引：1

Krishnamurthy A.K. 《Signal Processing, IEEE Transactions on》1992,40(3):682-686

An algorithm for estimating the glottal source waveform in voiced speech is described. The glottal source waveform is described using the LF model proposed by Fant et al. (1985). The vocal tract filter is modeled as a pole-zero system. The analysis of vowel sounds from several talkers shows that the analysis procedure leads to an accurate estimate of the glottal source 相似文献

10.

Estimating the pitch period of voiced speech

Ambikairajah E. Carey M.J. Tattersall G. 《Electronics letters》1980,16(12):464-466

In speech processing an estimation of the speech pitch period is important. Real time pitch detection is only possible by the selection of an efficient algorithm suitable for implementation on a programmable processor or in special-purpose hardware. The use of the periodogram algorithm (p.a.) is proposed to detect the pitch period of voiced speech. This algorithm is attractive for the following reasons: (a) it has no multiply operation; (b) when implemented on a 16-bit computer (e.g. microprocessor) the computation can be done in integer arithmetic without exceeding the microprocessor's dynamic range; (c) it is a simple technique for estimating the pitch period with reasonable accuracy. Results of the analysis of speech signals and sinusoids using the periodogram algorithm are presented and comparisons are made with the average magnitude difference function (a.m.d.f.) which is an alternative method of estimating the pitch period of the voiced speech. 相似文献

11.

A new approach to speech segmentation based on the maximum likelihood

Z. M. šarić S. R. Turajlić 《Circuits, Systems, and Signal Processing》1995,14(5):615-632

Successful speech recognition is highly dependent on appropriate speech segmentation. The poor efficiency of the sequential detection of abrupt changes in the signals with relatively short stationary intervals, as is the case with speech signals, can be improved by the off-line maximum likelihood segmentation algorithm. In this paper the new segmentation algorithm is presented. For the a priori known number of segments, the algorithm determines such signal partitions for which the sum of segment distortion is minimal. The generalized maximum likelihood distortion measure has been introduced, and has proven to be particularly efficient on short signal segments. In the case of an unknown number of segments, its estimate is obtained comparing the reduction of the distortion. The asymptotic properties of the distortion sequence have been analyzed, which led to the definition of the presented segmentation algorithm. The introduced measure can be applied both to the AR and ARMA models. The segmentation algorithm is verified on test signals as well as on the natural speech signal, for which the pitch synchronous framing scheme is applied. The experimental results also include a comparison of the AR and ARMA model-based segmentations. The first results show that ARMA model-based segmentation gives somewhat better results than the AR model algorithm.Research supported in part by the Mathematical Institute of the Serbian Science Academy and Serbian Science Foundation. 相似文献

12.

Measuring and modeling vocal source-tract interaction

Childers D.G. Chun-Fan Wong 《IEEE transactions on bio-medical engineering》1994,41(7):663-671

The quality of synthetic speech is affected by two factors: intelligibility and naturalness. At present, synthesized speech may be highly intelligible, but often sounds unnatural. Speech intelligibility depends on the synthesizer's ability to reproduce the formants, the formant bandwidths, and formant transitions, whereas speech naturalness is thought to depend on the excitation waveform characteristics for voiced and unvoiced sounds. Voiced sounds may be generated by a quasiperiodic train of glottal pulses of specified shape exciting the vocal tract filter. It is generally assumed that the glottal source and the vocal tract filter are linearly separable and do not interact. However, this assumption is often not valid, since it has been observed that appreciable source-tract interaction can occur in natural speech. Previous experiments in speech synthesis have demonstrated that the naturalness of synthetic speech does improve when source-tract interaction is simulated in the synthesis process. The purpose of this paper is two-fold: (1) to present an algorithm for automatically measuring source-tract interaction for voiced speech, and (2) to present a simple speech production model that incorporates source-tract interaction into the glottal source model, This glottal source model controls: (1) the skewness of the glottal pulse, and (2) the amount of the first formant ripple superimposed on the glottal pulse. A major application of the results of this paper is the modeling of vocal disorders 相似文献

13.

Discrimination of pathological voices using a time-frequency approach

Umapathy K Krishnan S Parsa V Jamieson DG 《IEEE transactions on bio-medical engineering》2005,52(3):421-430

Acoustical measures of vocal function are routinely used in the assessments of disordered voice, and for monitoring the patient's progress over the course of voice therapy. Typically, acoustic measures are extracted from sustained vowel stimuli where short-term and long-term perturbations in fundamental frequency and intensity, and the level of "glottal noise" are used to characterize the vocal function. However, acoustic measures extracted from continuous speech samples may well be required for accurate prediction of abnormal voice quality that is relevant to the client's "real world" experience. In contrast with sustained vowel research, there is relatively sparse literature on the effectiveness of acoustic measures extracted from continuous speech samples. This is partially due to the challenge of segmenting the speech signal into voiced, unvoiced, and silence periods before features can be extracted for vocal function characterization. In this paper we propose a joint time-frequency approach for classifying pathological voices using continuous speech signals that obviates the need for such segmentation. The speech signals were decomposed using an adaptive time-frequency transform algorithm, and several features such as the octave max, octave mean, energy ratio, length ratio, and frequency ratio were extracted from the decomposition parameters and analyzed using statistical pattern classification techniques. Experiments with a database consisting of continuous speech samples from 51 normal and 161 pathological talkers yielded a classification accuracy of 93.4%. 相似文献

14.

A modified pitch detection algorithm

Jianling Hu Sheng Xu Jian Chen 《Communications Letters, IEEE》2001,5(2):64-66

Sinusoidal speech coders have been widely studied for low bit rate coding due to their capability of producing high quality speech. However, the estimation error of the sinusoidal model parameters, pitch in particular, would seriously degrade the speech quality. We provide a modified pitch detection algorithm (MPDA). The experimental results show that the proposed method can provide a more accurate and smoother pitch tracking that that of IMBE, and the reconstructed speech sounds more continuous 相似文献

15.

一种改进的信号灵活分割算法及语音信号清-浊音的自动分割 总被引：1，自引：0，他引：1

董恩清刘贵忠周亚同顿玉洁《电子学报》2001,29(10):1364-1367

文中主要对王永忠等提出的灵活分割算法存在的问题做了相应的改进,并做了比较分析,然后将改进后的分割算法应用于语音信号的清-浊音自动分割中.经过大量的理论模型与实际语音信号验证该改进后的算法确实解决了二进分割算法及王永忠方法存在的问题,达到了对信号自适应有效分割.仍然采用Wesfreid等提出的清-浊音识别准则,将新的分割方法应用到实际语音信号的清-浊音自动分割中,不仅同样产生较好划分结果,而且在时间上没有过多的冗余分割. 相似文献

16.

声乐主旋律的自动提取

下载免费PDF全文

陆雄夏秀渝蔡良孙文慧《太赫兹科学与电子信息学报》2019,17(3):482-488

提出一种基于多候选基频提取和歌声基频判别的声乐主旋律提取算法。该算法可以有效降低旋律定位虚警率,提高整体准确率。利用度量距离(DIS)算法对音乐进行音符切分,并用方差法实现浊音段检测;采用幅度压缩基音估计滤波器(PEFAC)多基频提取技术,通过计算音高显著度提取每个浊音帧的多个候选基频。最后用维特比算法跟踪浊音段主导基频轨迹,并用基频判别模型进行歌声主旋律判别。在MIR-1K数据集上进行的实验表明,在信干比为5 dB和0 dB的情况下,本文算法提取的声乐主旋律整体准确率分别达到了86.22%和77.4%,相比于其他算法至少提高了3.79%和2.01%。相似文献

17.

一种改进的自相关基音检测算法 总被引：3，自引：0，他引：3

胡瑛陈宁夏旭《电子科技》2007,(2):25-28

提出了一种改进的ACF基音检测算法。检测前在小波域上用Teager能量算子对语音信号进行清浊音判决,在基音检测过程的前端和末端加入了有效的预处理和后处理技术。实验结果表明,该算法比传统的自相关算法具有更高的准确性,在低信噪比下,基音周期提取和清浊判决具有令人满意的效果。相似文献

18.

低速率WI编码器中4~6bit基音量化算法研究 总被引：1，自引：0，他引：1

罗亚飞鲍长春《电子与信息学报》2007,29(11):2669-2671

基音在语音编码中通常采用7bit无失真均匀量化。由于浊音段语音的基音普遍具有缓慢渐变的特点,为了更有效地去除前后帧基音之间存在的相关性,该文基于Eriksson和Kang提出的4bit基音量化算法,针对汉语语音进行研究,实现了一套4~6bit基音量化算法。该算法计算简单,无需码书存储。将此基音量化方案应用于WI模型和WI编码器,主观A/B听力测试结果表明,该方案在高效量化基音的同时保证了合成语音质量几乎没有损失,完全满足低速率WI编码器对量化基音的要求。相似文献

19.

面向移动应用的语音编解码开环基音搜索方法

姜林王晓晨张茂胜文彬《智能计算机与应用》2014,(1):75-77,82

基音周期搜索的准确性将直接影响到语音编码器的编码质量和效率。本文根据AMR—WB＋标准中基音周期搜索算法会发生检测基音倍频和半频错误,提出了开环基音搜索算法。该算法以白相关函数为基础,利用基音周期的平滑性,引入基音周期全局参考作为基音周期判断的辅助条件,有效解决了基音周期加倍的问题并在基音周期预测中体现基音周期的平滑性,实验结果表明本文算法性能优于AMR—WB＋中的算法性能。该算法已应用到AVS—P10移动音频编解码框架中。相似文献

20.

On the use of pitch power spectrum in the evaluation of vocaltremor

Yair E. Gath I. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1988,76(9):1166-1175

A point process model of the speech excitation wave for sustained vowels has been formulated, where the pitch is modulated by a physiological disturbance (vocal tremor). It has been demonstrated that the power spectrum of an impulse train, representing the glottal pulses, is composed of a periodic replication of the spectrum of the modulating signal, accompanied by an impulse train at the carrier frequency. Hence, the power spectrum of the vocal tremor can be estimated from the pitch. A trend location algorithm has been developed to locate and remove local trends from the signal prior to the evaluation of the pitch power spectrum. Thus, spectral representation of the speech excitation enables estimation of vocal tremor 相似文献