共查询到18条相似文献,搜索用时 265 毫秒
1.
利用短时过零率来检测清音,用短时能量来检测浊音,两者相配合便实现了信号信噪比较大情况下的端点检测。但是在信噪比较小的环境下,这两种方法便失去了作用。为了能在噪声环境下准确地检测出语音信号的端点,根据对含噪语音在时频域中的研究,提出了一种基于Matching pursuits时频分解算法的语音端点检测方法。该方法使用Matching pursuits算法对含噪信号进行分解,然后再对信号进行魏格纳变换,可以完全去除信号的魏格纳交叉干扰项,使得语音信号和噪声信号在时频平面上具有较直观明显的魏格纳能量分布,利用这个特点再进行端点检测,实验结果表明,该方法能在信噪比较低的情况下,准确地检测出语音信号的端点。 相似文献
2.
修正倒谱和动态规划的基频估计算法 总被引:1,自引:0,他引:1
基音频率是语音信号处理中的一个重要参数。倍频、半频错误以及清浊音判决的可靠性等问题一直是基频估计中的难点问题。在对语音信号的倒谱进行适当修正的基础上,提出了一种高精度的基频估计算法。该算法根据倒谱、短时能量和短时过零率在清音段和浊音段的不同表现,构造了一个清浊音判决函数,大大提高了清浊音判决精度;然后利用动态规划技术进行基频跟踪。在构造代价函数时.充分考虑了基频连续性的影响,从而使该算法既能有效地避免倍频和半频错误,又能体现出基频的自然加倍和减半。通过与现有的几种效果较好的方法进行对比实验,结果表明该算法具有准确率高、基频轨迹平滑的优点,利用该算法得到的基频轨迹基本不需要进行后期平滑处理。 相似文献
3.
匹配追踪时频分解算法的端点检测方法 总被引:2,自引:0,他引:2
为了能在无噪音环境下准确地检测语音信号的端点,传统的方法是使用过零方法检测清音,短时能量方法检测浊音,两者相结合便实现了端点检测。通过对语音信号在时频平面中分布的研究,提出了一种基于匹配追踪时频原子分解算法的端点检测方法。该方法利用匹配追踪算法对信号进行分解,使得信号在时频平面上具有较直观明显的魏格纳能量分布,利用这个特点设置一个门限值再进行端点检测,便能准确检测出语音信号端点。实验结果表明,和传统的方法相对比,因为涉及到了信号的分解,所以实时性较差,且门限问题还有待深入研究,但该方法能更加准确地检测出语音信号的端点,亦为端点检测问题提供了一种新的思维方法。 相似文献
4.
为了能在无噪音环境下准确地检测语音信号的端点,传统的方法是使用过零方法检测清音,短时能量方法检测浊音,两者相结合便实现了端点检测.通过对语音信号在时频平面中分布的研究,提出了一种基于匹配追踪时频原子分解算法的端点检测方法.该方法利用匹配追踪算法对信号进行分解,使得信号在时频平面上具有较直观明显的魏格纳能量分布,利用这个特点设置一个门限值再进行端点检测,便能准确检测出语音信号端点.实验结果表明,和传统的方法相对比,因为涉及到了信号的分解,所以实时性较差,且门限问题还有待深人研究,但该方法能更加准确地检测出语音信号的端点,亦为端点检测问题提供了一种新的思维方法. 相似文献
5.
6.
7.
用线性预测的方法求出语音信号的LPC(Linear Predictive Coding)谱,然后根据候选的声门激励与LPC谱卷积重构语音信号的短时频谱,当重构频谱与原始语音频谱之间的畸变最小时,声门激励之间的间隔为基音周期.为了提高计算效率,采用频域动态搜索的方法搜索基音周期的候选值.数值实验表明,采用线性预测和极大似燃估计 (Maximum Likelihood, ML)的基音检测算法可保留更多的基音信息,并能有效地减少基音检测的错误,并且该算法比传统的ML法有更强的鲁棒性. 相似文献
8.
9.
语音信号的端点检测一般都采用短时平均过零率和短时平均能量两参数判定,仅靠某一参数一般难以把噪声、清音和浊音区分开.本文通过理论分析和实验研究证明,仅靠短时过零率参数,只可以把清音和浊音区分开,但无法把清音和噪声有效地区分开. 相似文献
10.
提出了一种基于能量对称度(ES)参数的基音检测方法.先通过波峰检测和对称度检测粗略估计语音的基音,再根据ES参数得到最佳的语音基音。实验证明此方法不仅具有实时性而且具有很高的准确性,而且不存在延时问题.是一种适合于单片机实现的语音信号处理方法。 相似文献
11.
端点检测作为语音信号处理的关键技术,其准确性直接影响到语音识别系统的计算复杂度和识别能力。在人耳听觉特性理论研究的基础上,利用语音段和背景噪声段临界带功率谱上的差异,提出了一种基于临界带功率谱方差的端点检测方法。通过自适应门限值的选取,该方法对背景噪声具有良好的跟踪性能。在不同的信噪比条件下,进行了端点检测实验。结果表明:该方法与传统的短时能量和短时平均过零率方法、谱熵方法相比,可以有效降低背景噪声的影响,具有更好的鲁棒性和正确率。 相似文献
12.
Lijing Ding Radwan A. El-Hennawey M.S. Goubran R.A. 《IEEE transactions on instrumentation and measurement》2006,55(4):1197-1203
This paper investigates the effects of temporal clipping on perceived speech quality. Temporal clipping usually results from voice activity detection (VAD), or line echo canceller's nonlinear processor, and the clipped speech portions are replaced by comfort noise. A nonintrusive algorithm is proposed to predict speech quality based on the clipping statistics. Mean opinion score (MOS) is used as a metric for speech quality and is measured by perceptual evaluation of speech quality (PESQ). The impacts of speech frame size and noise spectrum on the algorithm are also investigated. The results show that the proposed algorithm can efficiently predict the speech quality. The correlation coefficient between the prediction and the measurement is about 0.975, and the root mean square error for the prediction is 0.20 MOS. The algorithm can be used as an integral part of a general speech quality assessment scheme in voice over Internet protocol (VoIP). 相似文献
13.
Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The
window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during
production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle.
Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis.
Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but
also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations,
formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications.
Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement
and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods
to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated. 相似文献
14.
Embedding a secret message into a cover media without attracting any attention, known as steganography, is one of the methods used for hidden communication purposes. One of the cover media that can be used for steganography is speech. In this study, the authors propose a new steganography method in speech signals. In this method, the silence intervals of speech are found and the length (number of samples) of these intervals is changed to hide information. The main feature of our method is robustness to MPEG-1 layer III (MP3) compression. This method can hide information in a speech stream with very low processing time which makes it a real-time steganography method. The hiding capacity of our method is comparable with other MP3 resistance methods and the listening tests show that the degradation in speech quality is not annoying. Additionally, the effect of our method on chaotic features is negligible, so it is difficult to detect our method with chaotic-based steganalysis methods. 相似文献
15.
16.
Content aware image resizing (CAIR) is an excellent technology used widely
for image retarget. It can also be used to tamper with images and bring the trust crisis of
image content to the public. Once an image is processed by CAIR, the correlation of local
neighborhood pixels will be destructive. Although local binary patterns (LBP) can
effectively describe the local texture, it however cannot describe the magnitude
information of local neighborhood pixels and is also vulnerable to noise. Therefore, to
deal with the detection of CAIR, a novel forensic method based on improved local
ternary patterns (ILTP) feature and gradient energy feature (GEF) is proposed in this
paper. Firstly, the adaptive threshold of the original local ternary patterns (LTP) operator
is improved, and the ILTP operator is used to describe the change of correlation among
local neighborhood pixels caused by CAIR. Secondly, the histogram features of ILTP and
the gradient energy features are extracted from the candidate image for CAIR forgery
detection. Then, the ILTP features and the gradient energy features are concatenated into
the combined features, and the combined features are used to train classifier. Finally
support vector machine (SVM) is exploited as a classifier to be trained and tested by the
above features in order to distinguish whether an image is subjected to CAIR or not. The
candidate images are extracted from uncompressed color image database (UCID), then
the training and testing sets are created. The experimental results with many test images
show that the proposed method can detect CAIR tampering effectively, and that its
performance is improved compared with other methods. It can achieve a better
performance than the state-of-the-art approaches. 相似文献
17.
18.
1Time-scale representation of voiced speech is applied to voice quality analysis, by introducing the Line of Maximum Amplitude
(LoMA) method. This representation takes advantage of the tree patterns observed for voiced speech periods in the time-scale
domain. For each period, the optimal LoMA is computed by linking amplitude maxima at each scale of a wavelet transform, using
a dynamic programming algorithm. A time-scale analysis of the linear acoustic model of speech production shows several interesting
properties. The LoMA points to the glottal closure instants. The LoMA phase delay is linked to the voice open quotient. The
cumulated amplitude along the LoMA is related to voicing amplitude. The LoMA spectral centre of gravity is an indication of
voice spectral tilt. Following these theoretical considerations, experimental results are reported. Comparative evaluation
demonstrates that the LoMA is an effective method for the detection of Glottal Closure Instants (GCI). The effectiveness of
LoMA analysis for open quotient, amplitude and spectral tilt estimations is also discussed with the help of some examples. 相似文献