首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.  相似文献   

2.
杨松  于凤芹 《计算机工程与应用》2012,48(23):125-127,154
传统的MFCC及短时能量特征只反映了信号序列的静态特征,目前基于这些特征的语音/音乐识别率为79%~86%。样本熵可以反映信号序列中的新信息量的大小以及新信息量的变化程度。以样本熵作为特征对语音/音乐进行分类识别,提取混合信号的样本熵,计算每段信号样本熵的均值和方差,采用k均值聚类进行识别。仿真实验结果表明,基于样本熵的语音/音乐识别的识别率可提高到88.073%。  相似文献   

3.
基于临界频带及能量熵的语音端点检测   总被引:1,自引:0,他引:1  
张婷  何凌  黄华  刘肖珩 《计算机应用》2013,33(1):175-178
语音端点检测的准确性直接关系着语音识别、合成、增强等语音领域的准确性,为了提高语音端点检测的有效性,提出了一种基于临界频带及能量熵的语音端点检测算法。算法充分利用人耳听觉特性的频率分布,将含噪语音信号进行临界频带划分,并结合各频带内信号的能量熵值在语音段和噪声段的不同分布,实现不同背景噪声下语音端点检测。实验结果表明,提出的语音端点检测算法与传统的短时能量法相比,检测正确率平均高1.6个百分点。所提方法在不同噪声的低信噪比(SNR)环境下均能实现语音端点检测。  相似文献   

4.
视频数据中的音频流包含了丰富的语义信息.在基于内容的视频检索中,对音频信息的分析是不可分割的一部分.本文主要讨论基于内容的音频场景分割,分析各种音频特征及提取方法,并在此基础上提出一种新的音频流分割方法,根据六种音频类型(语音、音乐、静音、环境音、纯语音、音乐背景下的语音和环境音背景下的语音)的音频特征对视频数据中的音频流分割音频场景.实验证明该方法是有效的,在保证一定的分割精度的同时,准确率和查全率都得到了较大的提高.  相似文献   

5.
语音/音乐区分是音频高效编码、音频检索、自动语音识别等音频处理和分析的重要步骤。本文提出一种新颖的语音/音乐分割与分类方法,首先根据相邻帧间的均方能量差异检测音频的变化点,实现分割;然后对音频段提取低带能量方差比、倒谱能量调制、熵调制等八维特征,用人工神经网络做分类。实验结果显示,本文算法和特征具有很高的分割准确率和分类正确率。  相似文献   

6.
In this paper, spectrographic analysis of the infant cries is reported. For the spectrographic analysis of the infant cries ten different cry modes are used to analyze differences in different pathological cries. A comparison of spectrograms of the adult speech signal and infant cry signals is given. Based on differences in the distribution of energy in the spectrograms, energy-based features are calculated from the short-time Fourier transform (STFT) of the adult speech and infant cry signals. The classification performance of these features is obtained using support vector machine (SVM) classifier and it is observed that the energy distribution in 0–1 kHz range is promising feature in the classification of adult speech and infant cries and the classification accuracy achieved with this feature is 98.22 %. On the contrary, it was observed that it is very difficult to classify adult speech and infant cries using the energy distribution in 1–3 kHz.  相似文献   

7.
The pitch is a crucial parameter in speech and music signals. However, due to severe noisy conditions, missing harmonics, unsuitable physical vibration, the determination of pitch presents a great challenge when desiring to get a good accuracy. In this paper, we propose a method for pitch estimation of speech and music sounds. Our method is based on the fast Fourier transform (FFT) of the multi-scale product (MP) provided by a feature auditory model of the sound signals. The auditory model simulates the spectral behaviour of the cochlea by a gammachirp filter-bank, and the out/middle ear filtering by a low-pass filter. For the two output channels, the FFT function of the MP is computed over frames. The MP is based on constituting the product of the speech and music wavelet transform coefficients at three scales. The experimental results show that our method estimates the pitch with high accuracy. Besides, our proposed method outperforms several other pitch detection algorithms in clean and noisy environments.  相似文献   

8.
This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.  相似文献   

9.
范影乐  俞祁焰  李轶  庞全 《传感技术学报》2007,20(10):2288-2293
应用Hilbert-Huang变换方法对语音特征进行分析,提高低信噪比语音端点检测的正确率.对语音信号进行Hilbert-Huang变换,得到语音信号在时域和频域上的能量分布,建立语音信号的时间–频率–振幅的三维Hilbert谱分布以及边际谱分布进行特征分析,最后通过语音端点检测验证Hilbert-Huang变换分析含噪语音特征及降噪的有效性.通过语音端点检测的结果表明,经过 Hilbert-Huang变换对含噪语音分析降噪后,检测准确率有显著提高.Hilbert-Huang变换方法能够真实描述语音信号的非线性以及非平稳特性,具有广泛的应用前景.  相似文献   

10.
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an adaptive network-based fuzzy inference system (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called warped LPC-based spectral centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and the classical features proposed in the literature for audio classification is performed, aiming to assess the good discriminatory power of the proposed feature. The vector length used to describe the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). With the aim of increasing the classification accuracy percentage, the feature space is then transformed to a new feature space by LDA. The classification task is performed applying ANFIS to the features in the transformed space. To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech signals show the good discriminating power of the proposed approach.  相似文献   

11.
12.
时频分布在非平稳信号的分析和处理中具有重要地位,它能够直观、合理的描述信号在时间-频率域上的能量分布。语音信号分类是语音识别、说话人识别、语种辨识和语音合成的一个重要基础,而信号表示的方式和距离测度的选择,对分类性能影响很大。该文正是利用时频分布的特性,对其核参数进行优化,并结合距离测度,完成了独立音标的说话人辨认,获得了较高的准确率,误判率仅为0.99%,具有较好的应用结果。  相似文献   

13.
为了能让计算机与人类能够用音乐自由的沟通,有关计算机音乐处理的研究越来越重要。对比了音乐信号与语音信号的特性,选取MFCC(Mel—Frequency Cepstral Coefficients)作为单音信号的特征,并对特征矢量维数的选择进行了讨论,利用RBF神经网络对钢琴88个单音进行识别。实验结果表明所选特征对识别单音信号是有效的。  相似文献   

14.
This paper proposes the use of speech-specific features for speech / music classification. Features representing the excitation source, vocal tract system and syllabic rate of speech are explored. The normalized autocorrelation peak strength of zero frequency filtered signal, and peak-to-sidelobe ratio of the Hilbert envelope of linear prediction residual are the two source features. The log mel energy feature represents the vocal tract information. The modulation spectrum represents the slowly-varying temporal envelope corresponding to the speech syllabic rate. The novelty of the present work is in analyzing the behavior of these features for the discrimination of speech and music regions. These features are non-linearly mapped and combined to perform the classification task using a threshold based approach. Further, the performance of speech-specific features is evaluated using classifiers such as Gaussian mixture models, and support vector machines. It is observed that the performance of the speech-specific features is better compared to existing features. Additional improvement for speech / music classification is achieved when speech-specific features are combined with the existing ones, indicating different aspects of information exploited by the former.  相似文献   

15.
强背景噪声下语音端点检测的算法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
多带谱熵法对语音频段进行分带处理形成新的分带谱熵函数,在低信噪比时,该方法能够更好地检测出语音,还能体现能量分布情况,应用较为广泛。多窗谱分析方法对同一数据序列用多个正交的数据窗分别求直接谱,是一种低方差、高分辨率的谱分析方法,尤其适合非线性系统中高噪声背景下弱信号、时频演变信号的分析。提出基于多窗谱及多带谱相结合的语音检测方法,仿真结果表明:改进算法较其他算法占有绝对的优势,而且性能稳定。  相似文献   

16.
17.
Content-based audio signal classification into broad categories such as speech, music, or speech with noise is the first step before any further processing such as speech recognition, content-based indexing, or surveillance systems. In this paper, we propose an efficient content-based audio classification approach to classify audio signals into broad genres using a fuzzy c-means (FCM) algorithm. We analyze different characteristic features of audio signals in time, frequency, and coefficient domains and select the optimal feature vector by employing a noble analytical scoring method to each feature. We utilize an FCM-based classification scheme and apply it on the extracted normalized optimal feature vector to achieve an efficient classification result. Experimental results demonstrate that the proposed approach outperforms the existing state-of-the-art audio classification systems by more than 11% in classification performance.  相似文献   

18.
提出了一种基于线性判别分析和高斯混合模型的窄带音频快速分类方法,该方法在白噪声、街道噪声和车内噪声环境下都能有效区分语音、音乐和噪声。实验结果表明,该方法在保证分类时间不大于1 s的情况下,分类准确率能达到95%以上。  相似文献   

19.
A method is described for estimating the fundamental frequencies of several concurrent sounds in polyphonic music and multiple-speaker speech signals. The method consists of a computational model of the human auditory periphery, followed by a periodicity analysis mechanism where fundamental frequencies are iteratively detected and canceled from the mixture signal. The auditory model needs to be computed only once, and a computationally efficient strategy is proposed for implementing it. Simulation experiments were made using mixtures of musical sounds and mixed speech utterances. The proposed method outperformed two reference methods in the evaluations and showed a high level of robustness in processing signals where important parts of the audible spectrum were deleted to simulate bandlimited interference. Different system configurations were studied to identify the conditions where pitch analysis using an auditory model is advantageous over conventional time or frequency domain approaches.  相似文献   

20.
基于盲源分离的单通道语音信号增强   总被引:1,自引:0,他引:1  
在运用基于独立分量分析(ICA)的盲源分离法进行语音增强时,要求观测信号(含噪语音)的个数不少于源信号(纯净语音和噪声)的个数.由于含噪语音通常是单通道的,所以必须合理地生成另一路的虚拟观测信号,以实现纯净语音和噪声的分离是个关键.介绍了一种基于盲源分离和谱减法的单通道语音信号增强的方法.首先运用谱减法对语音进行部分去噪,产生了ICA其中的一路观测信号,并产生了对噪声的估计值.用语音和噪声估计值的帧平均能量构成了加权函数,将噪声的估计值与原始含噪语音进行加权组合,生成另一路的虚拟观测信号.由于虚拟观测信号很好地再现了实际的观测信号,所以运用ICA可以较好地实现了噪声和语音的分离.同时,盲源分离和谱减法相互结合,使语音增强的性能提高.实验证明了算法可以在信噪比很小的情况下实现对噪声的去除,其效果要优于传统的去噪算法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号