首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 109 毫秒
1.
研究一种应用小波特征向量和多类支持向量机进行病态语音识别的方法,该方法基于连续小波变换提取语音特征向量,利用多类支持向量机进行病态语音分类。为了简化二分类支持向量机进行多类分类时所带来的计算复杂性,根据一类支持向量机分类思想提出一种多类分类算法。该算法能够使每一类样本都独立地获得一个决策函数,通过决策函数的最大值来判断样本所属的类。实验表明,在病态语音识别系统中,多类支持向量机与小波特征向量相结合具有良好的识别效果和应用价值。  相似文献   

2.
研究一种用支持向量机(SVM)进行多类音频分类的方法,其中引入增广两类分类法(AB法)设计多类分类器。该算法把音频分为四类:音乐、纯语音、带背景音的语音和典型的环境音,并分析了这几类音频的八个区别性特征,包括修正低能量成分比率(MLER)和修正基频(MPF)两个新特征以及频域总能量、子带能量、频率中心等其它六个基本特征,综合考察了不同特征集在基于SVM分类器中的分类精度。实验结果表明,提取的音频特征有效,基于SVM的多类音频分类效果良好。  相似文献   

3.
基于小波变换和支持向量机的音频分类   总被引:2,自引:0,他引:2       下载免费PDF全文
音频特征提取是音频分类的基础,而音频分类又是内容的音频检索的关键。综合分析了语音和音乐的区别性特征,提出一种基于小波变换和支持向量机的音频特征提取和分类的方法,用于纯语音、音乐、带背景音乐的语音以及环境音的分类,并且评估了新特征集合在SVM分类器上的分类效果。实验结果表明,提出的音频特征有效、合理,分类性能较好。  相似文献   

4.
孪生支持向量机TWSVMs分类过程的计算量和样本的数量成正比,当样本个数较多时,其分类过程将会比较耗时。为了提高样本集的稀疏性,从而提高TWSVMs的分类速度,提出了一种基于AP聚类的约简孪生支持向量机快速分类算法FCTSVMs-AP。首先对原始数据集进行AP聚类操作。聚类的中心为约简后新的样本集,按照分类误差最小的原则构建优化模型,用二次规划方法求解新的决策函数的系数,并证明了当样本集压缩时,收紧新的快速决策函数和原始决策函数之间的误差等价于在样本空间对原始数据集进行AP聚类操作。在人工数据集和UCI数据集上的实验表明,保持分类精度的损失在统计意义上不明显的前提下,FCTSVMs-AP可以通过有效压缩样本数量的方式提高分类速度。  相似文献   

5.
基于支持向量机的音频分类与分割   总被引:8,自引:0,他引:8  
音频分类与分割是提取音频结构和内容语义的重要手段,是基于内容的音频、视频检索和分析的基础。支持向量机(SVM)是一种有效的统计学习方法。本文提出了一种基于SVM的音频分类算法。将音频分为5类:静音、噪音、音乐、纯语音和带背景音的语音。在分类的基础上,采用3个平滑规则对分类结果进行平滑。分析了SVM分类嚣的分类性能,同时也评估了本文提出的新的音频特征在SVM分类嚣上的分类效果。实验结果显示,基于SVM的音频分类算法分类效果良好,平滑处理后的音频分割结果比较准确。  相似文献   

6.
针对多维时间序列的多类分类问题,本文提出基于时点分割思想的核Fisher判别分析-顺序回归机(KFDA-ORM)多类分类建模方法.该方法利用核Fisher判别分析(KFDA)与顺序回归机(ORM)的互补性得到分类决策函数;对分类样本的多维时间序列进行时点分割处理,使用决策函数得到各时点的分类级别;通过指数平滑分析得到采样周期内样本的最终分类结果.通过实例验证,该方法对多维时间序列的分类具有较好效果,是一种有效的多类分类方法.  相似文献   

7.
提出一种用于说话人识别中说话人语音特征向量聚类的方法--新颖检测法.通过提取出的特征参数(MFCC和LPCC),建立系统模型,实验结果表明,将新颖检测法结合VQ用于特征向量的分类,较之于单纯的VQ分类,取得了识别率高、稳健型强、确认可靠的效果.  相似文献   

8.
音频自动分类中的特征分析和抽取   总被引:8,自引:1,他引:8  
音频特征分析和抽取是音频自动分类的基础,本文将音频对象分为静音,噪音,纯语音,带背景音语音,音乐等5类,从帧层次和段层次上深入分析了不同类音频之间的区别性特征,包括帧层次上的MFCC,频域能量,子带能量,过零率,频谱中心等特征,在此基础上计算了段层次上的基本音频特征,包括静音比率,子带能量比均值等,提出了3个音频”流”特征-High-ZCR比率,Low-Frequency-Energy比率,频谱流量.设计并实现了一种基于支持向量机(support vector machine)的自动分类器,考察了上述特征组成的特征集合在该分类器中的分类性能.实验表明,本文提出的特征有效,分类性能良好.  相似文献   

9.
吕佳 《计算机应用》2012,32(3):643-645
针对在半监督分类问题中单独使用全局学习容易出现的在整个输入空间中较难获得一个优良的决策函数的问题,以及单独使用局部学习可在特定的局部区域内习得较好的决策函数的特点,提出了一种结合全局和局部正则化的半监督二分类算法。该算法综合全局正则项和局部正则项的优点,基于先验知识构建的全局正则项能平滑样本的类标号以避免局部正则项学习不充分的问题,通过基于局部邻域内样本信息构建的局部正则项使得每个样本的类标号具有理想的特性,从而构造出半监督二分类问题的目标函数。通过在标准二类数据集上的实验,结果表明所提出的算法其平均分类正确率和标准误差均优于基于拉普拉斯正则项方法、基于正则化拉普拉斯正则项方法和基于局部学习正则项方法。  相似文献   

10.
环境音分类是当前语音识别领域的研究热点。主动学习是利用未标记数据,在少量标记数据代价下提高监督学习算法的分类性能的方法。文中提出了熵优先采样(Entropy Priority Sampling,EPS)方法和简单不一致采样(Simple Disagreement Sampling,SDS)方法作为主动学习选择样本的策略。针对环境音数据,提取11维的CELP音频特征,采用单一分类器与EPS,SDS方法对不同标记训练样本比例下的分类实验结果进行了比较分析。结果表明,主动学习方法在标记样本数较少的情况下,能取得较好的分类效果,并且EPS方法的性能优于SDS方法。  相似文献   

11.
This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.  相似文献   

12.
Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.  相似文献   

13.
Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.  相似文献   

14.
基于内容的音频检索:概念和方法   总被引:38,自引:1,他引:37  
F过去对视觉媒体的检索,如图象和视频,进行了大量的研究。但是我们注意到音频也是多媒体中的一种典型媒体,是信息的一种常用载体。常规的自理是把数字音频当成非结构化流媒体。然而音频是语音的载体、包含丰富的听觉特征,并且具有结构信息。因此需要并且可以基于这些内容对音频进行存取。本文根据当前相关研究的进展,综述基于内容的音频检索方法,包括面向语音、音乐和音频分析的检索、音频分割等;分析并总结出音频内容及其检  相似文献   

15.
This paper proposes a hierarchical time-efficient method for audio classification and also presents an automatic procedure to select the best set of features for audio classification using Kolmogorov-Smirnov test (KS-test). The main motivation for our study is to propose a framework of general genre (e.g., action, comedy, drama, documentary, musical, etc...) movie video abstraction scheme for embedded devices-based only on the audio component. Accordingly simple audio features are extracted to ensure the feasibility of real-time processing. Five audio classes are considered in this paper: pure speech, pure music or songs, speech with background music, environmental noise and silence. Audio classification is processed in three stages, (i) silence or environmental noise detection, (ii) speech and non-speech classification and (iii) pure music or songs and speech with background music classification. The proposed system has been tested on various real time audio sources extracted from movies and TV programs. Our experiments in the context of real time processing have shown the algorithms produce very satisfactory results.  相似文献   

16.
语音/音乐自动分类中的特征分析   总被引:16,自引:0,他引:16  
综合分析了语音和音乐的区别性特征,包括音调,亮度,谐度等感觉特征与MFCC(Mel-Frequency Cepstral Coefficients)系数等,提出一种left-right DHMM(Discrete Hidden Markov Model)的分类器,以极大似然作为判别规则,用于语音,音乐以及它们的混合声音的分类,并且考察了上述特征集合在该分类器中的分类性能,实验结果表明,文中提出的音频特征有效,合理,分类性能较好。  相似文献   

17.
Automated audio segmentation and classification play important roles in multimedia content analysis. In this paper, we propose an enhanced approach, called the correlation intensive fuzzy c-means (CIFCM) algorithm, to audio segmentation and classification that is based on audio content analysis. While conventional methods work by considering the attributes of only the current frame or segment, the proposed CIFCM algorithm efficiently incorporates the influence of neighboring frames or segments in the audio stream. With this method, audio-cuts can be detected efficiently even when the signal contains audio effects such as fade-in, fade-out, and cross-fade. A number of audio features are analyzed in this paper to explore the differences between various types of audio data. The proposed CIFCM algorithm works by detecting the boundaries between different kinds of sounds and classifying them into clusters such as silence, speech, music, speech with music, and speech with noise. Our experimental results indicate that the proposed method outperforms the state-of-the-art FCM approach in terms of audio segmentation and classification.  相似文献   

18.
基于分形布朗运动和Ada Boosting的多类音频例子识别   总被引:2,自引:0,他引:2  
提出了一种基于分形布朗运动的音频特征提取和识别方法.这种方法使用分形布朗运动模型计算出音频例子的分形维数,并作为其分形特征.针对音频分形特征符合高斯分布的特点,使用Ada Boosting算法进行特征约减.然后分别使用Ada-加权高斯分类器和支持向量机对约减特征后的音频分类,并在两类分类的基础上构造多类分类的模型.实验表明,经过特征约减后的音频分形特征在音乐和语音的分类中都优于其他音频特征.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号