期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

郑继明魏国华吴渝《计算机工程与应用》2009,45(12):131-133

音频特征提取是音频分类的基础,好的特征将会有效提高分类精度。在提取频域特征Mel频率倒谱系数（MFCC）的同时,对每一帧信号做离散小波变换,提取小波域特征,把频域和小波域特征相结合计算其统计特征。通过SVM模型建立音频模板,对纯语音、音乐及带背景音乐的语音进行分类识别,取得了较高的识别精度。相似文献

2.

基于分形布朗运动和Ada Boosting的多类音频例子识别 总被引：2，自引：0，他引：2

吴飞庄永真潘红《计算机研究与发展》2003,40(7):941-949

提出了一种基于分形布朗运动的音频特征提取和识别方法．这种方法使用分形布朗运动模型计算出音频例子的分形维数，并作为其分形特征．针对音频分形特征符合高斯分布的特点，使用Ada Boosting算法进行特征约减．然后分别使用Ada-加权高斯分类器和支持向量机对约减特征后的音频分类，并在两类分类的基础上构造多类分类的模型．实验表明，经过特征约减后的音频分形特征在音乐和语音的分类中都优于其他音频特征．相似文献

3.

一种新颖的语言/音乐分割与分类方法

孟永辉蒋冬梅付中华谢磊《计算机工程与科学》2009,31(4)

语音/音乐区分是音频高效编码、音频检索、自动语音识别等音频处理和分析的重要步骤。本文提出一种新颖的语音/音乐分割与分类方法,首先根据相邻帧间的均方能量差异检测音频的变化点,实现分割;然后对音频段提取低带能量方差比、倒谱能量调制、熵调制等八维特征,用人工神经网络做分类。实验结果显示,本文算法和特征具有很高的分割准确率和分类正确率。相似文献

4.

基于支持向量机的音频分类与分割 总被引：8，自引：0，他引：8

白亮老松杨陈剑赟吴玲达《计算机科学》2005,32(4):87-90

音频分类与分割是提取音频结构和内容语义的重要手段,是基于内容的音频、视频检索和分析的基础。支持向量机(SVM)是一种有效的统计学习方法。本文提出了一种基于SVM的音频分类算法。将音频分为5类：静音、噪音、音乐、纯语音和带背景音的语音。在分类的基础上,采用3个平滑规则对分类结果进行平滑。分析了SVM分类嚣的分类性能,同时也评估了本文提出的新的音频特征在SVM分类嚣上的分类效果。实验结果显示,基于SVM的音频分类算法分类效果良好,平滑处理后的音频分割结果比较准确。相似文献

5.

视频与音频特征融合生成动作指令组的方法研究

林大润陈俊洪王思涵钟经谋刘文印《计算机应用与软件》2023,(7):132-138+144

为了提高人与机器人的语音交互能力,提出一个基于视频特征与音频特征融合的动作三元组分类的神经网络框架,其本质是从音视频中提取高度概括动作的指令组。该框架包含三个模块,分别是视频特征提取网络模块、音频特征提取网络模块、特征融合模块。视频特征提取网络模块使用I3D网络结构提取视频特征;音频特征提取网络模块使用卷积神经网络以及双向长短期记忆网络提取音频特征;特征融合模块将视频特征和音频特征进行融合并输出动作三元组的分类。通过在制作的动作音视频数据集上的实验证明,所提出的音视频特征融合网络能达到74.92%的准确率,且具有较强的鲁棒性。相似文献

6.

基于隐马尔可夫模型的音频自动分类 总被引：27，自引：0，他引：27

卢坚陈毅松孙正兴张福炎《软件学报》2002,13(8):1593-1597

音频的自动分类,尤其是语音和音乐的分类,是提取音频结构和内容语义的重要手段之一,它在基于内容的音频检索、视频的检索和摘要以及语音文档检索等领域都有重大的应用价值.由于隐马尔可夫模型能够很好地刻画音频信号的时间统计特性,因此,提出一种基于隐马尔可夫模型的音频分类算法,用于语音、音乐以及它们的混合声音的分类.实验结果表明,隐马尔可夫模型的音频分类性能较好,最优分类精度达到90.28%. 相似文献

7.

基于APR—SVM的音频分类方法

王晓峰蒋先涛《微机发展》2012,(10):59-61,65

音频分类在多媒体应用中十分广泛,主要有时域分析和频域分析方法。文中提出了一种基于自适应间距比（APR）算法和支持向量机（svrd）算法的音频分类方法,先用APR算法区分语音与非语音;对于非语音,再通过SVM进行音频分类。APR算法是比较PR参数和阈值来区分语音和非语音,它和信噪比密切相关;而将非语音分成四组：音乐,汽车,会议,雨声,提取特征因子。实验结果表明：文中设计的分类器的精度达到93．75％以上,能很好地把各类型音频分开。相似文献

8.

面向新闻视频内容分析的音频分层分类算法*

冀中苏育挺宋星光安欣《计算机应用研究》2009,26(5):1673-1675

提出了一种规则和隐马尔可夫模型相结合的音频分层分类算法,首先利用规则将新闻节目中的音频分为静音、语音和音乐三类,然后采用隐马尔可夫模型进一步将语音和音乐细分为男主持人语音、女主持人语音、交替报道、独白语音、现场语音和音乐六类。实验结果表明,男主持人语音、女主持人语音以及音乐的分类效果最好,查准率和查全率均可达90％以上;交替报道的分类性能最差,查准率为57.5％,查全率为79.3％;其他类别的分类性能居中,在70%～90％左右。与同类算法相比,该算法分类性能较高。相似文献

9.

灰关联分析与语音/音乐信号识别 总被引：1，自引：0，他引：1

陈功张雄伟《电子技术应用》2005,31(10):21-23

将灰关联分析方法应用于语音/音乐信号的分类和识别,并给出了对音频信号进行灰关联分析的方法和步骤。利用语音和音乐信号的短时能量均方根的概率统计特征建立目标的参考数据和比较数据,进行语音和音乐信号的灰关联分析,确定目标识别与分类的判据,并对两类信号进行识别。仿真结果表明灰关联分析方法应用于音频信号分类和识别具有一定的可行性。相似文献

10.

基于APR-SVM的音频分类方法

王晓峰蒋先涛《计算机技术与发展》2012,(10)

音频分类在多媒体应用中十分广泛,主要有时域分析和频域分析方法.文中提出了一种基于自适应间距比(APR)算法和支持向量机(SVM)算法的音频分类方法,先用APR算法区分语音与非语音;对于非语音,再通过SVM进行音频分类. APR算法是比较PR参数和阈值来区分语音和非语音,它和信噪比密切相关;而将非语音分成四组:音乐,汽车,会议,雨声,提取特征因子.实验结果表明:文中设计的分类器的精度达到93.75%以上,能很好地把各类型音频分开. 相似文献

11.

面向中文歌词的音乐情感分类方法

王洁朱贝贝《计算机系统应用》2019,28(8):24-29

情感是音乐最重要的语义信息,音乐情感分类广泛应用于音乐检索,音乐推荐和音乐治疗等领域.传统的音乐情感分类大都是基于音频的,但基于现在的技术水平,很难从音频中提取出语义相关的音频特征.歌词文本中蕴含着一些情感信息,结合歌词进行音乐情感分类可以进一步提高分类性能.本文将面向中文歌词进行研究,构建一部合理的音乐情感词典是歌词情感分析的前提和基础,因此基于Word2Vec构建音乐领域的中文情感词典,并基于情感词加权和词性进行中文音乐情感分析.本文首先以VA情感模型为基础构建情感词表,采用Word2Vec中词语相似度计算的思想扩展情感词表,构建中文音乐情感词典,词典中包含每个词的情感类别和情感权值.然后,依照该词典获取情感词权值,构建基于TF-IDF （Term Frequency-Inverse Document Frequency）和词性的歌词文本的特征向量,最终实现音乐情感分类.实验结果表明所构建的音乐情感词典更适用于音乐领域,同时在构造特征向量时考虑词性的影响也可以提高准确率. 相似文献

12.

Improvement to speech-music discrimination using sinusoidal model based features

Jalil Shirazi Shahrokh Ghaemmaghami 《Multimedia Tools and Applications》2010,50(2):415-435

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification. 相似文献

13.

Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination

《Engineering Applications of Artificial Intelligence》2007,20(6):783-793

Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an adaptive network-based fuzzy inference system (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called warped LPC-based spectral centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and the classical features proposed in the literature for audio classification is performed, aiming to assess the good discriminatory power of the proposed feature. The vector length used to describe the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). With the aim of increasing the classification accuracy percentage, the feature space is then transformed to a new feature space by LDA. The classification task is performed applying ANFIS to the features in the transformed space. To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech signals show the good discriminating power of the proposed approach. 相似文献

14.

An analysis of content-based classification of audio signals using a fuzzy c-means algorithm

Mohammad A. Haque Jong-Myon Kim 《Multimedia Tools and Applications》2013,63(1):77-92

Content-based audio signal classification into broad categories such as speech, music, or speech with noise is the first step before any further processing such as speech recognition, content-based indexing, or surveillance systems. In this paper, we propose an efficient content-based audio classification approach to classify audio signals into broad genres using a fuzzy c-means (FCM) algorithm. We analyze different characteristic features of audio signals in time, frequency, and coefficient domains and select the optimal feature vector by employing a noble analytical scoring method to each feature. We utilize an FCM-based classification scheme and apply it on the extracted normalized optimal feature vector to achieve an efficient classification result. Experimental results demonstrate that the proposed approach outperforms the existing state-of-the-art audio classification systems by more than 11% in classification performance. 相似文献

15.

Speech and music classification using spectrogram based statistical descriptors and extreme learning machine

Birajdar Gajanan K. Patil Mukesh D. 《Multimedia Tools and Applications》2019,78(11):15141-15168

相似文献

16.

Content-based audio classification and segmentation by using support vector machines 总被引：9，自引：0，他引：9

Lie Lu Hong-Jiang Zhang Stan Z. Li 《Multimedia Systems》2003,8(6):482-492

Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM. 相似文献

17.

A flexible and scalable audio information retrieval system for mixed‐type audio signals

Ebru Doğan Mustafa Sert Adnan Yazıcı 《国际智能系统杂志》2011,26(10):952-970

The content‐based classification and retrieval of real‐world audio clips is one of the challenging tasks in multimedia information retrieval. Although the problem has been well studied in the last two decades, most of the current retrieval systems cannot provide flexible querying of audio clips due to the mixed‐type form (e.g., speech over music and speech over environmental sound) of audio information in real world. We present here a complete, scalable, and extensible content‐based classification and retrieval system for mixed‐type audio clips. The system gives users an opportunity for flexible querying of audio data semantically by providing four alternative ways, namely, querying by mixed‐type audio classes, querying by domain‐based fuzzy classes, querying by temporal information and temporal relationships, and querying by example (QBE). In order to reduce the retrieval time, a hash‐based indexing technique is introduced. Two kinds of experiments were conducted on the audio tracks of the TRECVID news broadcasts to evaluate the performance of the proposed system. The results obtained from our experiments demonstrate that the Audio Spectrum Flatness feature in MPEG‐7 standard performs better in music audio samples compared to other kinds of audio samples and the system is robust under different conditions. © 2011 Wiley Periodicals, Inc. 相似文献

18.

基于内容的音频检索:概念和方法 总被引：38，自引：1，他引：37

李国辉李恒峰《小型微型计算机系统》2000,21(11):1173-1177

Ｆ过去对视觉媒体的检索,如图象和视频,进行了大量的研究。但是我们注意到音频也是多媒体中的一种典型媒体,是信息的一种常用载体。常规的自理是把数字音频当成非结构化流媒体。然而音频是语音的载体、包含丰富的听觉特征,并且具有结构信息。因此需要并且可以基于这些内容对音频进行存取。本文根据当前相关研究的进展,综述基于内容的音频检索方法,包括面向语音、音乐和音频分析的检索、音频分割等;分析并总结出音频内容及其检相似文献

19.

Hierarchical audio content classification system using an optimal feature selection algorithm

P. Krishnamoorthy Sarvesh Kumar 《Multimedia Tools and Applications》2011,54(2):415-444

This paper proposes a hierarchical time-efficient method for audio classification and also presents an automatic procedure to select the best set of features for audio classification using Kolmogorov-Smirnov test (KS-test). The main motivation for our study is to propose a framework of general genre (e.g., action, comedy, drama, documentary, musical, etc...) movie video abstraction scheme for embedded devices-based only on the audio component. Accordingly simple audio features are extracted to ensure the feasibility of real-time processing. Five audio classes are considered in this paper: pure speech, pure music or songs, speech with background music, environmental noise and silence. Audio classification is processed in three stages, (i) silence or environmental noise detection, (ii) speech and non-speech classification and (iii) pure music or songs and speech with background music classification. The proposed system has been tested on various real time audio sources extracted from movies and TV programs. Our experiments in the context of real time processing have shown the algorithms produce very satisfactory results. 相似文献