期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈莎莎《自动化与仪器仪表》2022,(12):25-30

针对采集的三角钢琴踏板机械信号故障检测准确率低的问题,提出基于小波分析的三角钢琴踏板机械信号故障检测方法。采用频谱感知算法构建一个三角钢琴音频信号的多源信息采集模型,获取三角钢琴踏板机械音频信号;为提高钢琴踏板机械信号获取的准确率,通过频谱特征分离方法对采集信号进行降噪处理,采用小波包变换中的Mel倒谱系数MFCC对处理后的信号进行频谱特征分解;最后基于分解频谱特征进行三角钢琴音频信号的特征识别优化。实验表明,在相同的数据量下,对比于12维的MFCC频谱特征分解方法,提出的小波包分解的MFCC方法可分解出更多的钢琴频谱特征峰值,为后续三角钢琴踏板机械信号故障检测提供了有效的数据。实验发现,对比于传统卷积神经网络的音频特征识别方法,本方法的识别精度最高可达95.6%,且均保持在90及以上。综合分析可知,本方法可实现三角钢琴踏板机械信号故障准确识别。相似文献

2.

基于加权MFCC的音频检索 总被引：1，自引：0，他引：1

华斌张丽超赵富强《计算机工程与应用》2015,51(8):200-204

通过研究音频特征值提取和特征匹配算法,给出了一个完整的音频数据检索系统框架。该系统框架主要分析了音频特征提取和特征匹配。在音频特征提取部分对经典的MFCC系数进行了分析,提出了基于熵值法加权的MFCC系数,提高了检索的识别率。音频匹配部分根据特征参数矩阵表征音频信息的性质,引入了矩阵相似度的匹配方法,提高了检索效率。实验结果表明系统识别效率提高1.2%,用时降低22%,系统的性能得到明显改善。相似文献

3.

基于MFCC和神经网络的西瓜音频熟度识别

钱玲龙俞东芝杨义静《电子制作．电脑维护与应用》2018,(9)

本文研究了MFCC特征参数提取的算法原理和提取西瓜特征的方法,采用MFCC作为特征参数提取算法,通过PCA降维处理得到西瓜音频特征,并使用多种神经网络模型进行西瓜熟度的识别。实验结果表明MFCC提取的特征通过神经网络训练可以用于根据西瓜音频识别西瓜熟度。相似文献

4.

基于多核学习特征融合的语音情感识别方法

王忠民刘戈宋辉《计算机工程》2019,45(8):248-254

在语音情感识别中提取梅尔频率倒谱系数(MFCC)会丢失谱特征信息,导致情感识别准确率较低。为此,提出一种结合MFCC和语谱图特征的语音情感识别方法。从音频信号中提取MFCC特征,将信号转换为语谱图,利用卷积神经网络提取图像特征。在此基础上,使用多核学习算法融合音频特征,并将生成的核函数应用于支持向量机进行情感分类。在2种语音情感数据集上的实验结果表明,与单一特征的分类器相比,该方法的语音情感识别准确率高达96%。相似文献

5.

基于稀疏表示权重张量的音频特征提取算法

林静杨继臣张雪源李新超《计算机应用》2016,36(5):1426-1429

为了更好地描述非平稳音频信号的特征,提出了一种基于Gabor字典和稀疏表示权重张量的时-频音频特征提取方法。该方法基于Gabor字典将音频信号编码为稀疏的权重向量,并进一步将权重向量中的元素重新排列为张量形式,该张量各阶分别刻画了信号的时间、频率以及时长特性,为信号的联合时-频-长表示。通过对该张量进行因子分解,将分解后得到的频率因子和时长因子拼接为音频特征。针对稀疏张量分解时容易产生过拟合的问题,提出一种自调整惩罚参数分解算法并进行了改进。实验结果显示,所提出的特征相对于传统梅尔倒谱系数(MFCC)特征、MFCC特征及匹配追踪算法(MP)求解的特征联合拼接得到的MFCC+MP特征和非均匀尺度-频率图特征对15类音效分类效果分别提升了28.0%、19.8%和6.7%。相似文献

6.

基于SVM的流行音乐中人声的识别

石自强李海峰孙佳音《计算机工程与应用》2008,44(25):126-128

针对流行音乐中人声的发现问题,使用SVM分类器针对MFCC特征进行训练和分类。依据音频特征的连续性,后期对分类结果进行低通滤波。实验结果表明,该方法在帧层面上的识别率可以达到85.76%。实验中也发现不同语种的演唱者在发音上,特别是在MFCC特征上存在很大的统计差异性。实验中对歌曲分类的结果可以作为近一步实现音乐相似性度量的依据之一。相似文献

7.

一种面向基于内容视频检索的音频场景分割方法

朱映映明仲周景洲《小型微型计算机系统》2008,29(3):557-562

视频数据中的音频流包含了丰富的语义信息.在基于内容的视频检索中,对音频信息的分析是不可分割的一部分.本文主要讨论基于内容的音频场景分割,分析各种音频特征及提取方法,并在此基础上提出一种新的音频流分割方法,根据六种音频类型(语音、音乐、静音、环境音、纯语音、音乐背景下的语音和环境音背景下的语音)的音频特征对视频数据中的音频流分割音频场景.实验证明该方法是有效的,在保证一定的分割精度的同时,准确率和查全率都得到了较大的提高. 相似文献

8.

一种环境音频数据分类方法

魏丹芳李应《计算机与数字工程》2009,37(11):7-9,133

为了实现对环境音频信号及其相关数据的自动分析,提出了一种环境音频数据分类方法。分类过程中,首先用短时平均幅度对音频数据进行有效分段;然后,计算分段音频数据的长度和平均过零率;最后,计算并生成一个分段音频数据的Mel频率倒谱系数（MFCC）和一阶差分Mel频率倒谱系数（△MFCC）特征参数。分类操作上,根据有效分段的长度和平均过零率确定分类搜索的范围,并在局部范围内采用DTW（Dynamic Time Warping,动态时间规整）分类算法。实验结果验证了该方法对各种环境音频数据分类的有效性。相似文献

9.

有效的基于内容的音频特征提取方法 总被引：1，自引：1，他引：0

下载免费PDF全文

郑继明魏国华吴渝《计算机工程与应用》2009,45(12):131-133

音频特征提取是音频分类的基础,好的特征将会有效提高分类精度。在提取频域特征Mel频率倒谱系数（MFCC）的同时,对每一帧信号做离散小波变换,提取小波域特征,把频域和小波域特征相结合计算其统计特征。通过SVM模型建立音频模板,对纯语音、音乐及带背景音乐的语音进行分类识别,取得了较高的识别精度。相似文献

10.

基于分形特征的音频检索

李坚毛先领文贵华《计算机工程》2008,34(11):211-213

提出利用分形几何抽取音频特征的全局化音频检索,将其学习阶段计算音频数据库中每个音频的分维作为特征向量,保存在音频特征数据库中,并建立索引。其检索阶段则首先计算查询音频的分维,然后从音频数据库中快速找出分维最相似的若干音频对象。分维刻画了音频的内在属性如自相似性,使其具有片段检索对匹配的起点不敏感、抗噪音、检索速度快等优点。用FRACTAL, MFCC和SOLAR 3种方法对数据集分别检索,实验结果表明基于分维的音频检索在性能和时间复杂度上有显著优势。相似文献

11.

足球比赛视频中的精彩镜头分析方法

文军谢毓湘老松杨《计算机工程》2004,30(6):159-161

针对足球视频应用的实际需要，文章提出了一种通过视频、音频信息等多种特征的提取和融合来进行足球比赛视频分析的方法。该方法首先在视频中进行镜头探测，然后在单独的镜头中提取颜色特征，进行镜头分类；在所需要的镜头类型中对球场特征、运动对象特征和音频特征进行提取、融合，建立有效的分析方法，从而实现精彩镜头分析。相似文献

12.

Co-clustering for Auditory Scene Categorization

Rui Cai Lie Lu Hanjalic A. 《Multimedia, IEEE Transactions on》2008,10(4):596-606

Auditory scenes are temporal audio segments with coherent semantic content. Automatically classifying and grouping auditory scenes with similar semantics into categories is beneficial for many multimedia applications, such as semantic event detection and indexing. For such semantic categorization, auditory scenes are first characterized with either low-level acoustic features or some mid-level representations like audio effects, and then supervised classifiers or unsupervised clustering algorithms are employed to group scene segments into various semantic categories. In this paper, we focus on the problem of automatically categorizing audio scenes in unsupervised manner. To achieve more reasonable clustering results, we introduce the co-clustering scheme to exploit potential grouping trends among different dimensions of feature spaces (either low-level or mid-level feature spaces), and provide more accurate similarity measure for comparing auditory scenes. Moreover, we also extend the co-clustering scheme with a strategy based on the Bayesian information criterion (BIC) to automatically estimate the numbers of clusters. Evaluation performed on 272 auditory scenes extracted from 12-h audio data shows very encouraging categorization results. Co-clustering achieved a better performance compared to some traditional one-way clustering algorithms, both based on the low-level acoustic features and on the mid-level audio effect representations. Finally, we present our vision regarding the applicability of this approach on general multimedia data, and also show some preliminary results on content-based image clustering. 相似文献

13.

基于听觉外围模型的音频基频估计方法

刘鑫鲍长春《计算机工程与应用》2014,(17):29-33,67

针对音频信号中的暂态成分对基频检测可靠性的影响,提出了一种基于听觉外围模型的基频估计方法。该方法根据听觉外围模型来模拟声音在人耳内耳听觉神经上传导过程,并利用循环平均幅度差函数判断每个神经传导信号上呈现的时域周期性,进而提取音频信号的基频。实验结果表明,在纯净音频条件下,该方法能准确地估计出音频信号的基频,并且在不同音量打击乐信号的干扰下,所提方法的平均错误率低于三种参考方法。相似文献

14.

Film segmentation and indexing using autoassociative neural networks

K. Sreenivasa Rao Dipanjan Nandi Shashidhar G. Koolagudi 《International Journal of Speech Technology》2014,17(1):65-74

In this paper, Autoassociative Neural Network (AANN) models are explored for segmentation and indexing the films (movies) using audio features. A two-stage method is proposed for segmenting the film into sequence of scenes, and then indexing them appropriately. In the first stage, music and speech plus music segments of the film are separated, and music segments are labelled as title and fighting scenes based on their position. At the second stage, speech plus music segments are classified into normal, emotional, comedy and song scenes. In this work, Mel frequency cepstral coefficients (MFCCs), zero crossing rate and intensity are used as audio features for segmentation and indexing the films. The proposed segmentation and indexing method is evaluated on manual segmented Hindi films. From the evaluation results, it is observed that title, fighting and song scenes are segmented and indexed without any errors, and most of the errors are observed in discriminating the comedy and normal scenes. Performance of the proposed AANN models used for segmentation and indexing of the films, is also compared with hidden Markov models, Gaussian mixture models and support vector machines. 相似文献

15.

基于音频特征的多小波域水印算法 总被引：3，自引：0，他引：3

彭宏王珣王卫星王军胡德宇《计算机研究与发展》2010,47(2)

基于对音频特征的分析,提出了一种多小波域的水印算法.结合人类听觉系统的时频掩蔽特性,该算法分析音频帧的过零率及时域能量,确定用于嵌入水印的帧.利用音频的分抽样特征和多小波变换在信号处理中的优势,将每一个音频帧进行分抽样为两个子音频帧并分别将其变换到多小波域.利用两个子音频帧在多小波域的能量来估计所嵌入水印的容量,并根据它们的能量大小关系完成水印的嵌入.水印的提取过程转为一个使用支持向量机进行处理的二分类问题.实验结果验证了所提出的水印算法能根椐音频自身的特点寻找到适合用于嵌入水印的音频帧,且能动态调整水印的嵌入强度,在保证听觉质量的同时提高了水印的鲁棒性. 相似文献

16.

Audio-based description and structuring of videos

Hadi Harb Liming Chen 《International Journal on Digital Libraries》2006,6(1):70-81

相似文献

17.

基于提升小波与DCT的自适应音频水印算法

陈新龙李晓艳胡国庆沈嘉皓《计算机应用》2011,31(2):520-522

为了达到版权保护的目的,利用提升小波变换计算速度快、离散余弦变换（DCT）后直流系数的听觉容差性强的特点,提出了一种在提升小波域进行DCT的自适应音频水印算法。原始音频信号经提升小波变换后分解为低频子带和高频子带,对其低频子带进行DCT,将水印序列嵌入到DC系数上。考虑到水印音频信号的不可感知性和鲁棒性之间的平衡问题,采用了水印序列自适应调整嵌入。实验结果表明,该水印算法计算复杂度低,且对噪声、低通滤波等常见信号攻击及恶意替换操作均表现出很强的鲁棒性。相似文献

18.

自适应小波域数字音频水印嵌入算法研究

王向阳杨红颖赵红付斌《小型微型计算机系统》2006,27(7):1353-1357

以整型提升小波变换、人类听觉掩蔽特性、数字音频局部邻域特性为基础，提出了一种自适应小波域数字音频水印嵌入算法，该算法具有以下特点：（1）结合人类听觉系统的掩蔽特性，实现了水印嵌入位置的自适应确定；（2）引入了高效的整型提升小波变换；（3）利用数字音频的局部邻域特性，实现了数字水印嵌入深度的智能调节；（4）数字水印信息的提取不需要原始音频信号．对比实验表明：该自适应数字音频水印算法不仅具有较好的透明性，而且对诸如叠加噪声、有损压缩、低通滤波、重新采样、重新量化等攻击均具有较好的鲁棒性（特别是叠加噪声与低通滤波）。相似文献

19.

A Noise-Robust FFT-Based Auditory Spectrum With Application in Audio Classification 总被引：2，自引：0，他引：2

Wei Chu Champagne B. 《IEEE transactions on audio, speech, and language processing》2008,16(1):137-150

In this paper, we investigate the noise robustness of Wang and Shamma's early auditory (EA) model for the calculation of an auditory spectrum in audio classification applications. First, a stochastic analysis is conducted wherein an approximate expression of the auditory spectrum is derived to justify the noise-suppression property of the EA model. Second, we present an efficient fast Fourier transform (FFT)-based implementation for the calculation of a noise-robust auditory spectrum, which allows flexibility in the extraction of audio features. To evaluate the performance of the proposed FFT-based auditory spectrum, a set of speech/music/noise classification tasks is carried out wherein a support vector machine (SVM) algorithm and a decision tree learning algorithm (C4.5) are used as the classifiers. Features used for classification include conventional Mel-frequency cepstral coefficients (MFCCs), MFCC-like features obtained from the original auditory spectrum (i.e., based on the EA model) and the proposed FFT-based auditory spectrum, as well as spectral features (spectral centroid, bandwidth, etc.) computed from the latter. Compared to the conventional MFCC features, both the MFCC-like and spectral features derived from the proposed FFT-based auditory spectrum show more robust performance in noisy test cases. Test results also indicate that, using the new MFCC-like features, the performance of the proposed FFT-based auditory spectrum is slightly better than that of the original auditory spectrum, while its computational complexity is reduced by an order of magnitude. 相似文献

20.

Text-Like Segmentation of General Audio for Content-Based Retrieval

《Multimedia, IEEE Transactions on》2009,11(4):658-669

Automatic detection of (semantically) meaningful audio segments, or audio scenes, is an important step in high-level semantic inference from general audio signals, and can benefit various content-based applications involving both audio and multimodal (multimedia) data sets. Motivated by the known limitations of traditional low-level feature-based approaches, we propose in this paper a novel approach to discover audio scenes, based on an analysis of audio elements and key audio elements, which can be seen as equivalents to the words and keywords in a text document, respectively. In the proposed approach, an audio track is seen as a sequence of audio elements, and the presence of an audio scene boundary at a given time stamp is checked based on pair-wise measuring the semantic affinity between different parts of the analyzed audio stream surrounding that time stamp. Our proposed model for semantic affinity exploits the proven concepts from text document analysis, and is introduced here as a function of the distance between the audio parts considered, and the co-occurrence statistics and the importance weights of the audio elements contained therein. Experimental evaluation performed on a representative data set consisting of 5 h of diverse audio data streams indicated that the proposed approach is more effective than the traditional low-level feature-based approaches in solving the posed audio scene segmentation problem. 相似文献