首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
王静宇  张纯  许枫 《计算机应用》2022,(S1):310-315
为了检测野外复杂噪声环境中的鸟鸣声信号,提出一种基于人耳听觉特性的听觉子带能量特征鸟鸣声端点检测方法。利用反映人耳听觉特征的Mel频率尺度变换,将鸟鸣声信号在频域划分为24个子带(称为Mel子带),分析鸟鸣声信号的Mel子带能量分布特征,选取能量值最高的Mel子带能量作为特征量进行端点检测。通过仿真和野外实测数据对比了基于短时能量法的端点检测性能,结果表明Mel子带能量法在信噪比(SNR)为-10 dB条件下仍能检测到鸟鸣声信号,对风声、海浪声等海岛环境噪声也具有较强的抗干扰性能,性能优于短时能量法。  相似文献   

2.
针对环境声音分类(ESC),提出了一种基于多分辨率特征和时频注意力的卷积神经网络环境声音分类方法.首先,相较单一分辨率的谱图,多通道多分辨率特征可以丰富特征信息,实现不同特征分辨率之间信息互补,增强特征的表达能力;其次,针对声信号提出了一种时频注意力模块,该模块先利用不同大小的一维卷积分别关注时域和频域有效信息,再用二维卷积将两者进行融合,从而抑制环境声中背景噪声并消除由多通道多分辨率带来的冗余信息干扰.实验结果表明,在ESC-10和ESC-50两个基准数据集上的分类准确率达到了98.50%和88.46%,与现有的最新方法相比分别提高了2.70%和0.76%.  相似文献   

3.
为了提升深度卷积神经网络对音乐频谱流派特征的提取效果,提出一种基于频谱空间域特征注意的音乐流派分类算法模型DCNN-SSA。DCNN-SSA模型通过对不同音乐梅尔谱图的流派特征在空间域上进行有效标注,并且改变网络结构,从而在提升特征提取效果的同时确保模型的有效性,进而提升音乐流派分类的准确率。首先,将原始音频信号进行梅尔滤波,以模拟人耳的滤波操作对音乐的音强及节奏变化进行有效过滤,所生成的梅尔谱图进行切割后输入网络;然后,通过深化网络层数、改变卷积结构及增加空间注意力机制对模型在流派特征提取上进行增强;最后,通过在数据集上进行多批次的训练与验证来有效提取并学习音乐流派特征,从而得到可以对音乐流派进行有效分类的模型。在GTZAN数据集上的实验结果表明,基于空间注意的音乐流派分类算法与其他深度学习模型相比,在音乐流派分类准确率和模型收敛效果上有所提高,准确率提升了5.36个百分点~10.44个百分点。  相似文献   

4.
针对频谱图对于音乐特征挖掘较弱、深度学习分类模型复杂且训练时间长的问题,设计了一种基于频谱增强和卷积宽度学习(CNNBLS)的音乐流派分类模型.该模型首先通过SpecAugment中随机屏蔽部分频率信道的方法增强梅尔频谱图,再将切割后的梅尔频谱图作为CNNBLS的输入,同时将指数线性单元函数(ELU)融合进CNNBLS的卷积层,以增强其分类精度.相较于其他机器学习网络框架, CNNBLS能用少量的训练时间获得较高的分类精度.此外, CNNBLS可以对增量数据进行快速学习.实验结果表明:无增量模型CNNBLS在训练400首音乐数据可获得90.06%的分类准确率,增量模型Incremental-CNNBLS在增加400首训练数据后可达91.53%的分类准确率.  相似文献   

5.
针对鸟鸣声信号的非稳态特性,提出了一种基于自适应最优核时频分布(Adaptive optimal kernel, AOK)的鸟类识别方法。首先对采集的鸟鸣声信号进行预处理,通过AOK时频分析方法得到时频谱图,分析不同鸟类声音信号在不同时间和不同频率下的能量分布。然后,将时频谱 图转化成灰度图像,求取灰度共生矩阵,提取基于灰度共生矩阵不同角度的图像特征参数作 为鸟类识别的特征值。最后选取已知鸟种的图像纹理特征训练生成训练模板,将待识别的鸟 种的图像纹理特征参数生成测试模板,利用动态规整(Dynamic time warping,DTW)算法进行模板的匹配,将匹配值进行大小比较,找到最小匹配值对应的模板,从而实现鸟类的识别。 通过对40种常见鸟类的实验表明,总体识别率达到96%  相似文献   

6.
针对已有的鼾声分类模型因未考虑实际睡眠时的其他声音而导致的泛化能力差、准确率较低等问题,提出一种基于注意力机制的NewVGG16双向门控循环单元(NVGG16-BiGRU-Att)算法用于鼾声识别。首先,生成每个声段的谱图,采用NVGG16网络提取语谱图、梅尔(Mel)时频图和恒Q变换(CQT)时频图组成的谱图特征矩阵;其次,将提取的特征向量输入BiGRU,结合注意力机制,增加分类过程中的重要特征信息的权重,改善分类效果;最后,经过全连接层输出鼾声与非鼾声。在采集的鼾声数据集上进行实验,实验结果表明,所提算法取得了较好的分类效果,其中Mel时频图效果最优,识别准确率达到96.18%;相较于卷积神经网络(CNN)+长短期记忆(LSTM)网络、卷积CNNsLSTMs-深度神经网络(DNNs)模型,在同特征输入下,所提算法的准确率提升了0.31%~2.39%,验证了所提算法具有较好的鲁棒性,能够提升分类性能。  相似文献   

7.
心率失常是心血管疾病诊断的重要手段,其自动分类具有重要的临床意义。为了提高心率失常分类的准确性,结合一维卷积神经网络(Convolutional Neural Networks,CNN)和注意力机制(Attention)提出了一种CNN+Attention的深度学习模型,使用CNN提取心电信号的一维时域特征。针对一维时序心电信号时域特征表征能力有限的问题,使用短时傅里叶变换(Short-Time Fourier transform,STFT)将心电信号变换到时频域,通过Attention提取心电信号的时频域全局相关依赖关系,将时域与时频域特征融合对5种类型心电信号进行分类。在MIT-BIH数据集上验证了模型的有效性,所提模型对5种类型心电信号的平均分类准确率、精准率、召回率、灵敏度以及F1_Score分别为99.72%、98.55%、99.46%、99.90%以及99.00%。与已有先进方法对比,验证了所提模型具有先进的性能表现。  相似文献   

8.
为了解决语音情感识别中数据集样本分布不平衡的问题,提出一种结合数据平衡和注意力机制的卷积神经网络(CNN)和长短时记忆单元(LSTM)的语音情感识别方法.该方法首先对语音情感数据集中的语音样本提取对数梅尔频谱图,并根据样本分布特点对进行分段处理,以便实现数据平衡处理,通过在分段的梅尔频谱数据集中微调预训练好的CNN模型,用于学习高层次的片段语音特征.随后,考虑到语音中不同片段区域在情感识别作用的差异性,将学习到的分段CNN特征输入到带有注意力机制的LSTM中,用于学习判别性特征,并结合LSTM和Softmax层从而实现语音情感的分类.在BAUM-1s和CHEAVD2.0数据集中的实验结果表明,本文提出的语音情感识别方法能有效地提高语音情感识别性能.  相似文献   

9.
基于MFCC和双重GMM的鸟类识别方法   总被引:1,自引:0,他引:1  
针对鸟类鸣声信号变化丰富和复杂的特点,提出一种基于MFCC和鸣叫、鸣唱声GMM模型的鸟类识别方法。该方法拟采用将鸟鸣声分为鸟叫声和鸟唱声的策略,分别提取其特征参数MFCC,提出双重GMM模型进行训练和识别。用8种鸟的鸣叫声和鸣唱声1077个样本进行实验,实验结果表明,双重GMM模型的识别率达到90%以上,与单一鸣声模型相比具有更高的识别率。  相似文献   

10.
为了准确提取信号所包含的主要频率分量,对多分量非平稳声信号进行了时频分析。利用短时傅立叶变换将多分量非平稳声信号由时域变换到时频域,根据谱图提取信号的主要频率分量。分析结果表明:多分量非平稳信号的各主要频率分量及其时频域特性参数可以准确提取。短时傅立叶变换是提取多分量非平稳声信号主要频率分量的有效方法。  相似文献   

11.
基于Radon和平移不变性小波变换的鸟类声音识别   总被引:1,自引:0,他引:1  
周晓敏  李应 《计算机应用》2014,34(5):1391-1396
针对低信噪比(SNR)环境下鸟叫声识别率不够高的问题,提出一种对声谱图进行Radon变换(RT)和平移不变性离散小波变换(TIDWT)的抗噪型鸟类声音识别技术。首先,使用改进的多频带谱减法对鸟叫声进行降噪处理;其次,利用短时能量检测降噪后的鸟叫声的静音段,并去除静音段;接着,将去除静音段的声音信号转化为声谱图,并对声谱图进行RT和TIDWT,提取特征值;最后,采用支持向量机(SVM)分类器对提取的特征值进行分类识别。实验结果表明,该方法在信噪比为10dB及以下仍可以达到较好的识别效果。  相似文献   

12.
在近些年的研究中,单设备的声音场景分类已经取得不错的效果,然而多设备声音场景分类的进展缓慢。为了解决多设备分类时样本数量差异大的问题,提出了一种配对特征融合算法。通过计算每一对配对样本在频谱图上的差异,将这些差异累加后取平均,可以获得各个设备的平均频谱特征,用于设备样本的转换。该算法在增加设备样本数量的同时有效提升了模型的泛化能力。同时,为了获取全局信息,提出了一种轻量级注意力模块,通过对输入特征在频域上压缩后进行自注意力操作,可以在减少计算量的基础上使模型专注于整个声音序列信息的训练,实验结果表明所提算法在模型大小和分类精度方面与其他方法相比具有较好的优势。  相似文献   

13.
在语音情感识别研究中,已有基于深度学习的方法大多没有针对语音时频两域的特征进行建模,且存在网络模型训练时间长、识别准确性不高等问题。语谱图是语音信号转换后具有时频两域的特殊图像,为了充分提取语谱图时频两域的情感特征,提出了一种基于参数迁移和卷积循环神经网络的语音情感识别模型。该模型把语谱图作为网络的输入,引入AlexNet网络模型并迁移其预训练的卷积层权重参数,将卷积神经网络输出的特征图重构后输入LSTM(Long Short-Term Memory)网络进行训练。实验结果表明,所提方法加快了网络训练的速度,并提高了情感识别的准确率。  相似文献   

14.
The key solution to study birds in their natural habitat is the continuous survey using wireless sensors networks (WSN). The final objective of this study is to conceive a system for monitoring threatened bird species using audio sensor nodes. The principal feature for their recognition is their sound. The main limitations encountered with this process are environmental noise and energy consumption in sensor nodes. Over the years, a variety of birdsong classification methods has been introduced, but very few have focused to find an adequate one for WSN. In this paper, a tonal region detector (TRD) using sigmoid function is proposed. This approach for noise power estimation offers flexibility, since the slope and the mean of the sigmoid function can be adapted autonomously for a better trade-off between noise overvaluation and undervaluation. Once the tonal regions in the noisy bird sound are detected, the features gammatone teager energy cepstral coefficients (GTECC) post-processed by quantile-based cepstral normalization were extracted from the above signals for classification using deep neural network classifier. Experimental results for the identification of 36 bird species from Tonga lake (northeast of Algeria) demonstrate that the proposed TRD–GTECC feature is highly effective and performs satisfactorily compared to popular front-ends considered in this study. Moreover, recognition performance, noise immunity and energy consumption are considerably improved after tonal region detection, indicating that it is a very suitable approach for the acoustic bird recognition in complex environments with wireless sensor nodes.  相似文献   

15.
陈晓  曾昭优 《测控技术》2024,43(6):21-25
为了在低参数量下提高鸟鸣声的识别准确率,提出了一种新的鸟声识别方法,包括鸟声信号特征优化和乌鸦搜索-支持向量机(Support Vector Machine,SVM)分类识别。该方法首先采用主成分分析法对从鸟声中提取的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)和翻转梅尔频率倒谱系数进行选择,得到优化后的声音特征参数并将其作为鸟声识别算法的输入;然后利用乌鸦搜索算法对SVM的核参数和损失值进行选优,得到改进的SVM网络用于鸟声分类识别。试验结果表明,该方法对5种鸟声识别的准确率为92.2%,声音特征维数在16时可以得到最好的识别效果。该方法为野外鸟声自动识别提供了一种可行的方式。  相似文献   

16.
An intelligent sound-based early fault detection system has been proposed for vehicles using machine learning. The system is designed to detect faults in vehicles at an early stage by analyzing the sound emitted by the car. Early detection and correction of defects can improve the efficiency and life of the engine and other mechanical parts. The system uses a microphone to capture the sound emitted by the vehicle and a machine-learning algorithm to analyze the sound and detect faults. A possible fault is determined in the vehicle based on this processed sound. Binary classification is done at the first stage to differentiate between faulty and healthy cars. We collected noisy and normal sound samples of the car engine under normal and different abnormal conditions from multiple workshops and verified the data from experts. We used the time domain, frequency domain, and time-frequency domain features to detect the normal and abnormal conditions of the vehicle correctly. We used abnormal car data to classify it into fifteen other classical vehicle problems. We experimented with various signal processing techniques and presented the comparison results. In the detection and further problem classification, random forest showed the highest results of 97% and 92% with time-frequency features.  相似文献   

17.
通过分析心音信号对心脏早期的病理状态进行确诊具有重要的意义。提出了一种基于深度卷积神经网络的心音分类方法。将心音信号转化成具有时频特性的梅尔频谱系数(Mel Frequency Spectral Coefficient,MFSC)特征图,将其作为深度卷积神经网络模型的输入;利用深度卷积神经网络对MFSC特征图进行训练,引入中心损失函数建立最优的深度学习模型;测试阶段,先将心音信号转换成多张二维MFSC特征图,然后利用训练好的深度学习模型对其分类,最后利用多数表决原则判断心音信号的类别。针对人工标注的训练样本有限,导致模型训练正确率不高的问题,以心音的二维MFSC特征图为对象分别从时间域和频率域进行随机屏蔽处理进而扩充训练样本。实验结果表明,该方法在PASCAL心音数据集上进行测试,对正常、杂音、早搏三种心音的分类性能明显优于现有最好的方法。  相似文献   

18.
Over the past decade, frog biodiversity has rapidly declined due to many problems including habitat loss and degradation, introduced invasive species, and environmental pollution. Frogs are greatly important to improve the global ecosystem and it is ever more necessary to monitor frog biodiversity. One way to monitor frog biodiversity is to record audio of frog calls. Various methods have been developed to classify these calls. However, to the best of our knowledge, there is still no paper that reviews and summarizes currently developed methods. This survey gives a quantitative and detailed analysis of frog call classification. To be specific, a frog call classification system consists of signal pre-processing, feature extraction, and classification. Signal pre-processing is made up of signal processing, noise reduction, and syllable segmentation. Following signal preprocessing, the next step is feature extraction, which is the most crucial step for improving classification performance. Features used for frog call classification are categorized into four types: (1) time domain and frequency domain features (we classify time domain and frequency domain features into one type because they are often combined together to achieve higher classification accuracy), (2) time-frequency features, (3) cepstral features, and (4) other features. For the classification step, different classifiers and evaluation criteria used for frog call classification are investigated. In conclusion, we discuss future work for frog call classification.  相似文献   

19.
Zhao  XiaoMing  Wang  Xinxin  Cheng  De 《Multimedia Tools and Applications》2020,79(31-32):23045-23069

Inspired by biological perceptual characteristics in human auditory systems and the mechanisms of saliency detection, we study the relevance constraint between time-frequency characteristics of sound signals and the multiple spectrogram and propose a co-saliency detection method for multiple sound signals in this paper. Then, according to the auditory characteristics of the human ear, the distinctive saliency features from the acoustic channel and the image channel are fused. Finally, an auditory saliency map is obtained to complete the detection of significant sounds. The saliency features of the acoustic channel include the features calculated in the in the temporal and spectral domains of signal, which the temporal saliency features could be represented by the local maximum points in the Power Spectral Density (PSD) curve, and the spectral features could be represented by local maximum points in Mel Frequency Cepstrum Coefficient (MFCC) curve of sound signal. The saliency features of acoustic channel and cross-scale fusion with the contrast cue of spectrogram, whose result is more in line with the human auditory attention mechanism. Finally, combined with the corresponding cue which could reflect the distribution between multiple spectrograms, it could reflect the characteristics of global repeatability, and reflect high frequency of occurrence. Experimentally, the auditory Co-Saliency map verifies the accuracy and robustness of proposed method in this paper. It shows that the proposed method is superior to other traditional detection methods for auditory saliency, and can implement intelligent automatic detection to sound signals.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号