首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
为了提升深度卷积神经网络对音乐频谱流派特征的提取效果,提出一种基于频谱空间域特征注意的音乐流派分类算法模型DCNN-SSA。DCNN-SSA模型通过对不同音乐梅尔谱图的流派特征在空间域上进行有效标注,并且改变网络结构,从而在提升特征提取效果的同时确保模型的有效性,进而提升音乐流派分类的准确率。首先,将原始音频信号进行梅尔滤波,以模拟人耳的滤波操作对音乐的音强及节奏变化进行有效过滤,所生成的梅尔谱图进行切割后输入网络;然后,通过深化网络层数、改变卷积结构及增加空间注意力机制对模型在流派特征提取上进行增强;最后,通过在数据集上进行多批次的训练与验证来有效提取并学习音乐流派特征,从而得到可以对音乐流派进行有效分类的模型。在GTZAN数据集上的实验结果表明,基于空间注意的音乐流派分类算法与其他深度学习模型相比,在音乐流派分类准确率和模型收敛效果上有所提高,准确率提升了5.36个百分点~10.44个百分点。  相似文献   

2.

In order to improve the efficiency and accuracy of collecting theater music data and recognizing genres, and solve the problem that a single feature in traditional algorithms cannot efficiently classify music genres, the technology of collecting data based on the Internet of Things (IoT) and Deep Belief Network (DBN) is firstly proposed for application of collecting music data and recognizing genres. The study firstly introduces the scheme of collecting music data based on the Internet of Things (IoT) and the theoretical basis of the recognition and classification of music genres by DBN. Second, the study focuses on the construction and improvement of the music genre recognition algorithm under DBN. In other words, the algorithm is optimized by adding Dropout and momentum to the network, and the optimal network model after training is implemented. And finally, the effectiveness of the algorithm is confirmed by experiment and research. The experimental results show that the efficiency of the optimized algorithm of identifying the music genres in the music library reaches 75.8%, which is far better than the traditional classic algorithms in the past. It is concluded that the technology of music data collection and genre recognition based on IoT and DBN has strong advantages, which will greatly contribute to reducing the workload of manual collection and identification of classification features, and improve efficiency. Meanwhile, the optimized algorithm model can also be extended to other fields.

  相似文献   

3.
4.
在音乐流派分类过程中,音乐流派局部特征与整体特征不一致时,通常采用的局部特征投票取最大的方法(MaxVote)在音频片段流派分类精度不高,而流派特征分布比较均衡时分类结果不合理。针对以上问题,该文提出基于音乐片段流派分布特征的神经网络投票机制(NNVote)和结合高层音乐节奏特征的RhythmNNVote投票方法。实验结果表明,NNVote方法在7个流派上的分类总精度达到68.9%,较MaxVote提高将近10%。  相似文献   

5.
由于音乐节拍的强度、快慢、持续时间等是反映音乐不同流派风格的重要语义特征,而音乐节拍多属于由打击乐器所产生的低频部分,为此利用小波变换对音乐信号进行6层分解来提取低频节拍特征;针对节拍特征差异不明显的音乐流派,提出用描述频域能量包络的MFCC声学特征与节拍特征结合,并用基于音乐流派机理分析的8阶MFCC代替常用的12阶MFCC。对8类音乐流派实验仿真结果表明,基于语义特征和声学特征结合的方法,总体分类准确率可达68.37%,同时特征维数增加对分类时间影响很小。  相似文献   

6.
针对机器学习模型对音乐流派特征识别能力较弱的问题,提出了一种基于深度卷积神经网络的音乐流派识别(DCNN-MGR)模型。该模型首先通过快速傅里叶变换提取音频信息,生成可以输入DCNN的频谱并切割生成频谱切片。然后通过融合带泄露整流(Leaky ReLU)函数、双曲正切(Tanh)函数和Softplus分类器对AlexNet进行增强。其次将生成的频谱切片输入增强的AlexNet进行多批次的训练与验证,提取并学习音乐特征,得到可以有效分辨音乐特征的网络模型。最后使用输出模型进行音乐流派识别测试。实验结果表明,增强的AlexNet在音乐特征识别准确率和网络收敛效果上明显优于AlexNet及其他常用的DCNN、DCNN-MGR模型在音乐流派识别准确率上比其他机器学习模型提升了4%~20%。  相似文献   

7.
孙辉  许洁萍  刘彬彬 《计算机应用》2015,35(6):1753-1756
针对不同特征向量下选择最优核函数的学习方法问题,将多核学习支持向量机(MK-SVM)应用于音乐流派自动分类中,提出了将最优核函数进行加权组合构成合成核函数进行流派分类的方法。多核分类学习能够针对不同的声学特征采用不同的最优核函数,并通过学习得到各个核函数在分类中的权重,从而明确各声学特征在流派分类中的权重,为音乐流派分类中特征向量的分析和选择提供了一个清晰、明确的结果。在ISMIR 2011竞赛数据集上验证了提出的基于多核学习支持向量机(MKL-SVM)的分类方法,并与传统的基于单核支持向量机的方法进行了比较分析。实验结果表明基于MKL-SVM的音乐流派自动分类准确率比传统单核支持向量机的分类准确率提高了6.58%,且该方法与传统的特征选择结果比较,更清楚地解释了所选择的特征向量对流派分类的影响大小,通过选择影响较大的特征组合进行分类,分类结果也有了明显的提升。  相似文献   

8.
行为识别是当前计算机视觉方向中视频理解领域的重要研究课题。从视频中准确提取人体动作的特征并识别动作,能为医疗、安防等领域提供重要的信息,是一个十分具有前景的方向。本文从数据驱动的角度出发,全面介绍了行为识别技术的研究发展,对具有代表性的行为识别方法或模型进行了系统阐述。行为识别的数据分为RGB模态数据、深度模态数据、骨骼模态数据以及融合模态数据。首先介绍了行为识别的主要过程和人类行为识别领域不同数据模态的公开数据集;然后根据数据模态分类,回顾了RGB模态、深度模态和骨骼模态下基于传统手工特征和深度学习的行为识别方法,以及多模态融合分类下RGB模态与深度模态融合的方法和其他模态融合的方法。传统手工特征法包括基于时空体积和时空兴趣点的方法(RGB模态)、基于运动变化和外观的方法(深度模态)以及基于骨骼特征的方法(骨骼模态)等;深度学习方法主要涉及卷积网络、图卷积网络和混合网络,重点介绍了其改进点、特点以及模型的创新点。基于不同模态的数据集分类进行不同行为识别技术的对比分析。通过类别内部和类别之间两个角度对比分析后,得出不同模态的优缺点与适用场景、手工特征法与深度学习法的区别和融合多模态的优...  相似文献   

9.
The aim of this article is to investigate whether separating music tracks at the pre-processing phase and extending feature vector by parameters related to the specific musical instruments that are characteristic for the given musical genre allow for efficient automatic musical genre classification in case of database containing thousands of music excerpts and a dozen of genres. Results of extensive experiments show that the approach proposed for music genre classification is promising. Overall, conglomerating parameters derived from both an original audio and a mixture of separated tracks improve classification effectiveness measures, demonstrating that the proposed feature vector and the Support Vector Machine (SVM) with Co-training mechanism are applicable to a large dataset.  相似文献   

10.
目的 食物图片具有结构多变、背景干扰大、类间差异小、类内差异大等特点,比普通细粒度图片的识别难度更大。目前在食物图片识别领域,食物图片的识别与分类仍存在精度低、泛化性差等问题。为了提高食物图片的识别与分类精度,充分利用食物图片的全局与局部细节信息,本文提出了一个多级卷积特征金字塔的细粒度食物图片识别模型。方法 本文模型从整体到局部逐级提取特征,将干扰较大的背景信息丢弃,仅针对食物目标区域提取特征。模型主要由食物特征提取网络、注意力区域定位网络和特征融合网格3部分组成,并采用3级食物特征提取网络的级联结构来实现特征由全局到局部的转移。此外,针对食物图片尺度变化大的特点,本文模型在每级食物特征提取网络中加入了特征金字塔结构,提高了模型对目标大小的鲁棒性。结果 本文模型在目前主流公开的食物图片数据集Food-101、ChineseFoodNet和Food-172上进行实验,分别获得了91.4%、82.8%、90.3%的Top-1正确率,与现有方法相比提高了1%~8%。结论 本文提出了一种多级卷积神经网络食物图片识别模型,可以自动定位食物图片区分度较大的区域,融合食物图片的全局与局部特征,实现了食物图片的细粒度识别,有效提高了食物图片的识别精度。实验结果表明,该模型在目前主流食物图片数据集上取得了最好的结果。  相似文献   

11.
基于深度学习的作曲家分类问题   总被引:1,自引:0,他引:1  
在音乐信息检索领域,作曲家分类是一个十分重要的问题,这一问题的目标是通过音频数据来识别相应的作曲家信息.传统的分类算法都是通过提取复杂的特征来进行分类的,而深层神经网络在特征学习上具有比较强的能力,因此提出用深层神经网络来解决这一问题.为了结合不同深层神经网络模型的优点,设计了一种混合模型,该模型基于深度置信网络(deep belief network, DBN)和级联去噪自编码器(stacked denoising autoencoder, SDA),可以较好地解决作曲家分类问题.实验表明,该模型取得了76.26%的正确率,这一结果比单纯用某一种模型搭建的深层神经网络以及支持向量机要好.和图像数据类似,人脑在提取音乐特征也是分层的,每一层对信号的处理不一样,因此混合模型在解决作曲家分类问题上具有一定的优势.  相似文献   

12.
音乐是表达情感的重要载体,音乐情感识别广泛应用于各个领域.当前音乐情感研究中,存在音乐情感数据集稀缺、情感量化难度大、情感识别精准度有限等诸多问题,如何借助人工智能方法对音乐的情感趋向进行有效的、高质量的识别成为当前研究的热点与难点.总结目前音乐情感识别的研究现状,从音乐情感数据集、音乐情感模型、音乐情感分类方法三方面...  相似文献   

13.
Temporal feature integration is the process of combining all the feature vectors in a time window into a single feature vector in order to capture the relevant temporal information in the window. The mean and variance along the temporal dimension are often used for temporal feature integration, but they capture neither the temporal dynamics nor dependencies among the individual feature dimensions. Here, a multivariate autoregressive feature model is proposed to solve this problem for music genre classification. This model gives two different feature sets, the diagonal autoregressive (DAR) and multivariate autoregressive (MAR) features which are compared against the baseline mean-variance as well as two other temporal feature integration techniques. Reproducibility in performance ranking of temporal feature integration methods were demonstrated using two data sets with five and eleven music genres, and by using four different classification schemes. The methods were further compared to human performance. The proposed MAR features perform better than the other features at the cost of increased computational complexity.  相似文献   

14.
Liu  Caifeng  Feng  Lin  Liu  Guochao  Wang  Huibing  Liu  Shenglan 《Multimedia Tools and Applications》2021,80(5):7313-7331

Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/music-genre-classification.

  相似文献   

15.
In this paper, music genre taxonomies are used to design hierarchical classifiers that perform better than flat classifiers. More precisely, a novel method based on sequential pattern mining techniques is proposed for the extraction of relevant characteristics that enable to propose a vector representation of music genres. From this representation, the agglomerative hierarchical clustering algorithm is used to produce music genre taxonomies. Experiments are realized on the GTZAN dataset for performances evaluation. A second evaluation on GTZAN augmented by Afro genres has been made. The results show that the hierarchical classifiers obtained with the proposed taxonomies reach accuracies of 91.6 % (more than 7 % higher than the performances of the existing hierarchical classifiers).  相似文献   

16.
针对自动的音乐流派分类这一音乐信息检索领域的热点问题,提出了多模态音乐流派分类的概念。针对传统的基于底层声学特征的音乐流派分类中的特征选择环节,实现了一种全新的特征选择算法——基于特征间相互影响的前向特征选择算法(IBFFS)。开创性地使用LDA(latent Dirichlet allocation)模型处理音乐标签,将标签属于每个流派的概率通过计算转换为对应的音乐属于每个流派的概率。  相似文献   

17.
吴克伟  高涛  谢昭  郭文斌 《软件学报》2022,33(5):1865-1879
针对现有基于视频整体时间结构建模的行为识别方法中,存在的时间噪声信息和歧义信息干扰现象,从而引起行为类别识别错误的问题,提出一种新型的Grenander推理优化下时间图模型(temporal graph model with Grenander inference, TGM-GI).首先,构建3D CNN-LSTM模块,其中3D CNN用于行为的动态特征提取, LSTM模块用于该特征的时间依赖关系优化.其次,在深度模块基础上,利用Grenander理论构建了行为识别的时间图模型,并设计了两个模块分别处理慢行为时间冗余和异常行为干扰问题,实现了时间噪声抑制下的时间结构提议.随后,设计融合特征约束和语义约束的Grenander测度,并提出一种时序增量形式的Viterbi算法,修正了行为时间模式中的歧义信息.最后,采用基于动态时间规划的模式匹配方法,完成了基于时间模式的行为识别任务.在UCF101和Olympic Sports两个公认数据集上,与现有多种基于深度学习的行为识别方法进行比较,该方法获得了最好的行为识别正确率.该方法优于基准的3D CNN-LSTM方法,在UCF101数据集上识别...  相似文献   

18.
何新宇  张晓龙 《计算机应用》2019,39(6):1680-1684
当前的肺炎图像识别算法面临两个问题:一是肺炎特征提取器使用的迁移学习模型在源数据集与肺炎数据集上图像差异较大,所提取的特征不能很好地契合肺炎图像;二是算法使用的softmax分类器对高维特征处理能力不够强,在识别准确率上仍有提升的空间。针对这两个问题,提出了一种基于深度卷积神经网络的肺炎图像识别模型。首先使用ImageNet数据集训练好的GoogLeNet Inception V3网络模型进行特征提取;其次,增加了特征融合层,使用随机森林分类器进行分类预测。实验在Chest X-Ray Images肺炎标准数据集上进行。实验结果表明,该模型的识别准确率、敏感度、特异度的值分别达到96.77%、97.56%、94.26%。在识别准确率以及敏感度指标上,与经典的GoogLeNet Inception V3+Data Augmentation (GIV+DA)算法相比,所提模型分别提高了1.26、1.46个百分点,在特异度指标上已接近GIV+DA算法的最优结果。  相似文献   

19.
Surface defect recognition is important to improve the surface quality of end products. In this area, there were many convolutional neural network (CNN)-based methods because CNN can extract features automatically. The extracted features determine the performance of recognition, so it is important for CNN-based methods to extract effective and sufficient features. However, feature extraction needs a large-scale dataset, which is hard to obtain. To save the cost of collecting samples and extract effective features, ensemble methods were proposed to make full use of the features extracted by CNN in order to guarantee good performance with limited samples. However, the methods are confined to utilize one sample – they extracted multi-level features from one individual sample – but ignore the vast information in a dataset. Due to the limit information in one sample, this paper turns the attention to the training dataset and attempts to mine the multi-level information in the dataset for predicting. The proposed method is named as Prototype vectors fusion-based CNN (ProtoCNN), which utilizes the prototype information in the training dataset. In training process, it trains a VGG11 as the base model, and meanwhile prototype vectors corresponding to each defect class are generated in multiple feature layers of VGG11. Then, in predicting process, the prototype vectors are fused to predict unknown samples. The experiments on three famous datasets, including NEU-CLS, wood dataset, and textile dataset indicate that the proposed ProtoCNN outperforms conventional ensemble models and other models for surface defect recognition. In these datasets, ProtoCNN has achieved the accuracy of 99.86%, 90.01%, and 81.28% respectively, which increase 1.05%, 4.07%, 19.53% compared to its base model respectively. Finally, this paper analyzes the effectiveness and practicality of prototype vectors, showing that the proposed ProtoCNN is practical for real world application.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号