首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 10 毫秒

Recently, with the advent of Convolutional Neural Network (CNN) era, Neural style transfer on images has become a very active research topic and the style of an image can be transferred to another image through a CNN so that the image retains both its own content and another style of image. In this work, we propose an algorithm for audio style transfer that uses the force of CNN to generate a new audio from a style audio. We use Continuous Wavelet Transfer(CWT) to convert the audio into a spectrogram and then use the spectrogram as the representation of the audio image through image style transfer method to obtain a new image, and finally, generate an audio using iterative phase reconstruction with Griffin-Lim. We succeed in transferring audio such as light music but had difficulty in transferring audio that has lyrics and high-level metrics such as emotion or tone. We propose several measures to improve the quality of audio and a lot of experimental results shows that our method is better than other methods in terms of sound quality.


Spammers often embed text into images in order to avoid filtering by text-based spam filters, which result in a large number of advertisement spam images. Garbage image recognition has become one of the hotspots in the field of Internet spam filtering research. Its goal is to solve the problem that traditional spam information filtering methods encounter a sharp performance decline or even failure when filtering spam image information. Based on the clustering algorithm, this paper proposes a method to expand the data samples, which greatly improves the number of high-quality training samples and meets the needs of model training. Then, we train a convolutional neural networks using the enlarged data samples to recognize the SPAM in real time. The experimental results show that the accuracy of the model is increased by more than 14% after using the method of data augmentation. The accuracy of the model can be improved by 6% compared with other methods of data augmentation. Combined with convolutional neural networks and the proposed method of data augmentation, the accuracy of our SPAM filtering model is 7–11% higher than that of the traditional method.  相似文献   

Object classification is a vital part of any video analytics system, which could aid in complex applications such as object monitoring and management. Traditional video analytics systems work on shallow networks and are unable to harness the power of distributed processing for training and inference. We propose a cloud-based video analytics system based on an optimally tuned convolutional neural network to classify objects from video streams. The tuning of convolutional neural network is empowered by in-memory distributed computing. The object classification is performed by comparing the target object with the prestored trained patterns, generating a set of matching scores. The matching scores greater than an empirically determined threshold reveal the classification of the target object. The proposed system proved to be robust to classification errors with an accuracy and precision of 97% and 96%, respectively, and can be used as a general-purpose video analytics system.  相似文献   

由于光照、遮挡、尺度变化等原因,在真实多变场景下完成人脸追踪面临挑战。探究了基于卷积神经网络(CNN)的人脸追踪,将基本的卷积神经网络改进为孪生神经网络,在OTB数据集上采用端到端的方式,以成对图像区域作为输入,输出两者距离,通过距离评估图像区域相似性;加入边框回归算法(bounding box regression)微调追踪结果。实验结果表明,改进后的神经网络优于传统的卷积神经网络,能达到更好的人脸追踪效果。  相似文献   


A large amount of research on Convolutional Neural Networks (CNN) has focused on flat Classification in the multi-class domain. In the real world, many problems are naturally expressed as hierarchical classification problems, in which the classes to be predicted are organized in a hierarchy of classes. In this paper, we propose a new architecture for hierarchical classification, introducing a stack of deep linear layers using cross-entropy loss functions combined to a center loss function. The proposed architecture can extend any neural network model and simultaneously optimizes loss functions to discover local hierarchical class relationships and a loss function to discover global information from the whole class hierarchy while penalizing class hierarchy violations. We experimentally show that our hierarchical classifier presents advantages to the traditional classification approaches finding application in computer vision tasks. The same approach can also be applied to some CNN for text classification.


目的 图像超分辨率算法在实际应用中有着较为广泛的需求和研究。然而传统基于样本的超分辨率算法均使用简单的图像梯度特征表征低分辨率图像块,这些特征难以有效地区分不同的低分辨率图像块。针对此问题,在传统基于样本超分辨率算法的基础上,提出双通道卷积神经网络学习低分辨率与高分辨率图像块相似度进行图像超分辨率的算法。方法 首先利用深度卷积神经网络学习得到有效的低分辨率与高分辨率图像块之间相似性度量,然后根据输入低分辨率图像块与高分辨率图像块字典基元的相似度重构出对应的高分辨率图像块。结果 本文算法在Set5和Set14数据集上放大3倍情况下分别取得了平均峰值信噪比(PSNR)为32.53 dB与29.17 dB的效果。结论 本文算法从低分辨率与高分辨率图像块相似度学习角度解决图像超分辨率问题,可以更好地保持结果图像中的边缘信息,减弱结果中的振铃现象。本文算法可以很好地适用于自然场景图像的超分辨率增强任务。  相似文献   

Qian  Yinlong  Dong  Jing  Wang  Wei  Tan  Tieniu 《Multimedia Tools and Applications》2018,77(15):19633-19657
Multimedia Tools and Applications - Traditional steganalysis methods usually rely on handcrafted features. However, with the rapid development of advanced steganography, manual design of complex...  相似文献   

Wang  Hanxiang  Li  Yanfen  Dang  L. Minh  Ko  Jaesung  Han  Dongil  Moon  Hyeonjoon 《Multimedia Tools and Applications》2020,79(39-40):29411-29431
Multimedia Tools and Applications - The rapid urbanization process is escalating the urban waste problem, and ineffective management has worsened the issue, leading to severe consequences to the...  相似文献   

Multimedia Tools and Applications - Eye pupil localization is one of the indispensable technologies in various computer vision applications such as virtual reality and augmented reality. In...  相似文献   

Multimedia Tools and Applications - Automatic classification of fruit freshness plays an important role in the agriculture industry. In this work, we propose an ensemble model that combines the...  相似文献   

遥感图像飞机目标分类的卷积神经网络方法   总被引:2,自引:0,他引:2       下载免费PDF全文
目的 遥感图像飞机目标分类,利用可见光遥感图像对飞机类型进行有效区分,对提供军事作战信息有重要意义。针对该问题,目前存在一些传统机器学习方法,但这些方法需人工提取特征,且难以适应真实遥感图像的复杂背景。近年来,深度卷积神经网络方法兴起,网络能自动学习图像特征且泛化能力强,在计算机视觉各领域应用广泛。但深度卷积神经网络在遥感图像飞机分类问题上应用少见。本文旨在将深度卷积神经网络应用于遥感图像飞机目标分类问题。方法 在缺乏公开数据集的情况下,收集了真实可见光遥感图像中的8种飞机数据,按大致4∶1的比例分为训练集和测试集,并对训练集进行合理扩充。然后针对遥感图像与飞机分类的特殊性,结合深度学习卷积神经网络相关理论,有的放矢地设计了一个5层卷积神经网络。结果 首先,在逐步扩充的训练集上分别训练该卷积神经网络,并分别用同一测试集进行测试,实验表明训练集扩充有利于网络训练,测试准确率从72.4%提升至97.2%。在扩充后训练集上,分别对经典传统机器学习方法、经典卷积神经网络LeNet-5和本文设计的卷积神经网络进行训练,并在同一测试集上测试,实验表明该卷积神经网络的分类准确率高于其他两种方法,最终能在测试集上达到97.2%的准确率,其余两者准确率分别为82.3%、88.7%。结论 在少见使用深度卷积神经网络的遥感图像飞机目标分类问题上,本文设计了一个5层卷积神经网络加以应用。实验结果表明,该网络能适应图像场景,自动学习特征,分类效果良好。  相似文献   

人脸图像的年龄和性别识别是人脸分析的重要任务,在真实多变场景下完成识别依然面临挑战。改进深度卷积神经网络(Convolutional Neural Network,CNN),将首层大尺寸卷积核替换为级联3[×]3卷积核;采用跨连卷积层融合中层和高层抽象特征;加入Batch Normalization(BN)层,设置较高的学习率和较小的Dropout比率;采用1[×]1卷积核与全局平均池化(Global Average Pooling)取代全连接层。实验表明,所提方法与主流的年龄性别识别方法比较具有较好的识别率,在Adience数据集上,年龄识别精度达到89.8%,性别识别精度达到93.3%。  相似文献   

目的 近年来,随着人脸识别认证技术的发展及逐渐普及,大量人脸照片存放在第三方服务器上的现象十分普遍,如何对人脸进行隐私保护这个问题变得十分突出。方法 首先对人脸图像进行预处理,然后采用Arnold变换对人脸关键部位进行分块随机置乱,并将置乱结果图输入到深度卷积神经网络中。为了解决人脸照片在分块置乱时由于本身拍照角度的原因导致的分块不均等因素,在预处理时根据人眼进行特性点定位,再据此进行对齐处理,使得预处理后的照片人眼处于同一水平线。针对人脸隐私保护及加扰置乱后图像的识别,本文提出了基于分块随机加扰的深度卷积神经网络模型。不包含附加层,该模型网络结构由4个卷积层、3个池化层、1个全连接层和1个softmax回归层组成。服务器端通过深度神经网络模型直接对置乱后人脸图像进行验证识别。结果 该算法使服务器端全程不存储原始人脸模板,实现了对原始人脸图像的有效加扰保护。实验采用该T深度卷积神经网络对处理过后的ORL人脸库进行识别,最终识别准确率达到97.62%。同时通过多组对比实验,验证了本文方法的有效性。结论 与其他文献中手工提取特征并利用决策树和随机森林进行训练识别的方法相比,本文方法减少了人工提取特征的工作量,且具有高识别率。  相似文献   

Learning effectiveness is normally analyzed by data collection through tests or questionnaires. However, instant feedback is usually not available. Learners’ facial emotion and learning motivation has a positive relationship. Therefore, the system identifying learners’ facial emotions can provide feedback that teachers can understand students’ learning situation and provide help or improve teaching strategy. Studies have found that convolutional neural networks provide a good performance in basic facial emotion recognition. Convolutional neural networks do not require manual design features like traditional machine learning, they automatically learn the necessary features of the entire image. This article improves the FaceLiveNet network with low and high accuracy in basic emotion recognition, and proposes the framework of Dense_FaceLiveNet. We use Dense_FaceLiveNet for two-phases of transfer learning. First, from the relatively simple data JAFFE and KDEF basic emotion recognition model transferring to the FER2013 basic emotion dataset and obtained an accuracy of 70.02%. Secondly, using the FER2013 basic emotion recognition model transferring to learning emotion recognition model, the test accuracy rate is as high as 91.93%, which is 12.9% higher than the accuracy rate of 79.03% without using the transfer learning model, which proves that the use of transfer learning can effectively improve the recognition accuracy of learning emotion recognition model. In addition, in order to test the generalization ability of the Learning Emotion Recognition Model, videos recorded by students from a national university in Taiwan during class learning were used as test data. The original database of learning emotions did not consider that students would have exceptions such as over eyebrows, eyes closed and hand hold the chin etc. To improve this situation, after adding the learning emotion database to the images of the exceptions mentioned above, the model was rebuilt, and the recognition accuracy rate of the model was 92.42%. By comparing the output of maps, the rebuilt model does have the characteristics of success in learning images such as eyebrows, chins, and eyes closed. Furthermore, after combining all the students’ image data with the original learning emotion database, the model was rebuilt and obtained the accuracy rate reached 84.59%. The result proves that the Learning Emotion Recognition Model can achieve high recognition accuracy by processing the unlearned image through transfer learning. The main contribution is to design two-phase transfer learning for establishing the learning emotion recognition model and overcome the problem for small amounts of learning emotion data. Our experiment results have shown the performance improvement of two-phase transfer learning.  相似文献   

为解决训练样本不足的问题,提出一种基于卷积神经网络和迁移学习的X光胸片肺结节检测方法。基于Keras深度学习框架,对比分析3种预训练卷积神经网络模型的分类性能,在此基础上进一步探究迁移学习的有效性。在公开的JSRT数据集上进行验证,提出方法获得了93.75%的准确度、94.36%的敏感度、92.74%的特异度以及98.20%的AUC值。与已有的其它研究进行对比,实现了最高的敏感度和较低的假阳性率,验证了迁移学习的有效性和所提算法的可行性。  相似文献   

Lu  Yao  Lu  Guangming  Li  Jinxing  Zhang  Zheng  Xu  Yuanrong 《Neural computing & applications》2021,33(14):8635-8648
Neural Computing and Applications - Recently, the group convolutions are widely used in mobile convolutional neural networks (CNNs) to improve the model’s efficiency. However, the training...  相似文献   

Wang  Xiaowei  Cheng  Maowei  Wang  Yefu  Liu  Shaohui  Tian  Zhihong  Jiang  Feng  Zhang  Hongjun 《Multimedia Tools and Applications》2020,79(23-24):15813-15827

In recent years, AI(Artificial Intelligence) has achieved great development in modern society. More and more modern technologies are applied in surveillance and monitoring. Healthcare monitoring is growing ubiquitous in modern wearable devices, such as a smart watch, electrocardiogram (ECG) necklace, smart band. Many sensors are attached to these smart devices to record and monitor physiological signals caused by activities, and then propagated those recorded electrical data to be further processed to give health diagnosis, disease prevention or making a distress call automatically. Obstructive sleep apnea (OSA) is a sleep disorder with a high occurrence in adult people and observed as an autonomous risk factor for circulatory problems such as ischemic heart attacks and stroke. Numerous traditional neural network based methods have been developed to detect OSA, where these methods however could not provide the intended result because they rely on shallow network. In this paper, we propose an effective OSA detection based on Convolutional neural network. Our method first extracts features from Apnea-Electrocardiogram (ECG) recordings using RR-intervals (time interval from one R-wave to the next R-wave in an ECG signal) and then CNN model having three convolution layers and three fully connected layers is trained with extracted features and applied for OSA detection. The first two convolution layers are followed by batch normalization and pooling layer, and softmax is connected to the last fully connected layer to give final decision. Experimental results on extracted feature of Apnea-ECG signal reveal that our model have better results in terms of performance measure sensitivity, specificity and accuracy. It is expected that the related technology can be applied into smart sensors, especially wearable devices.



Skin Cancer accounts for one-third of all diagnosed cancers worldwide. The prevalence of skin cancers have been rising over the past decades. In recent years, use of dermoscopy has enhanced the diagnostic capability of skin cancer. The accurate diagnosis of skin cancer is challenging for dermatologists as multiple skin cancer types may appear similar in appearance. The dermatologists have an average accuracy of 62% to 80% in skin cancer diagnosis. The research community has been made significant progress in developing automated tools to assist dermatologists in decision making. In this work, we propose an automated computer-aided diagnosis system for multi-class skin (MCS) cancer classification with an exceptionally high accuracy. The proposed method outperformed both expert dermatologists and contemporary deep learning methods for MCS cancer classification. We performed fine-tuning over seven classes of HAM10000 dataset and conducted a comparative study to analyse the performance of five pre-trained convolutional neural networks (CNNs) and four ensemble models. The maximum accuracy of 93.20% for individual model amongst the set of models whereas maximum accuracy of 92.83% for ensemble model is reported in this paper. We propose use of ResNeXt101 for the MCS cancer classification owing to its optimized architecture and ability to gain higher accuracy.


Multimedia Tools and Applications - Image retargeting is the task of making images capable of being displayed on screens with different sizes. This work should be done so that high-level visual...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号