Multimedia Tools and Applications - Today, images editing software has greatly evolved, thanks to them that the semantic manipulation of images has become easier. On the other hand, the... 相似文献
Multimedia Tools and Applications - In recent years, the emergence of fully convolutional neural networks (FCNs) has delivered significant success in the field of saliency detection. Although the... 相似文献
Multimedia Tools and Applications - Nowadays, convolutional neural network (CNN) based steganalysis methods achieved great performance. While those methods are also facing security problems. In... 相似文献
Despite the tremendous achievements of deep convolutional neural networks (CNNs) in many computer vision tasks, understanding how they actually work remains a significant challenge. In this paper, we propose a novel two-step understanding method, namely Salient Relevance (SR) map, which aims to shed light on how deep CNNs recognize images and learn features from areas, referred to as attention areas, therein. Our proposed method starts out with a layer-wise relevance propagation (LRP) step which estimates a pixel-wise relevance map over the input image. Following, we construct a context-aware saliency map, SR map, from the LRP-generated map which predicts areas close to the foci of attention instead of isolated pixels that LRP reveals. In human visual system, information of regions is more important than of pixels in recognition. Consequently, our proposed approach closely simulates human recognition. Experimental results using the ILSVRC2012 validation dataset in conjunction with two well-established deep CNN models, AlexNet and VGG-16, clearly demonstrate that our proposed approach concisely identifies not only key pixels but also attention areas that contribute to the underlying neural network's comprehension of the given images. As such, our proposed SR map constitutes a convenient visual interface which unveils the visual attention of the network and reveals which type of objects the model has learned to recognize after training. The source code is available at https://github.com/Hey1Li/Salient-Relevance-Propagation. 相似文献
Depth-image-based rendering (DIBR) is widely used in 3DTV, free-viewpoint video, and interactive 3D graphics applications. Typically, synthetic images generated by DIBR-based systems incorporate various distortions, particularly geometric distortions induced by object dis-occlusion. Ensuring the quality of synthetic images is critical to maintaining adequate system service. However, traditional 2D image quality metrics are ineffective for evaluating synthetic images as they are not sensitive to geometric distortion. In this paper, we propose a novel no-reference image quality assessment method for synthetic images based on convolutional neural networks, introducing local image saliency as prediction weights. Due to the lack of existing training data, we construct a new DIBR synthetic image dataset as part of our contribution. Experiments were conducted on both the public benchmark IRCCyN/IVC DIBR image dataset and our own dataset. Results demonstrate that our proposed metric outperforms traditional 2D image quality metrics and state-of-the-art DIBR-related metrics.
Surface defect detection is an important way to improve the production quality of voltage-dependent resistors (VDRs). To improve the accuracy and efficiency of VDR surface quality detection, an end-to-end surface quality detection method based on deep convolutional neural networks (CNNs) was proposed. The method includes four stages: data preparation, convolution neural network design, CNN training, and testing. First, images of VDRs were acquired from three perspectives, i.e., the front, back, and side, and then training, validation and testing sets were obtained. Second, the proposed CNN models for VDR surface defect detection were constructed. Third, during the training stage, the images with class labels from the established training sets were input to the proposed network for training and validation. Finally, in the testing stage, test images from a total of 408 samples of two VDR models were used to test the trained network. The sensitivity, specificity, accuracy, precision and F measure of the proposed algorithm were compared with those of state-of-the-art methods, and the experimental results showed that the proposed method has a high recognition speed and accuracy and meets the requirements of online real-time detection.
In recent years, AI(Artificial Intelligence) has achieved great development in modern society. More and more modern technologies are applied in surveillance and monitoring. Healthcare monitoring is growing ubiquitous in modern wearable devices, such as a smart watch, electrocardiogram (ECG) necklace, smart band. Many sensors are attached to these smart devices to record and monitor physiological signals caused by activities, and then propagated those recorded electrical data to be further processed to give health diagnosis, disease prevention or making a distress call automatically. Obstructive sleep apnea (OSA) is a sleep disorder with a high occurrence in adult people and observed as an autonomous risk factor for circulatory problems such as ischemic heart attacks and stroke. Numerous traditional neural network based methods have been developed to detect OSA, where these methods however could not provide the intended result because they rely on shallow network. In this paper, we propose an effective OSA detection based on Convolutional neural network. Our method first extracts features from Apnea-Electrocardiogram (ECG) recordings using RR-intervals (time interval from one R-wave to the next R-wave in an ECG signal) and then CNN model having three convolution layers and three fully connected layers is trained with extracted features and applied for OSA detection. The first two convolution layers are followed by batch normalization and pooling layer, and softmax is connected to the last fully connected layer to give final decision. Experimental results on extracted feature of Apnea-ECG signal reveal that our model have better results in terms of performance measure sensitivity, specificity and accuracy. It is expected that the related technology can be applied into smart sensors, especially wearable devices.
Multimedia Tools and Applications - Traditional steganalysis methods usually rely on handcrafted features. However, with the rapid development of advanced steganography, manual design of complex... 相似文献
ColorCheckers are reference standards that professional photographers and filmmakers use to ensure predictable results under every lighting condition. The objective of this work is to propose a new fast and robust method for automatic ColorChecker detection. The process is divided into two steps: (1) ColorCheckers localization and (2) ColorChecker patches recognition. For the ColorChecker localization, we trained a detection convolutional neural network using synthetic images. The synthetic images are created with the 3D models of the ColorChecker and different background images. The output of the neural networks are the bounding box of each possible ColorChecker candidates in the input image. Each bounding box defines a cropped image which is evaluated by a recognition system, and each image is canonized with regards to color and dimensions. Subsequently, all possible color patches are extracted and grouped with respect to the center's distance. Each group is evaluated as a candidate for a ColorChecker part, and its position in the scene is estimated. Finally, a cost function is applied to evaluate the accuracy of the estimation. The method is tested using real and synthetic images. The proposed method is fast, robust to overlaps and invariant to affine projections. The algorithm also performs well in case of multiple ColorCheckers detection. 相似文献
Hyperspectral image analysis has been gaining research attention thanks to the current advances in sensor design which have made acquiring such imagery much more affordable. Although there exist various approaches for segmenting hyperspectral images, deep learning has become the mainstream. However, such large-capacity learners are characterized by significant memory footprints. This is a serious obstacle in employing deep neural networks on board a satellite for Earth observation. In this paper, we introduce resource-frugal quantized convolutional neural networks, and greatly reduce their size without adversely affecting the classification capability. Our experiments performed over two hyperspectral benchmarks showed that the quantization process can be seamlessly applied during the training, and it leads to much smaller and still well-generalizing deep models. 相似文献
ABSTRACTUsing remote sensing techniques to detect trees at the individual level is crucial for forest management while finding the treetop is an initial and important first step. However, due to the large variations of tree size and shape, traditional unsupervised treetop detectors need to be carefully designed with heuristic knowledge making an efficient and versatile treetop detection still challenging. Currently, the deep convolutional neural networks (CNNs) have shown powerful capabilities to classify and segment images, but the required volume of labelled data for the training impedes their applications. Considering the strengths and limitations of the unsupervised and deep learning methods, we propose a framework using the automatically generated pseudo labels from unsupervised treetop detectors to train the CNNs, which saves the manual labelling efforts. In this study, we use multi-view satellite imagery derived digital surface model (DSM) and multispectral orthophoto as research data and train the fully convolutional networks (FCN) with pseudo labels separately generated from two unsupervised treetop detectors: top-hat by reconstruction (THR) operation and local maxima filter with a fixed window (FFW). The experiments show the FCN detectors trained by pseudo labels, have much better detection accuracies than the unsupervised detectors (6.5% for THR and 11.1% for FFW), especially in the densely forested area (more than 20% of improvement). In addition, our comparative experiments when using manually labelled samples show the proposed treetop detection framework has the potential to significantly reduce the need for training samples while keep a comparable performance. 相似文献
Multimedia Tools and Applications - In this paper, a novel multifocus image fusion algorithm based on the convolutional neural network (CNN) in the discrete wavelet transform (DWT) domain is... 相似文献
Spammers often embed text into images in order to avoid filtering by text-based spam filters, which result in a large number of advertisement spam images. Garbage image recognition has become one of the hotspots in the field of Internet spam filtering research. Its goal is to solve the problem that traditional spam information filtering methods encounter a sharp performance decline or even failure when filtering spam image information. Based on the clustering algorithm, this paper proposes a method to expand the data samples, which greatly improves the number of high-quality training samples and meets the needs of model training. Then, we train a convolutional neural networks using the enlarged data samples to recognize the SPAM in real time. The experimental results show that the accuracy of the model is increased by more than 14% after using the method of data augmentation. The accuracy of the model can be improved by 6% compared with other methods of data augmentation. Combined with convolutional neural networks and the proposed method of data augmentation, the accuracy of our SPAM filtering model is 7–11% higher than that of the traditional method. 相似文献
Object classification is a vital part of any video analytics system, which could aid in complex applications such as object monitoring and management. Traditional video analytics systems work on shallow networks and are unable to harness the power of distributed processing for training and inference. We propose a cloud-based video analytics system based on an optimally tuned convolutional neural network to classify objects from video streams. The tuning of convolutional neural network is empowered by in-memory distributed computing. The object classification is performed by comparing the target object with the prestored trained patterns, generating a set of matching scores. The matching scores greater than an empirically determined threshold reveal the classification of the target object. The proposed system proved to be robust to classification errors with an accuracy and precision of 97% and 96%, respectively, and can be used as a general-purpose video analytics system. 相似文献