Since the limitation of optical sensors, it’s often hard to obtain an image with the ideal resolution. Image super-resolution (SR) technology can generate a high-resolution image from the corresponding low-resolution image. Recently, deep learning (DL) based SR methods draw much attention due to their satisfying reconstruction results. However, these methods often neglect the diversity of image patches. Therefore, the reconstruction effect is limited. To fully exploit the texture variability across different image patches, we propose a universal, flexible, and effective framework. The proposed framework can be adopted to any DL based methods. It can significantly improve the SR accuracy while maintaining the running time. In the proposed framework, K-means is employed to cluster image patches into different categories. Multiple CNN branches are designed for these different categories to reconstruct the SR image. Each branch is weighted in accordance with the Euclidean distance to the cluster centers. Experimental results demonstrate that by applying the proposed framework, performance of the DL based SR method can be significantly improved.
相似文献In recent years, CNN has been used for single image super-resolution (SR) with its success of in the field of computer vision. However, in the recovery process, there are always some high-frequency components that cant be recovered from low-resolution images to high-resolution ones by using existing CNN-based methods. In this paper, we propose an image super-resolution method based on CNN, which uses a two-level residual learning network to learn residual components, i.e., high-frequency components. We use the Super-Resolution Convolutional Neural Network (SRCNN) as the network structure in each level so that our proposed method can achieve the high-resolution images with high-frequency components that cant be obtained by the existing methods. In addition, we analyze the proposed method with considering three kinds of residual learning networks, which are different in the structure and superimposed layers of the residual learning network. In the experiments, we investigate the performance of the proposed method with various residual learning networks and the effect of image super-resolution to image captioning task.
相似文献Learning cascade regression has been shown an effective strategy to further enhance the perceptual quality of resulted high-resolution (HR) images. However, previous cascade regression-based SR methods have two obvious weaknesses: (1)edge structures cannot be preserved well when applying texture features to represent low-resolution (LR) images, and (2)the local manifold structures spanned by the LR-HR feature spaces cannot be revealed by the learned local linear mappings. To alleviate the aforementioned problems, a novel example regression-based super-resolution (SR) approach called learning graph-constrained cascade regressors (LGCCR) is presented, which learns a group of multi-round residual regressors in a unique way. Specifically, we improve the edge preservation capability by synthesizing the whole HR image rather than local image patches, which facilitates to extract the edge features to represent LR images. Moreover, we utilize a graph-constrained regression model to build the local linear regressors, where each local linear regressor responds to an anchored atom in the learned over-complete dictionary. Both quantitative and qualitative quality evaluations on seven benchmark databases indicate the superiority of the proposed LGCCR-based SR approach in comparing with other state-of-the-art SR predecessors.
相似文献Iris recognition in less constrained environments is challenging as the images taken therein contain severe noisy factors. How to represent iris texture for accurate and robust recognition in such environments is still an open issue. Towards addressing this problem, this paper proposes a novel convolutional network (ConvNet) for effective iris texture representation. The key of the proposed ConvNet is an interaction block which computes an affinity matrix among all pairwise high-level features for learning second-order relationships. The interaction block can model relationships of neighboring and long-range features, and is architecture-agnostic, suitable for different deep network architectures. To further improve the robustness of iris representation, we encode the affinity matrix based on ordinal measure. In addition, we develop a mask network corresponding to the feature learning network, which can exclude the noisy factors during iris matching. We perform thorough ablation studies to evaluate the effectiveness of the proposed networks. Experiments have shown that the proposed networks outperform state-of-the-art (SOTA) methods, achieving a false reject rate (FRR) of 5.49%, 10.41% and 5.80% at 10??6 false accept rate (FAR) on ND-IRIS-0405, CASIA-IrisV4-Thousand and CASIA-IrisV4-Lamp respectively. And the improvements in equal error rates (EERs) are 0.41%, 0.72% and 0.40%, respectively, as compared with the SOTA methods.
相似文献With the advance of deep learning, deep learning based action recognition is an important research topic in computer vision. The skeleton sequence is often encoded into an image to better use Convolutional Neural Networks (ConvNets) such as Joint Trajectory Maps (JTM). However, this encoding method cannot effectively capture long temporal information. In order to solve this problem, This paper presents an effective method to encode spatial-temporal information into color texture images from skeleton sequences, referred to as Temporal Pyramid Skeleton Motion Maps (TPSMMs), and Convolutional Neural Networks (ConvNets) are applied to capture the discriminative features from TPSMMs for human action recognition. The TPSMMs not only capture short temporal information, but also embed the long dynamic information over the period of an action. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture and SYSU-3D datasets.
相似文献In this paper we propose a distributed locality sensitive hashing based framework for image super resolution exploiting computational and storage efficiency of cloud. Now days huge multimedia data is available on the cloud which can be utilized using store anywhere and excess anywhere model. It may be noted that super resolution is required for consumer electronics display devices due to various reasons. The propose framework exploits the image correlation for image super resolution using locality sensitive hashing (LSH) for manifold learning. In our work we have exploited the benefits of manifold learning for image super resolution, which in-turn is a highly time complex operation. The time complexity is involved due to finding the approximate nearest neighbors from trillion of image patches for locally linear embedding (LLE) operation. In our approach it is mitigated by using a distributed framework which internally uses hash tables for mapping of patches in the target image from a database of internet picture collection. The proposed framework for super resolution provides promising results in comparison to existing approaches.
相似文献Seeing through dense occlusions and reconstructing scene images is an important but challenging task. Traditional frame-based image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames. Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution. However, synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream, and the initial brightness is unknown. In this paper, we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion, which uses event streams to provide complete scene information and frames to provide color and texture information. An event stream encoder based on the spiking neural network (SNN) is proposed to encode and denoise the event stream efficiently. A comparison loss is proposed to generate clearer results. Experimental results on a large-scale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.
相似文献In machine learning, image classification accuracy generally depends on image segmentation and feature extraction methods with the extracted features and its qualities. The main focus of this paper is to determine the defected area of mangoes using image segmentation algorithm for improving the classification accuracy. The Enhanced Fuzzy based K-means clustering algorithm is designed for increasing the efficiency of segmentation. Proposed segmentation method is compared with K-means and Fuzzy C-means clustering methods. The geometric, texture and colour based features are used in the feature extraction. Process of feature selection is done by Maximally Correlated Principal Component Analysis (MCPCA). Finally, in the classification step, severe portions of the affected area are analyzed by Backpropagation Based Discriminant Classifier (BBDC). Proposed classifier is compared with BPNN and Naive Bayes classifiers. The images are classified into three classes in final output like Class A –good quality mango, Class B-average quality mango, and Class C-poor quality mango. Finally, the evaluated results of the proposed model examine various defected and healthy mango images and prove that the proposed method has the highest accuracy when compared with existing methods.
相似文献