首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The huge computational requirements and memory footprint limit the practical deployment of super resolution (SR) models. Knowledge distillation (KD) allows student networks to obtain performance improvement by learning from over-parameterized teacher networks. Previous work has attempted to solve SR distillation problem by using feature-based distillation, which ignores the supervisory role of the teacher module itself. In this paper, we introduce a cross knowledge distillation framework to compress and accelerate SR models. Specifically, we propose to obtain supervision by cascading the student into the teacher network for directly utilizing teacher’s well-trained parameters. This not only reduces the difficulty of optimization for students but also avoids designing alignment with obscure feature textures between two networks. To the best of our knowledge, we are the first work to explore the cross distillation paradigm on the SR tasks. Experiments on typical SR networks have shown the superiority of our method in generated images, PSNR and SSIM.  相似文献   

2.
In the field of security, faces are usually blurry, occluded, diverse pose and small in the image captured by an outdoor surveillance camera, which is affected by the external environment such as the camera pose and range, weather conditions, etc. It can be described as a problem of hard face detection in natural images. To solve this problem, we propose a deep convolutional neural network named feature hierarchy encoder–decoder network (FHEDN). It is motivated by two observations from contextual semantic information and the mechanism of multi-scale face detection. The proposed network is a scale-variant style architecture and single stage, which are composed of encoder and decoder subnetworks. Based on the assumption that contextual semantic information around face being auxiliary to detect faces, we introduce a residual mechanism to fuse context prior-based information into face feature and formulate the learning chain to train each encoder–decoder pair. In addition, we discuss some important factors in implement details such as the distribution of training dataset, the scale of feature hierarchy, and anchor box size, etc. They have some impact on the detection performance of the final network. Compared with some state-of-the-art algorithms, our method achieves promising performance on the popular benchmarks including AFW, PASCAL FACE, FDDB, and WIDER FACE. Consequently, the proposed approach can be efficiently implemented and routinely applied to detect faces with severe occlusion and arbitrary pose variations in unconstrained scenes. Our code and results are available on https://github.com/zzxcoder/EvaluationFHEDN.  相似文献   

3.
Although the deep CNN-based super-resolution methods have achieved outstanding performance, their memory cost and computational complexity severely limit their practical employment. Knowledge distillation (KD), which can efficiently transfer knowledge from a cumbersome network (teacher) to a compact network (student), has demonstrated its advantages in some computer vision applications. The representation of knowledge is vital for knowledge transferring and student learning, which is generally defined in hand-crafted manners or uses the intermediate features directly. In this paper, we propose a model-agnostic meta knowledge distillation method under the teacher–student architecture for the single image super-resolution task. It provides a more flexible and accurate way to help teachers transmit knowledge in accordance with the abilities of students via knowledge representation networks (KRNets) with learnable parameters. Specifically, the texture-aware dynamic kernels are generated from local information to decompose the distillation problem into texture-wise supervision for further promoting the recovery quality of high-frequency details. In addition, the KRNets are optimized in a meta-learning manner to ensure the knowledge transferring and the student learning are beneficial to improving the reconstructed quality of the student. Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation-related distillation methods and can help super-resolution algorithms achieve better reconstruction quality without introducing any extra inference complexity.  相似文献   

4.
5.
Deep network has become a new favorite for person re-identification (Re-ID), whose research focus is how to effectively extract the discriminative feature representation for pedestrians. In the paper, we propose a novel Re-ID network named as improved ReIDNet (iReIDNet), which can effectively extract the local and global multi-granular feature representations of pedestrians by a well-designed spatial feature transform and coordinate attention (SFTCA) mechanism together with improved global pooling (IGP) method. SFTCA utilizes channel adaptability and spatial location to infer a 2D attention map and can help iReIDNet to focus on the salient information contained in pedestrian images. IGP makes iReIDNet capture more effectively the global information of the whole human body. Besides, to boost the recognition accuracy, we develop a weighted joint loss to guide the training of iReIDNet. Comprehensive experiments demonstrate the availability and superiority of iReIDNet over other Re-ID methods. The code is available at https://github.com/XuRuyu66/ iReIDNet.  相似文献   

6.
The existing deraining methods based on convolutional neural networks (CNNs) have made great success, but some remaining rain streaks can degrade images drastically. In this work, we proposed an end-to-end multi-scale context information and attention network, called MSCIANet. The proposed network consists of multi-scale feature extraction (MSFE) and multi-receptive fields feature extraction (MRFFE). Firstly, the MSFE can pick up features of rain streaks in different scales and propagate deep features of the two layers across stages by skip connections. Secondly, the MRFFE can refine details of the background by attention mechanism and the depthwise separable convolution of different receptive fields with different scales. Finally, the fusion of these outputs of two subnetworks can reconstruct the clean background image. Extensive experimental results have shown that the proposed network achieves a good effect on the deraining task on synthetic and real-world datasets. The demo can be available at https://github.com/CoderLi365/MSCIANet.  相似文献   

7.
In the task of skeleton-based action recognition, CNN-based methods represent the skeleton data as a pseudo image for processing. However, it still remains as a critical issue of how to construct the pseudo image to model the spatial dependencies of the skeletal data. To address this issue, we propose a novel convolutional neural network with adaptive inferential framework (AIF-CNN) to exploit the dependencies among the skeleton joints. We particularly investigate several initialization strategies to make the AIF effective with each strategy introducing the different prior knowledge. Extensive experiments on the dataset of NTU RGB+D and Kinetics-Skeleton demonstrate that the performance is improved significantly by integrating the different prior information. The source code is available at: https://github.com/hhe-distance/AIF-CNN.  相似文献   

8.
Recently, deep learning-based methods have reached an excellent performance on License Plate (LP) detection and recognition tasks. However, it is still challenging to build a robust model for Chinese LPs since there are not enough large and representative datasets. In this work, we propose a new dataset named Chinese Road Plate Dataset (CRPD) that contains multi-objective Chinese LP images as a supplement to the existing public benchmarks. The images are mainly captured with electronic monitoring systems with detailed annotations. To our knowledge, CRPD is the largest public multi-objective Chinese LP dataset with annotations of vertices. With CRPD, a unified detection and recognition network with high efficiency is presented as the baseline. The network is end-to-end trainable with totally real-time inference efficiency (30 fps with 640 p). The experiments on several public benchmarks demonstrate that our method has reached competitive performance. The code and dataset will be publicly available at https://github.com/yxgong0/CRPD.  相似文献   

9.
Aerators are essential and crucial auxiliary devices in intensive culture, especially in industrial culture in China. In this paper, we propose a real-time expert system for anomaly detection of aerators based on computer vision technology and existing surveillance cameras. The expert system includes two modules, i.e., object region detection and working state detection. First, we present a small object region detection method based on the region proposal idea. Moreover, we propose a novel algorithm called reference frame Kanade-Lucas-Tomasi (RF-KLT) algorithm for motion feature extraction in fixed regions. Then, we describe a dimension reduction method of time series for establishing a feature dataset with obvious boundaries between classes. Finally, we use machine learning algorithms to build the feature classifier. The proposed expert system can realize real-time, robust and cost-free anomaly detection of aerators in both the actual video dataset and the augmented video dataset. Demo is available at https://youtu.be/xThHRwu_cnI.  相似文献   

10.
Images with visual pleasing bokeh effect are often unattainable for mobile cameras with compact optics and tiny sensors. To balance the aesthetic requirements on photo quality and expensive high-end SLR cameras, synthetic bokeh effect rendering has emerged as an attractive machine learning topic for engineering applications on imaging systems. However, most of bokeh rendering models either heavily relied on prior knowledge such as scene depth or were topic-irrelevant data-driven networks without task-specific knowledge, which restricted models’ training efficiency and testing accuracy. Since bokeh is closely related to a phenomenon called ”circle of confusion”, therefore, in this paper, following the principle of bokeh generation, a novel self-supervised multi-scale pyramid fusion network has been proposed for bokeh rendering. During the pyramid fusion process, structure consistencies are employed to emphasize the importance of respective bokeh components. Task-specific knowledge which mimics the ”circle of confusion” phenomenon through disk blur convolutions is utilized as self-supervised information for network training. The proposed network has been evaluated and compared with several state-of-the-art methods on a public large-scale bokeh dataset- the ”EBB!” Dataset. The experiment performance demonstrates that the proposed network has much better processing efficiency and can achieve better realistic bokeh effect with much less parameters size and running time. Related source codes and pre-trained models of the proposed model will be available soon on https://github.com/zfw-cv/MPFNet.  相似文献   

11.
Semantic segmentation aims to map each pixel of an image into its corresponding semantic label. Most existing methods either mainly concentrate on high-level features or simple combination of low-level and high-level features from backbone convolutional networks, which may weaken or even ignore the compensation between different levels. To effectively take advantages from both shallow (textural) and deep (semantic) features, this paper proposes a novel plug-and-play module, namely feature enhancement module (FEM). The proposed FEM first uses an information extractor to extract the desired details or semantics from different stages, and then enhances target features by taking in the extracted message. Two types of FEM, i.e., detail FEM and semantic FEM, can be customized. Concretely, the former type strengthens textural information to protect key but tiny/low-contrast details from suppression/removal, while the other one highlights structural information to boost segmentation performance. By equipping a given backbone network with FEMs, there might contain two information flows, i.e., detail flow and semantic flow. Extensive experiments on the Cityscapes, ADE20K and PASCAL Context datasets are conducted to validate the effectiveness of our design. The code has been released at https://github.com/SuperZ-Liu/FENet.  相似文献   

12.
To overcome the barrier of storage and computation, the hashing technique has been widely used for nearest neighbor search in multimedia retrieval applications recently. Particularly, cross-modal retrieval that searches across different modalities becomes an active but challenging problem. Although numerous of cross-modal hashing algorithms are proposed to yield compact binary codes, exhaustive search is impractical for large-scale datasets, and Hamming distance computation suffers inaccurate results. In this paper, we propose a novel search method that utilizes a probability-based index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme employs a few binary bits from the hash code as the index code. We construct an inverted index table based on the index codes, and train a neural network for ranking and indexing to improve the retrieval accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated and compared with several state-of-the-art cross-modal hashing methods. Results show the proposed method effectively boosts the performance on search accuracy, computation cost, and memory consumption in these datasets and hashing methods. The source code is available on https://github.com/msarawut/HCI.  相似文献   

13.
Recently, there has been a trend in tracking to use more refined segmentation mask instead of coarse bounding box to represent the target object. Some trackers proposed segmentation branches based on the tracking framework and maintain real-time speed. However, those trackers use a simple FCNs structure and lack of the edge information modeling. This makes performance quite unsatisfactory. In this paper, we propose an edge-aware segmentation network, which uses the complementarity between target information and edge information to provide a more refined representation of the target. Firstly, We use the high-level features of the tracking backbone network and the correlation features of the classification branch of the tracking framework to fuse, and use the target edge and target segmentation mask for simultaneous supervision to obtain an optimized high-level feature with rough edge information and target information. Secondly, we use the optimized high-level features to guide the low-level features of the tracking backbone network to generate more refined edge features. Finally, we use the refined edge features to fuse with the target features of each layer to generate the final mask. Our approach has achieved leading performance on recent pixel-wise object tracking benchmark VOT2020 and segmentation datasets DAVIS2016 and DAVIS2017 while running on 47 fps. Code is available at https://github.com/TJUMMG/EATtracker.  相似文献   

14.
Keypoint-based object detection achieves better performance without positioning calculations and extensive prediction. However, they have heavy backbone, and high-resolution is restored using upsampling that obtain unreliable features. We propose a self-constrained parallelism keypoint-based lightweight object detection network (SCPNet), which speeds inference, drops parameters, widens receptive fields, and makes prediction accurate. Specifically, the parallel multi-scale fusion module (PMFM) with parallel shuffle blocks (PSB) adopts parallel structure to obtain reliable features and reduce depth, adopts repeated multi-scale fusion to avoid too many parallel branches. The self-constrained detection module (SCDM) has a two-branch structure, with one branch predicting corners, and employing entad offset to match high-quality corner pairs, and the other branch predicting center keypoints. The distances between the paired corners’ geometric centers and the center keypoints are used for self-constrained detection. On MS-COCO 2017 and PASCAL VOC, SCPNet’s results are competitive with the state-of-the-art lightweight object detection. https://github.com/mengdie-wang/SCPNet.git.  相似文献   

15.
To increase the richness of the extracted text modality feature information and deeply explore the semantic similarity between the modalities. In this paper, we propose a novel method, named adaptive weight multi-channel center similar deep hashing (AMCDH). The algorithm first utilizes three channels with different configurations to extract feature information from the text modality; and then adds them according to the learned weight ratio to increase the richness of the information. We also introduce the Jaccard coefficient to measure the semantic similarity level between modalities from 0 to 1, and utilize it as the penalty coefficient of the cross-entropy loss function to increase its role in backpropagation. Besides, we propose a method of constructing center similarity, which makes the hash codes of similar data pairs close to the same center point, and dissimilar data pairs are scattered at different center points to generate high-quality hash codes. Extensive experimental evaluations on four benchmark datasets show that the performance of our proposed model AMCDH is significantly better than other competing baselines. The code can be obtained from https://github.com/DaveLiu6/AMCDH.git.  相似文献   

16.
This paper presents a novel No-Reference Video Quality Assessment (NR-VQA) model that utilizes proposed 3D steerable wavelet transform-based Natural Video Statistics (NVS) features as well as human perceptual features. Additionally, we proposed a novel two-stage regression scheme that significantly improves the overall performance of quality estimation. In the first stage, transform-based NVS and human perceptual features are separately passed through the proposed hybrid regression scheme: Support Vector Regression (SVR) followed by Polynomial curve fitting. The two visual quality scores predicted from the first stage are then used as features for the similar second stage. This predicts the final quality scores of distorted videos by achieving score level fusion. Extensive experiments were conducted using five authentic and four synthetic distortion databases. Experimental results demonstrate that the proposed method outperforms other published state-of-the-art benchmark methods on synthetic distortion databases and is among the top performers on authentic distortion databases. The source code is available at https://github.com/anishVNIT/two-stage-vqa.  相似文献   

17.
目前,人脸美丽预测存在数据样本少、评价指标不明确和人脸外观变化大等问题。多任务迁移学习能有效利用相关任务和源域任务额外的有用信息,知识蒸馏可将教师模型的部分知识蒸馏到学生模型,降低模型复杂性和大小。本文将多任务迁移学习与知识蒸馏相结合,用于人脸美丽预测,以大规模亚洲人脸美丽数据库(Large Scale Asia Facial Beauty Database, LSAFBD)中人脸美丽预测为主任务,以SCUT-FBP5500数据库中性别识别为辅任务。首先,构建多输入多任务的人脸美丽教师模型和学生模型;其次,训练多任务教师模型并计算其软目标;最后,结合多任务教师模型的软目标和学生模型的软、硬目标进行知识蒸馏。实验结果表明,多任务教师模型在人脸美丽预测任务中取得6823%的准确率,其结构较复杂,参数量达14793K;而多任务学生模型通过知识蒸馏后分类准确率为6739%,但其结构简单、参数量仅1366K。本方法多任务教师模型分类准确率比其他方法高,多任务学生模型分类准确率虽然略低一点,但其模型更简单、参数量更少,更有利于用更轻量的网络模型进行人脸美丽预测。   相似文献   

18.
Recently, vision transformer has gained a breakthrough in image recognition. Its self-attention mechanism (MSA) can extract discriminative tokens information from different patches to improve image classification accuracy. However, the classification token in its deep layer ignore the local features between layers. In addition, the patch embedding layer feeds fixed-size patches into the network, which inevitably introduces additional image noise. Therefore, we propose a hierarchical attention vision transformer (HAVT) based on the transformer framework. We present a data augmentation method for attention cropping to crop and drop image noise and force the network to learn key features. Second, the hierarchical attention selection (HAS) module is proposed, which improves the network's ability to learn discriminative tokens between layers by filtering and fusing tokens between layers. Experimental results show that the proposed HAVT outperforms state-of-the-art approaches and significantly improves the accuracy to 91.8% and 91.0% on CUB-200–2011 and Stanford Dogs, respectively. We have released our source code on GitHub https://github.com/OhJackHu/HAVT.git.  相似文献   

19.
Although convolutional neural networks (CNNs) have recently shown considerable progress in motion deblurring, most existing methods that adopt multi-scale input schemes are still challenging in accurately restoring the heavily-blurred regions in blurry images. Several recent methods aim to further improve the deblurring effect using larger and more complex models, but these methods inevitably result in huge computing costs. To address the performance-complexity trade-off, we propose a multi-stage feature-fusion dense network (MFFDNet) for motion deblurring. Each sub-network of our MFFDNet has the similar structure and the same scale of input. Meanwhile, we propose a feature-fusion dense connection structure to reuse the extracted features, thereby improving the deblurring effect. Moreover, instead of using the multi-scale loss function, we only calculate the loss function at the output of the last stage since the input scale of our sub-network is invariant. Experimental results show that MFFDNet maintains a relatively small computing cost while outperforming state-of-the-art motion-deblurring methods. The source code is publicly available at: https://github.com/CaiGuoHS/MFFDNet_release.  相似文献   

20.
Deep neural networks have achieved great success in a wide range of machine learning tasks due to their excellent ability to learn rich semantic features from high-dimensional data. Deeper networks have been successful in the field of image quality assessment to improve the performance of image quality assessment models. The success of deep neural networks majorly comes along with both big models with hundreds of millions of parameters and the availability of numerous annotated datasets. However, the lack of large-scale labeled data leads to the problems of over-fitting and poor generalization of deep learning models. Besides, these models are huge in size, demanding heavy computation power and failing to be deployed on edge devices. To deal with the challenge, we propose an image quality assessment based on self-supervised learning and knowledge distillation. First, the self-supervised learning of soft target prediction given by the teacher network is carried out, and then the student network is jointly trained to use soft target and label on knowledge distillation. Experiments on five benchmark databases show that the proposed method is superior to the teacher network and even outperform the state-of-the-art strategies. Furthermore, the scale of our model is much smaller than the teacher model and can be deployed in edge devices for smooth inference.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号