首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Colored point cloud (PC) will inevitably encounter distortion during its acquisition, processing, coding and transmission, which may affect the visual quality of the colored PC. Therefore, it is necessary to design an effective tool for colored PC quality assessment (PCQA). In this paper, considering the mapping relationship of perception between the colored PC and its corresponding projection images, we propose a novel PCQA method based on texture and geometry projection (denoted as TGP-PCQA). The main idea of the proposed TGP-PCQA method is to obtain texture and geometry projection maps from different perspectives for evaluating the colored PC. Specifically, 4D tensor decomposition is used to obtain the combination and difference information between the reference and distorted texture projection maps for mainly characterizing texture distortion of colored PC. Meanwhile, the edge features of the geometry projection map are calculated to measure the global or local geometry distortion. All of the extracted features are combined to predict an overall quality of colored PC. In addition, this paper establishes a multi-distorted colored PC database named CPCD2.0 with compression distortions and Gaussian noise, which orients to the influence of both geometry and texture components in distortion. Experimental results on two open subjective evaluation databases (IRPC and SJTU-PCQA) and the self-built CPCD2.0 database show that the proposed TGP-PCQA method outperforms the state-of-the-art PCQA methods. We are also providing the self-built CPCD2.0 database free of charge at https://github.com/cherry0415/CPCD2.0.  相似文献   

2.
Video super-resolution aims at restoring the spatial resolution of the reference frame based on consecutive input low-resolution (LR) frames. Existing implicit alignment-based video super-resolution methods commonly utilize convolutional LSTM (ConvLSTM) to handle sequential input frames. However, vanilla ConvLSTM processes input features and hidden states independently in operations and has limited ability to handle the inter-frame temporal redundancy in low-resolution fields. In this paper, we propose a multi-stage spatio-temporal adaptive network (MS-STAN). A spatio-temporal adaptive ConvLSTM (STAC) module is proposed to handle input features in low-resolution fields. The proposed STAC module utilizes the correlation between input features and hidden states in the ConvLSTM unit and modulates the hidden states adaptively conditioned on fused spatio-temporal features. A residual stacked bidirectional (RSB) architecture is further proposed to fully exploit the processing ability of the STAC unit. The proposed STAC and RSB architecture promote the vanilla ConvLSTM’s ability to exploit the inter-frame correlations, thus improving the reconstruction quality. Furthermore, different from existing methods that only aggregate features from the temporal branch once at a specified stage of the network, the proposed network is organized in a multi-stage manner. The corresponding temporal correlation in features at different stages can be fully exploited. Experimental results on Vimeo-90K-T and UMD10 datasets show that the proposed method has comparable performance with current video super-resolution methods. The code is available at https://github.com/yhjoker/MS-STAN.  相似文献   

3.
This paper describes an ultra high definition (UHD) video dataset named DVL2021 for the perceptual study of video quality assessment (VQA). To our knowledge, DVL2021 is the first authentically distorted 4K (3840 × 2160) UHD video quality dataset. The dataset contains 206 versatile 4K UHD video sequences, which are all collected in in-the-wild scenarios. Each sequence is captured at 50 frames per second (fps), stored in raw 10-bit 4:2:0 YUV format, and has a duration of 10 s. Following the subjective evaluation method of TV image quality granted by ITU-R BT.500-13, 32 unique participants take part in the manual annotation process, whose ages are from teenage to sixties (32.7 years old on average). DVL2021 has the following merits: (1) enormous variety of video contents, (2) captured by different types of cameras, (3) complex types and multiple levels of authentic distortion, (4) broadly distributed temporal/spatial information, and (5) a wide spectrum of mean opinion scores (MOS) distribution. Furthermore, we conduct a benchmark experiment by evaluating several mainstream VQA methods on DVL2021. The baseline results are higher than 0.75 in Spearman’s rank order correlation coefficient (SROCC) metric. Our study provides a basis for the UHD VQA problem. DVL2021 is publicly available at https://github.com/GZHU-DVL/DVL2021.  相似文献   

4.
5.
The existing deraining methods based on convolutional neural networks (CNNs) have made great success, but some remaining rain streaks can degrade images drastically. In this work, we proposed an end-to-end multi-scale context information and attention network, called MSCIANet. The proposed network consists of multi-scale feature extraction (MSFE) and multi-receptive fields feature extraction (MRFFE). Firstly, the MSFE can pick up features of rain streaks in different scales and propagate deep features of the two layers across stages by skip connections. Secondly, the MRFFE can refine details of the background by attention mechanism and the depthwise separable convolution of different receptive fields with different scales. Finally, the fusion of these outputs of two subnetworks can reconstruct the clean background image. Extensive experimental results have shown that the proposed network achieves a good effect on the deraining task on synthetic and real-world datasets. The demo can be available at https://github.com/CoderLi365/MSCIANet.  相似文献   

6.
Multi-label classification with region-free labels is attracting increasing attention compared to that with region-based labels due to the time-consuming manual region-labeling process. Existing methods usually employ attention-based technology to discover the conspicuous label-related regions in a weakly-supervised manner with only image-level region-free labels, while the region covering is not precise without exploring global clues of multi-level features. To address this issue, a novel Global-guided Weakly-Supervised Learning (GWSL) method for multi-label classification is proposed. The GWSL first extracts the multi-level features to estimate their global correlation map which is further utilized to guide feature disentanglement in the proposed Feature Disentanglement and Localization (FDL) networks. Specifically, the FDL networks then adaptively combine the different correlated features and localize the fine-grained features for identifying multiple labels. The proposed method is optimized in an end-to-end manner under weakly supervision with only image-level labels. Experimental results demonstrate that the proposed method outperforms the state-of-the-arts for multi-label learning problems on several publicly available image datasets. To facilitate similar researches in the future, the codes are directly available online at https://github.com/Yong-DAI/GWSL.  相似文献   

7.
As the demand for realistic representation and its applications increases rapidly, 3D human modeling via a single RGB image has become the essential technique. Owing to the great success of deep neural networks, various learning-based approaches have been introduced for this task. However, partial occlusions still give the difficulty to accurately estimate the 3D human model. In this letter, we propose the part-attentive kinematic regressor for 3D human modeling. The key idea of the proposed method is to predict body part attentions based on each body center position and estimate parameters of the 3D human model via corresponding attentive features through the kinematic chain-based decoder in a one-stage fashion. One important advantage is that the proposed method has a good ability to yield natural shapes and poses even with severe occlusions. Experimental results on benchmark datasets show that the proposed method is effective for 3D human modeling under complicated real-world environments. The code and model are publicly available at: https://github.com/DCVL-3D/PKCN_release  相似文献   

8.
To overcome the barrier of storage and computation, the hashing technique has been widely used for nearest neighbor search in multimedia retrieval applications recently. Particularly, cross-modal retrieval that searches across different modalities becomes an active but challenging problem. Although numerous of cross-modal hashing algorithms are proposed to yield compact binary codes, exhaustive search is impractical for large-scale datasets, and Hamming distance computation suffers inaccurate results. In this paper, we propose a novel search method that utilizes a probability-based index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme employs a few binary bits from the hash code as the index code. We construct an inverted index table based on the index codes, and train a neural network for ranking and indexing to improve the retrieval accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated and compared with several state-of-the-art cross-modal hashing methods. Results show the proposed method effectively boosts the performance on search accuracy, computation cost, and memory consumption in these datasets and hashing methods. The source code is available on https://github.com/msarawut/HCI.  相似文献   

9.
Object detection across different scales is challenging as the variances of object scales. Thus, a novel detection network, Top-Down Feature Fusion Single Shot MultiBox Detector (TDFSSD), is proposed. The proposed network is based on Single Shot MultiBox Detector (SSD) using VGG-16 as backbone with a novel, simple yet efficient feature fusion module, namely, the Top-Down Feature Fusion Module. The proposed module fuses features from higher-level features, containing semantic information, to lower-level features, containing boundary information, iteratively. Extensive experiments have been conducted on PASCAL VOC2007, PASCAL VOC2012, and MS COCO datasets to demonstrate the efficiency of the proposed method. The proposed TDFSSD network is trained end to end and outperforms the state-of-the-art methods across the three datasets. The TDFSSD network achieves 81.7% and 80.1% mAPs on VOC2007 and 2012 respectively, which outperforms the reported best results of both one-stage and two-stage frameworks. In the meantime, it achieves 33.4% mAP on MS COCO test-dev, especially 17.2% average precision (AP) on small objects. Thus all the results show the efficiency of the proposed method on object detection. Code and model are available at: https://github.com/dongfengxijian/TDFSSD.  相似文献   

10.
Images with visual pleasing bokeh effect are often unattainable for mobile cameras with compact optics and tiny sensors. To balance the aesthetic requirements on photo quality and expensive high-end SLR cameras, synthetic bokeh effect rendering has emerged as an attractive machine learning topic for engineering applications on imaging systems. However, most of bokeh rendering models either heavily relied on prior knowledge such as scene depth or were topic-irrelevant data-driven networks without task-specific knowledge, which restricted models’ training efficiency and testing accuracy. Since bokeh is closely related to a phenomenon called ”circle of confusion”, therefore, in this paper, following the principle of bokeh generation, a novel self-supervised multi-scale pyramid fusion network has been proposed for bokeh rendering. During the pyramid fusion process, structure consistencies are employed to emphasize the importance of respective bokeh components. Task-specific knowledge which mimics the ”circle of confusion” phenomenon through disk blur convolutions is utilized as self-supervised information for network training. The proposed network has been evaluated and compared with several state-of-the-art methods on a public large-scale bokeh dataset- the ”EBB!” Dataset. The experiment performance demonstrates that the proposed network has much better processing efficiency and can achieve better realistic bokeh effect with much less parameters size and running time. Related source codes and pre-trained models of the proposed model will be available soon on https://github.com/zfw-cv/MPFNet.  相似文献   

11.
Images captured in weak illumination conditions could seriously degrade the image quality. Solving a series of degradation of low-light images can effectively improve the visual quality of images and the performance of high-level visual tasks. In this study, a novel Retinex-based Real-low to Real-normal Network (R2RNet) is proposed for low-light image enhancement, which includes three subnets: a Decom-Net, a Denoise-Net, and a Relight-Net. These three subnets are used for decomposing, denoising, contrast enhancement and detail preservation, respectively. Our R2RNet not only uses the spatial information of the image to improve the contrast but also uses the frequency information to preserve the details. Therefore, our model achieved more robust results for all degraded images. Unlike most previous methods that were trained on synthetic images, we collected the first Large-Scale Real-World paired low/normal-light images dataset (LSRW dataset) to satisfy the training requirements and make our model have better generalization performance in real-world scenes. Extensive experiments on publicly available datasets demonstrated that our method outperforms the existing state-of-the-art methods both quantitatively and visually. In addition, our results showed that the performance of the high-level visual task (i.e., face detection) can be effectively improved by using the enhanced results obtained by our method in low-light conditions. Our codes and the LSRW dataset are available at: https://github.com/JianghaiSCU/R2RNet.  相似文献   

12.
We develop a full-reference (FR) video quality assessment framework that integrates analysis of space–time slices (STSs) with frame-based image quality measurement (IQA) to form a high-performance video quality predictor. The approach first arranges the reference and test video sequences into a space–time slice representation. To more comprehensively characterize space–time distortions, a collection of distortion-aware maps are computed on each reference–test video pair. These reference-distorted maps are then processed using a standard image quality model, such as peak signal-to-noise ratio (PSNR) or Structural Similarity (SSIM). A simple learned pooling strategy is used to combine the multiple IQA outputs to generate a final video quality score. This leads to an algorithm called Space–TimeSlice PSNR (STS-PSNR), which we thoroughly tested on three publicly available video quality assessment databases and found it to deliver significantly elevated performance relative to state-of-the-art video quality models. Source code for STS-PSNR is freely available at: http://live.ece.utexas.edu/research/Quality/STS-PSNR_release.zip.  相似文献   

13.
Semantic segmentation aims to map each pixel of an image into its corresponding semantic label. Most existing methods either mainly concentrate on high-level features or simple combination of low-level and high-level features from backbone convolutional networks, which may weaken or even ignore the compensation between different levels. To effectively take advantages from both shallow (textural) and deep (semantic) features, this paper proposes a novel plug-and-play module, namely feature enhancement module (FEM). The proposed FEM first uses an information extractor to extract the desired details or semantics from different stages, and then enhances target features by taking in the extracted message. Two types of FEM, i.e., detail FEM and semantic FEM, can be customized. Concretely, the former type strengthens textural information to protect key but tiny/low-contrast details from suppression/removal, while the other one highlights structural information to boost segmentation performance. By equipping a given backbone network with FEMs, there might contain two information flows, i.e., detail flow and semantic flow. Extensive experiments on the Cityscapes, ADE20K and PASCAL Context datasets are conducted to validate the effectiveness of our design. The code has been released at https://github.com/SuperZ-Liu/FENet.  相似文献   

14.
Blind video quality assessment (VQA) metrics predict the quality of videos without the presence of reference videos. This paper proposes a new blind VQA model based on multilevel video perception, abbreviated as MVP. The model fuses three levels of video features occurring in natural video scenes to predict video quality: natural video statistics (NVS) features, global motion features and motion temporal correlation features. They represent video scene characteristics, video motion types, and video temporal correlation variations. In the process of motion feature extraction, motion compensation filtering video enhancement is adopted to highlight the motion characteristics of videos so as to improve the perceptual correlations of the video features. The experimental results on the LIVE and CSIQ video databases show that the predicted video scores of the new model are highly correlated with human perception and have low root mean square errors. MVP obviously outperforms state-of-art blind VQA metrics, and particularly demonstrates competitive performance even compared against top-performing full reference VQA metrics.  相似文献   

15.
Generative Adversarial Networks (GANs) have facilitated a new direction to tackle the image-to-image transformation problem. Different GANs use generator and discriminator networks with different losses in the objective function. Still there is a gap to fill in terms of both the quality of the generated images and close to the ground truth images. In this work, we introduce a new Image-to-Image Transformation network named Cyclic Discriminative Generative Adversarial Networks (CDGAN) that fills the above mentioned gaps. The proposed CDGAN generates high quality and more realistic images by incorporating the additional discriminator networks for cycled images in addition to the original architecture of the CycleGAN. The proposed CDGAN is tested over three image-to-image transformation datasets. The quantitative and qualitative results are analyzed and compared with the state-of-the-art methods. The proposed CDGAN method outperforms the state-of-the-art methods when compared over the three baseline Image-to-Image transformation datasets. The code is available at https://github.com/KishanKancharagunta/CDGAN.  相似文献   

16.
The saliency prediction precision has improved rapidly with the development of deep learning technology, but the inference speed is slow due to the continuous deepening of networks. Hence, this paper proposes a fast saliency prediction model. Concretely, the siamese network backbone based on tailored EfficientNetV2 accelerates the inference speed while maintaining high performance. The shared parameters strategy further curbs parameter growth. Furthermore, we add multi-channel activation maps to optimize the fine features considering different channels and low-level visual features, which improves the interpretability of the model. Extensive experiments show that the proposed model achieves competitive performance on the standard benchmark datasets, and prove the effectiveness of our method in striking a balance between prediction accuracy and inference speed. Moreover, the small model size allows our method to be applied in edge devices. The code is available at: https://github.com/lscumt/fast-fixation-prediction.  相似文献   

17.
Being captured by amateur photographers, reciprocally propagated through multimedia pipelines, and compressed with different levels, real-world images usually suffer from a wide variety of hybrid distortions. Faced with this scenario, full-reference (FR) image quality assessment (IQA) algorithms can not deliver promising predictions due to the inferior references. Meanwhile, existing no-reference (NR) IQA algorithms remain limited in their efficacy to deal with different distortion types. To address this obstacle, we explore a NR-IQA metric by predicting the perceptual quality of distorted-then-compressed images using a deep neural network (DNN). First, we propose a novel two-stream DNN to handle both authentic distortions and synthetic compressions and adopt effective strategies to pre-train the two branches of the network. Specifically, we transfer the knowledge learned from in-the-wild images to account for authentic distortions by utilizing a pre-trained deep convolutional neural network (CNN) to provide meaningful initializations. Meanwhile, we build a CNN for synthetic compressions and pre-train it on a dataset including synthetic compressed images. Subsequently, we bilinearly pool these two sets of features as the image representation. The overall network is fine-tuned on an elaborately-designed auxiliary dataset, which is annotated by a reliable objective quality metric. Furthermore, we integrate the output of the authentic-distortion-aware branch with that of the overall network following a two-step prediction manner to boost the prediction performance, which can be applied in the distorted-then-compressed scenario when the reference image is available. Extensive experimental results on several databases especially on the LIVE Wild Compressed Picture Quality Database show that the proposed method achieves state-of-the-art performance with good generalizability and moderate computational complexity.  相似文献   

18.
19.
Underwater image enhancement has attracted much attention due to the rise of marine resource development in recent years. Benefit from the powerful representation capabilities of Convolution Neural Networks(CNNs), multiple underwater image enhancement algorithms based on CNNs have been proposed in the past few years. However, almost all of these algorithms employ RGB color space setting, which is insensitive to image properties such as luminance and saturation. To address this problem, we proposed Underwater Image Enhancement Convolution Neural Network using 2 Color Space (UICE^2-Net) that efficiently and effectively integrate both RGB Color Space and HSV Color Space in one single CNN. To our best knowledge, this method is the first one to use HSV color space for underwater image enhancement based on deep learning. UIEC^2-Net is an end-to-end trainable network, consisting of three blocks as follow: a RGB pixel-level block implements fundamental operations such as denoising and removing color cast, a HSV global-adjust block for globally adjusting underwater image luminance, color and saturation by adopting a novel neural curve layer, and an attention map block for combining the advantages of RGB and HSV block output images by distributing weight to each pixel. Experimental results on synthetic and real-world underwater images show that the proposed method has good performance in both subjective comparisons and objective metrics. The code is available at https://github.com/BIGWangYuDong/UWEnhancement.  相似文献   

20.
Adaptive image coding with perceptual distortion control   总被引:6,自引:0,他引:6  
This paper presents a discrete cosine transform (DCT)-based locally adaptive perceptual image coder, which discriminates between image components based on their perceptual relevance for achieving increased performance in terms of quality and bit rate. The new coder uses a locally adaptive perceptual quantization scheme based on a tractable perceptual distortion metric. Our strategy is to exploit human visual masking properties by deriving visual masking thresholds in a locally adaptive fashion. The derived masking thresholds are used in controlling the quantization stage by adapting the quantizer reconstruction levels in order to meet the desired target perceptual distortion. The proposed coding scheme is flexible in that it can be easily extended to work with any subband-based decomposition in addition to block-based transform methods. Compared to existing perceptual coding methods, the proposed perceptual coding method exhibits superior performance in terms of bit rate and distortion control. Coding results are presented to illustrate the performance of the presented coding scheme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号