首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper proposes a No-Reference (NR) Video Quality Assessment (VQA) method for videos subject to the distortion given by the High Efficiency Video Coding (HEVC) scheme. The assessment is performed without access to the bitstream. The proposed analysis is based on the transform coefficients estimated from the decoded video pixels, which is used to estimate the level of quantization. The information from this analysis is exploited to assess the video quality. HEVC transform coefficients are modeled with a joint-Cauchy probability density function in the proposed method. To generate VQA features the quantization step used in the Intra coding is estimated. We map the obtained HEVC features using an Elastic Net to predict subjective video quality scores, Mean Opinion Scores (MOS). The performance is verified on a dataset consisting of HEVC coded 4 K UHD (resolution equal to 3840 × 2160) video sequences at different bitrates and spanning a wide range of content. The results show that the quality scores computed by the proposed method are highly correlated with the mean subjective assessments.  相似文献   

2.
Images captured in weak illumination conditions could seriously degrade the image quality. Solving a series of degradation of low-light images can effectively improve the visual quality of images and the performance of high-level visual tasks. In this study, a novel Retinex-based Real-low to Real-normal Network (R2RNet) is proposed for low-light image enhancement, which includes three subnets: a Decom-Net, a Denoise-Net, and a Relight-Net. These three subnets are used for decomposing, denoising, contrast enhancement and detail preservation, respectively. Our R2RNet not only uses the spatial information of the image to improve the contrast but also uses the frequency information to preserve the details. Therefore, our model achieved more robust results for all degraded images. Unlike most previous methods that were trained on synthetic images, we collected the first Large-Scale Real-World paired low/normal-light images dataset (LSRW dataset) to satisfy the training requirements and make our model have better generalization performance in real-world scenes. Extensive experiments on publicly available datasets demonstrated that our method outperforms the existing state-of-the-art methods both quantitatively and visually. In addition, our results showed that the performance of the high-level visual task (i.e., face detection) can be effectively improved by using the enhanced results obtained by our method in low-light conditions. Our codes and the LSRW dataset are available at: https://github.com/JianghaiSCU/R2RNet.  相似文献   

3.
We develop a full-reference (FR) video quality assessment framework that integrates analysis of space–time slices (STSs) with frame-based image quality measurement (IQA) to form a high-performance video quality predictor. The approach first arranges the reference and test video sequences into a space–time slice representation. To more comprehensively characterize space–time distortions, a collection of distortion-aware maps are computed on each reference–test video pair. These reference-distorted maps are then processed using a standard image quality model, such as peak signal-to-noise ratio (PSNR) or Structural Similarity (SSIM). A simple learned pooling strategy is used to combine the multiple IQA outputs to generate a final video quality score. This leads to an algorithm called Space–TimeSlice PSNR (STS-PSNR), which we thoroughly tested on three publicly available video quality assessment databases and found it to deliver significantly elevated performance relative to state-of-the-art video quality models. Source code for STS-PSNR is freely available at: http://live.ece.utexas.edu/research/Quality/STS-PSNR_release.zip.  相似文献   

4.
Aerators are essential and crucial auxiliary devices in intensive culture, especially in industrial culture in China. In this paper, we propose a real-time expert system for anomaly detection of aerators based on computer vision technology and existing surveillance cameras. The expert system includes two modules, i.e., object region detection and working state detection. First, we present a small object region detection method based on the region proposal idea. Moreover, we propose a novel algorithm called reference frame Kanade-Lucas-Tomasi (RF-KLT) algorithm for motion feature extraction in fixed regions. Then, we describe a dimension reduction method of time series for establishing a feature dataset with obvious boundaries between classes. Finally, we use machine learning algorithms to build the feature classifier. The proposed expert system can realize real-time, robust and cost-free anomaly detection of aerators in both the actual video dataset and the augmented video dataset. Demo is available at https://youtu.be/xThHRwu_cnI.  相似文献   

5.
Recently, deep learning-based methods have reached an excellent performance on License Plate (LP) detection and recognition tasks. However, it is still challenging to build a robust model for Chinese LPs since there are not enough large and representative datasets. In this work, we propose a new dataset named Chinese Road Plate Dataset (CRPD) that contains multi-objective Chinese LP images as a supplement to the existing public benchmarks. The images are mainly captured with electronic monitoring systems with detailed annotations. To our knowledge, CRPD is the largest public multi-objective Chinese LP dataset with annotations of vertices. With CRPD, a unified detection and recognition network with high efficiency is presented as the baseline. The network is end-to-end trainable with totally real-time inference efficiency (30 fps with 640 p). The experiments on several public benchmarks demonstrate that our method has reached competitive performance. The code and dataset will be publicly available at https://github.com/yxgong0/CRPD.  相似文献   

6.
视频质量评价(VQA)对于视频处理应用有着重要影响.人眼视觉特性的时域掩蔽效应和时域失真波动是视频质量感知评价的关键因子.在已有的视频质量评价研究中,很少有考虑到时域失真波动对视频主观感知质量的影响.改进了传统的时域分析算法,并证明了视频质量评价算法中时域失真的有效性.  相似文献   

7.
With the continuous development of deep learning, neural networks have made great progress in license plate recognition (LPR). Nevertheless, there is still room to improve the performance of license plate recognition for low-resolution and relatively blurry images in remote surveillance scenarios. When it is difficult to enhance the recognition algorithm, we choose super-resolution (SR) to improve the quality of license plate images and thereby provide clearer input for the subsequent recognition stage. In this paper, we propose an automatic super-resolution license plate recognition (SRLPR) network which consists of four parts separately: license plate detection, character detection, single character super-resolution, and recognition. In the training stage, firstly, LP detection model needs to be trained alone and then its detection results will be used to successively train the three subsequent modules. During the test phase, for each input image, the network can get its LP number automatically. We also collect an applicable and challenging LPR dataset called SRLP, which is collected from real remote traffic surveillance. The experimental results demonstrate that our method achieves comprehensive quality of SR images and higher recognition accuracy compared with state-of-the-art methods. The SRLP dataset and the code for training and testing SRLPR network are available at https://pan.baidu.com/s/1vnhRa-c-dBj6jlfBZV5w4g.  相似文献   

8.
Many image co-segmentation algorithms have been proposed over the last decade. In this paper, we present a new dataset for evaluating co-segmentation algorithms, which contains 889 image groups with 18 images in each and the pixel-wise hand-annotated ground truths. The dataset is characterized by simple background produced from nearly a single color. It looks simple but is actually very challenging for current co-segmentation algorithms, because of four difficult cases in it: easy-confused foreground with background, transparent regions in objects, minor holes in objects, and shadows. In order to test the usefulness of our dataset, we review the state-of-the-art co-segmentation algorithms and evaluate seven algorithms on our dataset. The obtained performance of each algorithm is compared with those previously reported in the datasets with complex background. The results prove that our dataset is valuable for the development of co-segmentation techniques. It is more feasible to solve the four difficulties above on the simple background and then extend the solutions to the complex background problems. Our dataset can be freely downloaded from: http://www.iscbit.org/source/MLMR-COS.zip.  相似文献   

9.
In the task of skeleton-based action recognition, CNN-based methods represent the skeleton data as a pseudo image for processing. However, it still remains as a critical issue of how to construct the pseudo image to model the spatial dependencies of the skeletal data. To address this issue, we propose a novel convolutional neural network with adaptive inferential framework (AIF-CNN) to exploit the dependencies among the skeleton joints. We particularly investigate several initialization strategies to make the AIF effective with each strategy introducing the different prior knowledge. Extensive experiments on the dataset of NTU RGB+D and Kinetics-Skeleton demonstrate that the performance is improved significantly by integrating the different prior information. The source code is available at: https://github.com/hhe-distance/AIF-CNN.  相似文献   

10.
For the same video quality, HEVC gives 25% to 50% bitrate savings, compared to its predecessor the Advanced Video Coding H.264 and thus supports resolutions up to 8 K UHD. However, the reduction in bitrates provided by the HEVC leads to an increase in the computational cost of the encoding operation. This complexity can become a true handicap especially for real-time video streaming and also for VANET (Vehicular Ad-Hoc Network) applications such as traffic safety and Video surveillance. The improvement in the bitrates and also the increase in the calculation cost are due to the use of large and multi-sized coding, prediction and transform blocks. Indeed, the H264 coder is based on structure macroblocks with sizes 4 × 4, 8 × 8 and 16 × 16, while H.265 depends on Coding Tree Units (CTUs), CTUs select sizes 4 × 4, 8 × 8, 16 × 16, 32 × 32 and 64 × 64 blocks. This paper proposes a fast CU (Coding Unit) size decision method to reduce the HEVC calculation cost based on spatial homogeneity. Compared with the HM16.13 benchmark test model, the average coding time is reduced by around 40% for CIF / QCIF video sequences, 35% to 43% for class A, B and C test sequences. These important reductions in coding time are obtained with negligible loss of quality and an average increase in bitrates which does not exceed 0.89% for the three configuration modes (All intra, Random Access and Low Delay).  相似文献   

11.
Video super-resolution aims at restoring the spatial resolution of the reference frame based on consecutive input low-resolution (LR) frames. Existing implicit alignment-based video super-resolution methods commonly utilize convolutional LSTM (ConvLSTM) to handle sequential input frames. However, vanilla ConvLSTM processes input features and hidden states independently in operations and has limited ability to handle the inter-frame temporal redundancy in low-resolution fields. In this paper, we propose a multi-stage spatio-temporal adaptive network (MS-STAN). A spatio-temporal adaptive ConvLSTM (STAC) module is proposed to handle input features in low-resolution fields. The proposed STAC module utilizes the correlation between input features and hidden states in the ConvLSTM unit and modulates the hidden states adaptively conditioned on fused spatio-temporal features. A residual stacked bidirectional (RSB) architecture is further proposed to fully exploit the processing ability of the STAC unit. The proposed STAC and RSB architecture promote the vanilla ConvLSTM’s ability to exploit the inter-frame correlations, thus improving the reconstruction quality. Furthermore, different from existing methods that only aggregate features from the temporal branch once at a specified stage of the network, the proposed network is organized in a multi-stage manner. The corresponding temporal correlation in features at different stages can be fully exploited. Experimental results on Vimeo-90K-T and UMD10 datasets show that the proposed method has comparable performance with current video super-resolution methods. The code is available at https://github.com/yhjoker/MS-STAN.  相似文献   

12.
In the field of security, faces are usually blurry, occluded, diverse pose and small in the image captured by an outdoor surveillance camera, which is affected by the external environment such as the camera pose and range, weather conditions, etc. It can be described as a problem of hard face detection in natural images. To solve this problem, we propose a deep convolutional neural network named feature hierarchy encoder–decoder network (FHEDN). It is motivated by two observations from contextual semantic information and the mechanism of multi-scale face detection. The proposed network is a scale-variant style architecture and single stage, which are composed of encoder and decoder subnetworks. Based on the assumption that contextual semantic information around face being auxiliary to detect faces, we introduce a residual mechanism to fuse context prior-based information into face feature and formulate the learning chain to train each encoder–decoder pair. In addition, we discuss some important factors in implement details such as the distribution of training dataset, the scale of feature hierarchy, and anchor box size, etc. They have some impact on the detection performance of the final network. Compared with some state-of-the-art algorithms, our method achieves promising performance on the popular benchmarks including AFW, PASCAL FACE, FDDB, and WIDER FACE. Consequently, the proposed approach can be efficiently implemented and routinely applied to detect faces with severe occlusion and arbitrary pose variations in unconstrained scenes. Our code and results are available on https://github.com/zzxcoder/EvaluationFHEDN.  相似文献   

13.
This study aims to investigate the antecedents (i.e., relational governance) and consequence (i.e., loyalty) of service quality from the crowdsourcer perspective in the online crowdsourcing context. The hypotheses derived from our research model were empirically validated using an online survey of 240 crowdsourcers from Zhubajie.com in China. Results show that relational governance elements (i.e., information exchange, conflict resolution and trust) positively affect service quality dimensions despite the insignificant effect of trust on interaction quality. All service quality dimensions (i.e., interaction quality, outcome quality, environment quality) significantly affect crowdsourcer loyalty. Crowdsourcing experience positively moderates the effect of environment quality on crowdsourcer loyalty but negatively moderates the effect of outcome quality. This study contributes to deepening the understanding of the approach to enhance service quality and crowdsourcer loyalty.  相似文献   

14.
This paper presents a novel No-Reference Video Quality Assessment (NR-VQA) model that utilizes proposed 3D steerable wavelet transform-based Natural Video Statistics (NVS) features as well as human perceptual features. Additionally, we proposed a novel two-stage regression scheme that significantly improves the overall performance of quality estimation. In the first stage, transform-based NVS and human perceptual features are separately passed through the proposed hybrid regression scheme: Support Vector Regression (SVR) followed by Polynomial curve fitting. The two visual quality scores predicted from the first stage are then used as features for the similar second stage. This predicts the final quality scores of distorted videos by achieving score level fusion. Extensive experiments were conducted using five authentic and four synthetic distortion databases. Experimental results demonstrate that the proposed method outperforms other published state-of-the-art benchmark methods on synthetic distortion databases and is among the top performers on authentic distortion databases. The source code is available at https://github.com/anishVNIT/two-stage-vqa.  相似文献   

15.
The standard no-reference video quality assessment (NR-VQA) is designed for a specific type of distortion. It quantifies the visual quality of a distorted video without the reference one. Practically, there is a deviation between the result of NR-VQA and human subjective perception. To tackle this problem, we propose a 3D deep convolutional neural network (3D CNN) to evaluate video quality without reference by generating spatial/temporal deep features within different video clips 3D CNN is designed by collaboratively and seamlessly integrating the features output from VGG-Net on video frames. To prevent our adopted VGG-Net from overfitting, the parameters are transferred from the deep architecture learned from the ImageNet dataset. Extensive IQA/VQA experimental results based on the LIVE, TID, and the CSIQ video quality databases have demonstrated that the proposed IQA/VQA model performs competitively the conventional methods.  相似文献   

16.
Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at https://github.com/IMPLabUniPr/mmdetection/tree/sbr_cnn.  相似文献   

17.
For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary classification task with items in a garment being considered as fully compatible or fully incompatible. However, this is not applicable to Outfit Maker applications where users create their own outfits and need to know which specific items may be incompatible with the rest of the outfit. To address this, we propose the Visual InCompatibility TransfORmer (VICTOR) that is optimized for two tasks: 1) overall compatibility as regression and 2) the detection of mismatching items and utilize fashion-specific contrastive language-image pre-training for fine tuning computer vision neural networks on fashion imagery. We build upon the Polyvore outfit benchmark to generate partially mismatching outfits, creating a new dataset termed Polyvore-MISFITs, that is used to train VICTOR. A series of ablation and comparative analyses show that the proposed architecture can compete and even surpass the current state-of-the-art on Polyvore datasets while reducing the instance-wise floating operations by 88%, striking a balance between high performance and efficiency. We release our code at https://github.com/stevejpapad/Visual-InCompatibility-Transformer  相似文献   

18.
Generative Adversarial Networks (GANs) have facilitated a new direction to tackle the image-to-image transformation problem. Different GANs use generator and discriminator networks with different losses in the objective function. Still there is a gap to fill in terms of both the quality of the generated images and close to the ground truth images. In this work, we introduce a new Image-to-Image Transformation network named Cyclic Discriminative Generative Adversarial Networks (CDGAN) that fills the above mentioned gaps. The proposed CDGAN generates high quality and more realistic images by incorporating the additional discriminator networks for cycled images in addition to the original architecture of the CycleGAN. The proposed CDGAN is tested over three image-to-image transformation datasets. The quantitative and qualitative results are analyzed and compared with the state-of-the-art methods. The proposed CDGAN method outperforms the state-of-the-art methods when compared over the three baseline Image-to-Image transformation datasets. The code is available at https://github.com/KishanKancharagunta/CDGAN.  相似文献   

19.
Colored point cloud (PC) will inevitably encounter distortion during its acquisition, processing, coding and transmission, which may affect the visual quality of the colored PC. Therefore, it is necessary to design an effective tool for colored PC quality assessment (PCQA). In this paper, considering the mapping relationship of perception between the colored PC and its corresponding projection images, we propose a novel PCQA method based on texture and geometry projection (denoted as TGP-PCQA). The main idea of the proposed TGP-PCQA method is to obtain texture and geometry projection maps from different perspectives for evaluating the colored PC. Specifically, 4D tensor decomposition is used to obtain the combination and difference information between the reference and distorted texture projection maps for mainly characterizing texture distortion of colored PC. Meanwhile, the edge features of the geometry projection map are calculated to measure the global or local geometry distortion. All of the extracted features are combined to predict an overall quality of colored PC. In addition, this paper establishes a multi-distorted colored PC database named CPCD2.0 with compression distortions and Gaussian noise, which orients to the influence of both geometry and texture components in distortion. Experimental results on two open subjective evaluation databases (IRPC and SJTU-PCQA) and the self-built CPCD2.0 database show that the proposed TGP-PCQA method outperforms the state-of-the-art PCQA methods. We are also providing the self-built CPCD2.0 database free of charge at https://github.com/cherry0415/CPCD2.0.  相似文献   

20.
Blind video quality assessment (VQA) metrics predict the quality of videos without the presence of reference videos. This paper proposes a new blind VQA model based on multilevel video perception, abbreviated as MVP. The model fuses three levels of video features occurring in natural video scenes to predict video quality: natural video statistics (NVS) features, global motion features and motion temporal correlation features. They represent video scene characteristics, video motion types, and video temporal correlation variations. In the process of motion feature extraction, motion compensation filtering video enhancement is adopted to highlight the motion characteristics of videos so as to improve the perceptual correlations of the video features. The experimental results on the LIVE and CSIQ video databases show that the predicted video scores of the new model are highly correlated with human perception and have low root mean square errors. MVP obviously outperforms state-of-art blind VQA metrics, and particularly demonstrates competitive performance even compared against top-performing full reference VQA metrics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号