首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Video super-resolution aims at restoring the spatial resolution of the reference frame based on consecutive input low-resolution (LR) frames. Existing implicit alignment-based video super-resolution methods commonly utilize convolutional LSTM (ConvLSTM) to handle sequential input frames. However, vanilla ConvLSTM processes input features and hidden states independently in operations and has limited ability to handle the inter-frame temporal redundancy in low-resolution fields. In this paper, we propose a multi-stage spatio-temporal adaptive network (MS-STAN). A spatio-temporal adaptive ConvLSTM (STAC) module is proposed to handle input features in low-resolution fields. The proposed STAC module utilizes the correlation between input features and hidden states in the ConvLSTM unit and modulates the hidden states adaptively conditioned on fused spatio-temporal features. A residual stacked bidirectional (RSB) architecture is further proposed to fully exploit the processing ability of the STAC unit. The proposed STAC and RSB architecture promote the vanilla ConvLSTM’s ability to exploit the inter-frame correlations, thus improving the reconstruction quality. Furthermore, different from existing methods that only aggregate features from the temporal branch once at a specified stage of the network, the proposed network is organized in a multi-stage manner. The corresponding temporal correlation in features at different stages can be fully exploited. Experimental results on Vimeo-90K-T and UMD10 datasets show that the proposed method has comparable performance with current video super-resolution methods. The code is available at https://github.com/yhjoker/MS-STAN.  相似文献   

2.
Video Super-Resolution (SR) reconstruc-tion produces video sequences with High Resolu-tion (HR) via the fusion of several Low-Resolution (LR) video frames. Traditional methods rely on the accurate estimation of subpixel motion, which con-strains their applicability to video sequences with relatively simple motions such as global translation. We propose an efficient iterative spatio-temporal a-daptive SR reconstruction model based on Zernike Moment (ZM), which is effective for spatial video sequences with arbitrary motion. The model uses re-gion correlation judgment and self-adaptive thresh-old strategies to improve the effect and time effi-ciency of the ZM-based SR method. This leads to better mining of non-local self-similarity and local structural regularity, and is robust to noise and rota-tion. An efficient iterative curvature-based interpo-lation scheme is introduced to obtain the initial HR estimation of each LR video frame. Experimental results both on spatial and standard video sequences demonstrate that the proposed method outperforms existing methods in terms of both subjective visual and objective quantitative evaluations, and greatly improves the time efficiency.  相似文献   

3.
This paper presents a spatiotemporal super-resolution method to enhance both the spatial resolution and the frame rate in a hybrid stereo video system. In this system, a scene is captured by two cameras to form two videos, including a low spatial resolution with high-frame-rate video and a high spatial resolution with low-frame-rate video. For the low-spatial-resolution video, the low-resolution frames are spatially super-resolved by the high-resolution video via the stereo matching, the bilateral overlapped block motion estimation, and the adaptive overlapped block motion compensation algorithms, while for the low-frame-rate video, those missed frames are interpolated using the high-resolution frames obtained by fusing the disparity compensation and the motion compensation frame rate up-conversion. Experimental results demonstrate that the proposed mixed spatiotemporal super-resolution method has a more significant contribution to both the subjective and objective qualities than the pure spatial super-resolution or the frame rate up-conversion.  相似文献   

4.
Existing learning-based super-resolution (SR) reconstruction algorithms are mainly designed for single image, which ignore the spatio-temporal relationship between video frames. Aiming at applying the advantages of learning-based algorithms to video SR field, a novel video SR reconstruction algorithm based on deep convolutional neural network (CNN) and spatio-temporal similarity (STCNN-SR) was proposed in this paper. It is a deep learning method for video SR reconstruction, which considers not only the mapping relationship among associated low-resolution (LR) and high-resolution (HR) image blocks, but also the spatio-temporal non-local complementary and redundant information between adjacent low-resolution video frames. The reconstruction speed can be improved obviously with the pre-trained end-to-end reconstructed coefficients. Moreover, the performance of video SR will be further improved by the optimization process with spatio-temporal similarity. Experimental results demonstrated that the proposed algorithm achieves a competitive SR quality on both subjective and objective evaluations, when compared to other state-of-the-art algorithms.  相似文献   

5.
An image super-resolution algorithm for different error levels per frame.   总被引:1,自引:0,他引:1  
In this paper, we propose an image super-resolution (resolution enhancement) algorithm that takes into account inaccurate estimates of the registration parameters and the point spread function. These inaccurate estimates, along with the additive Gaussian noise in the low-resolution (LR) image sequence, result in different noise level for each frame. In the proposed algorithm, the LR frames are adaptively weighted according to their reliability and the regularization parameter is simultaneously estimated. A translational motion model is assumed. The convergence property of the proposed algorithm is analyzed in detail. Our experimental results using both real and synthetic data show the effectiveness of the proposed algorithm.  相似文献   

6.
A fast image super-resolution algorithm using an adaptive Wiener filter.   总被引:1,自引:0,他引:1  
A computationally simple super-resolution algorithm using a type of adaptive Wiener filter is proposed. The algorithm produces an improved resolution image from a sequence of low-resolution (LR) video frames with overlapping field of view. The algorithm uses subpixel registration to position each LR pixel value on a common spatial grid that is referenced to the average position of the input frames. The positions of the LR pixels are not quantized to a finite grid as with some previous techniques. The output high-resolution (HR) pixels are obtained using a weighted sum of LR pixels in a local moving window. Using a statistical model, the weights for each HR pixel are designed to minimize the mean squared error and they depend on the relative positions of the surrounding LR pixels. Thus, these weights adapt spatially and temporally to changing distributions of LR pixels due to varying motion. Both a global and spatially varying statistical model are considered here. Since the weights adapt with distribution of LR pixels, it is quite robust and will not become unstable when an unfavorable distribution of LR pixels is observed. For translational motion, the algorithm has a low computational complexity and may be readily suitable for real-time and/or near real-time processing applications. With other motion models, the computational complexity goes up significantly. However, regardless of the motion model, the algorithm lends itself to parallel implementation. The efficacy of the proposed algorithm is demonstrated here in a number of experimental results using simulated and real video sequences. A computational analysis is also presented.  相似文献   

7.
李方彪  何昕  魏仲慧  何家维  何丁龙 《红外与激光工程》2018,47(2):203003-0203003(8)
生成式对抗神经网络在约束图像生成表现出了巨大潜力,使得其适合运用于图像超分辨率重建。但是使用生成式对抗神经网络重建后的超分辨率图像存在过度平滑,缺少高频细节信息的缺点。针对单帧图像超分辨率重建方法不能有效利用图像序列间的时间-空间相关性的问题,提出了一种基于生成式对抗神经网络的多帧红外图像超分辨率重建方法(M-GANs)。首先,对低分辨率图像序列进行运动补偿;其次,使用权值表示卷积层对运动补偿后的图像序列进行权值转换计算;最后,将其输入生成式对抗重建网络,输出重建后的高分辨率图像。实验结果表明:文中方法在主观及客观评价中均优于当前代表性的超分辨率重建方法。  相似文献   

8.
It has been known for some time that temporal dependence (motion) plays a key role in the super-resolution (SR) reconstruction of a single frame (or sequence of frames). While the impact of global time-invariant translations is relatively well known, the general motion case has not been studied in detail. In this paper, we discuss SR reconstruction for both motion models from a frequency-domain point of view. A noniterative algorithm for SR reconstruction is presented using spatio-temporal filtering. The concepts of motion-compensated windows and sinc interpolation kernels are utilized, resulting in a finite impulse response (FIR) filter realization. In the simulations, we assume a priori knowledge of the motion (optical flow), which is commonly done throughout much of the SR reconstruction literature. The proposed process is localized in nature, and this enables the selective reconstruction of desired parts of a particular frame or sequence of frames.  相似文献   

9.
In this paper, we address a super-resolution problem of generating a high-resolution image from low-resolution images. The proposed super-resolution method consists of three steps: image registration, singular value decomposition (SVD)-based image fusion and interpolation. The contribution of this work is two-fold. First we customize an image registration approach using Scale Invariant Feature Transform (SIFT), Belief Propagation and Random Sampling Consensus (RANSAC) for super-resolution. Second, we propose SVD-based fusion to integrate the important features from the low-resolution images. The proposed image registration and fusion steps effectively maintain the important features and greatly improve the super-resolution results. Results, for a variety of image examples, show that the proposed method successfully generates high-resolution images from low-resolution images.  相似文献   

10.
本文提出了一种基于多尺度特征残差学习卷积神经网络的视频超分辨率方法,考虑到视频帧间的时空相关性,所提的方法采用由双三次插值预处理后的连续五帧视频作为卷积神经网络的输入,经由网络重建中间帧作为输出,依次按顺序重建直至获得整个高分辨率视频。本文所提出的卷积神经网络主要由多尺度特征提取、残差学习、亚像素卷积层、残差连接(skip-connection)四大部分组成,通过对视频的多尺度特征的提取获得更丰富的不同尺度特征和残差学习达到较好地恢复高频信息的目的。本文采用峰值信噪比(PSNR)和结构相似性指数(SSIM)作为损失函数优化网络。实验结果表明,本方法在平均评价指标上较其他方法均有一定的提升(PSNR +3.151dB,SSIM +0.102),从主观评价上看可以有效地减少视频边缘模糊的现象。   相似文献   

11.
为了提高视频的空间分辨率,提出了一种利用帧间运动信息进行超分辨率重建的方法。对于整个视频的重建,提出了一种基于滑动窗的分段重建模型。在每一个滑动窗中,首先对相邻帧进行子像素级精度的运动配准;然后通过迭代反投影算法进行超分辨率重建。在配准算法中,提出了一种基于四参数刚体变换模型的配准方法,通过迭代求解和高斯金字塔图像模型由粗及精地进行运动估计。分别对模拟图像及实拍彩色视频进行重建,实验结果表明,该配准算法具有较高的精度,重建算法取得了较高的峰值信噪比(PSNR)值,重建视频具有更好的视觉效果和更高的分辨率能力,可被广泛应用于在帧间主要存在平移和旋转运动的视频序列的超分辨率重建。  相似文献   

12.
Many scalable video compression techniques utilise a mixed-resolution scheme, which down-samples some frames at the encoder to produce reduced-resolution frames while keeping resolutions of other frames unchanged as full resolutions, in order to achieve higher compression gain. Image enlargement technique is required at the decoder to recover the original full-resolution frames for this mixed-resolution video system set-up. This article proposes a Bayesian approach to enlarge the reduced-resolution frame via its maximum a-posterior estimation, using the information from the observed reduced-resolution frame, plus more detailed information extracted from available neighbouring frames in full resolution. Experiments are conducted to justify that the proposed approach outperforms a few conventional approaches.  相似文献   

13.
齐峰岩  鲍长春 《电子学报》2006,34(4):605-611
本文将支持向量机(SVM)方法应用于语音信号的清/浊/静音检测中,提出并验证了一种在各种信噪比等级下将语音信号有效地分为清音、浊音和静音三类信号的新型分类算法.首先,在高信噪比情况下,本文采用了G.729B VAD中的四个差分参数作为SVM分类器的输入特征参数,进行了静音分类的对比实验,得到了优于G.729B VAD和BP神经网络传统算法的实验结果,说明引入这种机器学习方法做语音分类是可行的,并分析讨论了在核函数不同的情况下支持向量机在实验中所表现出的性能.其次,又讨论了在低信噪比条件下,如何通过对含噪语音建立统计模型,提取对噪音免疫的统计特征参数,并给出了一种对时变背景噪声自适应的估计方法.最后,通过在不同噪音环境下的对比实验结果,验证了本文所提出的算法在中低信噪比情况下的分类性能要优于其他传统算法.  相似文献   

14.
We present a fully automatic multimodal emotion recognition system based on three novel peak frame selection approaches using the video channel. Selection of peak frames (i.e., apex frames) is an important preprocessing step for facial expression recognition as they contain the most relevant information for classification. Two of the three proposed peak frame selection methods (i.e., MAXDIST and DEND-CLUSTER) do not employ any training or prior learning. The third method proposed for peak frame selection (i.e., EIFS) is based on measuring the “distance” of the expressive face from the subspace of neutral facial expression, which requires a prior learning step to model the subspace of neutral face shapes. The audio and video modalities are fused at the decision level. The subject-independent audio-visual emotion recognition system has shown promising results on two databases in two different languages (eNTERFACE and BAUM-1a).  相似文献   

15.
为了提高图像超分辨率重建的效果,该文将注意力机制引入多级残差网络(Multi-level Residual Attention Network,MRAN)作为CycleGAN的重建网络,提出了基于循环生成对抗网络(CycleGAN)的超分辨率重建模型MRA-GAN.MRA-GAN模型中重建网络负责将低分辨率(LR)图像...  相似文献   

16.
This paper presents a novel approach for spatio-temporal video super-resolution. Whereas the task of synthesizing high-frequency information on the spatial domain can be accomplished without introducing arbitrary priors on the image model (beyond the assumption of local cross-scale self-similarity), the high degree of temporal aliasing in standard-frame-rate video requires applying motion compensation in order to correctly interpolate the video sequence along the temporal axis. As the experimental results show, the proposed technique is capable of augmenting both the frame-rate of a video sequence (by any real up-converting factor) and its effective spatial resolution (by any rational magnification factor). The presented method is suitable for parallelized computing environments, such as GPUs, and provides an output quality comparable to state-of-the-art methods in both temporal interpolation and spatial super-resolution.  相似文献   

17.
In the dictionary-based image super-resolution (SR) methods, the resolution of the input image is enhanced using a dictionary of low-resolution (LR) and high-resolution (HR) image patches. Typically, a single dictionary is learned from all the patches in the training set. Then, the input LR patch is super-resolved using its nearest LR patches and their corresponding HR patches in the dictionary. In this paper, we propose a text-image SR method using multiple class-specific dictionaries. Each dictionary is learned from the patches of images of a specific character in the training set. The input LR image is segmented into text lines and characters, and the characters are preliminarily classified. Likewise, overlapping patches are extracted from the input LR image. Then, each patch is super-resolved through the anchored neighborhood regression, using n class-specific dictionaries corresponding to the top-n classification results of the character containing the patch. The final HR image is generated by aggregating all the super-resolved patches. Our method achieves significant improvements in visual image quality and OCR accuracy, compared to the related dictionary-based SR methods. This confirms the effectiveness of applying the preliminary character classification results and multiple class-specific dictionaries in text-image SR.  相似文献   

18.
于晓  李朝 《红外》2022,43(10):32-42
针对传统红外图像目标分类方法准确率低的问题,提出了一种用结合多特征融合的粒子群优化(Particle Swarm Optimization, PSO)算法来优化支持向量机(Support Vector Machine, SVM)的方法。该方法采用方向梯度直方图(Histogram of Oriented Gradient, HOG)和局部二值模式(Local Binary Pattern, LBP)两类特征描述红外图像中目标的轮廓特征和局部纹理,从不同的方面展现红外图像的特点,在图像的特征表达上具有一定的互补性。在特征提取后对样本数据进行凸包算法计算,得到一些具有代表性的样本数据,从而提高分类计算效率;在分类模型训练时,采用PSO算法优化SVM,寻找SVM的最优惩罚因子和核参数,从而提高分类模型的准确率。实验结果表明,多特征融合的分类模型的准确率比单一特征的分类模型提高近10%,且经PSO优化的SVM最终模型的分类准确率高达99%。  相似文献   

19.
在动作识别任务中,如何充分学习和利用视频的空间特征和时序特征的相关性,对最终识别结果尤为重要。针对传统动作识别方法忽略时空特征相关性及细小特征,导致识别精度下降的问题,本文提出了一种基于卷积门控循环单元(convolutional GRU, ConvGRU)和注意力特征融合(attentional feature fusion,AFF) 的人体动作识别方法。首先,使用Xception网络获取视频帧的空间特征提取网络,并引入时空激励(spatial-temporal excitation,STE) 模块和通道激励(channel excitation,CE) 模块,获取空间特征的同时加强时序动作的建模能力。此外,将传统的长短时记忆网络(long short term memory, LSTM)网络替换为ConvGRU网络,在提取时序特征的同时,利用卷积进一步挖掘视频帧的空间特征。最后,对输出分类器进行改进,引入基于改进的多尺度通道注意力的特征融合(MCAM-AFF)模块,加强对细小特征的识别能力,提升模型的准确率。实验结果表明:在UCF101数据集和HMDB51数据集上分别达到了95.66%和69.82%的识别准确率。该算法获取了更加完整的时空特征,与当前主流模型相比更具优越性。  相似文献   

20.
为了解决视频超分辨率重建的病态问题,以得到良好的重建效果,提出了一种新颖的视频超分辨率重建算法。在算法中引入了时空联合正则化算子,通过视频帧本身的空间平滑信息和视频相邻帧的帧间相关先验信息的引入,提高了解的质量;同时,为了选择合适的时空正则化系数,提出了基于L曲线的自适应时空正则化系数计算方法,可以自适应地计算合适的正则化系数。通过对模拟图像序列和真实视频序列的实验结果表明,算法能得到较为精确的解,重建出具有良好视觉效果的高分辨率视频。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号