首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A compressed domain video saliency detection algorithm, which employs global and local spatiotemporal (GLST) features, is proposed in this work. We first conduct partial decoding of a compressed video bitstream to obtain motion vectors and DCT coefficients, from which GLST features are extracted. More specifically, we extract the spatial features of rarity, compactness, and center prior from DC coefficients by investigating the global color distribution in a frame. We also extract the spatial feature of texture contrast from AC coefficients to identify regions, whose local textures are distinct from those of neighboring regions. Moreover, we use the temporal features of motion intensity and motion contrast to detect visually important motions. Then, we generate spatial and temporal saliency maps, respectively, by linearly combining the spatial features and the temporal features. Finally, we fuse the two saliency maps into a spatiotemporal saliency map adaptively by comparing the robustness of the spatial features with that of the temporal features. Experimental results demonstrate that the proposed algorithm provides excellent saliency detection performance, while requiring low complexity and thus performing the detection in real-time.  相似文献   

2.
In this study, a spatiotemporal saliency detection and salient region determination approach for H.264 videos is proposed. After Gaussian filtering in Lab color space, the phase spectrum of Fourier transform is used to generate the spatial saliency map of each video frame. On the other hand, the motion vector fields from each H.264 compressed video bitstream are backward accumulated. After normalization and global motion compensation, the phase spectrum of Fourier transform for the moving parts is used to generate the temporal saliency map of each video frame. Then, the spatial and temporal saliency maps of each video frame are combined to obtain its spatiotemporal saliency map using adaptive fusion. Finally, a modified salient region determination scheme is used to determine salient regions (SRs) of each video frame. Based on the experimental results obtained in this study, the performance of the proposed approach is better than those of two comparison approaches.  相似文献   

3.
Providing adequate Quality of Experience (QoE) to end-users is crucial for streaming service providers. In this paper, in order to realize automatic quality assessment, a No-Reference (NR) bitstream Human-Vision-System-(HVS)-based video quality assessment (VQA) model is proposed. Inspired by discoveries from the neuroscience community, which suggest there is a considerable overlap between active areas of the brain when engaging in video quality assessment and saliency detection tasks, saliency maps are used in the proposed method to improve the quality assessment accuracy. To this end, saliency maps are first generated from features extracted from the HEVC bitstream. Then, saliency map statistics are employed to create a model of visual memory. Finally, a support vector regression pipeline learns an estimate of the video quality from the visual memory, saliency, and frame features. Evaluations on SJTU dataset indicate that the proposed bitstream based no-reference video quality assessment algorithm achieves a competitive performance.  相似文献   

4.
In the literatures, the designs of H.264 to High Efficiency Video Coding (HEVC) transcoders mostly focus on inter transcoding. In this paper, a fast intra transcoding system from H.264 to HEVC based on discrete cosine transform (DCT) coefficients and intra prediction modes, called FITD, is proposed by using the intra information retrieved from an H.264 decoder for transcoding. To design effective transcoding strategies, FITD not only refers block size of intra prediction and intra prediction modes, but also effectively uses the DCT coefficients to help a transcoder to predict the complexity of the blocks. We successfully use DCT coefficients as well as intra prediction information embedded in H.264 bitstreams to predict the coding depth map for depth limitation and early termination to simplify HEVC re-encoding process. After a HEVC encoder gets the prediction of a certain CU size from depth map, if it reaches the predicted depth, the HEVC encoder will stop the next CU branch. As a result, the numbers of CU branches and predictions in HEVC re-encoder will be substantially reduced to achieve fast and precise intra transcoding. The experimental results show that the FITD is 1.7–2.5 times faster than the original HEVC in encoding intra frames, while the bitrate is only increased to 3% or less and the PSNR degradation is also controlled within 0.1 dB. Compared to the previous H.264 to HEVC transcoding approaches, FITD clearly maintains the better trade-off between re-encoding speed and video quality.  相似文献   

5.
A compressed video bitstream is sensitive to errors that may severely degrade the reconstructed images even when the bit error rate is small. One approach to combat the impact of such errors is the use of error concealment at the decoder without increasing the bit rate or changing the encoder. For spatial‐error concealment, we propose a method featuring edge continuity and texture preservation as well as low computation to reconstruct more visually acceptable images. Aiming at temporal error concealment, we propose a two‐step algorithm based on block matching principles in which the assumption of smooth and uniform motion for some adjacent blocks is adopted. As simulation results show, the proposed spatial and temporal methods provide better reconstruction quality for damaged images than other methods.  相似文献   

6.
In this paper, efficient solutions for requantization transcoding in H.264/AVC are presented. By requantizing residual coefficients in the bitstream, different error components can appear in the transcoded video stream. Firstly, a requantization error is present due to successive quantization in encoder and transcoder. In addition to the requantization error, the loss of information caused by coarser quantization will propagate due to dependencies in the bitstream. Because of the use of intra prediction and motion-compensated prediction in H.264/AVC, both spatial and temporal drift propagation arise in transcoded H.264/AVC video streams. The spatial drift in intra-predicted blocks results from mismatches in the surrounding prediction pixels as a consequence of requantization. In this paper, both spatial and temporal drift components are analyzed. As is shown, spatial drift has a determining impact on the visual quality of transcoded video streams in H.264/AVC. In particular, this type of drift results in serious distortion and disturbing artifacts in the transcoded video stream. In order to avoid the spatially propagating distortion, we introduce transcoding architectures based on spatial compensation techniques. By combining the individual temporal and spatial compensation approaches and applying different techniques based on the picture and/or macroblock type, overall architectures are obtained that provide a trade-off between computational complexity and rate-distortion performance. The complexity of the presented architectures is significantly reduced when compared to cascaded decoder–encoder solutions, which are typically used for H.264/AVC transcoding. The reduction in complexity is particularly large for the solution which uses spatial compensation only. When compared to traditional solutions without spatial compensation, both visual and objective quality results are highly improved.  相似文献   

7.
Saliency detection in the compressed domain for adaptive image retargeting   总被引:2,自引:0,他引:2  
Saliency detection plays important roles in many image processing applications, such as regions of interest extraction and image resizing. Existing saliency detection models are built in the uncompressed domain. Since most images over Internet are typically stored in the compressed domain such as joint photographic experts group (JPEG), we propose a novel saliency detection model in the compressed domain in this paper. The intensity, color, and texture features of the image are extracted from discrete cosine transform (DCT) coefficients in the JPEG bit-stream. Saliency value of each DCT block is obtained based on the Hausdorff distance calculation and feature map fusion. Based on the proposed saliency detection model, we further design an adaptive image retargeting algorithm in the compressed domain. The proposed image retargeting algorithm utilizes multioperator operation comprised of the block-based seam carving and the image scaling to resize images. A new definition of texture homogeneity is given to determine the amount of removal block-based seams. Thanks to the directly derived accurate saliency information from the compressed domain, the proposed image retargeting algorithm effectively preserves the visually important regions for images, efficiently removes the less crucial regions, and therefore significantly outperforms the relevant state-of-the-art algorithms, as demonstrated with the in-depth analysis in the extensive experiments.  相似文献   

8.
In the H.263 video codec related systems, motion estimation and Discrete Cosine Transform (DCT) have the most computational requirements. In order to reduce complexity of the encoder to dedicate more resources to other functions, according to the study of existing methods, an Improved All Zero Block Finding (IAZBF) method based on the statistic characteristics of DCT coefficients is proposed. Compared with existing methods, IAZBF improves the detecting efficiency by about 50% without importing too much extra computation requirement. Being computed with additions and shifts instead of complicated multiplications, IAZBF is of low computation complexity, especially for low-end processors. In addition, IAZBF upholds picture fidelity and remains compatible with the H.263 bitstream standard.  相似文献   

9.
Recently Saliency maps from input images are used to detect interesting regions in images/videos and focus on processing these salient regions. This paper introduces a novel, macroblock level visual saliency guided video compression algorithm. This is modelled as a 2 step process viz. salient region detection and frame foveation. Visual saliency is modelled as a combination of low level, as well as high level features which become important at the higher-level visual cortex. A relevance vector machine is trained over 3 dimensional feature vectors pertaining to global, local and rarity measures of conspicuity, to yield probabilistic values which form the saliency map. These saliency values are used for non-uniform bit-allocation over video frames. To achieve these goals, we also propose a novel video compression architecture, incorporating saliency, to save tremendous amount of computation. This architecture is based on thresholding of mutual information between successive frames for flagging frames requiring re-computation of saliency, and use of motion vectors for propagation of saliency values.  相似文献   

10.
Double JPEG compression detection plays a vital role in multimedia forensics, to find out whether a JPEG image is authentic or manipulated. However, it still remains to be a challenging task in the case when the quality factor of the first compression is much higher than that of the second compression, as well as in the case when the targeted image blocks are quite small. In this work, we present a novel end-to-end deep learning framework taking raw DCT coefficients as input to distinguish between single and double compressed images, which performs superior in the above two cases. Our proposed framework can be divided into two stages. In the first stage, we adopt an auxiliary DCT layer with sixty-four 8 × 8 DCT kernels. Using a specific layer to extract DCT coefficients instead of extracting them directly from JPEG bitstream allows our proposed framework to work even if the double compressed images are stored in spatial domain, e.g. in PGM, TIFF or other bitmap formats. The second stage is a deep neural network with multiple convolutional blocks to extract more effective features. We have conducted extensive experiments on three different image datasets. The experimental results demonstrate the superiority of our framework when compared with other state-of-the-art double JPEG compression detection methods either hand-crafted or learned using deep networks in the literature, especially in the two cases mentioned above. Furthermore, our proposed framework can detect triple and even multiple JPEG compressed images, which is scarce in the literature as far as we know.  相似文献   

11.
Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8×8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods.  相似文献   

12.
Bitstream-layer models are designed to use the information extracted from both packet headers and payload for real-time and non-intrusive quality monitoring of networked video. This paper proposes a content-adaptive bitstream-layer (CABL) model for coding distortion assessment of H.264/AVC networked video. Firstly, the fundamental relationship between perceived coding distortion and quantization parameter (QP) is established. Then, considering the fact that the perceived coding distortion of a networked video significantly relies on both the spatial and temporal characteristics of video content, spatial and temporal complexities are incorporated in the proposed model. Assuming that the residuals before Discrete Cosine Transform (DCT) keep to the Laplace distribution, the scale parameters of the Laplace distribution are estimated utilizing QP and quantized coefficients on the basis of the Parseval theorem firstly. Then the spatial complexity is evaluated using QP and the scale parameters. Meanwhile, the temporal complexity is obtained using the weighted motion vectors (MV) considering the variations in temporal masking extent for high motion regions and low motion regions, respectively. Both the two characteristics of video content are extracted from the compressed bitstream without resorting to a complete decoding. Using content related information, the proposed model is able to adapt to different video contents. Experimental results show that the overall performance of CABL model significantly outperforms that of the P.1202.1 model and other coding distortion assessment models in terms of widely used performance criteria, including the Pearson Correlation Coefficient (PCC), the Spearman Rank Order Correlation Coefficient (SROCC), the Root-Mean-Squared Error (RMSE) and the Outlier Ratio (OR).  相似文献   

13.
Motion adaptive intra refresh for MPEG-4   总被引:5,自引:0,他引:5  
The sensitivity to error of a video bitstream often changes as the amount of motion within a scene changes. Taking this into account, a scheme that attempts to optimise error robustness by varying the amount of adaptive intra refresh blocks is presented. Simulation results demonstrate the benefits of the proposed technique  相似文献   

14.
As cloud storage becomes more popular, concerns about data leakage have been increasing. Encryption techniques can be used to protect privacy of videos stored in the cloud. However, the recently proposed sketch attack for encrypted H.264/AVC video, which is based on the macroblock bitstream size (MBS), can generate the outline images of both intra-frames and inter-frames from a video encrypted by most existing encryption schemes; thus, the protection of the original video may be considered a failure. In this paper, a novel selective encryption scheme for H.264/AVC video with improved visual security is presented. Two different scrambling strategies that do not destroy the format compatibility are proposed to change the relative positions between macroblocks in intra-frames and inter-frames respectively, which in turn substantially distort the sketched outline images so that they do not disclose meaningful information. Moreover, the sign bits of non-zero DCT coefficients are encrypted to contribute to the visual security of our scheme, and an adaptive encryption key related to the intra prediction mode and the DCT coefficient distribution of each frame is employed to provide further security. The experimental results show that our encryption scheme can achieve a better visual scrambling effect with a small adverse impact on the video file size. Furthermore, the security analysis demonstrates that our scheme can successfully resist the MBS sketch attack compared with other related schemes. The proposed method is also proven secure against some other known attacks.  相似文献   

15.
The emergent 3D High Efficiency Video Coding (3D-HEVC) is an extension of the High Efficiency Video Coding (HEVC) standard for the compression of the multi-view texture videos plus depth maps format. Since depth maps have different statistical properties compared to texture video, various new intra tools have been added to 3D-HEVC depth coding. In current 3D-HEVC, new intra tools are utilized together with the conventional HEVC intra prediction modes for depth coding. This technique achieves the highest possible coding efficiency, but leads to an extremely high computational complexity which limits 3D-HEVC from practical applications. In this paper, we propose a fast intra mode decision algorithm for depth coding in 3D-HEVC. The basic idea of the proposed algorithm is to utilize the depth map characteristics to predict the current depth prediction mode and skip some specific depth intra modes rarely used in 3D-HEVC depth coding. Based on this analysis, two fast intra mode decision strategies are proposed including reduction of the number of conventional intra prediction modes, and simplification of depth modeling modes (DMMs). Experimental results demonstrate that the proposed algorithm can save 30 % coding runtime on average while maintaining almost the same rate-distortion (RD) performance as the original 3D-HEVC encoder.  相似文献   

16.
New architecture for dynamic frame-skipping transcoder   总被引:9,自引:0,他引:9  
Transcoding is a key technique for reducing the bit rate of a previously compressed video signal. A high transcoding ratio may result in an unacceptable picture quality when the full frame rate of the incoming video bitstream is used. Frame skipping is often used as an efficient scheme to allocate more bits to the representative frames, so that an acceptable quality for each frame can be maintained. However, the skipped frame must be decompressed completely, which might act as a reference frame to nonskipped frames for reconstruction. The newly quantized discrete cosine transform (DCT) coefficients of the prediction errors need to be re-computed for the nonskipped frame with reference to the previous nonskipped frame; this can create undesirable complexity as well as introduce re-encoding errors. In this paper, we propose new algorithms and a novel architecture for frame-rate reduction to improve picture quality and to reduce complexity. The proposed architecture is mainly performed on the DCT domain to achieve a transcoder with low complexity. With the direct addition of DCT coefficients and an error compensation feedback loop, re-encoding errors are reduced significantly. Furthermore, we propose a frame-rate control scheme which can dynamically adjust the number of skipped frames according to the incoming motion vectors and re-encoding errors due to transcoding such that the decoded sequence can have a smooth motion as well as better transcoded pictures. Experimental results show that, as compared to the conventional transcoder, the new architecture for frame-skipping transcoder is more robust, produces fewer requantization errors, and has reduced computational complexity.  相似文献   

17.
Key frame extraction based on visual attention model   总被引:2,自引:0,他引:2  
Key frame extraction is an important technique in video summarization, browsing, searching and understanding. In this paper, we propose a novel approach to extract the most attractive key frames by using a saliency-based visual attention model that bridges the gap between semantic interpretation of the video and low-level features. First, dynamic and static conspicuity maps are constructed based on motion, color and texture features. Then, by introducing suppression factor and motion priority schemes, the conspicuity maps are fused into a saliency map that includes only true attention regions to produce attention curve. Finally, after time-constraint cluster algorithm grouping frames with similar content, the frames with maximum saliency value are selected as key-frames. Experimental results demonstrate the effectiveness of our approach for video summarization by retrieving the meaningful key frames.  相似文献   

18.
Seongsoo Lee 《ETRI Journal》2005,27(5):504-510
This paper proposes a novel low‐power video decoding scheme. In the encoded video bitstream, there is quite a large number of non‐coded blocks. When the number of the non‐coded blocks in a frame is known at the start of frame decoding, the workload of the video decoding can be estimated. Consequently, the supply voltage of very large‐scale integration (VLSI) circuits can be lowered, and the power consumption can be reduced. In the proposed scheme, the encoder counts the number of non‐coded blocks and stores this information in the frame header of the bitstream. Simulation results show that the proposed scheme reduces the power consumption to about 1/10 to 1/20.  相似文献   

19.
当被基于宏块预测的压缩方法压缩的视频码流通过有损信道传输的时候,丢包现象会导致错误沿着时域预测而传递.文中提出了一种有别于传统误差掩盖方法的,利用相临区域与未知区域的相关性来重构未知区域的方法.为了重构出有着较为复杂结构的未知区域,提出了用基于结构的方向插值来恢复结构信息周围的信息.而后,用基于纹理方向的Graph Cuts基准块排放方式来重构大块的纹理结构.为了进一步提高重建结果的质量,引入了重构顺序补偿.就像在实验结果中呈现的那样,提出的重建算法对于有着复杂结构的信息重建有着较好的作用.  相似文献   

20.
三维高效视频编码在产生了高效的编码效率的同时也是以大量的计算复杂性作为代价的。因此为了降低计算的复杂度,本文提出了一种基于深度学习网络的边缘检测的3D-HEVC深度图帧内预测快速算法。算法中首先使用整体嵌套边缘检测网络对深度图进行边缘检测,而后使用最大类间方差法将得到的概率边缘图进行二值化处理,得到显著性的边缘区域。最后针对处于不同区域的不同尺寸的预测单元,设计了不同的优化方法,通过跳过深度建模模式和其他某些不必要的模式来降低深度图帧内预测的模式选择的复杂度,最终达到减少深度图的编码复杂度的目的。经过实验仿真的验证,本文提出的算法与原始的编码器算法相比,平均总编码时间可减少35%左右,且深度图编码时间平均大约可减少42%,而合成视点的平均比特率仅增加了0.11%。即本文算法在可忽略的质量损失下,达到降低编码时间的目的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号