期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Selection of the most efficient tile size in tile-based cylinder panoramic video coding and transmission

Feng Dai Yanfei Shen Yongdong Zhang Shouxun Lin 《The Visual computer》2007,23(9-11):891-896

Panoramic videos are a 360 degree representation of a certain scene. The users can navigate interactively through the scene and change their view angles. Panoramic videos are often high-resolution and consume a significant amount of bandwidth for transmission. To resolve the problem, tile-based panoramic video coding and transmission is applied in some systems. With tile-based panoramic video coding and transmission, only the tiles involved with the perspective view are transmitted and decoded. Different tile sizes will bring different transmission bit rates for same video quality. In this paper, a two path coding method with H.264/AVC for cylinder panoramic video based on a hyperbolic model is proposed. With this method, the most efficient tile size can be selected and users can build the same quality perspective view with the smallest transmission bit rate. 相似文献

2.

3D C-string: a new spatio-temporal knowledge representation for video database systems

Anthony J.T. Lee^{Author Vitae} Han-Pang ChiuAuthor VitaePing YuAuthor Vitae 《Pattern recognition》2002,35(11):2521-2537

In video database systems, one of the most important methods for discriminating the videos is by using the objects and the perception of spatial and temporal relations that exist between objects in the desired videos. In this paper, we propose a new spatio-temporal knowledge representation called 3D C-string. The knowledge structure of 3D C-string, extended from the 2D C⁺-string, uses the projections of objects to represent spatial and temporal relations between the objects in a video. Moreover, it can keep track of the motions and size changes of the objects in a video. The string generation and video reconstruction algorithms for the 3D C-string representation of video objects are also developed. By introducing the concept of the template objects and nearest former objects, the string generated by the string generation algorithm is unique for a given video and the video reconstructed from a given 3D C-string is unique too. This approach can provide us an easy and efficient way to retrieve, visualize and manipulate video objects in video database systems. Finally, some experiments are performed to show the performance of the proposed algorithms. 相似文献

3.

基于运动矢量细化的帧率上变换与HEVC结合的视频压缩算法

蔡于涵熊淑华孙伟恒 KarnPradeep 何小海《计算机科学》2020,47(2):76-82

将帧率变换技术与新型视频压缩编码标准HEVC相结合有利于提升视频的压缩效率。针对直接利用HEVC码流信息中的低帧率视频的运动矢量进行帧率上变换时效果不理想的问题,文中提出了一种基于运动矢量细化的帧率上变换与HEVC结合的视频压缩算法。首先,在编码端对原始视频进行抽帧,降低视频帧率;其次,对低帧率视频进行HEVC编解码;然后,在解码端与从HEVC码流中提取出的运动矢量相结合,利用前向-后向联合运动估计对其进行进一步的细化,使细化后的运动矢量更加接近于对象的真实运动;最后,利用基于运动补偿的帧率上变换技术将视频序列恢复至原始帧率。实验结果表明,与HEVC标准相比,所提算法在同等视频质量下可节省一定的码率。同时,与其他算法相比,在节省码率相同的情况下,所提算法重建视频的PSNR值平均可提升0.5 dB。相似文献

4.

以运动矢量残差为载体的视频隐写算法

下载免费PDF全文

段然陈丹《中国图象图形学报》2018,23(2):163-173

目的以运动矢量（MV）为载体的视频隐写算法会破坏同一帧内相邻宏块或者相邻帧相同位置宏块的运动矢量之间的相关性,从而容易被基于运动矢量时空相关性（temporal-spatial correlation）特征的隐写分析算法检测到。为了解决这个问题,在H.264/AVC的视频编解码标准下构建了一种能抵抗基于运动矢量时空相关性隐写分析的视频隐算法。方法通过分析运动矢量残差（MVD）与运动矢量时空相关性的联系,证明了保持运动矢量残差的统计特征的隐写算法能够很好地保持视频运动矢量的时空相关性;通过分析运动矢量残差的统计特征设置了一种能保持其直方图特征的嵌入规则,使用4个标记符和一个队列来记录修改载体造成的特征改变,并进行相应的补偿操作,将秘密信息嵌入到视频压缩过程中的熵编码之前的运动矢量残差中;结合可变长度的矩阵编码,有效降低了嵌入秘密信息对载体的修改量。结果实验结果表明,该算法能较好地保持运动矢量残差在隐写前后的直方图特征,具有较好的视觉不可见性,对视频峰值信噪比（PSNR）和码率影响都不超过0.5%,满载嵌入的情况下基于运动矢量时空相关性的隐写分析算法对其的检测正确率只有70%左右。结论本文算法以运动矢量残差为隐写嵌入的载体,使用保持其直方图特征的嵌入规则,结合了矩阵编码以减低对载体的修改量,能较好抵抗基于运动矢量时空相关性的隐写分析。相似文献

5.

一种基于用户播放行为序列的个性化视频推荐策略 总被引：4，自引：0，他引：4

王娜何晓明刘志强王文君李霞《计算机学报》2020,43(1):123-135

本文针对在线视频服务网站的个性化推荐问题,提出了一种基于用户播放行为序列的个性化推荐策略.该策略通过深度神经网络词向量模型分析用户播放视频行为数据,将视频映射成等维度的特征向量,提取视频的语义特征.聚类用户播放历史视频的特征向量,建模用户兴趣分布矩阵.结合用户兴趣偏好和用户观看历史序列生成推荐列表.在大规模的视频服务系统中进行了离线实验,相比随机算法、基于物品的协同过滤和基于用户的协同过滤传统推荐策略,本方法在用户观看视频的Top-N推荐精确率方面平均分别获得22.3%、30.7%和934%的相对提升,在召回率指标上分别获得52.8%、41%和1065%的相对提升.进一步地与矩阵分解算法SVD++、基于双向LSTM模型和注意力机制的Bi-LSTM+Attention算法和基于用户行为序列的深度兴趣网络DIN比较,Top-N推荐精确率和召回率也得到了明显提升.该推荐策略不仅获得了较高的精确率和召回率,还尝试解决传统推荐面临大规模工业数据集时的数据要求严苛、数据稀疏和数据噪声等问题. 相似文献

6.

联合深度视频增强的3D-HEVC帧内编码快速算法

下载免费PDF全文

黄超彭宗举苗瑾超陈芬《中国图象图形学报》2018,23(4):500-509

目的针对高效3维视频编码标准（3D-HEVC）深度视频编码复杂度高和获取不准确的两个问题,现有算法单独进行处理,并没有进行联合优化。为了同时提升深度视频编码速度和编码效率,提出一种联合深度视频增强处理和帧内快速编码的方法。方法首先,引入深度视频空域增强处理,消除深度视频中的虚假纹理信息,增强其空域相关性,为编码单元（CU）划分和预测模式选择提供进一步优化的空间;然后,针对增强处理过的深度视频的空域特征,利用纹理复杂度将CU进行分类,提前终止平坦CU的分割过程,减少了CU分割次数;最后,利用边缘强度对预测单元（PU）进行分类,跳过低边缘强度PU的深度模型模式。结果实验结果表明,与原始3D-HEVC的算法相比,本文算法平均节省62.91%深度视频编码时间,并且在相同虚拟视点质量情况下节省4.63%的码率。与当前代表性的帧内低复杂度编码算法相比,本文算法深度视频编码时间进一步减少26.10%,相同虚拟视点质量情况下,编码码率节省5.20%。结论该方法通过深度视频增强处理,保证了虚拟视点质量,提升了编码效率。对深度视频帧内编码过程中复杂度较高的CU划分和预测模式选择分别进行优化,减少了率失真代价计算次数,有效地降低了帧内编码复杂度。相似文献

7.

User-dependent interactive light field video streaming system

Wang Bing Peng Qiang Wang Eric Xiang Wei Wu Xiao 《Multimedia Tools and Applications》2022,81(2):1893-1918

The sheer size and complex structure of light field (LF) videos bring new challenges to their compression and transmission. There have been numerous LF video compression algorithms reported in the literature to date. All of these algorithms compress and transmit all the views of an LF video. However, in some interactive or selective applications where users can choose the area of interest to be displayed, these algorithms generate a significant computational load and enormous data redundancies. In this paper, we propose an interactive LF video streaming system based on a user-dependent view selection scheme and an LF video coding method, which streams only the required data. Specifically, by predicting trajectories and using projection models, the viewing area of users in a limited consecutive number of time slots is firstly calculated, and then a user-dependent view selection method is proposed to determine the selected views of users for streaming. Finally, with the novel LF video sequence formed by only the selected sets of views, an adaptive coding method is presented for different LF video sequences based on users’ gestures. Experimental results illustrate that the proposed interactive LF video streaming system can achieve the best performance compared with other comparison methods.

相似文献

8.

基于区域划分的深度视频快速编码算法

田涛彭宗举《计算机应用》2013,33(6):1706-1710

多视点彩色视频与深度视频作为多媒体系统中主流3D场景表示方式,吸引了越来越多人的关注。深度视频反映场景的几何信息,如何对其进行快速编码尤为重要。提出了一种基于区域划分的深度视频快速编码算法。首先,根据深度视频的边缘和运动属性把深度视频分为四个区域;然后,深入分析不同区域内宏块模式分布比例以及参考帧选择特性,设计了不同的宏块模式选择和参考帧搜索策略来提高编码速度;最后,测试提出算法的编码时间、码率以及合成虚拟视点的质量。实验结果表明,提出算法在保证虚拟视点图像质量和编码码率基本不变的情况下,节约了85.73%～91.06%的编码时间。相似文献

9.

基于HEVC的车辆异常事件检测

常同伟梁久祯吴秦王念兵《数据采集与处理》2018,33(2):370-378

当前传统交通事故检测和查阅主要通过人工监测的方法,这种方法效率低且实时性差,本文提出一种基于最新压缩域视频编码标准HEVC（High-efficiency video coding）的车辆异常事件检测方法。首先对HEVC码流中提取出的运动矢量信息进行运动矢量累积迭代和中值滤波的预处理,之后根据提取出的块划分信息和运动矢量信息计算运动对象的运动强度,然后根据运动强度值和八连通区域法提取出运动对象,最后根据空间距离法和运动强度判别法检测出视频序列中发生的车辆异常事件。实验证明,该方法可以准确地检测出视频序列中发生的车辆异常事件;对于有着快速移动的运动目标以及多个运动目标的视频效果更好。相似文献

10.

多模式3维视频形状编码

下载免费PDF全文

朱仲杰王玉儿蒋刚毅《中国图象图形学报》2018,23(7):953-960

目的具有立体感和高端真实感的3D视频正越来越受到学术界和产业界的关注和重视,未来在3D影视、机器视觉、远程医疗、军事航天等领域将有着广泛的应用前景。对象基3D视频是未来3D视频技术的重要发展趋势,其中高效形状编码是对象基3D视频应用中的关键问题。但现有形状编码方法主要针对图像和视频对象,面向3D视频的形状编码算法还很少。为此,基于对象基3D视频的应用需求,提出一种基于轮廓和链码表示的高效多模式3D视频形状编码方法。方法对于给定的3D视频形状序列逐帧进行对象轮廓提取并预处理后,进行对象轮廓活动性分析,将形状图像分成帧内模式编码图像和帧间预测模式编码图像。对于帧内编码图像,基于轮廓内链码方向约束和线性特征进行高效编码。对于帧间编码图像,采用基于链码表示的轮廓基运动补偿预测、视差补偿预测、联合运动与视差补偿预测等多种模式进行编码,以充分利用视点内对象轮廓的帧间时域相关性和视点间对象轮廓的空域相关性,从而达到高效编码的目的。结果实验仿真结果显示所提算法性能优于经典和现有的最新同类方法,压缩效率平均能提高9.3%到64.8%不等。结论提出的多模式3D视频形状编码方法可以有效去除对象轮廓的帧间和视点间冗余,能够进行高效编码压缩,性能优于现有同类方法,可广泛应用于对象基编码、对象基检索、对象基内容分析与理解等。相似文献

11.

全景视频双环带映射算法

林畅李国平赵海武王国中顾晓《计算机应用》2017,37(9):2631-2635

针对全景视频映射过程中局部区域变形过大、冗余数据量极高的问题,提出了一种双环带映射算法（DRP）。首先,根据球面视频的几何特点,结合人眼视度（HVS）这一视觉特性,用两个相互正交的环形区域,将球面视频分割成14个大小相近的区域;然后根据空域采样定理,采用兰索斯插值法,将这14个区域对应的球面视频内容映射为14个大小相等的矩形视频;最后,根据最新视频编码标准的特点,重新排列这14个矩形视频,得到符合编码器标准的紧凑的全景视频。实验结果表明,与经纬图映射算法（ERP）、八面体映射算法（OHP）、二十面体映射算法（ISP）相比,DRP算法在视频压缩性能方面有良好的表现;其中同最流行的ERP算法相比,码率平均降低8.61%,明显提升了视频编码效率。相似文献

12.

Activity based surveillance video content modelling

Tao Xiang Shaogang Gong 《Pattern recognition》2008,41(7):2309-2326

This paper tackles the problem of surveillance video content modelling. Given a set of surveillance videos, the aims of our work are twofold: firstly a continuous video is segmented according to the activities captured in the video; secondly a model is constructed for the video content, based on which an unseen activity pattern can be recognised and any unusual activities can be detected. To segment a video based on activity, we propose a semantically meaningful video content representation method and two segmentation algorithms, one being offline offering high accuracy in segmentation, and the other being online enabling real-time performance. Our video content representation method is based on automatically detected visual events (i.e. ‘what is happening in the scene’). This is in contrast to most previous approaches which represent video content at the signal level using image features such as colour, motion and texture. Our segmentation algorithms are based on detecting breakpoints on a high-dimensional video content trajectory. This differs from most previous approaches which are based on shot change detection and shot grouping. Having segmented continuous surveillance videos based on activity, the activity patterns contained in the video segments are grouped into activity classes and a composite video content model is constructed which is capable of generalising from a small training set to accommodate variations in unseen activity patterns. A run-time accumulative unusual activity measure is introduced to detect unusual behaviour while usual activity patterns are recognised based on an online likelihood ratio test (LRT) method. This ensures robust and reliable activity recognition and unusual activity detection at the shortest possible time once sufficient visual evidence has become available. Comparative experiments have been carried out using over 10 h of challenging outdoor surveillance video footages to evaluate the proposed segmentation algorithms and modelling approach. 相似文献

13.

Perceptual hash function for scalable video

Navajit Saikia Prabin K. Bora 《International Journal of Information Security》2014,13(1):81-93

Perceptual hash functions are important for video authentication based on digital signature verifying the originality and integrity of videos. They derive hashes from the perceptual contents of the videos and are robust against the common content-preserving operations on the videos. The advancements in the field of scalable video coding call for efficient hash functions that are also robust against the temporal, spatial and bit rate scalability features of the these coding schemes. This paper presents a new algorithm to extract hashes of scalably coded videos using the 3D discrete wavelet transform. A hash of a video is computed at the group-of-frames level from the spatio-temporal low-pass bands of the wavelet-transformed groups-of-frames. For each group-of-frames, the spatio-temporal low-pass band is divided into perceptual blocks and a hash is derived from the cumulative averages of their averages. Experimental results demonstrate the robustness of the hash function against the scalability features and the common content-preserving operations as well as the sensitivity to the various types of content differences. Two critical properties of the hash function, diffusion and confusion, are also examined. 相似文献

14.

Robust Video Face Recognition Under Pose Variation

Ya Su 《Neural Processing Letters》2018,47(1):277-291

With the abundance of video data, the interest in more effective methods for recognizing faces from surveillance videos has grown. However, most algorithms proposed in this field have an assumption that each image set lies in a single linear subspace, or a mixture of linear subspaces. As a result, 3-dimensional shape information, which leads to the nonlinear transformation of face images, is ignored. This paper proposes a robust video face recognition across pose variation in video (RVPose) based on sparse representation. The key idea is performing alignment and recognition based on sparse representation simultaneously. Moreover, by considering that multi-pose faces of the same subject possess the same texture and 3-dimensional shape, RVPose aligns a sequence of faces with pose variations simultaneously, which is reduced to a 3-dimensional shape-constrained video alignment problem. Finally, aligned video sequence is recognized based on sparse represent. Experiments conducted on public video datasets demonstrate the effectiveness of the proposed algorithm. 相似文献

15.

Exploiting stereoscopic disparity for augmenting human activity recognition performance

Ioannis Mademlis Alexandros Iosifidis Anastasios Tefas Nikos Nikolaidis Ioannis Pitas 《Multimedia Tools and Applications》2016,75(19):11641-11660

相似文献

16.

利用时空相关性的HEVC帧内编码块快速划分

下载免费PDF全文

仲伟波陈东姚旭洋冯友兵《中国图象图形学报》2018,23(2):155-162

目的为了提升高效视频编码（HEVC）的编码效率,使之满足高分辨率、高帧率视频实时编码传输的需求。由分析可知帧内编码单元（CU）的划分对HEVC的编码效率有决定性的影响,通过提高HEVC的CU划分效率,可以大大提升HEVC编码的实时性。方法通过对视频数据分析发现,视频数据具有较强的时间、空间相关性,帧内CU的划分结果也同样具有较强的时间和空间相关性,可以利用前一帧以及当前帧CU的划分结果进行预判以提升帧内CU划分的效率。据此,本文给出一种帧内CU快速划分算法,先根据视频相邻帧数据的时间相关性和帧内数据空间相关性初步确定当前编码块的编码树单元（CTU）形状,再利用前一帧同位CTU平均深度、当前帧已编码CTU深度以及对应的率失真代价值决定当前编码块CTU的最终形状。算法每间隔指定帧数设置一刷新帧,该帧采用HM16.7模型标准CU划分以避免快速CU划分算法带来的误差累积影响。结果利用本文算法对不同分辨率、不同帧率的视频进行测试,与HEVC的参考模型HM16.7相比,本文算法在视频编码质量基本不变,视频码率稍有增加的情况下平均可以节省约40%的编码时间,且高分辨率高帧率的视频码率增加幅度普遍小于低分辨率低帧率的视频码率。结论本文算法在HEVC的框架内,利用视频数据的时间和空间相关性,通过优化帧内CU划分方法,对提升HEVC编码,特别是提高高分辨率高帧率视频HEVC编码的实时性具有重要作用。相似文献

17.

视频处理与压缩技术

下载免费PDF全文

贾川民马海川杨文瀚任文琦潘金山刘东刘家瑛马思伟《中国图象图形学报》2021,26(6):1179-1200

视频处理与压缩是多媒体计算与通信领域的核心主题之一,是连接视频采集传输和视觉分析理解的关键桥梁,也是诸多视频应用的基础。当前“5G+超高清+AI”正在引发多媒体计算与通信领域的新一轮重大技术革新,视频处理与压缩技术正在发生深刻变革,亟需建立视频大数据高效紧凑表示理论和方法。为此,学术研究机构和工业界对视频大数据的视觉表示机理、视觉信息紧凑表达、视频信号重建与恢复、高层与低层视觉融合处理方法及相应硬件技术等前沿领域进行了广泛深入研究。本文从数字信号处理基础理论出发,分析了当前视频处理与压缩领域的热点问题和研究内容,包括基于统计先验模型的视频数据表示模型及处理方法、融合深度网络模型的视频处理技术、视频压缩技术以及视频压缩标准进展等领域。详细描述了视频超分辨率、视频重建与恢复、视频压缩技术等领域面临的前沿动态、发展趋势、技术瓶颈和标准化进程等内容,对国际国内研究内容和发展现状进行了综合对比与分析,并展望了视频处理与压缩技术的发展与演进方向。更高质量视觉效果和高效率视觉表达之间将不再是单独研究的个体,融合类脑视觉系统及编码机理的视频处理与压缩技术将是未来研究的重要领域之一。相似文献

18.

Perceptual importance analysis-based rate control method for HEVC

Lin HongWei Li Xiangqun Gao Mingliang Deng Keyan Xu Yongsheng 《Multimedia Tools and Applications》2022,81(9):12495-12518

High efficiency video coding (HEVC) has achieved high coding efficiency as the video coding standard. For rate control in HEVC, the conventional R-λ scheme is based on mean absolute difference in allocating bits; however, the scheme does not fully utilize the perceptual importance variation to guide rate control, thus the subjective and objective quality of coded videos has room to improve. Therefore, in this paper, we propose a rate control scheme that considers perceptual importance. We first develop a perceptual importance analysis scheme to accurately abstract the spatial and temporal perceptual importance maps of video contents. The results of the analysis are then used to guide the bit allocation. Utilizing this model, a region-level bit allocation procedure is developed to maintain video quality balance. Subsequently, a largest coding unit (LCU)-level bit allocation scheme is designed to obtain the target bit of each LCU. To achieve a more accurate bitrate, an improved R-λ model based on the Broyden-Fletcher-Goldfarb-Shanno model is utilized to update the R-λ parameter. The experimental results showed that our method not only improved subjective and objective video quality with lower bitrate errors compared to the original RC in HEVC, but also outperformed state-of-the-art methods.

相似文献

19.

改进的矩形NAM图像表示算法

伍鹏余厚全陈传波卢光跃谢凯《计算机应用》2011,31(4):1016-1018

为了提高图像的表示效率,提出了一种改进的矩形非对称逆布局模式表示模型(NAM)图像编码算法IRNAM。该算法采用双矩形子模式对灰度图像进行表示,结合位平面优化策略,并顺序存储各子模式的数据,使得子模式数目大幅减少。实验结果表明：与矩形NAM算法和其他改进的NAM算法相比, IRNAM算法使表示图像的子模式数明显减少了,从而有效地节省了数据存储空间,是一种高效的图像表示方法。相似文献

20.

Exploring probabilistic localized video representation for human action recognition 总被引：1，自引：1，他引：0

Yan Song Sheng Tang Yan-Tao Zheng Tat-Seng Chua Yongdong Zhang Shouxun Lin 《Multimedia Tools and Applications》2012,58(3):663-685

In recent years, the bag-of-words (BoW) video representations have achieved promising results in human action recognition in videos. By vector quantizing local spatial temporal (ST) features, the BoW video representation brings in simplicity and efficiency, but limitations too. First, the discretization of feature space in BoW inevitably results in ambiguity and information loss in video representation. Second, there exists no universal codebook for BoW representation. The codebook needs to be re-built when video corpus is changed. To tackle these issues, this paper explores a localized, continuous and probabilistic video representation. Specifically, the proposed representation encodes the visual and motion information of an ensemble of local ST features of a video into a distribution estimated by a generative probabilistic model. Furthermore, the probabilistic video representation naturally gives rise to an information-theoretic distance metric of videos. This makes the representation readily applicable to most discriminative classifiers, such as the nearest neighbor schemes and the kernel based classifiers. Experiments on two datasets, KTH and UCF sports, show that the proposed approach could deliver promising results. 相似文献