首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Text in videos contains rich semantic information, which is useful for content based video understanding and retrieval. Although a great number of state-of-the-art methods are proposed to detect text in images and videos, few works focus on spatiotemporal text localization in videos. In this paper, we present a spatiotemporal text localization method with an improved detection efficiency and performance. Concretely, a unified framework is proposed which consists of the sampling-and-recovery model (SaRM) and the divide-and-conquer model (DaCM). SaRM aims at exploiting the temporal redundancy of text to increase the detection efficiency for videos. DaCM is designed to efficiently localize the text in spatiotemporal domain simultaneously. Besides, we construct a challenging video overlaid text dataset named UCAS-STLData, which contains 57070 frames with spatiotemporal ground truths. In the experiments, we comprehensively evaluate the proposed method on the publicly available overlaid text datasets and UCAS-STLData. A slight performance improvement is achieved compared with the state-of-the-art methods for spatiotemporal text localization, with a significant efficiency improvement.  相似文献   

2.
基于多帧图像的视频文字跟踪和分割算法   总被引:8,自引:2,他引:6  
视频中文字的提取是视频语义理解和检索的重要信息来源.针对视频中的静止文字时间和空间上的冗余特性,以文字区域的边缘位图为特征对检测结果作精化,并提出了基于二分搜索法的快速文字跟踪算法,实现了对文字对象快速有效的定位.在分割阶段,除了采用传统的灰度融合图像进行文字区域增强方法,还结合边缘位图对文字区域进行进一步的背景过滤.实验表明,文字的检测精度和分割质量都有很大提高.  相似文献   

3.
Color video compression has become a necessity in the advent of newer applications of color video and its increasing size for higher definitions. New approaches are used to achieve more compression over the traditional and standard compression techniques. Considerable progress has been made of color transfer-based compression techniques that do not modify standard encoders but add some pre-processing and post-processing steps. We present a novel, simple yet robust, color transfer-based compression technique that can be integrated with the standard encoders like MPEG-2, H.264/AVC, etc. We achieve the compression by discarding the color components from all frames in a video sequence except those of the Intra (I) frames. The colored intra frames are converted to gray-textured images. The method is developed based on embedding the colors of the intra frames to low visibility high frequency textures of its gray luminance image. This is achieved by decomposing the luminance of the intra frame using the non-decimated DWT or discrete wavelet frame transform and replacing the bandpass subbands by the chrominance signals. The lowpass band is same as that of the luminance signal. At the output of the decoder, the colored intra frames are recovered using the wavelet transform. Instead of calculating motion vectors, we identify and reuse the motion vectors present in the decoder. The remaining luminance frames of the video are colored with the help of identified motion vectors and the color information of the preceding frames. The proposed codec considerably improves the compression ratio achievable by the standard codec, at the cost of slight increase in computation time at the decoder end.  相似文献   

4.
针对同源视频序列的copy-move篡改方式, 提出一种通过度量图像内容间的相关性, 来实现对视频序列的copy-move篡改检测并恢复的方法. 首先将视频帧内容转化为一系列连续的图像帧, 对图像分块, 提取每帧图像的8个特征矢量, 再利用欧氏距离计算帧间相关性, 并通过添加偏差矩阵构造动态偏差阈值, 检测出copy-move篡改序列且精确至帧, 从而实现对视频序列的篡改检测与恢复. 实验表明, 该算法对同源视频序列的copy-move篡改检测及恢复能够取得理想的效果.  相似文献   

5.
一种自适应的视频帧中字幕检测定位方法   总被引:3,自引:0,他引:3  
王勇  燕继坤  郑辉 《计算机应用》2004,24(1):134-135,139
视频帧中的字幕往往包含当前视频的高层语意内容,对视频内容的自动理解、索引和检索有重要意义。文中提出了一种视频帧中字幕的自适应检测定位方法,与以往根据经验设定阈值的方法相比,该方法简单,对视频帧的复杂变化的适应能力更强,检测定位更快速、准确。大量实验结果显示该方法是有效的。  相似文献   

6.
In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max-Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels.  相似文献   

7.
Accurately tracking the video object in video sequence is a crucial stage for video object processing which has wide applications in different fields. In this paper, a novel video object tracking algorithm based on the improved gradient vector flow (GVF) snake model and intra-frame centroids tracking algorithm is proposed. Unlike traditional gradient vector flow snake, the improved gradient vector flow snake adopts anisotropic diffusion and a four directions edge operator to solve the blurry boundary and edge shifting problem. Then the improved gradient vector flow snake is employed to extract the object contour in each frame of the video sequence. To set the initial contour of the gradient vector flow snake automatically, we design an intra-frame centroids tracking algorithm. Splitting the original video sequence into segments, for each segment, the initial contours of first two frames are set by change detection based on t-distribution significance test. Then, utilizing the redundancy between the consecutive frames, the subsequent frames’ initial contours are obtained by intra-frame motion vectors. Experimental results with several test video sequences indicate the validity and accuracy of the video object tracking.  相似文献   

8.
Li  Chao  Chen  Zhihua  Sheng  Bin  Li  Ping  He  Gaoqi 《Multimedia Tools and Applications》2020,79(7-8):4661-4679

In this paper, we introduce an approach to remove the flickers in the videos, and the flickers are caused by applying image-based processing methods to original videos frame by frame. First, we propose a multi-frame based video flicker removal method. We utilize multiple temporally corresponding frames to reconstruct the flickering frame. Compared with traditional methods, which reconstruct the flickering frame just from an adjacent frame, reconstruction with multiple temporally corresponding frames reduces the warp inaccuracy. Then, we optimize our video flickering method from following aspects. On the one hand, we detect the flickering frames in the video sequence with temporal consistency metrics, and just reconstructing the flickering frames can accelerate the algorithm greatly. On the other hand, we just choose the previous temporally corresponding frames to reconstruct the output frames. We also accelerate our video flicker removal with GPU. Qualitative experimental results demonstrate the efficiency of our proposed video flicker method. With algorithmic optimization and GPU acceleration, the time complexity of our method also outperforms traditional video temporal coherence methods.

  相似文献   

9.
为了在高误码环境下提高视频通信质量,提出一种改进的基于H.264的误码隐藏技术。对于帧内编码图像,根据图像直方图信息计算图像的光滑特性,然后自适应地选择线性插值的方法。对于帧间编码图像,根据不同大小块编码类型和多参考帧的特性,能够更准确地估计出丢失的MV,进行时域误码隐藏。实验表明,此方法同H.264参考软件JM11.0方法相比,具有更好的主客观图像恢复质量。  相似文献   

10.
随着网络和多媒体技术的不断发展,基于内容的多媒体信息检索技术变得越来越重要.同成熟的文本检索技术相比,视频检索还处在研究和探索阶段.视频检索的一个有效方法是将无结构的视频节目进行镜头分割,根据每个镜头的关键帧对视频建立索引.因此,镜头分割是基于内容的视频检索的基本步骤,在各种类型的镜头检测算法中,叠化镜头是很难检测的.根据叠化(dissolve)镜头内部预测帧预测误差能量和运动矢量分布特点,提出一种在压缩域中分割叠化镜头的新算法.与公开发表的同类算法相比,它具有以下优点:工作在压缩域上、速度快、鲁棒性好、精度更高.  相似文献   

11.
Image and video matting are still challenging problems in areas with low foreground‐background contrast. Video matting also has the challenge of ensuring temporally coherent mattes because the human visual system is highly sensitive to temporal jitter and flickering. On the other hand, video provides the opportunity to use information from other frames to improve the matte accuracy on a given frame. In this paper, we present a new video matting approach that improves the temporal coherence while maintaining high spatial accuracy in the computed mattes. We build sample sets of temporal and local samples that cover all the color distributions of the object and background over all previous frames. This helps guarantee spatial accuracy and temporal coherence by ensuring that proper samples are found even when distantly located in space or time. An explicit energy term encourages temporal consistency in the mattes derived from the selected samples. In addition, we use localized texture features to improve spatial accuracy in low contrast regions where color distributions overlap. The proposed method results in better spatial accuracy and temporal coherence than existing video matting methods.  相似文献   

12.
Huang and Hsu (1981) describe an image sequence enhancement algorithm based on computing motion vectors between successive frames and using these vectors to determine the correspondence between pixels for frame averaging. In this note, we demonstrate that it may be sufficient to use only the components of the motion vectors in the gradient direction (called the normal components) to perform the enhancement.  相似文献   

13.
针对目前移动视点下视频阴影检测算法存在的误检测率高和边缘连续性差的问题,提出了一种基于边跟踪、边检测框架的实时阴影检测算法.首先对前后2帧重叠的阴影部分进行2次光流跟踪,并筛选掉前后向跟踪误差较大的点,通过Canny边缘置信保证跟踪边缘的准确性;然后通过基于光流的区域划分法得到待检测的新增区域;其次,针对纹理边缘误检测...  相似文献   

14.
一种视频文本自动定位、跟踪和识别的方法   总被引:3,自引:0,他引:3       下载免费PDF全文
视频数据中的文本能提供重要的语义信息。本文提出了一种视频文本自动定位、跟踪和识别的方法,首先用基于小波和LH检测视频帧文本所在的位置,然后用运动估计的方法,跟踪后继帧文本的位置,再用多帧平均的方法增强文本区域,最后经过二值化处理和连通分量分析,将文本字符送入OCR软件进行识别。实验结果表明,该方法简单易行,能快速地定位和跟踪文本区域,定位精度和识别效果良好。  相似文献   

15.
噪声条件下基于粒子群优化的数字稳像方法*   总被引:1,自引:0,他引:1  
当视频序列中同时存在随机噪声和随机晃动时,传统的数字稳像算法由于受到噪声干扰而无法有效消除视频序列中的随机晃动。为了稳定这种存在随机噪声的视频序列,提出了一种基于粒子群优化的数字稳像方法。首先,定义了衡量寻优结果适应度函数,即输入视频连续若干帧均值图像的能量;然后,算法利用粒子群优化策略来搜索视频序列的最优运动补偿向量;最后,实验分别使用模拟抖动视频和真实拍摄的视频来测试算法的性能。实验结果表明,当测试视频同时存在随机噪声和随机晃动时,该算法不仅能够有效消除视频的随机晃动,并且有效抑制了随机噪声。  相似文献   

16.
Indexing animated objects using spatiotemporal access methods   总被引:5,自引:0,他引:5  
We present an approach for indexing animated objects and efficiently answering queries about their position in time and space. In particular, we consider an animated movie as a spatiotemporal evolution. A movie is viewed as an ordered sequence of frames, where each frame is a 2D space occupied by the objects that appear in that frame. The queries of interest are range queries of the form, "find the objects that appear in area S between frames fi and fj" as well as nearest neighbor queries such as, "find the q nearest objects to a given position A between frames fi and fj". The straightforward approach to index such objects considers the frame sequence as another dimension and uses a 3D access method (such as an R-Tree or its variants). This, however, assigns long "lifetime" intervals to objects that appear through many consecutive frames. Long intervals are difficult to cluster efficiently in a 3D index. Instead, we propose to reduce the problem to a partial-persistence problem. Namely, we use a 2D access method that is made partially persistent. We show that this approach leads to faster query performance while still using storage proportional to the total number of changes in the frame evolution, What differentiates this problem from traditional temporal indexing approaches is that objects are allowed to move and/or change their extent continuously between frames. We present novel methods to approximate such object evolutions, We formulate an optimization problem for which we provide an optimal solution for the case where objects move linearly. Finally, we present an extensive experimental study of the proposed methods. While we concentrate on animated movies, our approach is general and can be applied to other spatiotemporal applications as well  相似文献   

17.
The detection of moving objects under a free-moving camera is a difficult problem because the camera and object motions are mixed together and the objects are often detected into the separated components. To tackle this problem, we propose a fast moving object detection method using optical flow clustering and Delaunay triangulation as follows. First, we extract the corner feature points using Harris corner detector and compute optical flow vectors at the extracted corner feature points. Second, we cluster the optical flow vectors using K-means clustering method and reject the outlier feature points using Random Sample Consensus algorithm. Third, we classify each cluster into the camera and object motion using its scatteredness of optical flow vectors. Fourth, we compensate the camera motion using the multi-resolution block-based motion propagation method and detect the objects using the background subtraction between the previous frame and the motion compensated current frame. Finally, we merge the separately detected objects using Delaunay triangulation. The experimental results using Carnegie Mellon University database show that the proposed moving object detection method outperforms the existing other methods in terms of detection accuracy and processing time.  相似文献   

18.
目的 视频目标检测旨在序列图像中定位运动目标,并为各个目标分配指定的类别标签。视频目标检测存在目标模糊和多目标遮挡等问题,现有的大部分视频目标检测方法是在静态图像目标检测的基础上,通过考虑时空一致性来提高运动目标检测的准确率,但由于运动目标存在遮挡、模糊等现象,目前视频目标检测的鲁棒性不高。为此,本文提出了一种单阶段多框检测(single shot multibox detector,SSD)与时空特征融合的视频目标检测模型。方法 在单阶段目标检测的SSD模型框架下,利用光流网络估计当前帧与近邻帧之间的光流场,结合多个近邻帧的特征对当前帧的特征进行运动补偿,并利用特征金字塔网络提取多尺度特征用于检测不同尺寸的目标,最后通过高低层特征融合增强低层特征的语义信息。结果 实验结果表明,本文模型在ImageNet VID (Imagelvet for video object detetion)数据集上的mAP (mean average precision)为72.0%,相对于TCN (temporal convolutional networks)模型、TPN+LSTM (tubelet proposal network and long short term memory network)模型和SSD+孪生网络模型,分别提高了24.5%、3.6%和2.5%,在不同结构网络模型上的分离实验进一步验证了本文模型的有效性。结论 本文模型利用视频特有的时间相关性和空间相关性,通过时空特征融合提高了视频目标检测的准确率,较好地解决了视频目标检测中目标漏检和误检的问题。  相似文献   

19.
目的深度伪造是新兴的一种使用深度学习手段对图像和视频进行篡改的技术,其中针对人脸视频进行的篡改对社会和个人有着巨大的威胁。目前,利用时序或多帧信息的检测方法仍处于初级研究阶段,同时现有工作往往忽视了从视频中提取帧的方式对检测的意义和效率的问题。针对人脸交换篡改视频提出了一个在多个关键帧中进行帧上特征提取与帧间交互的高效检测框架。方法从视频流直接提取一定数量的关键帧,避免了帧间解码的过程;使用卷积神经网络将样本中单帧人脸图像映射到统一的特征空间;利用多层基于自注意力机制的编码单元与线性和非线性的变换,使得每帧特征能够聚合其他帧的信息进行学习与更新,并提取篡改帧图像在特征空间中的异常信息;使用额外的指示器聚合全局信息,作出最终的检测判决。结果所提框架在FaceForensics++的3个人脸交换数据集上的检测准确率均达到96.79%以上;在Celeb-DF数据集的识别准确率达到了99.61%。在检测耗时上的对比实验也证实了使用关键帧作为样本对检测效率的提升以及本文所提检测框架的高效性。结论本文所提出的针对人脸交换篡改视频的检测框架通过提取关键帧减少视频级检测中的计算成本和时间消耗,使用卷积...  相似文献   

20.
闫善武  肖洪兵  王瑜  孙梅 《图学学报》2023,44(1):95-103
针对目前视频异常检测不能充分利用时序信息且忽视正常行为多样性的问题,提出了一种融合行 人时空信息的异常检测方法。以卷积自编码器为基础,通过其中的编码器和解码器对输入帧进行压缩和还原,并 根据输出帧与真实值的差异实现异常检测。为了加强视频连续帧之间的特征信息联系,引入残差时间移位模块和 残差通道注意力模块,分别提升网络对时间信息和通道信息的建模能力。考虑到卷积神经网络(CNN)过度的泛化 性,在编解码器各层的跳跃连接之间加入记忆增强模块,限制自编码器对异常帧过于强大的表示能力,提高网络 的异常检测精度。此外,通过一种特征离散性损失来修正目标函数,有效区分不同的正常行为模式。在 CUHK Avenue 和 ShanghaiTech 数据集上的实验结果表明,该方法在满足实时性要求的同时,优于当前主流的视频异常 检测方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号