首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper addresses issues in visual tracking where videos contain object intersections, pose changes, occlusions, illumination changes, motion blur, and similar color distributed background. We apply the structural local sparse representation method to analyze the background region around the target. After that, we reduce the probability of prominent features in the background and add new information to the target model. In addition, a weighted search method is proposed to search the best candidate target region. To a certain extent, the weighted search method solves the local optimization problem. The proposed scheme, designed to track single human through complex scenarios from videos, has been tested on some video sequences. Several existing tracking methods are applied to the same videos and the corresponding results are compared. Experimental results show that the proposed tracking scheme demonstrates a very promising performance in terms of robustness to occlusions, appearance changes, and similar color distributed background.  相似文献   

2.
In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.  相似文献   

3.
提出了一种基于边缘的视频文字检测算法.利用Canny算子对图像进行边缘检测,然后根据文字边缘线条的特征,过滤非字符的边缘线条.最后利用文字线条区域的相似性,设置综合阈值,得到最终的文字区域.实验结果表明该算法不仅对规则排列的文字有较高的查全率.对不规则排列及扭曲的文字也能够准确定位.并对光照、阴影等条件不敏感.  相似文献   

4.
Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8×8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods.  相似文献   

5.
As human vision system is highly sensitive to motion present in a scene, motion saliency forms an important feature in a video sequence. Motion information is used for video compression, object segmentation, object tracking and in many other applications. Though its applications are extensive, accurate detection of motion in a given video is complex and computationally expensive for the solutions reported in the literature. Decomposing a video into visually similar and residual videos is a robust way to detect motion salient regions. The existing decomposition techniques require large execution time as the standard form of the problem is NP-hard. We propose a novel algorithm which detects the motion salient regions by decomposing the input video into background and residual videos in much lesser time without sacrificing the accuracy of the decomposition. In addition, the proposed algorithm is completely parallelizable that ensures further reduction in computational time with the use of advanced multicore processors.  相似文献   

6.
In this paper, a novel multidimensional underwater video dehazing method is presented to restore and enhance the underwater degraded videos. Videos in the underwater suffer from medium scattering and light absorption. The absorption of light traveling in the water makes the underwater hazing videos different from the atmosphere hazing videos. In order to dehaze the underwater videos, a spatial–temporal information fusion method is proposed which includes two main parts. One is transmission estimation, which is based on the correlation between the adjacent frames of videos to keep the color consistency, where fast tracking and the least square method are used to reduce the influence of camera and object motions and water flowing. Another part is background light estimation to keep consistent atmospheric light values in a video. Extensive experimental results demonstrate that the proposed algorithm can have superior haze removing and color balancing capabilities.  相似文献   

7.
A novel method for visual object tracking in stereo videos is proposed, which fuses an appearance based representation of the object based on Local Steering Kernel features and 2D color–disparity histogram information. The algorithm employs Kalman filtering for object position prediction and a sampling technique for selecting the candidate object regions of interest in the left and right channels. Disparity information is exploited, for matching corresponding regions in the left and right video frames. As tracking evolves, any significant changes in object appearance due to scale, rotation, or deformation are identified and embodied in the object model. The object appearance changes are identified simultaneously in the left and right channel video frames, ensuring correct 3D representation of the resulting bounding box in a 3D display monitor. The proposed framework performs stereo object tracking and it is suitable for application in 3D movies, 3D TV content and 3D video content captured by consuming stereo cameras. Experimental results proved the effectiveness of the proposed method in tracking objects under geometrical transformations, zooming and partial occlusion, as well as in tracking slowly deforming articulated 3D objects in stereo video.  相似文献   

8.
提出了一种基于色彩距离最小化和最大 色彩差(MCD)的场景文本定位方法。首先,使用多次K均值 聚类和色彩距离最小化的方法,从不同复杂程度的场景图像中提取文本 连通区域;考虑到色彩聚类方法容易受光照影响,使用基于MCD最大色彩差的方法,提取 文本连通区域作为补充,由于将 色彩与梯度信息相结合,在一定程度上能克服光照的影响;将得到的连通区域通过设 定的字符合并规则,构建文本行; 候选文本行中通常包含错误检测的非文本行,为了提高文本检测的正确率,最后采用基于特 征提取和机器学习的方法,验证 候选文本行,得到文本定位结果。将本文方法在ICDAR2011和ICDAR2013公共数 据库上实验,对于ICDAR2011数据集,本文 获得的召回率、准确率和F指标分别为0.66、0.77;对于ICDAR2013数据集,本文获得的召回率、准确率和F 指标分别为0.65、0.77。将本文方法与 其它文本检测算法比较,结果表明本文方法的可行性、有效性。  相似文献   

9.
褚晶辉  董越  吕卫 《电视技术》2014,38(3):188-191
视频中包含的文字信息与视频的语义内容有很强的相关性,将视频中的文字信息提取出来进行分析处理可以有效地理解电视视频语义,从而实现对视频内容的安全监控。针对文字检测提出一种基于小波变换、角点特征图像和统计特征的有效方法,并运用基于彩色空间的文字提取方法获取二值图像,更有利于后面OCR的文字识别。  相似文献   

10.
The quality of the synthesized views by Depth Image Based Rendering (DIBR) highly depends on the accuracy of the depth map, especially the alignment of object boundaries of texture image. In practice, the misalignment of sharp depth map edges is the major cause of the annoying artifacts at the disoccluded regions of the synthesized views. Conventional smooth filter approach blurs the depth map to reduce the disoccluded regions. The drawbacks are the degradation of 3D perception of the reconstructed 3D videos and the destruction of the texture in background regions. Conventional edge preserving filter utilizes the color image information in order to align the depth edges with color edges. Unfortunately, the characteristics of color edges and depth edges are very different which causes annoying boundaries artifacts in the synthesized virtual views. Recent solution of reliability-based approach uses reliable warping information from other views to fill the holes. However, it is not suitable for the view synthesis in video-plus-depth based DIBR applications. In this paper, a new depth map preprocessing approach is proposed. It utilizes Watershed color segmentation method to correct the depth map misalignment and then the depth map object boundaries are extended to cover the transitional edge regions of color image. This approach can handle the sharp depth map edges lying inside or outside the object boundaries in 2D sense. The quality of the disoccluded regions of the synthesized views can be significantly improved and unknown depth values can also be estimated. Experimental results show that the proposed method achieves superior performance for view synthesis by DIBR especially for generating large baseline virtual views.  相似文献   

11.
Extraction of foreground contents in complex background document images is very difficult as background texture, color and foreground font, size, color, tilt are not known in advance. In this work, we propose a RGB color model for the input of complex color document images. An algorithm to detect the text regions using Gabor filters followed by extraction of text using color feature luminance is developed too. The proposed approach consists of three stages. Based on the Gabor features, the candidate image segments containing text are detected in stage-1. Because of complex background, certain amount of high frequency non-text objects in the background are also detected as text objects in stage-1. In stage-2, certain amount of false text objects is dropped by performing the connected component analysis. In stage-3, the image segments containing textual information, which are obtained from the previous stage are binarized to extract the foreground text. The color feature luminance is extracted from the input color document image. The threshold value is derived automatically using this color feature. The proposed approach handles both printed and handwritten color document images with foreground text in any color, font, size and orientation. For experimental evaluations, we have considered a variety of document images having non-uniform/uniform textured and multicolored background. Performance of segmentation of foreground text is evaluated on a commercially available OCR. Evaluation results show better recognition accuracy of foreground characters in the processed document images against unprocessed document images.  相似文献   

12.
随着视频获取设备和技术的不断发展,视频数量增长快速,在海量视频中精准查找目标视频片段是具有挑战的任务。跨模态视频片段检索旨在根据输入一段查询文本,模型能够从视频库中找出符合描述的视频片段。现有的研究工作多是关注文本与候选视频片段的匹配,忽略了视频上下文的“语境”信息,在视频理解时,存在对特征关系表达不足的问题。针对此,该文提出一种基于显著特征增强的跨模态视频片段检索方法,通过构建时间相邻网络学习视频的上下文信息,然后使用轻量化残差通道注意力突出视频片段的显著特征,提升神经网络对视频语义的理解能力。在公开的数据集TACoS和ActivityNet Captions的实验结果表明,该文所提方法能更好地完成视频片段检索任务,比主流的基于匹配的方法和基于视频-文本特征关系的方法取得了更好的表现。  相似文献   

13.
This paper presents a video context enhancement method for night surveillance. The basic idea is to extract and fuse the meaningful information of video sequence captured from a fixed camera under different illuminations. A unique characteristic of the algorithm is to separate the image context into two classes and estimate them in different ways. One class contains basic surrounding scene information and scene model, which is obtained via background modeling and object tracking in daytime video sequence. The other class is extracted from nighttime video, including frequently moving region, high illumination region and high gradient region. The scene model and pixel-wise difference method are used to segment the three regions. A shift-invariant discrete wavelet based image fusion technique is used to integral all those context information in the final result. Experiment results demonstrate that the proposed approach can provide much more details and meaningful information for nighttime video.  相似文献   

14.
基于Kinect的实时深度提取与多视绘制算法   总被引:4,自引:3,他引:1  
王奎  安平  张艳  程浩  张兆扬 《光电子.激光》2012,(10):1949-1956
提出了一种基于Kinect的实时深度提取算法和单纹理+深度的多视绘制方法。在采集端,使用Kinect提取场景纹理和深度,并针对Kinect输出深度图的空洞提出一种快速修复算法。在显示端,针对单纹理+深度的基于深度图像的绘制(DIBR,depth image based rendering)绘制产生的大空洞,采用一种基于背景估计和前景分割的绘制方法。实验结果表明,本文方法可实时提取质量良好的深度图,并有效修复了DIBR绘制过程中产生的大空洞,得到质量较好的多路虚拟视点图像。以所提出的深度获取和绘制算法为核心,实现了一种基于深度的立体视频系统,最终的虚拟视点交织立体显示的立体效果良好,进一步验证了本文算法的有效性。本文系统可用于实景的多视点立体视频录制与播放。  相似文献   

15.
Player Information Extraction for Semantic Annotation in Golf Videos   总被引:1,自引:0,他引:1  
In sports videos, text provides semantic information about the game such as scores and players. This paper provides an accurate extraction method of the player information in golf. First, a new detection method of the key captions containing the player information is presented. Since the location of the key captions containing the player information is not fixed during a game in golf, we use a color pattern of captions and its temporal repetition property instead of the location property to decide the key captions. Second, a dual binarization method is presented to segment texts with different color polarities (i.e. dark and bright texts) easily from the background in the key captions. Finally, the binarization results are recognized by OCR and converted to plain texts. The player is recognized by comparing the plain texts with the pre-reserved player name database. Experiments on a large database show that our method can extract the player information efficiently in golf videos.   相似文献   

16.
裴立志  王润生 《信号处理》2010,26(11):1621-1626
为了在复杂背景、部分遮挡和光照变化等因素干扰的情况下鲁棒地跟踪视频序列中感兴趣的运动目标,提出了一种改进的粒子滤波跟踪算法。该算法针对颜色信息在目标表述中存在的不足,首先对观测模型进行改进,提出了一种基于ICA特征分布的目标模型,将基于核函数的目标特征描述转换到ICA特征空间,由于光照变化引起灰度变化经ICA后仍是同一分量,因此能有效的适应光照变化,不仅考虑并充分利用了空间信息。有效的解决了光照变化及背景颜色相近造成的目标丢失现象,提高了目标跟踪算法的鲁棒性。同时,考虑到粒子的退化现象,将均值平移算法嵌入到粒子滤波的跟踪框架中,待各粒子经过系统传播后,利用均值平移算法使粒子向其领域局部极大值处移动,使得粒子集中在测量模型的局部区域内,只需少量的粒子就覆盖了尽可能的目标分布,很好地克服了粒子滤波器的退化现象并有效缩短了计算时间,提高了目标跟踪算法的准确性和系统的实时性。实验表明,该算法不仅能在复杂背景下准确的跟踪目标,而且在光线变化和部分遮挡情况下也能保证不丢失目标。   相似文献   

17.
18.
A new method for text detection and recognition in natural scene images is presented in this paper. In the detection process, color, texture, and OCR statistic features are combined in a coarse-to-fine framework to discriminate texts from non-text patterns. In this approach, color feature is used to group text pixels into candidate text lines. Texture feature is used to capture the “dense intensity variance” property of text pattern. Statistic features from OCR (Optical Character Reader) results are employed to further reduce detection false alarms empirically. After the detection process, a restoration process is used. This process is based on plane-to-plane homography. It is carried out to refine the background plane of text when an affine transformation is detected on a located text and independent of camera parameters. Experimental results tested from a large dataset have demonstrated that the proposed method is effective and practical.  相似文献   

19.
针对现有动态背景下目标分割算法存在的局限性,提出了一种融合运动线索和颜色信息的视频序列目标分割算法。首先,设计了一种新的运动轨迹分类方法,利用背景运动的低秩特性,结合累积确认的策略,可以获得准确的运动轨迹分类结果;然后,通过过分割算法获取视频序列的超像素集合,并计算超像素之间颜色信息的相似度;最后,以超像素为节点建立马尔可夫随机场模型,将运动轨迹分类信息以及超像素之间颜色信息统一建模在马尔可夫随机场的能量函数中,并通过能量函数最小化获得每个超像素的最优分类。在多组公开发布的视频序列中进行测试与对比,结果表明,本文方法可以准确分割出动态背景下的运动目标,并且较传统方法具有更高的分割准确率。  相似文献   

20.
In this study, a spatiotemporal saliency detection and salient region determination approach for H.264 videos is proposed. After Gaussian filtering in Lab color space, the phase spectrum of Fourier transform is used to generate the spatial saliency map of each video frame. On the other hand, the motion vector fields from each H.264 compressed video bitstream are backward accumulated. After normalization and global motion compensation, the phase spectrum of Fourier transform for the moving parts is used to generate the temporal saliency map of each video frame. Then, the spatial and temporal saliency maps of each video frame are combined to obtain its spatiotemporal saliency map using adaptive fusion. Finally, a modified salient region determination scheme is used to determine salient regions (SRs) of each video frame. Based on the experimental results obtained in this study, the performance of the proposed approach is better than those of two comparison approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号