首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
基于多帧图像的视频文字跟踪和分割算法   总被引:6,自引:2,他引:6  
视频中文字的提取是视频语义理解和检索的重要信息来源.针对视频中的静止文字时间和空间上的冗余特性,以文字区域的边缘位图为特征对检测结果作精化,并提出了基于二分搜索法的快速文字跟踪算法,实现了对文字对象快速有效的定位.在分割阶段,除了采用传统的灰度融合图像进行文字区域增强方法,还结合边缘位图对文字区域进行进一步的背景过滤.实验表明,文字的检测精度和分割质量都有很大提高.  相似文献   

2.
基于颜色聚类和多帧融合的视频文字识别方法   总被引:1,自引:0,他引:1  
易剑  彭宇新  肖建国 《软件学报》2011,22(12):2919-2933
提出一种基于颜色聚类和多帧融合的视频文字识别方法,首先,在视频文字检测模块,综合考虑了文字区域的两个显著特征:一致的颜色和密集的边缘,利用近邻传播聚类算法,根据图像中边缘颜色的复杂程度,自适应地把彩色边缘分解到若干边缘子图中去,使得在各个子图中检测文字区域更为准确.其次,在视频文字增强模块,基于文字笔画强度图过滤掉模糊的文字区域,并综合平均融合和最小值融合的优点,对在不同视频帧中检测到的、包含相同内容的文字区域进行融合,能够得到背景更为平滑、笔画更为清晰的文字区域图像.最后,在视频文字提取模块,通过自适应地选取具有较高文字对比度的颜色分量进行二值化,能够取得比现有方法更好的二值化结果;另一方面,基于图像中背景与文字的颜色差异,利用颜色聚类的方法去除噪声,能够有效地提高文字识别率.实验结果表明,该方法能够比现有方法取得更好的文字识别结果.  相似文献   

3.
边缘算子在视频对象提取中的应用   总被引:4,自引:1,他引:4  
提出了一种基于边缘检测算子的视频对象提取算法,用于在视频会议、新闻等视频序列的关键帧编码中提取视频对象的区域,为随后的运动估计和检测提供参考对象.首先分析了两种常用的边缘检测算子:Marr算子和Canny算子,并分别对视频关键帧进行边缘检测;然后从边缘图像中扫描出对象区域,生成掩码图像;最后与原始图像进行与操作得到对象.实验表明:采用Canny算子能够提高关键帧中视频对象提取的正确性,为整个视频序列中的对象提取提供了较优的参考帧。  相似文献   

4.
针对视频中文本信息在视频序列和视频索引中的重要性,本文提出了一种基于文字混合特征的文本定位算法.该算法首先对视频序列中每隔25帧的单帧图像进行边缘检测和投影处理来提取文本块,然后用支持向量基进行筛选,排除非文本块的干扰,最后利用视频序列中相邻帧之间的相关性来搜索剩余帧中的文本块.本文的算法在提高检测速度的同时保证了较高的检测准确度.  相似文献   

5.
基于边缘异常与压缩跟踪的视频抠像篡改检测   总被引:1,自引:0,他引:1  
近年来视频的真实性认证成为信息安全领域的一个研究热点。视频的篡改方式多种多样,为了有效判断视频中某个目标的出现是否真实,提出一种视频抠像的篡改检测方法。由于成像设备CFA插值的存在,使得像素存在一定的邻域相关性。为此,提取视频的帧序列,对图像帧进行Sobel边缘检测,计算边缘点4个方向上像素值的偏差来检测边缘点,确定异常边缘点,定位篡改区域,并对篡改区域进行学习,结合压缩跟踪算法快速查找时域上帧的篡改范围,当目标消失跟踪结束之后,在目标出现的最后一帧再次进行异常边缘检测,与最初检测到的异常区域进行对比,查看检测结果是否一致。实验结果表明,该算法大幅提高了检测效率,并且准确率较高。  相似文献   

6.
针对视频序列中的运动目标检测问题,提出了一种新的基于边缘差分的运动目标检测方法.通过改进的边缘检测算法提取视频序列中相邻帧的边缘图像并作差分,采用改进的Otsu法对差分图像进行阈值分割,得到运动目标检测的结果.结合Prewitt算子和Sobel算子改进的边缘检测算法能够获取纹理丰富、细节显著的边缘图像,边缘差分结果更加理想;改进的Otsu法联合类内方差能够很好地抑制噪声,保留更多的纹理细节.实验结果表明,提出的方法能够提取更加完整的目标区域,对背景噪声更加鲁棒.与最近一些同类算法相比,在背景运动和光照变化条件下,该方法具有更加优越的运动目标检测性能.  相似文献   

7.
针对视频帧图像背景复杂、字体大小变化较大等特点,提出一种改进的视频帧中文本区域定位算法。算法设计并实现一个交叉点检测算法,利用交叉点密度等边缘特征去除大部分非文字边缘以降低背景边缘对文本区域的影响,并对剩余边缘进行膨胀以形成候选文本区域,结合文本区域特征和支持向量机算法区分文本区域和非文本区域。实验表明,该算法可以提取视频帧中90%以上的文本区域,并且文本区域定位准确率达92.0%。  相似文献   

8.
为了在视频图像中进行字幕信息的实时提取,提出了一套简捷而有效的方法。首先进行文字事件检测,然后进行边缘检测、阈值计算和边缘尺寸限制,最后依据文字像素密度范围进一步滤去非文字区域的视频字幕,提出的叠加水平和垂直方向边缘的方法,加强了检测到的文字的边缘;对边缘进行尺寸限制过滤掉了不符合文字尺寸的边缘。应用投影法最终确定视频字幕所在区域。最后,利用OCR识别技术对提取出来的文字区域进行识别,完成视频中文字的提取。以上方法的结合保证了提出算法的正确率和鲁棒性。  相似文献   

9.
视频字幕检测和提取是视频理解的关键技术之一。文中提出一种两阶段的字幕检测和提取算法,将字幕帧和字幕区域分开检测,从而提高检测效率和准确率。第一阶段进行字幕帧检测:首先,根据帧间差算法进行运动检测,对字幕进行初步判断,得到二值化图像序列;然后,根据普通字幕和滚动字幕的动态特征对该序列进行二次筛选,得到字幕帧。第二阶段对字幕帧进行字幕区域检测和提取:首先,利用Sobel边缘检测算法初检文字区域;然后,利用高度约束等剔除背景,并根据宽高比区分出纵向字幕和横向字幕,从而得到字幕帧中的所有字幕,即静止字幕、普通字幕、滚动字幕。该方法减少了需要检测的帧数,将字幕检测效率提高了约11%。实验对比结果证明, 相比单一使用帧间差和边缘检测的方法,该方法在F值上提升约9%。  相似文献   

10.
数字视频中字幕检测及提取的研究和实现   总被引:12,自引:1,他引:12  
首先进行文字事件检测,然后进行边缘检测、阈值计算和边缘尺寸限制,最后依据文字像素密度范围进一步滤去非文字区域的视频字幕.提出的叠加水平和垂直方向边缘的方法,加强了检测到的文字的边缘;对边缘进行尺寸限制过滤掉了不符合文字尺寸的边缘;进一步,提出像素密度α的概念,并指出文字区域的像素密度α应在某一阈值范围之内(αmin≤α≤αmax).通过像素密度α滤去了非文字区域,应用投影法最终确定视频字幕所在区域.以上方法的结合保证了提出的算法的正确率和鲁棒性.选用不同类型的视频素材对文中算法进行实验,并与其他方法进行比较,得出文中算法具有较高的正确率和较快的计算速度.  相似文献   

11.
Detection of both scene text and graphic text in video images is gaining popularity in the area of information retrieval for efficient indexing and understanding the video. In this paper, we explore a new idea of classifying low contrast and high contrast video images in order to detect accurate boundary of the text lines in video images. In this work, high contrast refers to sharpness while low contrast refers to dim intensity values in the video images. The method introduces heuristic rules based on combination of filters and edge analysis for the classification purpose. The heuristic rules are derived based on the fact that the number of Sobel edge components is more than the number of Canny edge components in the case of high contrast video images, and vice versa for low contrast video images. In order to demonstrate the use of this classification on video text detection, we implement a method based on Sobel edges and texture features for detecting text in video images. Experiments are conducted using video images containing both graphic text and scene text with different fonts, sizes, languages, backgrounds. The results show that the proposed method outperforms existing methods in terms of detection rate, false alarm rate, misdetection rate and inaccurate boundary rate.  相似文献   

12.
In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.  相似文献   

13.
为了从复杂背景中精确地提取出视频对象,提出了一种融合时域和梯度域信息的视频对象提取算法,可以有效地提取出复杂背景下的视频运动对象,并解决前景与背景一致情况下,背景剔除方法所带来的空洞问题。首先在时域空间中分别采用背景剔除和帧间差分方法生成初步的视频对象,并利用形态学中的二值腐蚀和膨胀方法对视频对象进行处理;然后,在梯度域空间中用Sobel算子进行视频对象边缘检测,并结合时域空间中的视频对象,生成精确的视频对象轮廓边缘;最后,采用启发式搜索方法连接视频对象轮廓边缘点,进而提取出视频对象。实验结果表明,该方法能够比较完整精确地从复杂背景中提取出视频对象。  相似文献   

14.
Sports video annotation is important for sports video semantic analysis such as event detection and personalization. In this paper, we propose a novel approach for sports video semantic annotation and personalized retrieval. Different from the state of the art sports video analysis methods which heavily rely on audio/visual features, the proposed approach incorporates web-casting text into sports video analysis. Compared with previous approaches, the contributions of our approach include the following. 1) The event detection accuracy is significantly improved due to the incorporation of web-casting text analysis. 2) The proposed approach is able to detect exact event boundary and extract event semantics that are very difficult or impossible to be handled by previous approaches. 3) The proposed method is able to create personalized summary from both general and specific point of view related to particular game, event, player or team according to user's preference. We present the framework of our approach and details of text analysis, video analysis, text/video alignment, and personalized retrieval. The experimental results on event boundary detection in sports video are encouraging and comparable to the manually selected events. The evaluation on personalized retrieval is effective in helping meet users' expectations.  相似文献   

15.
视频中的文字探测   总被引:12,自引:0,他引:12  
视频中出现的文字往往包含大量的信息 ,是视频分析的重要语义线索 ,探测并识别出来的文字可以为基于内容的视频检索提供索引 .本文简要介绍了目前现有的一些文字探测的方法 ,结合视频中出现的文字的特点 ,提出了一种较为高效的视频文字探测方法 ,该方法在一般图像质量的条件下对中、英文文字都有较好的探测效果 .文章给出了实验结果并对相关问题进行了讨论  相似文献   

16.
A new method for detecting text in video images is proposed in this article. Variations in background complexity, font size and color, make detecting text regions in video images a difficult task. A pyramidal scheme is utilized to solve these problems. First, two downsized images are generated by bilinear interpolation from the original image. Then, the gradient difference of each pixel is calculated for three differently sized images, including the original one. Next, three K-means clustering procedures are applied to separate all the pixels of the three gradient difference images into two clusters: text and non-text, separately. The K-means clustering results are then combined to form the text regions. Thereafter, projection profile analysis is applied to the Sobel edge map of each text region to determine the boundaries of candidate text regions. Finally, we identify text candidates through two verification phases. In the first verification phase, we verify the geometrical properties and texture of each text candidate. In the second verification phase, statistical characteristics of the text candidate are computed using a discrete wavelet transform, and then the principal component analysis is further used to reduce the number of dimensions of these features. Next, the optimal decision function of the support vector machine, obtained by sequential minimal optimization, is applied to determine whether the text candidates contain texts or not.  相似文献   

17.
简要介绍了现有视频字幕的检测提取方法及独立成分分析的基本理论和算法,探讨了独立成分分析在视频图像序列处理方面的应用,提出了一种基于独立成分分析的新的视频字幕检测提取方法。仿真实验结果表明,在图像背景复杂、图像分辨率低以及字幕字体、大小、颜色多变这些传统检测提取方法或多或少都存在困难的条件下,该方法都具有良好的视频字幕检测提取能力。  相似文献   

18.
视频和图像中的文本通常在基于内容的视频数据库检索、网络视频搜索,图像分割和图像修复等中起到重要作用,为了提高文本检测的效率,给出了一种基于多种特征自适应阈值的视频文本检测方法.方法是在Michael算法的基础上,利用文本边缘的强度,密度,水平竖直边缘比3个特征计算自适应局部阈值,用阈值能较好去除非文本区域,提取文本边缘,检测并定位文本,减少了Michael算法单一特征阈值的不利影响.在文本定位阶段引入了合并机制.减少了不完整区域的出现.实验结果表明有较高的精度和召回率,可用于视频搜索、图像分割和图像修复等.  相似文献   

19.
In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max-Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号