首页 | 本学科首页   官方微博 | 高级检索  
     

Logistic视频字幕增强模型
引用本文:李钦瑞,吕学强,李卓,刘坤.Logistic视频字幕增强模型[J].中国图象图形学报,2014,19(5):683-692.
作者姓名:李钦瑞  吕学强  李卓  刘坤
作者单位:北京信息科技大学,北京信息科技大学,北京信息科技大学,北京拓尔思信息技术股份有限公司
基金项目:国家自然科学基金项目(项目编号:61171159、61271304 );北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目(项目编号:KZ201311232037);北京市属高等学校创新团队建设与教师职业发展计划项目(项目编号:IDHT20130519);北京信息科技大学网络文化与数字传播北京市重点实验室开放课题(项目编号:ICDD201303)
摘    要:目的:为提高复杂背景下的视频字幕在OCR中的识别率,需要对提取的视频字幕进行有效地字幕增强。该文首次将Logistic模型应用到视频字幕增强中,提出了基于Logistic模型的融合多帧信息的视频字幕增强方法。方法:对字幕进行检测与跟踪,将出现在连续多帧中的同一字幕片段进行对齐;通过分析字幕片段在多帧中信息,提出字幕背景在时域上的变化特征、背景和字幕文本的固有特征,并将三个特征进行量化与融合,构建适用于字幕增强的Logistic模型,实现对视频字幕的增强。结果:对含阴影或描边效果的特殊复杂背景字幕、普通复杂背景字幕、单一背景字幕分别进行实验,增强后的字幕在OCR软件中的识别正确率分别为81.76%、97.13%、98.19%,与对比方法比较均有一定的提高。结论:实验结果表明,该文方法既可以降低字幕背景的复杂度,又可以提高字幕背景与文本的对比度,从而可以对复杂背景和单一背景下的视频字幕进行有效地增强。

关 键 词:复杂背景  字幕增强    Logistic模型    字幕检测与跟踪    时域特征
收稿时间:2013/9/18 0:00:00
修稿时间:2013/11/7 0:00:00

Logistic model for video caption enhancement
Li Qinrui,Lyu Xueqiang,Li Zhuo and Liu Kun.Logistic model for video caption enhancement[J].Journal of Image and Graphics,2014,19(5):683-692.
Authors:Li Qinrui  Lyu Xueqiang  Li Zhuo and Liu Kun
Affiliation:Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;Beijing TRS Information Technology Co., Ltd, Beijing 100101, China
Abstract:Objective Video caption contains abundant information related to the video content. Recognizing text in images is the premise of making full use of this information. Although optical character recogonition (OCR) software recognition accuracy has been improved,the video captions with complex backgrounds still cannot be recognized well. Therefore,in order to improve the recognition accuracy,the extracted caption shall be enhanced which can reduce the complexity of caption background and improve the contrast between background and text. In this paper, we propose a method of fusing multi-frame information to realize caption enhancement based on the Logistic model. Method Logistic curve is a common form of an S-type curve, which either end or converge to a constant. By counting and analyzing distribution proportion of different pixel values in a single background caption,we establish a proper Logistic model whose output can be used as the enhanced caption's pixel values and their distribution proportion shall generally be kept consistent with the single background caption. According to the convergence of the Logistic model,the majority of pixel values can be assigned to 0 or 255,and a small quantity of gray points can be taken as transitions of black points and white points. Therefore,the enhanced caption image not only keeps the continuity of the pixel values but also improves the contrast between background and text. Then we detect and track the video caption,and align the same segments of caption, which appears in consecutive frames to obtain multi-frame information of the pixel. In order to reduce the complexity of the background,we analyze the characteristics of the background changing in the time domain as well as, the inherent characteristics of the background and text. Furthermore, we take the fusion of them as the characteristics of Logistic model. Normalizing the characteristic of the model based on caption blocks which is the unit of enhancement,we take the result as the input parameter of the Logistic model. Result We select 60 videos with caption from the Paike column of Youku and divide caption into three categories:the special complex background caption containing shadows or stroke effects,the common complex background caption,and the single background caption. We respectively implement four caption enhancement methods: OTSU with adaptive threshold method based on single frame,multiple frame averaging method,minimum pixel value search method and the method proposed in this paper, we use these four methods for each kind of caption for our caption enhancement experiment. We use Hanwang OCR to recognize the enhanced caption and take the recognition accuracy as the evaluation instance of the caption enhancement effect. Experimental results show that the recognition accuracy of the three kinds of caption are 81.76%,97.13%, and 81.76% respectively after enhanced by adopting the method in this paper. Comparing with the best results of the other three methods,the accuracy respectively increased by 24.35%,2.70% and 2.70%. Thus,the method in this paper can adapt to both complex background and single background caption. Especially,the enhanced effect of complex background caption containing shadows or stoke show a significant improvement. Conclusion In this paper, we propose a method of fusing multi-frame information to realize caption enhancement based on the Logistic model. This method can reduce the complexity of the caption background and improve the contrast between the background and the text as well. Furthermore, the enhanced caption can be recognized well by OCR software. However,the parameters of the Logistic model are static values acquired by artificial parameter adjustment. If we can dynamically adjust parameters according to the characteristics of different video caption,the recognition accuracy will be further improved.
Keywords:complex background  caption enhancement  Logistic model  caption detection and tracking  time domain feature
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号