首页 | 本学科首页   官方微博 | 高级检索  
     

Visual Similarity Based Document Layout Analysis
引用本文:Di Wen and Xiao-Qing Ding. Visual Similarity Based Document Layout Analysis[J]. 计算机科学技术学报, 2006, 21(3): 459-448. DOI: 10.1007/s11390-006-0459-0
作者姓名:Di Wen and Xiao-Qing Ding
作者单位:Department of Electronic Engineering & State Key Laboratory of Intelligent Technology and Systems, Tsinghua University Beijing 100084, P.R. China
基金项目:This work is supported by the National Natural Science Foundation of China under Grant No. 60472002.
摘    要:In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process. Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.

关 键 词:文档 规划分析 纹理分析 动态聚类 视觉相似性
收稿时间:2005-07-14
修稿时间:2005-07-142005-12-23

Visual Similarity Based Document Layout Analysis
Di Wen,Xiao-Qing Ding. Visual Similarity Based Document Layout Analysis[J]. Journal of Computer Science and Technology, 2006, 21(3): 459-448. DOI: 10.1007/s11390-006-0459-0
Authors:Di Wen  Xiao-Qing Ding
Affiliation:(1) Department of Electronic Engineering & State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, P.R. China
Abstract:In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process. Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.
Keywords:document layout analysis   texture analysis   dynamic clustering
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
点击此处可从《计算机科学技术学报》浏览原始摘要信息
点击此处可从《计算机科学技术学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号