首页 | 本学科首页   官方微博 | 高级检索  
     


A robust system for document layout analysis using multilevel homogeneity structure
Affiliation:1. Faculty of Computer Science and Engineering, HoChiMinh City University of Technology, 268 Ly Thuong Kiet, District 10, Ho Chi Minh City, Viet Nam;2. School of Electronics and Computer Engineering, Chonnam National University, 77 Yongbong-ro, Gwangju 500-757, South Korea;1. Decision Science Institute, School of Economics and Management, Fuzhou University, 2 Xueyuan road, Fuzhou 350116, China;2. School of Mathematics and Computer Science, Fuzhou University, 2 Xueyuan road, Fuzhou 350116, China;1. Biomedical HPC Technology Research Center, Korean Institute of Science and Technology Information, Daejeon, Republic of Korea;2. Department of Computer Engineering, Gachon University, Seongnam-si, Gyeonggi-do, Republic of Korea;3. Department of Computer Science, Virginia Tech, Blacksburg, VA, USA;4. Department of Computer Science and Engineering, Incheon University, Incheon, Republic of Korea;1. University of Nova Gorica, Nova Gorica, Slovenia;2. Jožef Stefan Institute, Ljubljana, Slovenia;3. Temida d.o.o., Ljubljana, Slovenia;1. Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran;2. Department of Medical Biophysics, Western University, London, Ontario, Canada;3. Imaging Research Laboratories, Robarts Research Institute (RRI), London, Ontario, Canada;4. Department of Radiology, Isfahan University of Medical Sciences, Isfahan, Iran;5. Medical Image & Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran;6. Department of Electrical and Computer Engineering, Western University, London, Ontario, Canada
Abstract:One of the difficulties in the understanding of document images is document layout analysis, which is the first step in document image modeling. In this paper, a robust system for which a multilevel-homogeneity structure is used in accordance with a hybrid methodology is proposed to deal with this problem. Our system consists of the following three main stages: classification, segmentation, and refinement and labeling. Different from other page segmentation methods, the proposed system includes an efficient algorithm to detect table regions in document images. Besides, to create an effective application, the proposed system is designed to work with a variety of document languages. The proposed method was tested with the ICDAR2015 competition (RDCL-2015) and three other published datasets in different languages. The results of these tests show that the accuracy of proposed system is superior to the previous methods.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号