首页 | 本学科首页   官方微博 | 高级检索  
     


A learning framework for the optimization and automation of document binarization methods
Authors:Mohamed Cheriet  Reza Farrahi Moghaddam  Rachid Hedjam
Affiliation:1. Department of Orthopedic Surgery, The Spine Hospital at New York Presbyterian Hospital, Columbia University, 5141 Broadway, 3 Field west-022, New York, NY 10034, United States;2. University of Toronto and Toronto Western Hospital, 399 Bathurst St, Toronto, ON M5T 2S8, Canada;3. University of Virginia, 1215 Lee St, Charlottesville, VA 22908, United States;4. Queen Mary Hospital, The University of Hong Kong, 102 Pok Fu Lam Road, Hong Kong;5. Norton Leatherman Spine Center, 210 E Gray St #900, Louisville, KY 40202, United States;6. The CORE Institute, 14520 W Granite Valley Dr, Sun City West, AZ 85375, United States;7. Hospital for Special Surgery, 535 East 70th Street, New York, NY 10021, United States;8. The FOCOS Hospital, 8 Teshie Street, Pantang West, Ghana;9. Johns Hopkins University, 3101 Wyman Park Dr., Baltimore, MD 21211, United States;10. University of California San Francisco, 505 Parnassus Ave. San Francisco, CA 94143, United States;11. Affiliated Drum Tower Hospital of Nanjing University Medical School, 101Longmian Avenue, Jiangning District, Nanjing 211166, P.R. China;12. Hamamatsu University School of Medicine, 1 Chome-20-1 Handayama, Hamamatsu, Shizuoka Prefecture 431-3192, Japan;13. Rigshospitalet, National University of Denmark, Blegdamsvej 9, 2100 København, Denmark;14. Department of Orthopedic Surgery, Texas Children’ Hospital and Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, United States;15. University Hospital, Queen''s Medical Centre, Derby Road, Nottingham, NG7 2UH, England;p. Hospital Universitari Vall d''Hebron, Passeig de la Vall d''Hebron, 119-129, 08035 Barcelona, Spain
Abstract:Almost all binarization methods have a few parameters that require setting. However, they do not usually achieve their upper-bound performance unless the parameters are individually set and optimized for each input document image. In this work, a learning framework for the optimization of the binarization methods is introduced, which is designed to determine the optimal parameter values for a document image. The framework, which works with any binarization method, has a standard structure, and performs three main steps: (i) extracts features, (ii) estimates optimal parameters, and (iii) learns the relationship between features and optimal parameters. First, an approach is proposed to generate numerical feature vectors from 2D data. The statistics of various maps are extracted and then combined into a final feature vector, in a nonlinear way. The optimal behavior is learned using support vector regression (SVR). Although the framework works with any binarization method, two methods are considered as typical examples in this work: the grid-based Sauvola method, and Lu’s method, which placed first in the DIBCO’09 contest. The experiments are performed on the DIBCO’09 and H-DIBCO’10 datasets, and combinations of these datasets with promising results.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号