首页 | 本学科首页   官方微博 | 高级检索  
     


A survey of document image classification: problem statement,classifier architecture and performance evaluation
Authors:Nawei Chen  Dorothea Blostein
Affiliation:(1) School of Computing, Queen’s University, K7L 3N6 Kingston, ON, Canada
Abstract:Document image classification is an important step in Office Automation, Digital Libraries, and other document image analysis applications. There is great diversity in document image classifiers: they differ in the problems they solve, in the use of training data to construct class models, and in the choice of document features and classification algorithms. We survey this diverse literature using three components: the problem statement, the classifier architecture, and performance evaluation. This brings to light important issues in designing a document classifier, including the definition of document classes, the choice of document features and feature representation, and the choice of classification algorithm and learning mechanism. We emphasize techniques that classify single-page typeset document images without using OCR results. Developing a general, adaptable, high-performance classifier is challenging due to the great variety of documents, the diverse criteria used to define document classes, and the ambiguity that arises due to ill-defined or fuzzy document classes.
Keywords:Document image classification  Document classifiers  Document classification  Document categorization  Document features  Feature representations  Class models  Classification algorithms  Learning mechanisms  Performance evaluation
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号