首页 | 本学科首页   官方微博 | 高级检索  
     


A categorization system for handwritten documents
Authors:Thierry Paquet  Laurent Heutte  Guillaume Koch  Clément Chatelain
Affiliation:1. Université de Rouen, LITIS EA 4108, Rouen, France
2. INSA-Rouen, LITIS EA 4108, Rouen, France
Abstract:This paper presents a complete system able to categorize handwritten documents, i.e. to classify documents according to their topic. The categorization approach is based on the detection of some discriminative keywords prior to the use of the well-known tf-idf representation for document categorization. Two keyword extraction strategies are explored. The first one proceeds to the recognition of the whole document. However, the performance of this strategy strongly decreases when the lexicon size increases. The second strategy only extracts the discriminative keywords in the handwritten documents. This information extraction strategy relies on the integration of a rejection model (or anti-lexicon model) in the recognition system. Experiments have been carried out on an unconstrained handwritten document database coming from an industrial application concerning the processing of incoming mails. Results show that the discriminative keyword extraction system leads to better recall/precision tradeoffs than the full recognition strategy. The keyword extraction strategy also outperforms the full recognition strategy for the categorization task.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号