首页 | 本学科首页   官方微博 | 高级检索  
     


Six Ways from Sunday: Approaches to Indexing Digital Text Images
Authors:Scott J Van Jacob
Affiliation:(1) University of Notre Dame, USA
Abstract:In 1994, the Andrew W. Mellon Foundation funded a joint project undertaken by the Center for Research Libraries (CRL) and the Latin American Microfilm Project (LAMP) to scan and index over three-hundred thousand pages of microfilmed Brazilian Government Documents for the Internet. Due to the collection size, format, language and poor physical condition of the text, entering this overwhelmingly textual collection as full-text was prohibitively expensive. Instead the documents were scanned as images, thereby maintaining the intellectual content of the collection, but losing the dynamic searching capabilities inherent in full-text databases. A combination of indexing approaches was used to provide access to these documents. Indexing (table-of-contents, pagination and subject indexes) found in the documents were recreated to give users access to the documents. A controlled vocabulary was established to index a portion of the database. The factors of costs, user feedback and available technologies all influenced the choices of the five indexes ultimately utilized. This paper will describe and comment on the strengths and weaknesses of the various indexing approaches taken to access the images within this database.
Keywords:indexing and digital collections  optical data processing
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号