Six Ways from Sunday: Approaches to Indexing Digital Text Images |
| |
Authors: | Scott J. Van Jacob |
| |
Affiliation: | (1) University of Notre Dame, USA |
| |
Abstract: | In 1994, the Andrew W. Mellon Foundation funded a joint project undertaken by the Center for Research Libraries (CRL) and the Latin American Microfilm Project (LAMP) to scan and index over three-hundred thousand pages of microfilmed Brazilian Government Documents for the Internet. Due to the collection size, format, language and poor physical condition of the text, entering this overwhelmingly textual collection as full-text was prohibitively expensive. Instead the documents were scanned as images, thereby maintaining the intellectual content of the collection, but losing the dynamic searching capabilities inherent in full-text databases. A combination of indexing approaches was used to provide access to these documents. Indexing (table-of-contents, pagination and subject indexes) found in the documents were recreated to give users access to the documents. A controlled vocabulary was established to index a portion of the database. The factors of costs, user feedback and available technologies all influenced the choices of the five indexes ultimately utilized. This paper will describe and comment on the strengths and weaknesses of the various indexing approaches taken to access the images within this database. |
| |
Keywords: | indexing and digital collections optical data processing |
本文献已被 SpringerLink 等数据库收录! |
|