String alignment for automated document versioning |
| |
Authors: | Wei Lee Woon Kuok-Shoong Daniel Wong |
| |
Affiliation: | (1) Department of Civil and Building Engineering, Loughborough University, LE11 3TU Loughborough, UK;(2) Director of Project Based Learning Lab, Department of Civil and Environmental Engineering, Stanford University, Stanford, CA 94305, USA |
| |
Abstract: | The automated analysis of documents is an important task given the rapid increase in availability of digital texts. Automatic
text processing systems often encode documents as vectors of term occurrence frequencies, a representation which facilitates
the classification and clustering of documents. Historically, this approach derives from the related field of data mining,
where database entries are commonly represented as points in a vector space. While this lineage has certainly contributed
to the development of text processing, there are situations where document collections do not conform to this clustered structure,
and where the vector representation may be unsuitable for text analysis. As a proof-of-concept, we had previously presented
a framework where the optimal alignments of documents could be used for visualising the relationships within small sets of
documents. In this paper we develop this approach further by using it to automatically generate the version histories of various
document collections. For comparison, version histories generated using conventional methods of document representation are
also produced. To facilitate this comparison, a simple procedure for evaluating the accuracy of the version histories thus
generated is proposed. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|