Automatic text summarization using latent semantic analysis |
| |
Authors: | I. V. Mashechkin M. I. Petrovskiy D. S. Popov D. V. Tsarev |
| |
Affiliation: | 1.Department of Computational Mathematics and Cybernetics,Moscow State University,Moscow,Russia |
| |
Abstract: | In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|