Finding division points for a time series corpus based on structural change point detection |
| |
Authors: | Hiroshi Kobayashi Ryosuke Saga |
| |
Affiliation: | 1.Osaka Prefecture University,Sakai,Japan |
| |
Abstract: | This paper describes a method of finding the proper points for dividing a corpus with time series information to extract local and frequent keywords. Previous works have proposed the corpus separating method for extracting keywords from a corpus. However, this method divides a corpus at equal intervals so that it cannot consider the topic changes. The present paper utilizes the idea of the topic model and the topic extracted through latent Dirichlet allocation to consider the topic change. This paper identifies the points at which large topic changes occur to divide the corpus using structural change detection method. An experiment involving newspaper articles with 5-year topics confirm that the points at which the topics of each document change are detected to find the division points based on the idea of structural change point detection and our method is better than previous methods based on recall measure. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |