Contextual correlation based thread detection in short text message streams |
| |
Authors: | Jiuming Huang Bin Zhou Quanyuan Wu Xiaowei Wang Yan Jia |
| |
Affiliation: | (1) College of Computer, National University of Defense Technology, Changsha, 410073, China |
| |
Abstract: | Short text message streams are produced by Instant Messaging and Short Message Service which are wildly used nowadays. Each
stream contains more than one thread usually. Detecting threads in the streams is helpful to various applications, such as
business intelligence, investigation of crime and public opinion analysis. Existing works which are mainly based on text similarity
encounter many challenges including the sparse eigenvector and anomaly of short text message. This paper introduces a novel
concept of contextual correlation instead of the traditional text similarity into single-pass clustering algorithm to cover
the challenges of thread detection. We firstly analyze the contextually correlative nature of conversations in short text
message streams, and then propose an unsupervised method to compute the correlative degree. As a reference, a single-pass
algorithm employing the contextual correlation is developed to detect threads in massive short text stream. Experiments on
large real-life online chat logs show that our approach improves the performance by 11% when compared with the best similarity-based
algorithm in terms of F1 measure. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|