Finding structure in noisy text: topic classification and unsupervised clustering |
| |
Authors: | Prem Natarajan Rohit Prasad Krishna Subramanian Shirin Saleem Fred Choi Rich Schwartz |
| |
Affiliation: | 1. BBN Technologies, 10 Moulton Street, Cambridge, MA, 02138, USA
|
| |
Abstract: | This paper addresses two types of classification of noisy, unstructured text such as newsgroup messages: (1) spotting messages containing topics of interest, and (2) automatic conceptual organization of messages without prior knowledge of topics of interest. In addition to applying our hidden Markov model methodology to spotting topics of interest in newsgroup messages, we present a robust methodology for rejecting messages which are off-topic. We describe a novel approach for automatically organizing a large, unstructured collection of messages. The approach applies an unsupervised topic clustering procedure to generate a hierarchical tree of topics. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|