首页 | 本学科首页   官方微博 | 高级检索  
     


Finding structure in noisy text: topic classification and unsupervised clustering
Authors:Prem Natarajan  Rohit Prasad  Krishna Subramanian  Shirin Saleem  Fred Choi  Rich Schwartz
Affiliation:1. BBN Technologies, 10 Moulton Street, Cambridge, MA, 02138, USA
Abstract:This paper addresses two types of classification of noisy, unstructured text such as newsgroup messages: (1) spotting messages containing topics of interest, and (2) automatic conceptual organization of messages without prior knowledge of topics of interest. In addition to applying our hidden Markov model methodology to spotting topics of interest in newsgroup messages, we present a robust methodology for rejecting messages which are off-topic. We describe a novel approach for automatically organizing a large, unstructured collection of messages. The approach applies an unsupervised topic clustering procedure to generate a hierarchical tree of topics.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号