Clustering scientific documents with topic modeling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Clustering scientific documents with topic modeling

Authors:	Chyi-Kwei Yau Alan Porter Nils Newman Arho Suominen

Affiliation:	1. Technology Policy and Assessment Center, Georgia Tech, Atlanta, GA, 30332-0345, USA 2. Search Technology, Inc., Norcross, GA, 30992, USA 3. IISC, Atlanta, GA, 30357, USA 4. UNU-MERIT, University of Maastricht, Maastricht, Netherlands 5. VTT Technical Research Centre of Finland, Innovations, Economy, and Policy, It?inen Pitk?katu 4, P.O. Box 106, 20521, Turku, Finland

Abstract:	Topic modeling is a type of statistical model for discovering the latent “topics” that occur in a collection of documents through machine learning. Currently, latent Dirichlet allocation (LDA) is a popular and common modeling approach. In this paper, we investigate methods, including LDA and its extensions, for separating a set of scientific publications into several clusters. To evaluate the results, we generate a collection of documents that contain academic papers from several different fields and see whether papers in the same field will be clustered together. We explore potential scientometric applications of such text analysis capabilities.

Keywords:
本文献已被 SpringerLink 等数据库收录！