Clustering scientific documents with topic modeling |
| |
Authors: | Chyi-Kwei Yau Alan Porter Nils Newman Arho Suominen |
| |
Affiliation: | 1. Technology Policy and Assessment Center, Georgia Tech, Atlanta, GA, 30332-0345, USA 2. Search Technology, Inc., Norcross, GA, 30992, USA 3. IISC, Atlanta, GA, 30357, USA 4. UNU-MERIT, University of Maastricht, Maastricht, Netherlands 5. VTT Technical Research Centre of Finland, Innovations, Economy, and Policy, It?inen Pitk?katu 4, P.O. Box 106, 20521, Turku, Finland
|
| |
Abstract: | Topic modeling is a type of statistical model for discovering the latent “topics” that occur in a collection of documents through machine learning. Currently, latent Dirichlet allocation (LDA) is a popular and common modeling approach. In this paper, we investigate methods, including LDA and its extensions, for separating a set of scientific publications into several clusters. To evaluate the results, we generate a collection of documents that contain academic papers from several different fields and see whether papers in the same field will be clustered together. We explore potential scientometric applications of such text analysis capabilities. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|