首页 | 本学科首页   官方微博 | 高级检索  
     


Clustering scientific documents with topic modeling
Authors:Chyi-Kwei Yau  Alan Porter  Nils Newman  Arho Suominen
Affiliation:1. Technology Policy and Assessment Center, Georgia Tech, Atlanta, GA, 30332-0345, USA
2. Search Technology, Inc., Norcross, GA, 30992, USA
3. IISC, Atlanta, GA, 30357, USA
4. UNU-MERIT, University of Maastricht, Maastricht, Netherlands
5. VTT Technical Research Centre of Finland, Innovations, Economy, and Policy, It?inen Pitk?katu 4, P.O. Box 106, 20521, Turku, Finland
Abstract:Topic modeling is a type of statistical model for discovering the latent “topics” that occur in a collection of documents through machine learning. Currently, latent Dirichlet allocation (LDA) is a popular and common modeling approach. In this paper, we investigate methods, including LDA and its extensions, for separating a set of scientific publications into several clusters. To evaluate the results, we generate a collection of documents that contain academic papers from several different fields and see whether papers in the same field will be clustered together. We explore potential scientometric applications of such text analysis capabilities.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号