首页 | 本学科首页   官方微博 | 高级检索  
     


Using neural-network based paragraph embeddings for the calculation of within and between document similarities
Authors:Thijs  Bart
Affiliation:1.KU Leuven, ECOOM, FEB, Naamsestraat 61, 3000, Leuven, Belgium
;2.Univ Grenoble Alpes, LIG, SIGMA, 38000, Grenoble, France
;
Abstract:

Science mapping using document networks comes often with the implicit assumption that scientific papers are indivisible units with unique links to neighbour documents. Research on proximity in co-citation analysis and the study of lexical properties of sections and citation contexts indicate that this assumption doesn’t always hold. Moreover, the meaning of words and co-words depends on the context in which they appear. This study proposes the use of a neural network architecture for word and paragraph embeddings (Doc2Vec) for the measurement of similarity among those smaller units of analysis. It is shown that paragraphs in the “Introduction” and the “Discussion” Section are more similar to the abstract, that the similarity among paragraphs is related to -but not linearly- the distance between the paragraphs. The “Methodology” Section is least similar to the other sections. Abstracts of citing-cited documents are more similar than random pairs and the context in which a reference appears is most similar to the abstract of the cited document. This novel approach with higher granularity can be used for bibliometric aided retrieval and to assist in measuring interdisciplinarity through the application of network-based centrality measures.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号