A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model

Authors:	Kai Hu Huayi Wu Kunlun Qi Jingmin Yu Siluo Yang Tianxing Yu Jie Zheng Bo Liu

Affiliation:	1.The State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing,Wuhan University,Wuhan,China;2.Collaborative Innovation Center of Geospatial Technology,Wuhan University,Wuhan,China;3.Faculty of Information Engineering,China University of Geosciences (Wuhan),Wuhan,China;4.Changjiang Spatial Information Technology Engineering CO., LTD,Wuhan,China;5.School of Information Management,Wuhan University,Wuhan,China;6.Faculty of Geomatics,East China Institute of Technology,Nanchang,China

Abstract:	In bibliometric research, keyword analysis of publications provides an effective way not only to investigate the knowledge structure of research domains, but also to explore the developing trends within domains. To identify the most representative keywords, many approaches have been proposed. Most of them focus on using statistical regularities, syntax, grammar, or network-based characteristics to select representative keywords for the domain analysis. In this paper, we argue that the domain knowledge is reflected by the semantic meanings behind keywords rather than the keywords themselves. We apply the Google Word2Vec model, a model of a word distribution using deep learning, to represent the semantic meanings of the keywords. Based on this work, we propose a new domain knowledge approach, the Semantic Frequency-Semantic Active Index, similar to Term Frequency-Inverse Document Frequency, to link domain and background information and identify infrequent but important keywords. We adopt a semantic similarity measuring process before statistical computation to compute the frequencies of “semantic units” rather than keyword frequencies. Semantic units are generated by word vector clustering, while the Inverse Document Frequency is extended to include the semantic inverse document frequency; thus only words in the inverse documents with a certain similarity will be counted. Taking geographical natural hazards as the domain and natural hazards as the background discipline, we identify the domain-specific knowledge that distinguishes geographical natural hazards from other types of natural hazards. We compare and discuss the advantages and disadvantages of the proposed method in relation to existing methods, finding that by introducing the semantic meaning of the keywords, our method supports more effective domain knowledge analysis.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏