基于图卷积半监督学习的论文作者同名消歧方法研究 Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于图卷积半监督学习的论文作者同名消歧方法研究

引用本文：	盛晓光,王颖,钱力,王颖.基于图卷积半监督学习的论文作者同名消歧方法研究[J].电子与信息学报,2021,43(12):3442-3450.

作者姓名：	盛晓光王颖钱力王颖

作者单位：	中国科学院大学人工智能学院北京 100049;中国科学院文献情报中心北京 100190;中国科学院文献情报中心北京 100190;中国科学院大学图书情报与档案管理系北京 100190

基金项目：	国家自然科学基金(61702038)，国家社会科学基金(15CTQ006)

摘要：	为解决学者与成果的精确匹配问题，该文提出了一种基于图卷积半监督学习的论文作者同名消歧方法。该方法使用SciBERT预训练语言模型计算论文题目、关键字获得论文节点语义表示向量，利用论文的作者和机构信息获得论文的合作网络和机构关联网络邻接矩阵，并从论文合作网络中采集伪标签获得正样本集和负样本集，将这些作为输入利用图卷积神经网络进行半监督学习，获得论文节点嵌入表示进行论文节点向量聚类，实现对论文作者同名消歧。实验结果表明，与其他消歧方法相比，该方法在实验数据集上取得了更好的效果。
关键词：	同名消歧图卷积神经网络 BERT语言模型
收稿时间：	2020-10-23
Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network

Xiaoguang SHENG,Ying WANG,Li QIAN,Ying WANG.Author Name Disambiguation Based on Semi-supervised Learning with Graph Convolutional Network[J].Journal of Electronics & Information Technology,2021,43(12):3442-3450.

Authors:	Xiaoguang SHENG Ying WANG Li QIAN Ying WANG

Affiliation:	1.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China2.National Science Library, Chinese Academy of Sciences, Beijing 100190, China3.Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China

Abstract:	In order to solve the problem of exact matching between scholars and articles, a new method of author name disambiguation is proposed based on semi-supervised learning with graph convolutional network. In this method, the SciBERT pre-training language model is applied to calculating the semantic embedding vector of each paper with their title and keywords. Authors and organizations of papers are used to obtain the adjacency matrixes of the paper’s co-author network and co-organization network. The pseudo labels are collected from the co-author network to obtain the positive and negative samples. The semantic embedding vector, adjacency matrixes and the positive and negative samples are used as input to be processed by Graph Convolution neural Network (GCN). In semi-supervised learning, the embedding vectors of papers are learned to be clustered in order to realize the name disambiguation of papers. The experimental results show that, compared with other disambiguation methods, this method achieves better results on the experimental dataset.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏