首页 | 本学科首页   官方微博 | 高级检索  
     

引入外部词向量的文本信息网络表示学习
引用本文:张潇鲲,刘琰,陈静.引入外部词向量的文本信息网络表示学习[J].智能系统学报,2019,14(5):1056-1063.
作者姓名:张潇鲲  刘琰  陈静
作者单位:数学工程与先进计算国家重点实验室, 河南 郑州 450000
摘    要:针对信息网络(text-based information network)现有研究多基于网络自身信息建模,受限于任务语料规模,只使用任务相关文本进行建模容易产生语义漂移或语义残缺的问题,本文将外部语料引入建模过程中,利用外部语料得到的词向量对建模过程进行优化,提出基于外部词向量的网络表示模型NE-EWV(network embedding based on external word vectors),从语义特征空间以及结构特征空间两个角度学习特征融合的网络表示。通过实验,在现实网络数据集中对模型有效性进行了验证。实验结果表明,在链接预测任务中的AUC指标,相比只考虑结构特征的模型提升7%~19%,相比考虑结构与文本特征的模型在大部分情况下有1%~12%提升;在节点分类任务中,与基线方法中性能最好的CANE性能相当。证明引入外部词向量作为外部知识能够有效提升网络表示能力。

关 键 词:网络表示学习  文本信息网络  自编码器  外部词向量  节点分类  词向量  分布式表示  表示学习

Representation learning using network embedding based on external word vectors
ZHANG Xiaokun,LIU Yan,CHEN Jing.Representation learning using network embedding based on external word vectors[J].CAAL Transactions on Intelligent Systems,2019,14(5):1056-1063.
Authors:ZHANG Xiaokun  LIU Yan  CHEN Jing
Affiliation:Mathematical Engineering and Advanced Computing State Key Laboratory, Zhengzhou 450000
Abstract:Network embedding, which preserves a network’s sophisticated features, can effectively learn the low-dimensional embedding of vertices in order to lower the computing and storage costs. Content information networks (such as Twitter), which contain rich text information, are commonly used in daily life. Most studies on content information network are based on the information of the network itself. Distributed word vectors are becoming increasingly popular in natural language processing tasks. As a low-dimensional representation of the semantic feature space, word vectors can preserve syntactic and semantic regularities. By introducing external word vectors into the modeling process, we can use the external syntactic and semantic features. Hence, in this paper, we propose network embedding based on external word vectors (NE-EWV), whereby the feature fusion representation is learned from both semantic feature space as well as structural feature space. Empirical experiments were conducted using real-world content information network datasets to validate the effectiveness of the model. The results show that in link prediction task, the AUC of the model was 7% to 19% higher than that of the model that considers only the structural features, and in most cases was 1% to 12% higher than the model that considers structural and text features. In node classification tasks, the performance is comparable with that of context-aware network embedding (CANE), which was the state-of-the-art baseline model.
Keywords:network embedding  content information network  auto-encoder  external word vectors  vertex classification  word vectors  distributed representation  representation learning
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号