首页 | 本学科首页   官方微博 | 高级检索  
     

融合语料库特征与图注意力网络的短文本分类方法
引用本文:杨世刚,刘勇国.融合语料库特征与图注意力网络的短文本分类方法[J].计算机应用,2022,42(5):1324-1329.
作者姓名:杨世刚  刘勇国
作者单位:电子科技大学 信息与软件工程学院,成都 610054
基金项目:国家重点研发计划项目(2017YFC1703905);;国家自然科学基金资助项目(81803851);;四川省应用基础研究计划项目(2021YJ0184)~~;
摘    要:短文本分类是自然语言处理(NLP)中的重要研究问题,广泛应用于新闻分类、情感分析、评论分析等领域。针对短文本分类中存在的数据稀疏性问题,通过引入语料库的节点和边权值特征,基于图注意力网络(GAT),提出了一个融合节点和边权值特征的图注意力网络NE-GAT。首先,针对每个语料库构建异构图,利用引力模型(GM)评估单词节点的重要性,并通过节点间的点互信息(PMI)获得边权重;其次,为每个句子构建文本级别图,并将节点重要性和边权重融入节点更新过程。实验结果表明,所提模型在测试集上的平均准确率达到了75.48%,优于用于文本分类的图卷积网络(Text-GCN)、TL-GNN、Text-ING等模型;相较原始GAT,所提模型的平均准确率提升了2.32个百分点,验证了其有效性。

关 键 词:短文本分类  图注意力网络  语料库特征  引力模型  点互信息  
收稿时间:2021-04-06
修稿时间:2021-06-18

Short text classification method by fusing corpus features and graph attention network
Shigang YANG,Yongguo LIU.Short text classification method by fusing corpus features and graph attention network[J].journal of Computer Applications,2022,42(5):1324-1329.
Authors:Shigang YANG  Yongguo LIU
Affiliation:School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 610054,China
Abstract:Short text classification is an important research problem of Natural Language Processing (NLP), and is widely used in news classification, sentiment analysis, comment analysis and other fields. Aiming at the problem of data sparsity in short text classification, by introducing node and edge weight features of corpora, based on Graph ATtention network (GAT), a new graph attention network named Node-Edge GAT (NE-GAT) by fusing node and edge weight features was proposed. Firstly, a heterogeneous graph was constructed for each corpus, Gravity Model (GM) was used to evaluate the importance of word nodes, and edge weights were obtained through Point Mutual Information (PMI) between nodes. Secondly, a text-level graph was constructed for each sentence, node importance and edge weights were integrated into the update process of nodes. Experimental results show that, the average accuracy of the proposed model on the test sets reaches 75.48%, which is better than those of the models such as Text Graph Convolution Network (Text-GCN), Text-Level-Graph Neural Network (TL-GNN) and Text classification method for INductive word representations via Graph neural networks (Text-ING). Compared with original GAT, the proposed model has the average accuracy improved by 2.32 percentage points, which verifies the effectiveness of the proposed model.
Keywords:short text classification  Graph Attention Network (GAT)  corpus feature  Gravity Model (GM)  Pointwise Mutual Information (PMI)  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号