首页 | 本学科首页   官方微博 | 高级检索  
     

基于图卷积网络的文本分割模型
引用本文:杜雨奇,郑津,王杨,黄诚,李平.基于图卷积网络的文本分割模型[J].计算机应用,2022,42(12):3692-3699.
作者姓名:杜雨奇  郑津  王杨  黄诚  李平
作者单位:西南石油大学 计算机科学学院,成都 610500
基金项目:国家杰出青年科学基金资助项目(61625204);西南石油大学科研创新能力提升计划“启航”项目(2019QHZ016)
摘    要:文本分割的主要任务是将文本按照主题相关的原则划分为若干个相对独立的文本块。针对现有文本分割模型提取文本段落结构信息、语义相关性及上下文交互等细粒度特征的不足,提出了一种基于图卷积网络(GCN)的文本分割模型TS-GCN。首先,基于文本段落的结构信息与语义逻辑构建出文本图;然后,引入语义相似性注意力来捕获文本段落节点间的细粒度相关性,并借助GCN实现文本段落节点高阶邻域间的信息传递,以此增强模型多粒度提取文本段落主题特征表达的能力。将所提模型与目前常用作文本分割任务基准的代表模型CATS及其基础模型TLT-TS进行对比。实验结果表明在Wikicities数据集上,TS-GCN在未增加任何辅助模块的情况下比TLT-TS的评价指标Pk值下降了0.08个百分点;在Wikielements数据集上,相较于CATS和TLT-TS,所提模型的Pk值分别下降了0.38个百分点和2.30个百分点,可见TLT-TS取得了较好的分割效果。

关 键 词:文本分割  图卷积网络  注意力  自然语言处理  深度学习
收稿时间:2021-10-14
修稿时间:2022-01-07

Text segmentation model based on graph convolutional network
Yuqi DU,Jin ZHENG,Yang WANG,Cheng HUANG,Ping LI.Text segmentation model based on graph convolutional network[J].journal of Computer Applications,2022,42(12):3692-3699.
Authors:Yuqi DU  Jin ZHENG  Yang WANG  Cheng HUANG  Ping LI
Affiliation:School of Computer Science,Southwest Petroleum University,Chengdu Sichuan 610500,China
Abstract:The main task of text segmentation is to divide the text into several relatively independent text blocks according to the topic relevance. Aiming at the shortcomings of the existing text segmentation models in extracting fine-grained features such as text paragraph structural information, semantic correlation and context interaction, a text segmentation model TS-GCN (Text Segmentation-Graph Convolutional Network) based on Graph Convolutional Network (GCN) was proposed. Firstly, a text graph based on the structural information and semantic logic of text paragraphs was constructed. Then, the semantic similarity attention was introduced to capture the fine-grained correlation between text paragraph nodes, and the information transmission between high-order neighborhoods of text paragraph nodes was realized with the help of GCN, so that the model ability of multi-granularity extraction of text paragraph topic feature representations was enhanced. The proposed model was compared with the representative model CATS (Coherence-Aware Text Segmentation), and its basic model TLT-TS (Two-Level Transformer model for Text Segmentation), which were commonly used as benchmarks for text segmentation task. Experimental results show that TS-GCN’s evaluation index Pk is 0.08 percentage points lower than that of TLT-TS without any auxiliary module on Wikicities dataset. And the proposed model has the Pk value decreased by 0.38 percentage points and 2.30 percentage points respectively on Wikielements dataset compared with CATS and TLT-TS. It can be seen that TS-GCN achieves good segmentation effect.
Keywords:text segmentation  Graph Convolutional Network (GCN)  attention  Natural Language Processing (NLP)  deep learning  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号