知识图谱增强的科普文本分类模型 Popular science text classification model enhanced by knowledge graph期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

知识图谱增强的科普文本分类模型

引用本文：	唐望径,许斌,仝美涵,韩美奂,王黎明,钟琦. 知识图谱增强的科普文本分类模型[J]. 计算机应用, 2022, 42(4): 1072-1078. DOI: 10.11772/j.issn.1001-9081.2021071278

作者姓名：	唐望径许斌仝美涵韩美奂王黎明钟琦

作者单位：	清华大学计算机科学与技术系，北京 100084 北京交通大学计算机与信息技术学院，北京 100044 清华大学深圳国际研究生院，广东深圳 518055 中国科普研究所，北京 100081

基金项目：	此项工作得到了中国科普研究所委托合作项目

摘要：	科普文本分类是将科普文章按照科普分类体系进行划分的任务。针对科普文章篇幅超过千字，模型难以聚焦关键信息，造成传统模型分类性能不佳的问题，提出一种结合知识图谱进行两级筛选的科普长文本分类模型，来减少主题无关信息的干扰，提升模型的分类性能。首先，采用四步法构建科普领域的知识图谱；然后，将该知识图谱作为距离监督器，并通过训练句子过滤器来过滤掉无关信息；最后，使用注意力机制对过滤后的句子集做进一步的信息筛选，并实现基于注意力的主题分类模型。在所构建的科普文本分类数据集（PSCD）上的实验结果表明，基于领域知识图谱的知识增强的文本分类算法模型具有更高的F1-Score，相较于TextCNN模型和BERT模型，在F1-Score上分别提升了2.88个百分点和1.88个百分点，验证了知识图谱对于长文本信息筛选的有效性。
关键词：	科普文本分类知识图谱两级筛选长文本分类注意力
收稿时间：	2021-07-16
修稿时间：	2021-09-07
Popular science text classification model enhanced by knowledge graph

TANG Wangjing,XU Bin,TONG Meihan,HAN Meihuan,WANG Liming,ZHONG Qi. Popular science text classification model enhanced by knowledge graph[J]. Journal of Computer Applications, 2022, 42(4): 1072-1078. DOI: 10.11772/j.issn.1001-9081.2021071278

Authors:	TANG Wangjing XU Bin TONG Meihan HAN Meihuan WANG Liming ZHONG Qi

Affiliation:	Department of Computer Science and Technology，Tsinghua University，Beijing 100084，China School of Computer and Information Technology，Beijing Jiaotong University，Beijing 100044，China Tsinghua Shenzhen International Graduate School，Shenzhen Guangdong 518055，China China Research Institute for Science Popularization，Beijing 100081，China

Abstract:	Popular science text classification aims to classify the popular science articles according to the popular science classification system. Concerning the problem that the length of popular science articles often exceeds 1 000 words， which leads to the model hard to focus on key points and causes poor classification performance of the traditional models， a model for long text classification combining knowledge graph to perform two-level screening was proposed to reduce the interference of topic-irrelevant information and improve the performance of model classification. First， a four-step method was used to construct a knowledge graph for the domains of popular science. Then， this knowledge graph was used as a distance monitor to filter out irrelevant information through training sentence filters. Finally， the attention mechanism was used to further filter the information of the filtered sentence set， and the attention-based topic classification model was completed. Experimental results on the constructed Popular Science Classification Dataset （PSCD） show that the text classification algorithm model based on the domain knowledge graph information enhancement has higher F1-Score. Compared with the TextCNN model and the BERT （Bidirectional Encoder Representations from Transformers） model， the proposed model has the F1-Score increased by 2.88 percentage points and 1.88 percentage points respectively， verifying the effectiveness of knowledge graph to long text information screening.

Keywords:	popular science text classification knowledge graph two-level screening long text classification attention
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏