首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义理解注意力神经网络的多元特征融合中文文本分类
引用本文:谢金宝,侯永进,康守强,李佰蔚,张霄.基于语义理解注意力神经网络的多元特征融合中文文本分类[J].电子与信息学报,2018,40(5):1258-1265.
作者姓名:谢金宝  侯永进  康守强  李佰蔚  张霄
基金项目:黑龙江省海外学人基金(1253HQ019)
摘    要:在中文文本分类任务中,针对重要特征在中文文本中位置分布分散、稀疏的问题,以及不同文本特征对文本类别识别贡献不同的问题,该文提出一种基于语义理解的注意力神经网络、长短期记忆网络(LSTM)与卷积神经网络(CNN)的多元特征融合中文文本分类模型(3CLA)。模型首先通过文本预处理将中文文本分词、向量化。然后,通过嵌入层分别经过CNN通路、LSTM通路和注意力算法模型通路以提取不同层次、具有不同特点的文本特征。最终,文本特征经融合层融合后,由softmax分类器进行分类。基于中文语料进行了文本分类实验。实验结果表明,相较于CNN结构模型与LSTM结构模型,提出的算法模型对中文文本类别的识别能力最多提升约8%。

关 键 词:中文文本分类    多元特征融合    注意力算法    长短期记忆网络    卷积神经网络
收稿时间:2017-08-17

Multi-feature Fusion Based on Semantic Understanding Attention Neural Network for Chinese Text Categorization
XIE Jinbao,HOU Yongjin,KANG Shouqiang,LI Baiwei,ZHANG Xiao.Multi-feature Fusion Based on Semantic Understanding Attention Neural Network for Chinese Text Categorization[J].Journal of Electronics & Information Technology,2018,40(5):1258-1265.
Authors:XIE Jinbao  HOU Yongjin  KANG Shouqiang  LI Baiwei  ZHANG Xiao
Abstract:In Chinese text categorization tasks, the locations of the important features in the Chinese texts are disperse and sparse, and the different characteristics of Chinese texts contributes differently for the recognition of their categories. In order to solve the above problems, this paper proposes a multi-feature fusion model Three Convolutional neural network paths and Long short term memory path fused with Attention neural network path (3CLA) for Chinese text categorization, which is based on Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) and semantic understanding attention neural networks. The model first uses text preprocessing to finish the segmentation and vectorization of the Chinese text. Then, through the embedding layer, the input data are sent to the CNN path, the LSTM path and the attention path respectively to extract text features of different levels and different characteristics. Finally, the text features are fused by the fusion layer and classified by the classifier. Based on the Chinese corpus, the text classification experiment is carried out. The results of the experiments show that compared with the CNN structure model and the LSTM structure model, the proposed algorithm model improves the recognition ability of Chinese text categories by up to about 8%.
Keywords:
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号