首页 | 本学科首页   官方微博 | 高级检索  
     

基于双向长短时记忆单元和卷积神经网络的多语种文本分类方法
引用本文:孟先艳,崔荣一,赵亚慧,方明洙.基于双向长短时记忆单元和卷积神经网络的多语种文本分类方法[J].计算机应用研究,2020,37(9):2669-2673.
作者姓名:孟先艳  崔荣一  赵亚慧  方明洙
作者单位:延边大学 计算机科学与技术学科 智能信息处理研究室,吉林 延吉 133002;延边朝鲜族自治州科技信息服务中心,吉林 延吉 133002
基金项目:国家语委十三五科研规划项目;延边大学外国语言文学世界一流学科建设科研项目
摘    要:针对日渐丰富的多语种文本数据,为了实现对同一类别体系下不同语种的文本分类,充分发挥多语种文本信息的价值,提出一种结合双向长短时记忆单元和卷积神经网络的多语种文本分类模型BiLSTM-CNN模型。针对每个语种,利用双向长短时记忆神经网络提取文本特征,并引入卷积神经网络进行特征优化,获得各语种更深层次的文本表示,最后将各语种的文本表示级联输入到softmax函数预测类别。在中英朝科技文献平行数据集上进行了实验验证,实验结果表明,该方法相比于基准方法分类正确率提高了4%,且对任一语种文本均能正确分类,具有良好的扩展性。

关 键 词:多语种文本分类  长短时记忆单元  卷积神经网络
收稿时间:2019/4/10 0:00:00
修稿时间:2020/8/6 0:00:00

Multilingual text classification method based on bi-directional long short-term memory and convolutional neural network
Meng Xian-yan,Cui Rong-yi,Zhao Ya-hui and Fang Ming-zhu.Multilingual text classification method based on bi-directional long short-term memory and convolutional neural network[J].Application Research of Computers,2020,37(9):2669-2673.
Authors:Meng Xian-yan  Cui Rong-yi  Zhao Ya-hui and Fang Ming-zhu
Affiliation:Intelligent Information Processing Lab.,Dept. of Computer Science Technology,Yanbian University,,,
Abstract:In order to realize the text classification of different languages in the same category system and make full use of the value of multilingual text information, this paper proposed a multilanguage text classification model BiLSTM-CNN which combined bidirectional long short-term memory and convolutional neural networks. For each language, it extracted the text features through the two-way long-term memory neural network, and introduced the convolutional neural network to extract the text local information for feature optimization, so as to realize the distributed text representation of different language documents. Finally, it cascaded the text representation of each language into the softmax function prediction category. Experiments on parallel datasets of Chinese, British and Korean scientific and technological documents show that the proposed multilingual text classification model has a 4% improvement over the benchmark method, and can correctly classify any linguistic text with good expansibility.
Keywords:multilingual text categorization  long short-term memory  convolutional neural network
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号