首页 | 本学科首页   官方微博 | 高级检索  
     

融合词语统计特征和语义信息的文本分类方法研究
引用本文:张丽,马静.融合词语统计特征和语义信息的文本分类方法研究[J].计算机工程与科学,2021,43(7):1308-1315.
作者姓名:张丽  马静
作者单位:(南京航空航天大学经济与管理学院,江苏 南京211106)
基金项目:国家自然科学基金(71373123);中央高校基本科研业务费专项前瞻性发展策略研究资助项目(NW2018004)
摘    要:为了更好地表示文本语义信息,提高文本分类准确率,改进了特征权重计算方法,并融合特征向量与语义向量进行文本表示.首先基于文本复杂网络实现文本特征提取,接着利用网络节点统计特征改进TF-IDF得到特征向量,再基于LSTM抽取语义向量,最后将特征向量与语义向量相融合,使新的文本表示向量信息区分度更高.以网络新闻数据为实验对象的实验结果表明,改进特征权重计算方法,在特征向量中引入了语义和结构信息,并融合特征向量和语义向量,能进一步丰富文本信息,改善文本分类效果.

关 键 词:文本分类  文本复杂网络  特征权重  LSTM  
收稿时间:2020-03-03
修稿时间:2020-07-21

A text classification method combining word statistical characteristics and semantic information
ZHANG Li,MA Jing.A text classification method combining word statistical characteristics and semantic information[J].Computer Engineering & Science,2021,43(7):1308-1315.
Authors:ZHANG Li  MA Jing
Affiliation:(School of Economic and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
Abstract:In order to better represent the text semantic information and improve the accuracy of text classification, this paper improves the feature weight calculation method and integrates the feature vector and semantic vector for text representation. Firstly, this method extracts the text features based on the text complex network. Secondly, the statistical features of network nodes are used to improve the TF-IDF weight algorithm to get the feature vector. Thirdly, LSTM is used to get the semantic vector. Finally, the feature vector is integrated with the semantic vector to make the new text representation vector information more distinguishable. In this paper, the network news data is taken as the experimental object. The experimental results show that the improved feature weight algorithm can further enrich the text information and improve the text classification performance by introducing semantic information and structural information into the feature vector and integrating the feature vector with semantic vector.
Keywords:text classification  text complex network  feature weight  long short-term memory (LSTM)  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号