首页 | 本学科首页   官方微博 | 高级检索  
     

面向不平衡文本情感分类的三支决策特征选择方法
引用本文:万志超,胡峰,邓维斌. 面向不平衡文本情感分类的三支决策特征选择方法[J]. 计算机应用, 2019, 39(11): 3127-3133. DOI: 10.11772/j.issn.1001-9081.2019050822
作者姓名:万志超  胡峰  邓维斌
作者单位:重庆邮电大学计算机科学与技术学院,重庆,400065;重庆邮电大学计算机科学与技术学院,重庆400065;计算智能重庆市重点实验室(重庆邮电大学),重庆400065;计算智能重庆市重点实验室(重庆邮电大学),重庆,400065
基金项目:国家重点研发计划项目(2018YFC0832100,2018YFC0832102);国家自然科学基金资助项目(61533020,61751312,61309014);重庆市基础科学与前沿技术研究专项(cstc2017jcyjAX0408)。
摘    要:传统的特征选择方法在面对不平衡文本情感倾向性分类时会有很大的局限性,这种局限性主要体现在特征维数过高、特征过于稀疏和特征分布不平衡,这会使得分类的准确度大幅度下降。根据不平衡文本情感特征分布的特点,结合三支决策的思想,提出了一种面向不平衡文本情感分类的三支决策特征选择方法(TWD-FS)。该方法将两种有监督特征选择方法相结合,将选择出的特征词进一步筛选,使得最终选择出的特征词同时满足类间离散度最大和类内离散度最小的特点,有效地减少了特征词的数量,降低了特征维度;此外,通过组合正负类情感特征,缓解了情感特征的不平衡性,有效提高了不平衡样本中少数类情感的分类效果。在COAE2013中文微博非平衡数据集等多个数据集上的实验结果表明,所提的特征选择算法TWD-FS可以有效提高不平衡文本情感分类的准确度。

关 键 词:不平衡文本  特征选择  情感分类  有监督  三支决策
收稿时间:2019-05-06
修稿时间:2019-05-23

Feature selection method for imbalanced text sentiment classification based on three-way decisions
WAN Zhichao,HU Feng,DENG Weibin. Feature selection method for imbalanced text sentiment classification based on three-way decisions[J]. Journal of Computer Applications, 2019, 39(11): 3127-3133. DOI: 10.11772/j.issn.1001-9081.2019050822
Authors:WAN Zhichao  HU Feng  DENG Weibin
Affiliation:1. College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;2. Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications), Chongqing 400065, China
Abstract:Traditional feature selection methods have great limitations in the imbalanced text sentiment tendency classification, which are mainly reflected in the high feature dimension, the sparse characteristics, and the imbalanced feature distribution, making the reduction of classification accuracy. According to the distribution of emotional features of imbalanced texts, a Three-Way Decisions-Feature Selection algorithm (TWD-FS) was proposed for imbalanced text sentiment classification based on three-way decisions. In order to reduce the number of feature words and reduce the feature dimension, two supervised feature selection methods were combined, and the feature words selected were further filtered in order to make them satisfy the characteristics of the maximum between-class scatter degree and the minimum within-class scatter degree. In addition, the imbalance of sentiment features was decreased and the classification accuracy of minority sentiment was effectively improved by combining positive and negative sentiment features. The experimental results on COAE2013 Chinese microblog imbalanced datasets and other datasets show that the proposed feature selection algorithm TWD-FS can effectively improve the accuracy of imbalanced text sentiment classification.
Keywords:imbalanced text   feature selection   sentiment classification   supervised   three-way decisions
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号