首页 | 本学科首页   官方微博 | 高级检索  
     

非平衡文本情感分类的数据集设计与评价指标
引用本文:赵立东,李德玉,王素格.非平衡文本情感分类的数据集设计与评价指标[J].电脑开发与应用,2013(5):1-4.
作者姓名:赵立东  李德玉  王素格
作者单位:山西大学计算机与信息技术学院;山西大学计算智能与中文信息处理教育部重点实验室
基金项目:国家自然科学基金资助项目(60970014,61272095);山西省自然科学基金资助项目(2010011021-1);山西省科技攻关项目(20110321027-02)
摘    要:随着非平衡分类问题研究的深入,训练数据与测试数据如何划分成为一个值得思考的问题。针对非平衡文本情感分类数据集设计问题,通过下采样方法,对测试数据集设计了平衡与非平衡两种方案,给出了在不同任务需求下,选择相应的实验方案,并对验证分类器分类性能的评价指标进行了讨论。通过在真实的网络评论数据上的实验,验证了这些方案的合理性和适用性。

关 键 词:非平衡数据  情感分类  实验设计

Dataset Design and Evaluation Index for Imbalanced Text Sentiment Classification
ZHAO Li-dong,LI De-yu,WANG Su-ge.Dataset Design and Evaluation Index for Imbalanced Text Sentiment Classification[J].Computer Development & Applications,2013(5):1-4.
Authors:ZHAO Li-dong  LI De-yu  WANG Su-ge
Affiliation:1,2(1.School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China,2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
Abstract:With the deep researching of the imbalanced classification problems,how to divide the training data and test data has become a worth considering question.Aiming at the imbalanced text sentiment classification problems,this paper has studied both balanced and imbalanced test data with under sampling methods.Discussed in different mission requirements,how to choose a proper scheme and evaluation index to verify the performance of the classifier.The experiments results indicate that proposed schemes are reasonable and applicative on two real network reviews datasets.
Keywords:imbalanced data  sentiment classification  experimental design
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号