首页 | 本学科首页   官方微博 | 高级检索  
     

面向社会事件的半监督自训练多方立场分析
引用本文:林俊杰,王磊,毛文吉.面向社会事件的半监督自训练多方立场分析[J].模式识别与人工智能,2018,31(12):1074-1084.
作者姓名:林俊杰  王磊  毛文吉
作者单位:1.中国科学院自动化研究所 复杂系统管理与控制国家重点实验室 北京 100190
2.中国科学院大学 人工智能学院 北京 100049
基金项目:国家自然科学基金项目(No.71702181,11832001)资助
摘    要:已有的立场分析方法主要采用有监督或无监督方式训练立场分类模型,有监督模型训练通常需要大量有标注数据支持,而相比有监督模型,无监督模型的性能差距较大.为了降低模型训练对有标注训练数据的要求,同时保证模型性能,文中面向社会事件相关的社交媒体文本,提出半监督自训练多方立场分析方法.对于自训练方法,在模型迭代训练过程中,选择高质量样本加入训练集合,对提升模型性能起到关键作用.为此,文中方法首先根据用户立场一致性度量文本的分类置信度,然后利用话题信息进一步筛选高质量样本扩充训练集合,保证模型性能不断提升.实验表明,相比相关工作中的代表性方法和其它半监督模型训练方式,文中方法能够取得更优的立场分类效果,并且方法依据的用户立场一致性和话题信息均有助于提升立场分类效果.

关 键 词:多方立场分析  半监督  自训练  用户立场一致性  话题信息  
收稿时间:2018-02-12

Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events
LIN Junjie,WANG Lei,MAO Wenji.Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events[J].Pattern Recognition and Artificial Intelligence,2018,31(12):1074-1084.
Authors:LIN Junjie  WANG Lei  MAO Wenji
Affiliation:1.State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049
Abstract:Existing methods for standpoint analysis mainly train standpoint classification models in a supervised or unsupervised manner. It usually needs a large number of labeled data to support the training of supervised models. In contrast, the performance of unsupervised models differs greatly from that of the supervised models. To reduce the demand of labeled data in model training, and meanwhile to ensure model performance, this paper proposes a semi-supervised self-training method for multiple standpoint analysis based on social media texts related to social events. For self-training methods, selecting and adding high-quality data to the training dataset play a key role in improving the performance of classification models during the iterative training process. The proposed method first measures the classification confidence of texts based on user-level standpoint consistency. It then leverages topic information to select high-quality texts to expand the training dataset, so as to constantly improve the performance of the model. Experimental results show that the proposed method can achieve better performance in standpoint classification compared with the representative methods in the related work as well as other semi-supervised model training methods. In addition, both the user-level standpoint consistency and topic information used in the method contribute to improve the performance of standpoint classification.
Keywords:Multiple Standpoint Analysis  Semi-supervised  Self-training  User-Level Standpoint Consistency  Topic Information  
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号