面向社会事件的半监督自训练多方立场分析 Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向社会事件的半监督自训练多方立场分析

引用本文：	林俊杰,王磊,毛文吉.面向社会事件的半监督自训练多方立场分析[J].模式识别与人工智能,2018,31(12):1074-1084.

作者姓名：	林俊杰王磊毛文吉

作者单位：	1.中国科学院自动化研究所复杂系统管理与控制国家重点实验室北京 100190 2.中国科学院大学人工智能学院北京 100049

基金项目：	国家自然科学基金项目(No.71702181,11832001)资助

摘要：	已有的立场分析方法主要采用有监督或无监督方式训练立场分类模型,有监督模型训练通常需要大量有标注数据支持,而相比有监督模型,无监督模型的性能差距较大.为了降低模型训练对有标注训练数据的要求,同时保证模型性能,文中面向社会事件相关的社交媒体文本,提出半监督自训练多方立场分析方法.对于自训练方法,在模型迭代训练过程中,选择高质量样本加入训练集合,对提升模型性能起到关键作用.为此,文中方法首先根据用户立场一致性度量文本的分类置信度,然后利用话题信息进一步筛选高质量样本扩充训练集合,保证模型性能不断提升.实验表明,相比相关工作中的代表性方法和其它半监督模型训练方式,文中方法能够取得更优的立场分类效果,并且方法依据的用户立场一致性和话题信息均有助于提升立场分类效果.
关键词：	多方立场分析半监督自训练用户立场一致性话题信息
收稿时间：	2018-02-12
Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events

LIN Junjie,WANG Lei,MAO Wenji.Semi-supervised Self-training for Multiple Standpoint Analysis in Social Events[J].Pattern Recognition and Artificial Intelligence,2018,31(12):1074-1084.

Authors:	LIN Junjie WANG Lei MAO Wenji

Affiliation:	1.State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190 2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049

Abstract:	Existing methods for standpoint analysis mainly train standpoint classification models in a supervised or unsupervised manner. It usually needs a large number of labeled data to support the training of supervised models. In contrast, the performance of unsupervised models differs greatly from that of the supervised models. To reduce the demand of labeled data in model training, and meanwhile to ensure model performance, this paper proposes a semi-supervised self-training method for multiple standpoint analysis based on social media texts related to social events. For self-training methods, selecting and adding high-quality data to the training dataset play a key role in improving the performance of classification models during the iterative training process. The proposed method first measures the classification confidence of texts based on user-level standpoint consistency. It then leverages topic information to select high-quality texts to expand the training dataset, so as to constantly improve the performance of the model. Experimental results show that the proposed method can achieve better performance in standpoint classification compared with the representative methods in the related work as well as other semi-supervised model training methods. In addition, both the user-level standpoint consistency and topic information used in the method contribute to improve the performance of standpoint classification.

Keywords:	Multiple Standpoint Analysis Semi-supervised Self-training User-Level Standpoint Consistency Topic Information

	点击此处可从《模式识别与人工智能》浏览原始摘要信息
	点击此处可从《模式识别与人工智能》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏