针对标记数据不足的数据流分类器 Data stream classifier with limited labelled data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

针对标记数据不足的数据流分类器

引用本文：	熊忠阳,周兴勤,张玉芳. 针对标记数据不足的数据流分类器[J]. 计算机工程与应用, 2015, 51(6): 124-128

作者姓名：	熊忠阳周兴勤张玉芳

作者单位：	重庆大学计算机学院，重庆 400030

摘要：	大部分数据流分类算法解决了数据流无限长度和概念漂移这两个问题。但是，这些算法需要人工专家将全部实例都标记好作为训练集来训练分类器，这在数据流高速到达并需要快速分类的环境中是不现实的，因为标记实例需要时间和成本。此时，如果采用监督学习的方法来训练分类器，由于标记数据稀少将得到一个弱分类器。提出一种基于主动学习的数据流分类算法，该算法通过选择全部实例中的一小部分来人工标记，其中这小部分实例是分类置信度较低的样本，从而可以极大地减少需要人工标记的实例数量。实验结果表明，该算法可以在数据流存在概念漂移情况下，使用较少的标记数据对数据流训练出分类器，并且分类效果良好。
关键词：	数据流分类概念漂移主动学习
Data stream classifier with limited labelled data

XIONG Zhongyang , ZHOU Xingqin , ZHANG Yufang. Data stream classifier with limited labelled data[J]. Computer Engineering and Applications, 2015, 51(6): 124-128

Authors:	XIONG Zhongyang ZHOU Xingqin ZHANG Yufang

Affiliation:	School of Computer Science, Chongqing University, Chongqing 400030, China

Abstract:	Most algorithms for data streams have addressed the problems of infinite length and concept drifting. However, These algorithms need all instances to be labelled by human experts and then they use them as training set to get a classifier. It is impractical in a high-speed data stream environment because labelling instances are both time consuming and costly. Then if just using supervised learning method to train a classifier, a small number of labelled instances will get a poor classifier. This paper proposes a classification algorithm for data stream based on active learning. The method selects a small part of instances to be labelled, which have low confidence when classifying. Thus the number of instances needed to be labeled is greatly reduced. The experimental results show that the proposed method can use a small number of labelled data to classify the concept-drifting data streams correctly.

Keywords:	data streams classification concept drifting active learning
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏