首页 | 本学科首页   官方微博 | 高级检索  
     

针对标记数据不足的数据流分类器
引用本文:熊忠阳,周兴勤,张玉芳. 针对标记数据不足的数据流分类器[J]. 计算机工程与应用, 2015, 51(6): 124-128
作者姓名:熊忠阳  周兴勤  张玉芳
作者单位:重庆大学 计算机学院,重庆 400030
摘    要:大部分数据流分类算法解决了数据流无限长度和概念漂移这两个问题。但是,这些算法需要人工专家将全部实例都标记好作为训练集来训练分类器,这在数据流高速到达并需要快速分类的环境中是不现实的,因为标记实例需要时间和成本。此时,如果采用监督学习的方法来训练分类器,由于标记数据稀少将得到一个弱分类器。提出一种基于主动学习的数据流分类算法,该算法通过选择全部实例中的一小部分来人工标记,其中这小部分实例是分类置信度较低的样本,从而可以极大地减少需要人工标记的实例数量。实验结果表明,该算法可以在数据流存在概念漂移情况下,使用较少的标记数据对数据流训练出分类器,并且分类效果良好。

关 键 词:数据流  分类  概念漂移  主动学习  

Data stream classifier with limited labelled data
XIONG Zhongyang , ZHOU Xingqin , ZHANG Yufang. Data stream classifier with limited labelled data[J]. Computer Engineering and Applications, 2015, 51(6): 124-128
Authors:XIONG Zhongyang    ZHOU Xingqin    ZHANG Yufang
Affiliation:School of Computer Science, Chongqing University, Chongqing 400030, China
Abstract:Most algorithms for data streams have addressed the problems of infinite length and concept drifting. However, These algorithms need all instances to be labelled by human experts and then they use them as training set to get a classifier. It is impractical in a high-speed data stream environment because labelling instances are both time consuming and costly. Then if just using supervised learning method to train a classifier, a small number of labelled instances will get a poor classifier. This paper proposes a classification algorithm for data stream based on active learning. The method selects a small part of instances to be labelled, which have low confidence when classifying. Thus the number of instances needed to be labeled is greatly reduced. The experimental results show that the proposed method can use a small number of labelled data to classify the concept-drifting data streams correctly.
Keywords:data streams  classification  concept drifting  active learning
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号