首页 | 本学科首页   官方微博 | 高级检索  
     

基于C4. 5和NB混合模型的数据流分类算法
引用本文:李燕,张玉红,胡学钢. 基于C4. 5和NB混合模型的数据流分类算法[J]. 计算机科学, 2010, 37(12): 138-142
作者姓名:李燕  张玉红  胡学钢
作者单位:合肥工业大学计算机与信息学院,合肥,230009
基金项目:本文受国家973重点基础研究发展计划(2009CB326203),国家自然科学基金课题(60975034),安徽省自然科学基金课题(090412044)资助。
摘    要:具有概念漂移的含噪数据流的分类问题成为数据流挖掘领域研究的热点之一。提出了一种基于C4. 5和Naive I3ayes混合模型的数据流分类算法CDSMM。它以C4.5作为基分类器,采用朴素贝叶斯分类器过滤噪音,同时引入假设检验中的u检验方法检测概念漂移,动态更新模型。实验结果表明,CDSMM算法在处理带有噪音的概念漂移数据流时具有比同类算法更好的分类正确率。

关 键 词:数据流,概念漂移,分类,噪音

Classification Algorithm for Data Stream Based on Mixture Models of C4. 5 and NB
LI Yan,ZHANG Yu-hong,HU Xue-gang. Classification Algorithm for Data Stream Based on Mixture Models of C4. 5 and NB[J]. Computer Science, 2010, 37(12): 138-142
Authors:LI Yan  ZHANG Yu-hong  HU Xue-gang
Affiliation:(School of Computer and Information, Hefei University of Technology, Hefei 230009, China)
Abstract:Classification on the noisy data stream with concept drifts has recently become one of the most popular topics in streaming data mining. Classification algorithm for mining Data Streams based on Mixture Models of C4. 5 and NB was proposed,called CDSMM,in which decision trees based on C;4. 5 are selected as the basic classifiers and the classifier of NaW a I3ayes is adopted to filter noise data. Meanwhile, it introduces the p-hypothesis testing method to detect concept drifts. Extensive studies demonstrate that CDSMM is superior to several existing algorithms in the predictive accuracy when handling noisy data streams with concept drifts.
Keywords:Data streams  Concept drifts  Classification  Noise
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号