首页 | 本学科首页   官方微博 | 高级检索  
     

具有回忆和遗忘机制的数据流挖掘模型与算法
引用本文:赵强利,蒋艳凰,卢宇彤.具有回忆和遗忘机制的数据流挖掘模型与算法[J].软件学报,2015,26(10):2567-2580.
作者姓名:赵强利  蒋艳凰  卢宇彤
作者单位:湖南商学院 计算机与信息工程学院, 湖南 长沙 410205,高性能计算国家重点实验室国防科学技术大学, 湖南 长沙 410073,高性能计算国家重点实验室国防科学技术大学, 湖南 长沙 410073
基金项目:国家自然科学基金(61272141,60905032,61120106005,61273232)
摘    要:集成式数据流挖掘是对存在概念漂移的数据流进行学习的重要方法.针对传统集成式数据流挖掘存在的缺陷,将人类的回忆和遗忘机制引入到数据流挖掘中,提出基于记忆的数据流挖掘模型MDSM(memorizing based data stream mining).该模型将基分类器看作是系统获得的知识,通过"回忆与遗忘"机制,不仅使历史上有用的基分类器因记忆强度高而保存在"记忆库"中,提高预测的稳定性,而且从"记忆库"中选取当前分类效果好的基分类器参与集成预测,以提高对概念变化的适应能力.基于MDSM模型,提出了一种集成式数据流挖掘算法MAE(memorizing based adaptive ensemble),该算法利用Ebbinghaus遗忘曲线对系统的遗忘机制进行设计,并利用选择性集成来模拟人类的"回忆"机制.与4种典型的数据流挖掘算法进行比较,结果表明:MAE算法分类精度高,对概念漂移的整体适应能力强,尤其对重复出现的概念漂移以及实际应用中存在的复杂概念漂移具有很好的适应能力.不仅能够快速适应新的概念变化,并且能够有效抵御随机的概念波动对系统性能的影响.

关 键 词:数据流挖掘  概念漂移  回忆与遗忘  Ebbinghaus遗忘曲线  选择性集成
收稿时间:2014/7/31 0:00:00
修稿时间:9/3/2014 12:00:00 AM

Ensemble Model and Algorithm with Recalling and Forgetting Mechanisms for Data Stream Mining
ZHAO Qiang-Li,JIANG Yan-Huang and LU Yu-Tong.Ensemble Model and Algorithm with Recalling and Forgetting Mechanisms for Data Stream Mining[J].Journal of Software,2015,26(10):2567-2580.
Authors:ZHAO Qiang-Li  JIANG Yan-Huang and LU Yu-Tong
Affiliation:School of Computer and Information Engineering, Hu'nan University of Commerce, Changsha 410205, China,State Key Laboratory of High Performance Computing National University of Defense Technology, Changsha 410073, China and State Key Laboratory of High Performance Computing National University of Defense Technology, Changsha 410073, China
Abstract:Using ensemble of classifiers on sequential chunks of training instances is a popular strategy for data stream mining with concept drifts. Aiming at the limitations of existing approaches, this paper introduces human recalling and forgetting mechanisms into a data stream mining system, and proposes a memorizing based data stream mining (MDSM) model. The model considers base classifiers as learned knowledge. Through "recalling and forgetting" mechanism, most useful classifiers in the past will be reserved in a "memory repository", which improves the stability under random concept drifts. The best classifiers for the current data chunk are selected for prediction, which achieves high adaptability for different concept drifts. Based on MSDM, the paper puts forward a new algorithm MAE (memorizing based adaptive ensemble). MAE uses Ebbinghaus forgetting curve as forgetting mechanism and adopts ensemble pruning to emulate the "recalling" mechanism. Compared with four traditional data stream mining approaches, the results show that MAE achieves high and stable accuracy with moderate training time. The results also proved that MAE has good adaptability for different kinds of concept drifts, especially for the applications with recurring or complex concept drifts.
Keywords:data stream mining  concept drift  recalling and forgetting  Ebbinghaus forgetting curve  ensemble pruning
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号