首页 | 本学科首页   官方微博 | 高级检索  
     

结合自助抽样的动态数据流贝叶斯分类算法
引用本文:琚春华,殷贤君,许翀寰.结合自助抽样的动态数据流贝叶斯分类算法[J].计算机工程与应用,2011,47(8):118-121.
作者姓名:琚春华  殷贤君  许翀寰
作者单位:浙江工商大学 计算机与信息工程学院,杭州 310018
基金项目:国家自然科学基金,浙江科技计划项目,浙江省自然科学基金重点项目,浙江省自然科学基金项目
摘    要:动态数据流具有数据量大、变化快、随机存取代价高、详细数据难以存储等特点,挖掘动态数据流对计算能力与存储能力要求非常高。针对动态数据流的以上特点,设计了一种基于自助抽样的动态数据流贝叶斯分类算法,算法运用滑动窗口模型对动态数据流进行处理分析。该模型以每个窗口的数据为基本单位,对窗口内的数据进行处理分析;算法采用自助抽样技术对待分类数据中的属性进行裁剪和优化,解决了数据属性间的多重线性相关问题;算法结合贝叶斯算法的特点,采用动态增量存储树来解决动态样本数据流的存储问题,实现了无限动态数据流无信息失真的静态有限存储,解决了动态数据流挖掘最大的难题——数据存储;对优化的待分类数据使用all-贝叶斯分类器和k-贝叶斯分类器进行分类,结合数据流的特性对两个分类器进行实时更新。该算法有效克服了贝叶斯分类属性独立性的约束和传统贝叶斯只对静态数据分类的缺点,克服了动态数据流最大的难题——数据存储问题。通过实验测试证明,基于自助抽样的贝叶斯分类具有很高的时效性和精确性。

关 键 词:数据流  自助抽样  贝叶斯分类  滑动窗口  增量存储树  
修稿时间: 

Bayesian classification algorithm of dynamic data stream based on bootstrap
JU Chunhua,YIN Xianjun,XU Chonghuan.Bayesian classification algorithm of dynamic data stream based on bootstrap[J].Computer Engineering and Applications,2011,47(8):118-121.
Authors:JU Chunhua  YIN Xianjun  XU Chonghuan
Affiliation:College of Computer Science & Information Engineering,Zhejiang Gongshang University,Hangzhou 310018,China
Abstract:Dynamic data streams have features of large data,instant change,costly random access and difficult storage of detailed data,so mining of such dynamic data streams puts forwards high requirements on the computing power and storage capacity.According to the above features,a Bayesian classification algorithm of dynamic data stream based on bootstrap is proposed to process and analyze dynamic data streams with the sliding window model.This model,taking data of each window as the basic unit,processes and analyzes the data of windows.The algorithm adopts the bootstrap method to cut and optimize the attributes of data to be classified,solving the problem in multi-linear inter-relation between data attributes.The algorithm,combining characteristics of Bayesian algorithm,adopts the dynamic incremental storage tree to store the dynamic sample data stream to realize the static finite storage of infinite dynamic data streams without distortion of information and ultimately solve the biggest problem in dynamic data stream mining——data storage.The all-Bayesian classifier and k-Bayesian classifier are adopted to classify the optimized data,and their updates are made according to the features of data streams.This algorithm overcomes the attribute independence of the Bayesian classifier and its limitation only to the static data.It overcomes the biggest problem of dynamic data stream——the data storage.Experimental tests prove that the Bayesian classification algorithm based on bootstrap has high timeliness and accuracy.
Keywords:data stream bootstrap Bayesian classification sliding window incremental storage tree
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号