首页 | 本学科首页   官方微博 | 高级检索  
     

一种高效的基于排序二叉树的数据流挖掘算法
引用本文:何昭青.一种高效的基于排序二叉树的数据流挖掘算法[J].计算机工程与科学,2008,30(11):151-154.
作者姓名:何昭青
作者单位:湖南第一师范学院信息技术系,湖南,长沙,410205;国防科技大学计算机学院,湖南,长沙,410073
基金项目:湖南省教育厅科研项目,湖南省自然科学基金
摘    要:数据流挖掘分类技术是数据挖掘领域非常具有挑战性的工作。VFDT利用Hoeffding不等式很好地解决了在数据流上进行单遍扫描获取高精度决策树的问题;VFDTc改进了V-FDT ,使其能够处理连续属性。基于VFDT和VFDTc,我们设计并实现了一种基于排序二叉树的高效算法V-FDT-BSTree。该算法解决了VFDTc中存在的问题,提高了样本动态插入和最 佳划分节点选取的速度,从而提高了分类速度。实验结果表明,VFDT-BSTree在保持决策树大小和分类精度不变的基础上,执行时间相比VFDT平均减少32.25%,比VFDTc平均均减少24.96%。

关 键 词:数据流  排序二叉树  连续属性

An Efficient Data Stream Mining Algorithm Based on Binary Search Trees
HE Zhao-qing.An Efficient Data Stream Mining Algorithm Based on Binary Search Trees[J].Computer Engineering & Science,2008,30(11):151-154.
Authors:HE Zhao-qing
Abstract:Data stream mining classification is a very challenging job in the field of data mining.VFDT is a one-pass algorithm for decision tree construction.It uses the Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed.VFDTc improves VFDT,and make it be able to process continuous attributes.Based on VFDT and VFDTc,we design and realize an efficient algorithm VFDT-BSTree based on binary search trees.The algorithm solves the problems existing in VFDTc,and increases the speeds of dynamic sample insertion and best split node selection,and thus improves the speed of classification.The experimental results show that VFDT-BSTree's time is 32.25% less than that of VFDT,and 24.96% less than that of VFDTc on average,while the same tree size and accuracy are kept.
Keywords:data streams  binary search tree  continuous attribute
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号