首页 | 本学科首页   官方微博 | 高级检索  
     

A semi-random multiple decision-tree algorithm for mining data streams
作者姓名:Xue-Gang Hu  Pei-Pei Li  Xin-Dong Wu  and Gong-Qing Wu
作者单位:School of Computer Science and Information Engineering Hefei University of Technology Hefei 230009,China,School of Computer Science and Information Engineering Hefei University of Technology,Hefei 230009,China,School of Computer Science and Information Engineering Hefei University of Technology,Hefei 230009,China Department of Computer Science,University of Vermont,Burlington,VT 50405,U.S.A.,School of Computer Science and Information Engineering Hefei University of Technology,Hefei 230009,China
基金项目:This research is supported by the National Natural Science Foundation of China(Grant No.60573174),the Natural Science Foundation of Anhui Province of China(Grant No.050420207).
摘    要:Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.

关 键 词:随机决策树  数据流  数据采集  分类算法
修稿时间:2005-01-15

A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams
Xue-Gang Hu,Pei-Pei Li,Xin-Dong Wu,and Gong-Qing Wu.A semi-random multiple decision-tree algorithm for mining data streams[J].Journal of Computer Science and Technology,2007,22(5):711-724.
Authors:Xue-Gang Hu  Pei-Pei Li  Xin-Dong Wu  Gong-Qing Wu
Affiliation:1.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China ;2.Department of Computer Science, University of Vermont, Burlington, VT 50405, U.S.A.
Abstract:Mining with streaming data is a hot topic in data mining.When performing classification on data streams, traditional classification algorithms based on decision trees,such as ID3 and C4.5,have a relatively poor efficiency in both time and space due to the characteristics of streaming data.There are some advantages in time and space when using random decision trees.An incremental algorithm for mining data streams,SRMTDS(Semi-Random Multiple decision Trees for Data Streams),based on random decision trees is proposed in this paper.SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples,a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes,and a Naive Bayes classifier to estimate the class labels of tree leaves.Our extensive experimental study shows that SRMTDS has an improved performance in time,space,accuracy and the anti-noise capability in comparison with VFDTc,a state-of-the-art decision-tree algorithm for classifying data streams.
Keywords:data streams  Naive Bayes  random decision trees
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
点击此处可从《计算机科学技术学报》浏览原始摘要信息
点击此处可从《计算机科学技术学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号