首页 | 本学科首页   官方微博 | 高级检索  
     

在数据流管理系统中实现快速决策树算法(英文)
引用本文:袁磊,张阳,李梅,李雪,王勇.在数据流管理系统中实现快速决策树算法(英文)[J].计算机科学与探索,2010,4(8):673-682.
作者姓名:袁磊  张阳  李梅  李雪  王勇
作者单位:1. 西北农林科技大学,机械与电子工程学院,陕西,杨凌,712100
2. 西北农林科技大学,信息工程学院,陕西,杨凌,712100
3. 昆士兰大学,信息技术与电子工程系,布里斯班,4072,澳大利亚
4. 西北工业大学,计算机学院,西安,710072
基金项目:国家自然科学资金,中央高校基本科研业务费专项资金 
摘    要:在数据流管理系统(data stream management system,DSMS)中嵌入数据挖掘算法对数据库研究者是一项新的挑战,而在数据流管理系统中嵌入快速决策树(very fast decision tree,VFDT),尚未见报道。利用DSMS原有的机制在Esper中实现了VFDT算法。其主要思想是将VFDT算法转换为Esper的数据查询语言(Esper query language,EQL)。给出了在DSMS中实现VFDT算法的两种方法:普通方法。直接将VFDT算法转化为EQL语言并在DSMS中实现(记作DVFDT);改进方法。通过Esper中固有的批量处理模式来实现(记作optimal-DVFDT)。通过一系列实验比较分析了两种方法对海量数据流分类的准确率和性能;将提出的两种方法与用Java实现的VFDT算法(记作JVFDT)在分类精度和时间上进行比较。结果表明,在DSMS中实现的VFDT算法具有较好的性能,并且该算法对大规模数据流数据的子集同样具有较高的性能。

关 键 词:数据管理系统  VFDT算法  嵌入  分类
修稿时间: 

Programming the VFDT Algorithm in Data Stream Management System
YUAN Lei,ZHANG Yang,LI Mei,LI Xue,WANG Yong.Programming the VFDT Algorithm in Data Stream Management System[J].Journal of Frontier of Computer Science and Technology,2010,4(8):673-682.
Authors:YUAN Lei  ZHANG Yang  LI Mei  LI Xue  WANG Yong
Affiliation:1. College of Mechanical and Electronic Engineering, Northwest A&;F University, Yangling, Shaanxi 712100, China 2. College of Information Engineering, Northwest A&;F University, Yangling, Shaanxi 712100, China 3. School of Information Technology and Electrical Engineering, University of Queensland, Brisbane 4072, Australia 4. School of Computer, Northwestern Polytechnical University, Xi’an 710072, China
Abstract:Integrating data stream mining algorithm with data stream management system(DSMS)is a novel challenge for data mining and database researchers.But the integration of very fast decision tree(VFDT)with datastream management has not been reported till now.This paper focuses on integrating VFDT algorithm with Esper by exploiting capabilities of data stream management system(DSMS).How to transform the algorithm into efficient Esper query language(EQL)is analyzed,and two implementations for integrating the popular VFDT algorithm with DSMS are proposed:Transforming the VFDT algorithm into EQL straightforwardly(denoted by DVFDT);an optimized version of DVFDT based on the inherent batch mode of Esper(denoted by optimal-DVFDT).The proposed implementations with VFDT based on Java(denoted by JVFDTl in terms of classification accuracy and performance are compared.Experiments on a set of large volume of synthetic data show the implementation works efficiently and accurately.In addition,this approach also has better performance for the sub-streams of the original data stream.
Keywords:data stream management system(DSMS)  very fast decision tree(VFDT)algorithm  integration  classification
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机科学与探索》浏览原始摘要信息
点击此处可从《计算机科学与探索》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号