首页 | 本学科首页   官方微博 | 高级检索  
     

一种新的两阶段抽样算法
引用本文:马光志,张耀坤.一种新的两阶段抽样算法[J].计算机工程与科学,2007,29(7):64-66.
作者姓名:马光志  张耀坤
作者单位:华中科技大学计算机科学与技术学院,湖北,武汉,430074
摘    要:两阶段抽样算法从海量数据集中抽取样本数据用于数据挖掘,当数据集规模过大时算法效率偏低,当数据集规模过大且为稀疏数据集时抽样精度偏低。本文改进了传统两阶段抽样算法,提出新的抽样算法EAFAST,可自适应地调节算法参数,而且能充分利用历史信息进行启发式搜索。实验证明,EAFAST算法可同时提高算法效率和抽样精度,弥补了传统算法的不足。

关 键 词:抽样  两阶段  频繁项目集  剪枝  精度
文章编号:1007-130X(2007)07-0064-03
修稿时间:2006-09-052006-12-15

A New Two-Phase Sampling Algorithm
MA Guang-zhi,ZHANG Yao-kun.A New Two-Phase Sampling Algorithm[J].Computer Engineering & Science,2007,29(7):64-66.
Authors:MA Guang-zhi  ZHANG Yao-kun
Affiliation:School of Computer Science and Technology,Huazhong University of Science and Technology, Wuhan 430074,China
Abstract:Traditional two-phase sampling algorithms extract the sample data used on data mining from a huge data set. The algorithm efficiency is lower when the data set is oversized, and the sample accuracy is lower when the data set is an oversized sparse one. By improving the traditional two-phase sampling algorithms, the paper presents a new sampling algorithm named EAFAST, which adjusts algorithm parameters adaptively and performs heuristic search using the historical information fully. Experiments demonstrate EAFAST can enhance the efficiency and sample accuracy simultaneously,and thus remedies the insufficiencies of traditional algorithms.
Keywords:sample  two-phase  frequent item set  trim  accuracy
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号