首页 | 本学科首页   官方微博 | 高级检索  
     

数据挖掘取样方法的衡量与选用研究
引用本文:胡文瑜,蔡文培.数据挖掘取样方法的衡量与选用研究[J].福建建筑高等专科学校学报,2011(4):351-356.
作者姓名:胡文瑜  蔡文培
作者单位:福建工程学院计算机与信息科学系,福建福州350108
基金项目:福建省教育厅科技项目(JA08161)
摘    要:取样是一种通用有效的近似技术。在数据挖掘研究中,取样方法可显著减小所处理数据集的规模,使得众多数据挖掘算法得以应用到大规模数据集以及数据流数据上。文章在研究了统计学上随机均匀取样方法误差统计和衡量方法的基础上,着重探讨和研究了适用于数据挖掘领域的取样方法衡量标准以及影响取样方法选择的因素等问题,提出了能更好地评估取样质量,尤其是偏倚取样方法取样质量的"取样方法代表性"和"取样偏差"等概念并进行了量化,最后对数据挖掘取样方法的衡量标准和选用研究的后续工作与研究方向进行了阐述。

关 键 词:数据挖掘  均匀取样  偏倚取样  取样偏差  取样代表性  衡量与选用

Research on measure and selection of sampling methods in data mining
Hu Wenyu,Cai Wenpei.Research on measure and selection of sampling methods in data mining[J].Journal of Fujian College of Architecture & C.E.,2011(4):351-356.
Authors:Hu Wenyu  Cai Wenpei
Affiliation:(Computer and Information Science Department,Fujian University of Technology,Fuzhou 350108,China)
Abstract:Sampling is a useful and efficient approximation technique,which enables lots of algorithms to be applied to huge dataset by dramatically scaling down dataset for data mining and data stream mining.Based on review of error statistics and measure of random uniform sampling techniques,measure of sampling methods in data mining and the factors to be considered in selecting appropriate sampling algorithms for data mining task were discussed and explored.The quantifying of the "representativeness" of a sample and "sample deviation" was conducted to make a more appropriate measure for biased sampling methods.The direction of farther research on measure and selection of sampling techniques for data mining model was also indicated.
Keywords:data mining  uniform sampling  biased sampling  sample deviation  representativeness of a sample  measure and selection
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号