高可伸缩性海量数据挖掘技术 Highly scalable algorithm for mining huge datasets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

高可伸缩性海量数据挖掘技术

引用本文：	刘君强,王勋,孙晓莹,刘仁平. 高可伸缩性海量数据挖掘技术[J]. 计算机工程与设计, 2003, 24(6): 27-29

作者姓名：	刘君强王勋孙晓莹刘仁平

作者单位：	杭州商学院计算机与信息工程学院,浙江,杭州,310035

基金项目：	浙江省自然科学基金资助项目(602140)，国家863计划基金(2002AA121064)，浙江省教育厅科研计划基金(20020635)

摘要：	挖掘海量数据从中发现有用的信息与知识是人们面临的重大挑战，而目前大多数挖掘算法对于海量数据的可伸缩性较差。针对频繁模式与关联规则挖掘问题，提出了数据集削减法，设计了相应的缓冲管理模型，将宽度与深度挖掘相结合，用于扩展Apriori和OpportumeProject两个算法。实验表明，扩展后的算法不仅可伸缩性大大提高，而且时间效率的提高也非常显著。
关键词：	数据挖掘数据管理频繁模式关联规则数据集分割法数据库海量数据
文章编号：	1000-7024(2003)06-0027-03
Highly scalable algorithm for mining huge datasets

LIU Jun-qiang,WANG Xun,SUN Xiao-ying,LIU Ren-ping. Highly scalable algorithm for mining huge datasets[J]. Computer Engineering and Design, 2003, 24(6): 27-29

Authors:	LIU Jun-qiang WANG Xun SUN Xiao-ying LIU Ren-ping

Abstract:	Discovery of knowledge from huge datasets is a challenge, while most of current algorithms do not scale well. In this paper, the dataset reduction method is proposed and the buffering technique is designed for mining frequent patterns and association rules from huge datasets. Combined with hybrid search strategy, the dataset reduction method and buffering technique are used to extend the algorithms Apriori and OpportuneProject. Experiments show that the extended algorithms are not only much more scalable to huge datasets, but also highly efficient.

Keywords:	data mining data management huge databases
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏