首页 | 本学科首页   官方微博 | 高级检索  
     

高维稀疏数据频繁项集挖掘算法的研究
引用本文:闫 珍,皮德常,吴文昊. 高维稀疏数据频繁项集挖掘算法的研究[J]. 计算机科学, 2011, 38(6): 183-186
作者姓名:闫 珍  皮德常  吴文昊
作者单位:1. 南京航空航天大学信息科学与技术学院,南京,210016
2. 复旦大学计算机科学与技术学院,上海,200433
基金项目:本文受国防技术基础研究和国家高技术研究发展计划(863计划)项目(2007AAO1Z404)资助。
摘    要:传统挖掘算法不适用于挖掘高维稀疏数据集.提出了一种针对高维稀疏数据的频繁项集挖掘算法FIHS.FIHS引入了一种新的数据结构用来存储频繁项集,该结构不但可以减少存储空间,而且可以降低计数代价.该算法只需扫描一次数据集,通过优化连接剪枝操作避免产生非频繁的候选项集,基于K-频繁项集使用"与"、"或"操作产生K+1-频繁项...

关 键 词:高维数据  稀疏数据  频繁项集  存储结构

Research on Frequent Itemsets Mining Algorithm Based on High-dimensional Sparse Dataset
YAN Zhen,PI De-chang,WU Wen-hao. Research on Frequent Itemsets Mining Algorithm Based on High-dimensional Sparse Dataset[J]. Computer Science, 2011, 38(6): 183-186
Authors:YAN Zhen  PI De-chang  WU Wen-hao
Abstract:The traditional mining algorithms arc not applicable to mine high-dimensional sparse dataset,a new frequent itemsets mining algorithm based on high-dimensional sparse dataset named FIRS (Frequent mining algorithm based on High-dimensional Sparse dataset) was proposed in this paper. FIHS adopts a new data structure to store frequcnt itemsets, using this structure can reduce the storage space and the cost of counting. FIHS can avoid generating infrectuent candidate itemsets through optimizing the operation of connection and pruning,which rectuires scan the dataset once. What's more,just by applying ANIX)R operation,frequcnt K+1-itemsets can be created according to frequent K-itemsets, and the maintenance of the data structure is simple. According to theoretical analysis and experiments, the improved algorithm enjoys many advantages aiming at high-dimensional sparse dataset, such as quick mining, less memory spacc,etc.
Keywords:High-dimensional data  Sparse data  Frequent itemsets  Data structure
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号