首页 | 本学科首页   官方微博 | 高级检索  
     

基于量值的频繁闭项集层次聚类算法
引用本文:延皓,张博,刘芳,雷振明.基于量值的频繁闭项集层次聚类算法[J].北京邮电大学学报,2011,34(6):64-68.
作者姓名:延皓  张博  刘芳  雷振明
作者单位:1. 北京邮电大学信息与通信工程学院宽带网络流量监控教研中心
2.
3. 北京邮电大学
4. 北京邮电大学信息工程学院
基金项目:国家自然科学基金项目(61072061);高等学校学科创新引智计划项目(B08004)
摘    要:提出了基于量值的频繁闭项集层次聚类算法CFIHCQ,并将其应用于Web使用挖掘。该算法首先通过用户Web访问数据获取频繁闭项集;其次,以频繁闭项集对簇进行初始化,并以打分的方式将用户指入唯一簇;再次按照簇标记生成自上而下的簇树结构,并使用用户访问向量分裂子簇;最后,对簇树进行剪枝。实验表明,该算法能够很好的预测用户Web访问行为;在海量用户数据情况下,可满足实时挖掘的需求;并能以树结构展示挖掘结果。

关 键 词:web使用挖掘  聚类  频繁闭项集
收稿时间:2011-01-20
修稿时间:2011-05-18

Closed Frequent Itemsets Hierarchical Clustering based on Items’ Quantities
YAN Hao , ZHANG Bo , LIU Fang , LEI Zhen-ming.Closed Frequent Itemsets Hierarchical Clustering based on Items’ Quantities[J].Journal of Beijing University of Posts and Telecommunications,2011,34(6):64-68.
Authors:YAN Hao  ZHANG Bo  LIU Fang  LEI Zhen-ming
Affiliation:(School of Information and Communication Engineering,Beijing University of Posts and Telecommunications,Beijing 100876,China)
Abstract:A Web Usage Mining algorithm named Closed Frequent Itemsets Hierarchical Clustering based on Quantities (CFIHCQ) is proposed. The algorithm first obtains Closed Frequent Itemsets with network user Web access data. Then it initial clusters with Closed Frequent Itemsets and points users in to the only cluster using scoring method. After that, it construct cluster tree using cluster labels. User access vectors are used to divide sub-clusters in cluster tree. Finally the cluster tree is pruned. Experimental results indicate CFIHCQ has many advantages such as accurate predicating network user Web access behavior, real-time mining in huge data set, and easy-browse result with tree structure.
Keywords:
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京邮电大学学报》浏览原始摘要信息
点击此处可从《北京邮电大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号