首页 | 本学科首页   官方微博 | 高级检索  
     

并行分布式的Web访问模式双层聚类
引用本文:贾晓莉,吴瑞,吴思颖.并行分布式的Web访问模式双层聚类[J].计算机工程与应用,2019,55(23):216-221.
作者姓名:贾晓莉  吴瑞  吴思颖
作者单位:山西师范大学 数学与计算机科学学院,山西 临汾,041004;山西师范大学 数学与计算机科学学院,山西 临汾,041004;山西师范大学 数学与计算机科学学院,山西 临汾,041004
基金项目:国家自然科学基金;山西省软科学基金
摘    要:Web日志挖掘可以通过对用户访问模式进行分析,以获取用户的访问兴趣程度。目前,大多数的web日志挖掘是基于频率的,其挖掘的信息没有太大的价值。而提出的聚类技术是基于访问时间的,使用模糊向量表示用户浏览模式,记录用户是否浏览过该页面以及停留的时间。通过不同的聚类方法对用户的访问序列进行聚类分析。将模糊粗糙k]-均值和夹角余弦相结合,提出了一种双层聚类技术,减少了对初始聚类中心的敏感性,并且通过一系列实验,论证了该聚类方法的可行性。而且,实验通过使用Davies-Bouldin指标来验证不同聚类方法的效果并进行比较。由于数据量大时,仍然存在算法效率低的问题,因此,使用MapReduce实现双层聚类的并行化,提高了聚类的效率。

关 键 词:web挖掘  模糊粗糙聚类  web访问模式  夹角余弦  并行

Parallel Distributed Web Access Patterns Two-Layer Clustering
JIA Xiaoli,WU Rui,WU Siying.Parallel Distributed Web Access Patterns Two-Layer Clustering[J].Computer Engineering and Applications,2019,55(23):216-221.
Authors:JIA Xiaoli  WU Rui  WU Siying
Affiliation:School of Mathematics and Computer, Shanxi Normal University, Linfen, Shanxi 041004, China
Abstract:Web log mining analyzes user access patterns to gain users’ level of interest. Currently, most web log mining is based on frequency, but the information that it mines is not of much value. In this paper, the proposed clustering technique is based on access time, firstly, the fuzzy vector is used to represent the user access patterns, recording whether the user has visited the page and the time of browsing. Then, the users’ access sequences are analyzed by different clustering methods. In addition, a two-layer clustering technique is proposed based on the fuzzy rough k]-means and angle cosine, which can reduce the sensitivity to the initial clustering center. And the feasibility of the clustering method is demonstrated by a series of experiments. The results of different clustering methods are verified by using the Davies-Bouldin index. When the data sets are too large, the algorithm is inefficient. Therefore, it uses MapReduce to realize the parallelism of two-layer clustering, improving the efficiency of clustering.
Keywords:web mining  fuzzy rough clustering  web access patterns  angle cosine  parallel  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号