首页 | 本学科首页   官方微博 | 高级检索  
     

基于时间密度的Web日志用户浏览行为分析
引用本文:庄力可 张长水 勒中坚. 基于时间密度的Web日志用户浏览行为分析[J]. 计算机科学, 2004, 31(4): 108-112
作者姓名:庄力可 张长水 勒中坚
作者单位:清华大学智能技术与系统国家重点实验室,北京,100084;江西财经大学计算机系,南昌,330000
摘    要:本文针对Web日志中用户会话识别阈值问题,给出一种基于时间密度的频度分析方法。文中首先将基于时间间膈参数刻度的用户访问频度定义为一个随机向量,给出了随机向量的切尾算法;然后建立频度与IP用户的相关矩阵,矩阵的列为访问频度,矩阵的行为用户IP,矩阵中的每一个值为某一时间间膈的访问频度。通过列向量的聚类分析,对不同类别用户的访问行为进行探讨。最后,对会话识别的阈值进行参数估计,并通过抽样对阈值进行检测和参数修正。

关 键 词:Web日志挖掘  时间间隔  频度分布  随机向量  会话阈值

Analysis of Browsing Behaviour in Web Log Based on Time Density
ZHUANG Li-Ke ZHANG Chan-Sui LE Zhong-Jiang. Analysis of Browsing Behaviour in Web Log Based on Time Density[J]. Computer Science, 2004, 31(4): 108-112
Authors:ZHUANG Li-Ke ZHANG Chan-Sui LE Zhong-Jiang
Abstract:Facing the threshold of session recognize in Web log mining, a frequency analysis method based on time interval is introduced. First, the visitor frequency of user based on scale parameter of time interval is defined as a random vector. The cut-tail algorithm for random vector is also given. Second, a frequency-user IP relevant matrix is set up, where frequency is taken as row and user IP is taken as column, and each element's value of this matrix is the user's visitor frequency on the time interval. The different IP users are classified by measuring similarity between column vectors and the browsing behaviour is also discussed. Finally, the parametric estimation and test of threshold of session recognize are given by further sampling-
Keywords:Web log mining   Time interval   Frequency distribution   Random vector   Threshold of session  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号