首页 | 本学科首页   官方微博 | 高级检索  
     

Web日志会话的个性化识别方法的研究
引用本文:董志锋,陈俊杰,付裕峰. Web日志会话的个性化识别方法的研究[J]. 计算机工程与应用, 2008, 44(8): 179-182. DOI: 10.3778/j.issn.1002-8331.2008.08.053
作者姓名:董志锋  陈俊杰  付裕峰
作者单位:太原理工大学,计算机与软件学院,太原,030024;山西省网络管理中心,系统部,太原,030001
基金项目:山西省自然科学基金( the Natural Science Foundation of Shanxi Province of China under Grant No.2006011030, No.2007011050)
摘    要:会话识别是Web日志挖掘中的重要步骤。针对目前的各种会话识别方法,提出了一种改进的基于页面内容、下载时间等多个参数综合得到的针对每个用户的个性化识别方法。该方法通过使用访问时间间隔,判断是否在极大、极小两个阈值范围内来识别会话。根据页面内容、站点结构确定页面重要程度,通过页面的信息容量确定用户正常的阅读时间,通过Web日志中页面下载时间来确定起始阅读时间,对以上因素进行综合后对该阈值进行调整。实验结果表明,相对于目前的对所有用户页面使用单一先验阈值进行会话识别的方法及使用针对用户页面的阈值动态调整方法,提出的方法能更准确地个性化确定出页面访问时间阈值,更为合理有效。

关 键 词:Web挖掘  会话识别  预处理  阈值
文章编号:1002-8331(2008)08-0179-04
收稿时间:2007-07-05
修稿时间:2007-10-29

Research on method for session identification in Web log mining
DONG Zhi-feng,CHEN Jun-jie,FU Yu-feng. Research on method for session identification in Web log mining[J]. Computer Engineering and Applications, 2008, 44(8): 179-182. DOI: 10.3778/j.issn.1002-8331.2008.08.053
Authors:DONG Zhi-feng  CHEN Jun-jie  FU Yu-feng
Affiliation:1.College of Computer Science and Software,Taiyuan University of Technology,Taiyuan 030024,China 2.Network Administration Center of Shanxi Province,Taiyuan 030001,China
Abstract:Session identification is an important step in Web log mining.Compared with the traditional static threshold methods,multi-parameters based dynamic threshold improvement is carried out.Its parameters contain the content of Web page,downloading time,etc,and it produces an individual threshold for different user.In this improvement,the Web log is divided into session at point where the access interval is between maximum threshold and minimal threshold.The threshold is adjusted by the page weightness based on site’s structure,normal read speed based on page contents and begin read time based on download time for different users.Compared with the traditional method that defines an uniform threshold for all Web pages and other methods that define different threshold for each Web page,experimentally,the approach presented can decide the access time threshold more accurately.It is more reasonable and effective.
Keywords:Web mining  session  data preprocessing  threshold
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号