首页 | 本学科首页   官方微博 | 高级检索  
     

Web日志中时态约束浏览模式挖掘算法研究
引用本文:宁慧,李红宇,吴培莲. Web日志中时态约束浏览模式挖掘算法研究[J]. 哈尔滨工业大学学报, 2008, 40(9): 1474-1480
作者姓名:宁慧  李红宇  吴培莲
作者单位:哈尔滨工程大学,计算机科学与技术学院,哈尔滨,150001;哈尔滨师范大学,阿城学院,哈尔滨,150301;哈尔滨工业大学,材料科学与工程学院,哈尔滨,150001
基金项目:国家自然科学基金,哈尔滨师范大学科研项目
摘    要:为了有效地从海量的Web日志中挖掘出有用的用户浏览模式,将顺序约束和时态约束加入到快速关联规则挖掘算法中,给出了一种基于时态约束的浏览模式挖掘算法FPMBTC.该算法简化了挖掘过程中候选模式的生成操作,对数据库扫描一次,求出所有事务的连续子序列集,利用集合交差运算求得支持度,同时逐步修正会话事务时间得到浏览模式的有效时间,根据网站结构及Web日志不断变化的特点,给出了增量更新挖掘算法.实验结果表明:与类Apriori算法相关工作相比,运行时间少,扩展性好,并且挖掘出的模式具有时效性,适合于不断变化的且有时态特点的Web日志信息的挖掘.此研究对于学习和研究Web挖掘技术具有很好的参考价值,对建造实际的Web挖掘系统具有重要的理论意义和实用价值.

关 键 词:Web日志挖掘  频繁访问模式  有效时间

An algorithm for temporal constraint browsing pattern mining in Weblogs
NING Hui,LI Hong-yu,WU Pei-lian. An algorithm for temporal constraint browsing pattern mining in Weblogs[J]. Journal of Harbin Institute of Technology, 2008, 40(9): 1474-1480
Authors:NING Hui  LI Hong-yu  WU Pei-lian
Affiliation:1.College of Computer Science and Technology,Harbin Engineering University,Harbin 150001,China;2.Acheng College,Harbin Normal University,Harbin 150301,China;3.College of Materials Science and Engineering,Harbin Institute of Technology,Harbin 150001,China)
Abstract:To effectively excavate useful browsing patterns from mass Weblogs,the sequential and temporal constraints are added in the quick mining algorithm based on the association rule in this paper.A browsing pattern mining algorithm based on temporal constraints:FPMBTC is presented.This algorithm simplifies the generation of candidate patterns.The continuous sub-sequence sets of all transactions were acquired by scanning over the database only once.The supporting degrees were calculated by the intersection and difference operation of sets.At the same time,the effective time of browsing patterns was obtained by the gradual correction for the session transaction time.On the basis of the above-mentioned process,the increment update algorithm was given according to the character of the continuous change in the structure of the homepage and the Weblogs.The experimental results show that the algorithm is able to excavate the patterns in a real-time way;meanwhile,it needs shorter running time and is more expandable than the Apriori-like algorithm.This approach suits to the mining of Weblogs which are in continuous change and with temporal feature,and can provide a good reference on learning and researching on Web mining technology.
Keywords:Weblog mining  frequent access patterns  valid time
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号