首页 | 本学科首页   官方微博 | 高级检索  
     

一种Web事务识别的新模型及其频繁路径挖掘
引用本文:战立强,刘大昕. 一种Web事务识别的新模型及其频繁路径挖掘[J]. 哈尔滨工程大学学报, 2005, 26(6): 758-762
作者姓名:战立强  刘大昕
作者单位:哈尔滨工程大学,计算机科学与技术学院,黑龙江,哈尔滨,150001;哈尔滨工程大学,计算机科学与技术学院,黑龙江,哈尔滨,150001
摘    要:针对已有Web事务识别模型的缺点,提出一种识别Web事务的新模型———IPRC模型.该模型根据主索引页上的引用以及文档目录结构将网页分类,并以此作为识别Web事务的依据.在此基础上提出了一种挖掘频繁访问模式的算法WDHP,该算法继承了DHP算法使用hash树过滤候选集以及裁剪数据库的基本方法,并以访问路径树的方式将数据库存储于内存,在内存中完成后继的挖掘,不仅减少了扫描数据库的次数,而且大大降低了算法的时间复杂性.实验表明WDHP算法不仅优于DHP算法,而且也优于典型的基于内存的WAP算法.

关 键 词:频繁访问模式  AP-树  hash-表
文章编号:1006-7043(2005)06-0758-05
收稿时间:2005-09-21
修稿时间:2005-09-21

A new model for identifying internet transactions and mining frequent access paths mining
ZHAN Li-qiang,LIU Da-xin. A new model for identifying internet transactions and mining frequent access paths mining[J]. Journal of Harbin Engineering University, 2005, 26(6): 758-762
Authors:ZHAN Li-qiang  LIU Da-xin
Affiliation:School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
Abstract:Existing models for internet transaction identification are limited.A new model is proposed that defines page categories according to the links on an index page and the structure of a file directory of a site.It also groups internet log data into web transactions according to page categories.Based on this, a new algorithm for frequent access paths mining is also proposed.The new algorithm inherits from direct hashing and pruning(DHP) the advantages of using a hash table to filter the candidate set and trimming the database.It puts the database into the main memory using a tree and finds frequent patterns on the tree. The algorithm needs less database scanning and significantly improves its time complicity.Experiments show that the algorithm outperforms the DHP algorithm and web access pattern(WAP),which is based on main memory in execution efficiency.
Keywords:frequent access pattern  access point(AP) tree  hash table  internet transaction  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号