共查询到20条相似文献,搜索用时 8 毫秒
1.
2.
Web数据挖掘 总被引:30,自引:4,他引:26
Web Mining is an important branch in Data Mining.It attracts more research interest for rapidly developing Internet. Web Mining includes(1)Web Content Mining;(g)Web Usage Mining;(3) Web structure Mining.In this paper we define Web Mining and present an overview of the various research issues,techniques and development efforts. 相似文献
3.
Web使用挖掘系统研制中的主要问题和应对策略 总被引:6,自引:0,他引:6
With the rapid development of WWW,Web Usage Mining,as well as Web Mining,has become a hot direction in academic and industrial circles.It is generally believed that there are three tasks,preprocessing,knowledge discovery and pattern analysis,in Web Usage Mining.Though Web Usage Mining is still ranged in the application of traditional data mining techniques,in view of changes in application environment and operated data concerned,some new difficulties have arisen accordingly.This paper takes efforts to address such challenges in the three phases and introduces some proposed solutions simultaneously. 相似文献
4.
5.
自适应Web站点:挑战与机遇 总被引:6,自引:0,他引:6
1 引言万维网(World Wide Web)已经成为信息传播、交流与共享的主要媒体。在全球Web站点数目迅速增长的同时,各个Web站点的信息量及其复杂度也在迅速上升,包含成千上万个网页与超链接是很平常的。由于以下的因素,数据密集型Web站点的设计与管理也变得越来越困难: 相似文献
6.
论述了通用Web日志挖掘系统的总体结构以及它的设计实现过程,为用户从Internet中提取知识,改进站点设计提供帮助. 相似文献
7.
基于Web日志挖掘的个性化服务站点 总被引:2,自引:1,他引:2
介绍个性化站点的概念,并对Web日志挖掘系统体系结构进行分析。其后将关联规则挖掘技术应用到日志事务会话中,在对日志数据的特性分析的基础上提出类Apriori挖掘算法。对类Apriori挖掘算法得到的频繁项集如何有效提取关联规则提出了最有效的方法。在实际应用中探讨了如何从多个匹配的关联规则中选择合适的匹配规则。 相似文献
8.
Web结构挖掘及其算法 总被引:10,自引:0,他引:10
随着网络和数据挖掘技术的发展,Web数据挖掘得到了较多的研究。该文从Web结构挖掘的角度出发,在分析了网络有向图的总体结构以及导航页面、目标页面和网络功能的基础上,研究了结构挖掘算法,针对Hub页面的多主题性、无关页面、无关链接等问题,提出了HITS算法的改进算法。 相似文献
9.
10.
Maria Carla Calzarossa Author Vitae Daniele Tessera Author Vitae 《Journal of Systems and Software》2008,81(12):2336-2344
The Web has become a ubiquitous tool for distributing knowledge and information and for conducting businesses. To exploit the huge potential of the Web as a global information repository, it is necessary to understand its dynamics. These issues are particularly important for news Web sites as they are expected to provide fresh information on current world events to a potentially large user population. This paper presents an experimental study aimed at characterizing and modeling the evolution of a news Web site. We focused on the MSNBC Web site as it is a good representative of its category in terms of structure, news coverage and popularity. Specifically, we analyzed how often and to what extent the content of this site changed and we identified models describing its dynamics. The study has shown that the rate of page creations and updates was characterized by some well defined patterns that varied as a function of time of day and day of week. On the contrary, the content of individual pages changed to a different extent. Most updates involved a very small fraction of their content, whereas very few were more extensive and spread over the whole page. By taking into accounts all these aspects, we derived analytical models able to accurately capture and reproduce the evolution of the news Web site. 相似文献
11.
Zhong Su Qiang Yang Hongjiang Zhang Xiaowei Xu Yu-Hen Hu Shaoping Ma 《Knowledge and Information Systems》2002,4(2):151-167
A great challenge for web site designers is how to ensure users' easy access to important web pages efficiently. In this
paper we present a clustering-based approach to address this problem. Our approach to this challenge is to perform efficient
and effective correlation analysis based on web logs and construct clusters of web pages to reflect the co-visit behavior
of web site users. We present a novel approach for adapting previous clustering algorithms that are designed for databases
in the problem domain of web page clustering, and show that our new methods can generate high-quality clusters for very large
web logs when previous methods fail. Based on the high-quality clustering results, we then apply the data-mined clustering
knowledge to the problem of adapting web interfaces to improve users' performance. We develop an automatic method for web
interface adaptation: by introducing index pages that minimize overall user browsing costs. The index pages are aimed at providing
short cuts for users to ensure that users get to their objective web pages fast, and we solve a previously open problem of
how to determine an optimal number of index pages. We empirically show that our approach performs better than many of the
previous algorithms based on experiments on several realistic web log files.
Received 25 November 2000 / Revised 15 March 2001 / Accepted in revised form 14 May 2001 相似文献
12.
随着Internet技术的发展,Web网页成为人们获取信息的有效途径,Web数据挖掘逐渐成为研究的热点。基于Web结构挖掘的PageRank算法存在不足的情况下,提出了一种改进的算法,实验结果证明改进的算法较原算法具有较好的效果,具有一定的实用价值。 相似文献
13.
14.
15.
16.
伍粤山 《数字社区&智能家居》2006,(6):27-28
Web上有海量的数据信息,对这些数据进行复杂的应用成了现令数据库技术的研究热点。这里对数据挖掘的基本概念、Web数据挖掘步骤、Web数据挖掘在三个研究领域的研究现状、发展及常用Web数据挖掘工具做了简单介绍,希望起到抛砖引玉作用。 相似文献
17.
Web数据挖掘初探 总被引:1,自引:0,他引:1
伍粤山 《数字社区&智能家居》2006,(17)
Web上有海量的数据信息,对这些数据进行复杂的应用成了现今数据库技术的研究热点。这里对数据挖掘的基本概念、Web数据挖掘步骤、Web数据挖掘在三个研究领域的研究现状、发展及常用Web数据挖掘工具做了简单介绍,希望起到抛砖引玉作用。 相似文献
18.
基于P2P的个性化Web搜索系统的设计与实现 总被引:1,自引:0,他引:1
针对中心化的Web信息搜索系统在覆盖率、及时性、个性化、可扩展性等方面存在的问题,提出了一种基于Peer-to-Peer(P2P)的可扩展、个性化的Web搜索系统PeerBridge。PeerBridge基于分布式哈希表组织大量的网络结点形成有组织的P2P覆盖网络,每个对等体作为一个主题搜索引擎,根据用户兴趣从Web中搜索特定主题相关的信息,而具有相似主题的对等体被聚集在一起形成基于主题的对等体簇,协作进行Web搜索与信息共享。并采用主题驱动的Web爬行、基于语义概念的文档分类、个性化的链接分析和基于主题划分的P2P搜索等机制来改善PeerBridge的性能。 相似文献
19.
Web使用挖掘是近年来Web数据挖掘中的研究热点。针对传统遗传算法在提取关联规则问题时常采用固定染色体交叉概率和染色体变异概率,容易出现早熟、收敛速度较慢的问题,提出了改进的遗传算法,并在关联规则的提取中增加了用户页面兴趣度这一阈值,成功地运用到某商业网站服务器日志挖掘。实验证明,这种改进的遗传算法能够有效避免早熟收敛现象,是一种有效的方法。 相似文献