首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
Exploiting Regularities in Web Traffic Patterns for Cache Replacement   总被引:2,自引:0,他引:2  
Cohen  Kaplan 《Algorithmica》2002,33(3):300-334
Abstract. Caching web pages at proxies and in web servers' memories can greatly enhance performance. Proxy caching is known to reduce network load and both proxy and server caching can significantly decrease latency. Web caching problems have different properties than traditional operating systems caching, and cache replacement can benefit by recognizing and exploiting these differences. We address two aspects of the predictability of traffic patterns: the overall load experienced by large proxy and web servers, and the distinct access patterns of individual pages. We formalize the notion of ``cache load' under various replacement policies, including LRU and LFU, and demonstrate that the trace of a large proxy server exhibits regular load. Predictable load allows for improved design, analysis, and experimental evaluation of replacement policies. We provide a simple and (near) optimal replacement policy when each page request has an associated distribution function on the next request time of the page. Without the predictable load assumption, no such online policy is possible and it is known that even obtaining an offline optimum is hard. For experiments, predictable load enables comparing and evaluating cache replacement policies using partial traces , containing requests made to only a subset of the pages. Our results are based on considering a simpler caching model which we call the interval caching model . We relate traditional and interval caching policies under predictable load, and derive (near)-optimal replacement policies from their optimal interval caching counterparts.  相似文献   

2.
WEB数据库应用程序安全性设计的一种实现   总被引:8,自引:0,他引:8  
Web应用程序和数据库结合可以创建动态页面,从而建设功能强大的商务网站。但是由于HTTP协议的无记忆性,使得每个Web页面相互独立,页面之间缺少必然的因果关系,可以通过URL跳过某个页面而直接去访问其它页面,使得基于Web应用程序数据访问的安全存在很大问题。文章对此进行了讨论,并给出解决该问题的一种方法,即用户所访问的页面必须经过权限验证页面的认可才能访问,并用程序进行了实现。  相似文献   

3.
Web站点导航是Web数据挖掘的一个重要研究领域,是准确理解用户访问网站行为的关键;传统Web站点导航技术很难全面反映出用户对页面浏览的兴趣程度,找到用户感兴趣页面路径准确度比较低;为提高找到用户感兴趣页面路径准确度,提出一种基于蚁群算法的Web站点导航技术;将网络用户看作人工的蚂蚁,用户的浏览兴趣作蚂蚁的信息素,通过利用Web日志数据采用正负反馈机制和路径概率选择机制建立一个Web站点导航模型,挖掘用户感兴趣页面的导航路径;仿真实验结果表明,基于蚁群算法的Web站点导航技术提高了找到用户感兴趣页面路径准确度,更加能够准确反映出用户的浏览兴趣,用于Web站点导航是可行的。  相似文献   

4.
Web prefetching is an attractive solution to reduce the network resources consumed by Web services as well as the access latencies perceived by Web users. Unlike Web caching, which exploits the temporal locality, Web prefetching utilizes the spatial locality of Web objects. Specifically, Web prefetching fetches objects that are likely to be accessed in the near future and stores them in advance. In this context, a sophisticated combination of these two techniques may cause significant improvements on the performance of the Web infrastructure. Considering that there have been several caching policies proposed in the past, the challenge is to extend them by using data mining techniques. In this paper, we present a clustering-based prefetching scheme where a graph-based clustering algorithm identifies clusters of “correlated” Web pages based on the users’ access patterns. This scheme can be integrated easily into a Web proxy server, improving its performance. Through a simulation environment, using a real data set, we show that the proposed integrated framework is robust and effective in improving the performance of the Web caching environment.  相似文献   

5.
一种新的基于Web日志的挖掘用户浏览偏爱路径的方法   总被引:2,自引:0,他引:2  
任永功  付玉  张亮  吕君义 《计算机科学》2008,35(10):192-196
提出了一种新的基于Web日志的挖掘用户浏览偏爱路径的方法.该方法首先在单元数组存储结构(存储矩阵)基础上建立以浏览兴趣度为基本元素的会话矩阵和路径矩阵.然后,在会话矩阵上采用两个页面向量夹角余弦作为相似用户的页面距离公式进行页面聚类,求得相似用户的相关页面集.最后,利用路径选择偏爱度在相似用户的路径矩阵上挖掘出相似用户的浏览偏爱路径.实验证明此方法是合理有效的,能够得到更准确的偏爱路径.  相似文献   

6.
Web中的行情数据获取与预测研究   总被引:1,自引:0,他引:1       下载免费PDF全文
抽取网页中的行情数据进行预测和分析具有重要意义。提出了Web中的行情数据抽取算法,该算法主要基于“行情数据通常在网页中表现为区域最大的数据表格”等实践规律,首先自动识别出最大的数据表格,然后转换为DOM树结构,最后抽取DOM树的结点值。与传统算法不同,算法自动抽取行情区域而无需用户定义抽取数据区域。设计了一个农产品价格预测原型系统,该系统针对某个农产品,自动从特定网站获取价格数据,对月度价格进行预测,实验表明预测性能较好。  相似文献   

7.
现有的Web缓存器的实现主要是基于传统的内存缓存算法,由于Web业务请求的异质性,传统的替换算法不能在Web环境中有效工作。研究了Web缓存替换操作的依据,分析了以往替换算法的不足,考虑到Web文档的大小、访问代价、访问频率、访问兴趣度以及最近一次被访问的时间对缓存替换的影响,提出了Web缓存对象角色的概念,建立了一种新的基于对象角色的高精度Web缓存替换算法(ORB算法);并以NASA和DEC的代理服务器数据为例,将该算法与LRU、LFU、SIZE、Hybrid算法进行了仿真实验对比,结果证明,ORB算  相似文献   

8.
对于一个日访问量达到万级以上的网站来说,浏览速度将会成为该系统的一个瓶颈。如果能够优化内容发布系统,并把非实时更新的页面转换成静态页面,将会使浏览速度得到显著的提升。本文提出一种基于Java和XML技术、在MVC2架构基础上的三级缓存机制,从静态页面、组件到数据对象分别进行缓存,根据点击率来更新缓存,有效降低了多层体系结构之间的通信量,提高了Web的响应性能。  相似文献   

9.
《Computer Networks》1999,31(11-16):1725-1736
The World-Wide Web provides remote access to pages using its own naming scheme (URLs), transfer protocol (HTTP), and cache algorithms. Not only does using these special-purpose mechanisms have performance implications, but they make it impossible for standard Unix applications to access the Web. Gecko is a system that provides access to the Web via the NFS protocol. URLs are mapped to Unix file names, providing unmodified applications access to Web pages; pages are transferred from the Gecko server to the clients using NFS instead of HTTP, significantly improving performance; and NFS's cache consistency mechanism ensures that all clients have the same version of a page. Applications access pages as they would Unix files. A client-side proxy translates HTTP requests into file accesses, allowing existing Web applications to use Gecko. Experiments performed on our prototype show that Gecko is able to provide this additional functionality at a performance level that exceeds that of HTTP.  相似文献   

10.
Asynchronous Web pages, which can change some of their content without reloading the whole page, have gained popularity recently. Although asynchronous pages are more difficult to store and access, and they complicate the browser's "Back" button operations, they support more flexible content presentation and thus enhance page performance. Ajax is a widely used technique to create such Web pages. The authors look at frameworks that help one work with Ajax.  相似文献   

11.
Given a user keyword query, current Web search engines return a list of individual Web pages ranked by their "goodness" with respect to the query. Thus, the basic unit for search and retrieval is an individual page, even though information on a topic is often spread across multiple pages. This degrades the quality of search results, especially for long or uncorrelated (multitopic) queries (in which individual keywords rarely occur together in the same document), where a single page is unlikely to satisfy the user's information need. We propose a technique that, given a keyword query, on the fly generates new pages, called composed pages, which contain all query keywords. The composed pages are generated by extracting and stitching together relevant pieces from hyperlinked Web pages and retaining links to the original Web pages. To rank the composed pages, we consider both the hyperlink structure of the original pages and the associations between the keywords within each page. Furthermore, we present and experimentally evaluate heuristic algorithms to efficiently generate the top composed pages. The quality of our method is compared to current approaches by using user surveys. Finally, we also show how our techniques can be used to perform query-specific summarization of Web pages.  相似文献   

12.
Web-log mining for predictive Web caching   总被引:3,自引:0,他引:3  
Caching is a well-known strategy for improving the performance of Web-based systems. The heart of a caching system is its page replacement policy, which selects the pages to be replaced in a cache when a request arrives. In this paper, we present a Web-log mining method for caching Web objects and use this algorithm to enhance the performance of Web caching systems. In our approach, we develop an n-gram-based prediction algorithm that can predict future Web requests. The prediction model is then used to extend the well-known GDSF caching policy. We empirically show that the system performance is improved using the predictive-caching approach.  相似文献   

13.
周文刚  马占欣 《微机发展》2007,17(4):120-124
对Web页进行必要的、有效的内容过滤对于营造健康、安全的网络环境具有重要的意义。重现用户成功访问过的Web页内容,可以对网络访问进行事后监督,为过滤机制的完善提供相应数据。文中分析了Web页的访问流程,基于HTTP代理服务器,在应用层实现了对Web页的关键字过滤和基于语义的内容过滤,并通过将客户机成功访问过的Web页存储在代理服务器硬盘上,实现了内容重现。试验表明,语义过滤能较好地甄别文本的不同观点,准确度较单纯关键字过滤有明显提高。  相似文献   

14.
结合网站内容和结构进行的Web日志挖掘   总被引:7,自引:1,他引:7  
提出一种以聚类为基础的Web日志挖掘方法:从Web日志事务、Web站点内容和Web站点结构3个不同方面来聚类页面集合,并通过将用户的访问记录和页面聚集进行匹配和相关度计算,来预测用户感兴趣的页面。  相似文献   

15.
Proxy caches are essential to improve the performance of the World Wide Web and to enhance user perceived latency. Appropriate cache management strategies are crucial to achieve these goals. In our previous work, we have introduced Web object-based caching policies. A Web object consists of the main HTML page and all of its constituent embedded files. Our studies have shown that these policies improve proxy cache performance substantially.In this paper, we propose a new Web object-based policy to manage the storage system of a proxy cache. We propose two techniques to improve the storage system performance. The first technique is concerned with prefetching the related files belonging to a Web object, from the disk to main memory. This prefetching improves performance as most of the files can be provided from the main memory rather than from the proxy disk. The second technique stores the Web object members in contiguous disk blocks in order to reduce the disk access time. We used trace-driven simulations to study the performance improvements one can obtain with these two techniques. Our results show that the first technique by itself provides up to 50% reduction in hit latency, which is the delay involved in providing a hit document by the proxy. An additional 5% improvement can be obtained by incorporating the second technique.  相似文献   

16.
在传统的Web网站中,网页的布局往往由网页制作人员安排并很少变化.为了更好的为网络用户提供服务,提出通过对Web日志的数据清洗,识别出每个用户在一个会话期内访问的页面,依据网页内客在逻辑上的关系和用户经常访问的页面,得到用户对网页内容的兴趣度矩阵及各子项目的兴趣度矩阵.对网络用户根据兴趣度短阵进行层次化的分类,得到每个...  相似文献   

17.
WWW业务访问特性分布研究   总被引:8,自引:0,他引:8  
WWW业务表现为一系列的访问序列。而Web Server和Proxy Server的日志很好地记录了这种访问序列的过程及特性。WWW业务的特性研究是Web Server、Web中间件研究和人工合成Web负载的基础。分析了一个Web Server和两个Proxy Server的日志,重点研究了Web页面请求的概率分布、Web静态文档大小的概率分布(含传输文档)、Web静态文档的访问距离的概率分布,并将分析结果同相关文献的结果进行了对比,同时通过试验证实了在使用Size作为Web缓存替换依据时,还应该考虑Web文档的访问频率。  相似文献   

18.
《Computer Networks》1999,31(11-16):1231-1243
This work focuses on characterizing information about Web resources and server responses that is relevant to Web caching. The approach is to study a set of URLs at a variety of sites and gather statistics about the rate and nature of changes compared with the resource type. In addition, we gather response header information reported by the servers with each retrieved resource. Results from the work indicate that there is potential to reuse more cached resources than is currently being realized due to inaccurate and nonexistent cache directives. In terms of implications for caching, the relationships between resources used to compose a page must be considered. Embedded images are often reused, even in pages that change frequently. This result both points to the need to cache such images and to discard them when they are no longer included as part of any page. Finally, while the results show that HTML resources frequently change, these changes can be in a predictable and localized manner. Separating out the dynamic portions of a page into their own resources allows relatively static portions to be cached, while retrieval of the dynamic resources can trigger retrieval of new resources along with any invalidation of already cached resources.  相似文献   

19.
Predictive Prefetching on the Web and Its Potential Impact in the Wide Area   总被引:2,自引:0,他引:2  
The rapid increase of World Wide Web users and the development of services with high bandwidth requirements have caused the substantial increase of response times for users on the Internet. Web latency would be significantly reduced, if browser, proxy or Web server software could make predictions about the pages that a user is most likely to request next, while the user is viewing the current page, and prefetch their content.In this paper we study Predictive Prefetching on a totally new Web system architecture. This is a system that provides two levels of caching before information reaches the clients. This work analyses prefetching on a Wide Area Network with the above mentioned characteristics. We first provide a structured overview of predictive prefetching and show its wide applicability to various computer systems. The WAN that we refer to is the GRNET academic network in Greece. We rely on log files collected at the network's Transparent cache (primary caching point), located at GRNET's edge connection to the Internet. We present the parameters that are most important for prefetching on GRNET's architecture and provide preliminary results of an experimental study, quantifying the benefits of prefetching on the WAN. Our experimental study includes the evaluation of two prediction algorithms: an n most popular document algorithm and a variation of the PPM (Prediction by Partial Matching) prediction algorithm. Our analysis clearly shows that Predictive prefetching can improve Web response times inside the GRNET WAN without substantial increase in network traffic due to prefetching.  相似文献   

20.
Hawking  D. 《Computer》2006,39(6):86-88
In this article, we go behind the scenes and explain how this data processing "miracle" is possible. We focus on whole-of-Web search but note that enterprise search tools and portal search interfaces use many of the same data structures and algorithms. Search engines cannot and should not index every page on the Web. After all, thanks to dynamic Web page generators such as automatic calendars, the number of pages is infinite. To provide a useful and cost-effective service, search engines must reject as much low-value automated content as possible. In addition, they can ignore huge volumes of Web-accessible data, such as ocean temperatures and astrophysical observations, without harm to search effectiveness. Finally, Web search engines have no access to restricted content, such as pages on corporate intranets. What follows is not an inside view of any particular commercial engine - whose precise details are jealously guarded secrets - but a characterization of the problems that whole-of-Web search services face and an explanation of the techniques available to solve these problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号