首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we address the problem of minimizing the cost of transferring a document or a file requested by a set of users geographically separated on a network of nodes. We concentrate on theoretical aspects of data migration and caching on high-speed networks. Following the information caching paradigm introduced in the literature, we present polynomial time optimal caching strategies that minimize the total monetary cost of all the service requests by the users on a high-speed network. We consider a scenario in which a large pool of customers from one or more remote sites on a network demand a document, situated at some site, for their use. We also assume that the users can request the document at different time instants. This process of distributing the requested document incurs communication costs due to the use of communication resources and caching costs of the document at some server sites before it is delivered to the users at their desired time instances. We configure the network as a fully connected topology in which the service providers manage and control the distribution of the requested document among the users. For a high-speed network, we show that a single copy of the requested document is sufficient to serve all the user requests in an optimal manner. We extend the study to a homogeneous case in which the communication costs are identical and caching costs at all the sites are identical. In this case, we demonstrate the adaptability of the algorithm in generating more than one copy when needed by the minimization process. Using these strategies, the network service providers can decide when, where, and for how long the requested documents must be cached at vantage sites to obtain an optimal solution. Illustrative examples are provided to ease the understanding.  相似文献   

2.
Cohen et al. [5] recently initiated the theoretical study of connection caching in the world-wide web. They extensively studied uniform connection caching, where the establishment cost is uniform for all connections [5], [6]. They showed that ordinary paging algorithms can be used to derive algorithms for uniform connection caching and analyzed various algorithms such as Belady's rule, LRU and Marking strategies. In particular, in [5] Cohen et al. showed that LRU yields a (2k-1) -competitive algorithm, where k is the size of the largest cache in the network. In [6] they investigated Marking algorithms with different types of communication among nodes and presented deterministic k -competitive algorithms. In this paper we study generalized connection caching , also introduced in [5], where connections can incur varying establishment costs. This model is reasonable because the cost of establishing a connection depends, for instance, on the distance of the nodes to be connected and on the congestion in the network. Algorithms for ordinary weighted caching can be used to derive algorithms for generalized connection caching. We present tight or nearly tight analyses on the performance achieved by the currently known weighted caching algorithms when applied in generalized connection caching. In particular we give online algorithms that achieve an optimal competitive ratio of k . Our deterministic algorithm uses extra communication while maintaining open connections. We develop a generalized algorithm that trades communication for performance and achieves a competitive ratio of (1+ε)k , for any 0<ε≤ 1 , using at most
bits of communication on each open link. Additionally we consider two extensions of generalized connection caching where (1) connections have time-out values, or (2) the establishment cost of connections is asymmetric. We show that the performance ratio of our algorithms can be preserved in scenario (1). In the case of (2) we derive nearly tight upper and lower bounds on the best possible competitiveness.  相似文献   

3.
《Computer Networks》2007,51(13):3673-3692
Network congestion remains one of the main barriers to the continuing success of the Internet. For Web users, congestion manifests itself in unacceptably long response times. One possible remedy to the latency problem is to use caching at the client, at the proxy server, or within the Internet. However, Web documents are becoming increasingly dynamic (i.e., have short lifetimes), which limits the potential benefit of caching. The performance of a Web caching system can be dramatically increased by integrating document prefetching (a.k.a. “proactive caching”) into its design. Although prefetching reduces the response time of a requested document, it also increases the network load, as some documents will be unnecessarily prefetched (due to the imprecision in the prediction algorithm). In this study, we analyze the confluence of the two effects through a tractable mathematical model that enables us to establish the conditions under which prefetching reduces the average response time of a requested document. The model accommodates both passive client and proxy caching along with prefetching. Our analysis is used to dynamically compute the “optimal” number of documents to prefetch in the subsequent client’s idle (think) period. In general, this optimal number is determined through a simple numerical procedure. Closed-form expressions for this optimal number are obtained for special yet important cases. We discuss how our analytical results can be used to optimally adapt the parameters of an actual prefetching system. Simulations are used to validate our analysis and study the interactions among various system parameters.  相似文献   

4.
Increasingly, HTML documents are dynamically generated by interactive Web services. To ensure that the client is presented with the newest versions of such documents it is customary to disable client caching causing a seemingly inevitable performance penalty. In the system, dynamic HTML documents are composed of higher-order templates that are plugged together to construct complete documents. We show how to exploit this feature to provide an automatic fine-grained caching of document templates, based on the service source code. A service transmits not the full HTML document but instead a compact JavaScript recipe for a client-side construction of the document based on a static collection of fragments that can be cached by the browser in the usual manner. We compare our approach with related techniques and demonstrate on a number of realistic benchmarks that the size of the transmitted data and the latency may be reduced significantly.  相似文献   

5.
Building a large and efficient hybrid peer-to-peer Internet caching system   总被引:2,自引:0,他引:2  
Proxy hit ratios tend to decrease as the demand and supply of Web contents are becoming more diverse. By case studies, we quantitatively confirm this trend and observe significant document duplications among a proxy and its client browsers' caches. One reason behind this trend is that the client/server Web caching model does not support direct resource sharing among clients, causing the Web contents and the network bandwidths among clients to be relatively underutilized. To address these limits and improve Web caching performance, we have extensively enhanced and deployed our browsers-aware framework, a peer-to-peer Web caching management scheme. We make the browsers and their proxy share the contents to exploit the neglected but rich data locality in browsers and reduce document duplications among the proxy and browsers' caches to effectively utilize the Web contents and network bandwidth among clients. The objective of our scheme is to improve the scalability of proxy-based caching both in the number of connected clients and in the diversity of Web documents. We show that building such a caching system with considerations of sharing contents among clients, minimizing document duplications, and achieving data integrity and communication anonymity is not only feasible but also highly effective.  相似文献   

6.
The sharing of caches among proxies is an important technique to reduce Web traffic, alleviate network bottlenecks, and improve response time of document requests. Most existing work on cooperative caching has been focused on serving misses collaboratively. Very few have studied the effect of cooperation on document placement schemes and its potential enhancements on cache hit ratio and latency reduction. We propose a new document placement scheme which takes into account the contentions at individual caches in order to limit the replication of documents within a cache group and increase document hit ratio. The main idea of this new scheme is to view the aggregate disk space of the cache group as a global resource of the group and uses the concept of cache expiration age to measure the contention of individual caches. The decision of whether to cache a document at a proxy is made collectively among the caches that already have a copy of this document. We refer to this new document placement scheme as the Expiration Age-based scheme (EA scheme). The EA scheme effectively reduces the replication of documents across the cache group, while ensuring that a copy of the document always resides in a cache where it is likely to stay for the longest time. We report our study on the potentials and limits of the EA scheme using both analytic modeling and trace-based simulation. The analytical model compares and contrasts the existing (ad hoc) placement scheme of cooperative proxy caches with our new EA scheme and indicates that the EA scheme improves the effectiveness of aggregate disk usage, thereby increasing the average time duration for which documents stay in the cache. The trace-based simulations show that the EA scheme yields higher hit rates and better response times compared to the existing document placement schemes used in most of the caching proxies.  相似文献   

7.
A study on traffic characterization of the Internet is essential to design the Internet infrastructure. In this paper, we first characterize WWW (World Wide Web) traffic based on the access log data obtained at four different servers. We find that the document size, the request inter-arrival time and the access frequency of WWW traffic follow heavy-tail distributions. Namely, the document size and the request inter-arrival time follow log-normal distributions, and the access frequency does the Pareto distribution. For the request inter-arrival time, however, an exponential distribution becomes adequate if we are concerned with the busiest hours. Based on our analytic results, we next build an M/G/1/PS queuing model to discuss a design methodology of the Internet access network. The accuracy of our model is validated by comparing with the trace-driven simulation. We also investigate the effect of document caching at the Proxy server on the WWW traffic characteristics. The results show that the traffic volume is actually reduced by the document replacement policies, but the traffic characteristics are not much affected. It suggests that our modeling approach can be applied to the case with document caching, which is demonstrated by simulation experiments.  相似文献   

8.
We study web caching with request reordering. The goal is to maintain a cache of web documents so that a sequence of requests can be served at low cost. To improve cache hit rates, a limited reordering of requests is allowed. Feder et al. (Proceedings of the 13th ACM–SIAM Symposium on Discrete Algorithms, pp. 104–105, 2002), who recently introduced this problem, considered caches of size 1, i.e. a cache can store one document. They presented an offline algorithm based on dynamic programming as well as online algorithms that achieve constant factor competitive ratios. For arbitrary cache sizes, Feder et al. (Theor. Comput. Sci. 324:201–218, 2004) gave online strategies that have nearly optimal competitive ratios in several cost models.  相似文献   

9.
刘超  娄尘哲  喻民  姜建国  黄伟庆 《信息安全学报》2017,(收录汇总):14-26
通过恶意文档来传播恶意软件在现代互联网中是非常普遍的,这也是众多机构面临的最高风险之一。PDF文档是全世界应用最广泛的文档类型,因此由其引发的攻击数不胜数。使用机器学习方法对恶意文档进行检测是流行且有效的途径,在面对攻击者精心设计的样本时,机器学习分类器的鲁棒性有可能暴露一定的问题。在计算机视觉领域中,对抗性学习已经在许多场景下被证明是一种有效的提升分类器鲁棒性的方法。对于恶意文档检测而言,我们仍然缺少一种用于针对各种攻击场景生成对抗样本的综合性方法。在本文中,我们介绍了PDF文件格式的基础知识,以及有效的恶意PDF文档检测器和对抗样本生成技术。我们提出了一种恶意文档检测领域的对抗性学习模型来生成对抗样本,并使用生成的对抗样本研究了多检测器假设场景的检测效果(及逃避有效性)。该模型的关键操作为关联特征提取和特征修改,其中关联特征提取用于找到不同特征空间之间的关联,特征修改用于维持样本的稳定性。最后攻击算法利用基于动量迭代梯度的思想来提高生成对抗样本的成功率和效率。我们结合一些具有信服力的数据集,严格设置了实验环境和指标,之后进行了对抗样本攻击和鲁棒性提升测试。实验结果证明,该模型可以保持较高的对抗样本生成率和攻击成功率。此外,该模型可以应用于其他恶意软件检测器,并有助于检测器鲁棒性的优化。  相似文献   

10.
通过恶意文档来传播恶意软件在现代互联网中是非常普遍的,这也是众多机构面临的最高风险之一。PDF文档是全世界应用最广泛的文档类型,因此由其引发的攻击数不胜数。使用机器学习方法对恶意文档进行检测是流行且有效的途径,在面对攻击者精心设计的样本时,机器学习分类器的鲁棒性有可能暴露一定的问题。在计算机视觉领域中,对抗性学习已经在许多场景下被证明是一种有效的提升分类器鲁棒性的方法。对于恶意文档检测而言,我们仍然缺少一种用于针对各种攻击场景生成对抗样本的综合性方法。在本文中,我们介绍了PDF文件格式的基础知识,以及有效的恶意PDF文档检测器和对抗样本生成技术。我们提出了一种恶意文档检测领域的对抗性学习模型来生成对抗样本,并使用生成的对抗样本研究了多检测器假设场景的检测效果(及逃避有效性)。该模型的关键操作为关联特征提取和特征修改,其中关联特征提取用于找到不同特征空间之间的关联,特征修改用于维持样本的稳定性。最后攻击算法利用基于动量迭代梯度的思想来提高生成对抗样本的成功率和效率。我们结合一些具有信服力的数据集,严格设置了实验环境和指标,之后进行了对抗样本攻击和鲁棒性提升测试。实验结果证明,该模型可以保持较高的对抗样本生成率和攻击成功率。此外,该模型可以应用于其他恶意软件检测器,并有助于检测器鲁棒性的优化。  相似文献   

11.
With the explosion of multimedia content, Internet bandwidth is wasted by repeated downloads of popular content. Recently, Content-Centric Networking (CCN), or the so-called Information-Centric Networking (ICN), has been proposed for efficient content delivery. In this paper, we investigate the performance of in-network caching for Named Data Networking (NDN), which is a promising CCN proposal. First, we examine the inefficiency of LRU (Least Recently Used) which is a basic cache replacement policy in NDN. Then we formulate the optimal content assignment for two in-network caching policies. One is Single-Path Caching, which allows a request to be served from routers only along the path between a requester and a content source. The other is Network-Wide Caching, which enables a request to be served from any router holding the requested content in a network. For both policies, we use a Mixed Integer Program to optimize the content assignment models by considering the link cost, cache size, and content popularity. We also consider the impact of link capacity and routing issues on the optimal content assignment. Our evaluation and analysis present the performance bounds of in-network caching on NDN in terms of the practical constraints, such as the link cost, link capacity, and cache size.  相似文献   

12.
传统的文本聚类方法大部分采用基于词的文本表示模型,这种模型只考虑单个词的重要度而忽略了词与词之间的语义关系.同时,传统文本表示模型存在高维的问题.为解决以上问题,提出一种基于频繁词集的文本聚类方法(frequent itemsets based document clustering method, FIC).该方法从文档集中运用FP-Growth算法挖掘出频繁词集,运用频繁词集来表示每个文本从而大大降低了文本维度,根据文本间相似度建立文本网络,运用社区划分的算法对网络进行划分,从而达到文本聚类的目的.FIC算法不仅能降低文本表示的维度,还可以构建文本集中文本间的关联关系,使文本与文本间不再是独立的两两关系.实验中运用2个英文语料库Reuters-21578,20NewsGroup和1个中文语料库——搜狗新闻数据集来测试算法精度.实验表明:较传统的利用文本空间向量模型的聚类方法,该方法能够有效地降低文本表示的维度,并且,相比于常见的基于频繁词集的聚类方法能获得更好的聚类效果.  相似文献   

13.
WorldWideWeb作为全球范围的分布式信息系统,其使用日益广泛,大量交互式应用的出现使用户对响应时间有较高的要求,服务器文档预送和非服务器站点的缓冲机制配合使用,可以显著改进Web服务的响应性能,但是,由于预送增加了网络传输的阵发性,增加了报文重传和丢失,加大了网络排队延迟和平均队列长度,带来网络性能的下降,为了保证预送对用户质量(QoS)的积极作用,同时减少对网络的负面影响,我们提出平滑的  相似文献   

14.
In order to process large numbers of explicit knowledge documents such as patents in an organized manner, automatic document categorization and search are required. In this paper, we develop a document classification and search methodology based on neural network technology that helps companies manage patent documents more effectively. The classification process begins by extracting key phrases from the document set by means of automatic text processing and determining the significance of key phrases according to their frequency in text. In order to maintain a manageable number of independent key phrases, correlation analysis is applied to compute the similarities between key phrases. Phrases with higher correlations are synthesized into a smaller set of phrases. Finally, the back-propagation network model is adopted as a classifier. The target output identifies a patent document’s category based on a hierarchical classification scheme, in this case, the international patent classification (IPC) standard. The methodology is tested using patents related to the design of power hand-tools. Related patents are automatically classified using pre-trained neural network models. In the prototype system, two modules are used for patent document management. The automatic classification module helps the user classify patent documents and the search module helps users find relevant and related patent documents. The result shows an improvement in document classification and identification over previously published methods of patent document management.  相似文献   

15.
刘会平  国伟 《计算机应用》2014,34(9):2645-2649
针对现代网络环境通信的隐秘性和信息容量问题,提出了一种基于数字加网的信息隐藏算法,通过该算法将秘密信息嵌入文本载体中实现隐秘通信。该算法首先将隐秘信息隐藏在由网点组成的背景底纹图案中,接着与随机生成的半色调调频网点图像相融合,最后将嵌入水印后的背景底纹图案作为常规元素叠加到文本文档的版面中。分析与实验结果表明,该方法可以隐藏的隐秘信息容量大,在A4幅面的文档页面中嵌入72000个汉字的信息量,而且视觉效果美观自然,隐蔽性好,安全性高,文件体积小,可以广泛应用于现代网络安全通信领域。  相似文献   

16.
现有的Web缓存器的实现主要是基于传统的内存缓存算法,由于Web业务请求的异质性,传统的替换算法不能在Web环境中有效工作。研究了Web缓存替换操作的依据,分析了以往替换算法的不足,考虑到Web文档的大小、访问代价、访问频率、访问兴趣度以及最近一次被访问的时间对缓存替换的影响,提出了Web缓存对象角色的概念,建立了一种新的基于对象角色的高精度Web缓存替换算法(ORB算法);并以NASA和DEC的代理服务器数据为例,将该算法与LRU、LFU、SIZE、Hybrid算法进行了仿真实验对比,结果证明,ORB算  相似文献   

17.
现有的用于矫正透视倾斜变形文档的深度学习模型存在空间泛化性差、模型参数量大、推理速度慢等问题。从姿态估计的角度出发,提出一种轻量化文档姿态估计网络DPENet(lightweight document pose estimation network),以优化上述问题。将文档图像中的单一文档视为一个姿态估计对象,将文档的四个角点视为文档对象的四个姿态估计点,采用兼具全连接回归与高斯热图回归优点的DSNT(differentiable spatial to numerical transform)模块实现文档图像角点的高精度定位,并通过透视变换处理实现透视变形文档图像的高精度矫正。DPENet采用轻量化设计,以面向移动端的MobileNet V2为主干网络,模型体量只有10.6?MB。在SmartDoc-QA(仅取148张文档图像)数据集上与现有的三种主流网络进行了对比实验,实验结果表明,DPENet的矫正成功率(96.6%)和平均位移误差(mean displacement error,MDE)(1.28个像素)均优于其他三种网络,同时其平均矫正速度也有良好的表现。在保持轻量化和速度快的条件下,DPENet网络具有更高的变形文档矫正成功率和矫正精度。  相似文献   

18.
With the recent explosion in usage of the World Wide Web, Web caching has become increasingly important. However, due to the non-uniform cost/size property of data objects in this environment, design of an efficient caching algorithm becomes an even more difficult problem compared to the traditional caching problems. In this paper, we propose the Least Expected Cost (LEC) replacement algorithm for Web caches that provides a simple and robust framework for the estimation of reference probability and fair evaluation of non-uniform Web objects. LEC evaluates a Web object based on its cost per unit size multiplied by the estimated reference probability of the object. This results in a normalized assessment of the contribution to the cost-savings ratio, leading to a fair replacement algorithm. We show that this normalization method finds optimal solution under some assumptions. Trace-driven simulations with actual Web cache logs show that LEC offers the performance of caches more than twice its size compared with other algorithms we considered. Nevertheless, it is simple, having no parameters to tune. We also show how the algorithm can be effectively implemented as a Web cache replacement module.  相似文献   

19.
The Indexing and Retrieval of Document Images: A Survey   总被引:2,自引:0,他引:2  
The economic feasibility of maintaining large data bases of document images has created a tremendous demand for robust ways to access and manipulate the information these images contain. In an attempt to move toward a paperless office, large quantities of printed documents are often scanned and archived as images, without adequate index information. One way to provide traditional data-base indexing and retrieval capabilities is to fully convert the document to an electronic representation which can be indexed automatically. Unfortunately, there are many factors which prohibit complete conversion including high cost, low document quality, and the fact that many nontext components cannot be adequately represented in a converted form. In such cases, it can be advantageous to maintain a copy of and use the document in image form. In this paper, we provide a survey of methods developed by researchers to access and manipulate document images without the need for complete and accurate conversion. We briefly discuss traditional text indexing techniques on imperfect data and the retrieval of partially converted documents. This is followed by a more comprehensive review of techniques for the direct characterization, manipulation, and retrieval, of images of documents containing text, graphics, and scene images.  相似文献   

20.
Web预取模型分析   总被引:1,自引:0,他引:1  
WWW的快速增长导致网络拥塞和服务器超载。缓存技术被认为是减轻服务器负载、减少网络拥塞、降低客户访问延迟的有效途径之一,但作用有限。为进一步提高WWW性能,引入了预取技术。文中首先介绍了Web预取技术的基本思想及其研究可行性,然后分析了现有Web预取模型,最后给出了一个Web预取模型应具有的关键属性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号