期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Framework for Cache Management for Mobile Databases: Design and Evaluation

Boris Y. Chan Antonio Si Hong V. Leong 《Distributed and Parallel Databases》2001,10(1):23-57

In a mobile computing environment, database servers disseminate information to multiple mobile clients via wireless channels. Due to the low bandwidth and low reliability of wireless channels, it is important for a mobile client to cache its frequently accessed database items into its local storage. This improves performance of database queries and improves availability of database items for query processing during disconnection. In this paper, we investigate issues on caching granularity, coherence strategy, and replacement policy of caching mechanisms for a mobile environment utilizing point-to-point communication paradigm.We first illustrate that page-based caching is not suitable in the mobile context due to the lack of locality among database items. We propose three different levels of caching granularity: attribute caching, object caching, and hybrid caching, a hybrid approach of attribute and object caching. Next, we show that existing coherence strategies are inappropriate due to frequent disconnection in a mobile environment, and propose a cache coherence strategy, based on the update patterns of database items. Via a detail simulation model, we examine the performance of various levels of caching granularity with our cache coherence strategy. We observe, in general, that hybrid caching could achieve a better performance. Finally, we propose several cache replacement policies that can adapt to the access patterns of database items. For each given caching granularity, we discover that our replacement policies outperform conventional ones in most situations. 相似文献

2.

Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

《Parallel Computing》2007,33(7-8):497-520

In this paper, we present a multi-query optimization framework based on the concept of active semantic caching. The framework permits the identification and transparent reuse of data and computation in the presence of multiple queries (or query batches) that specify user-defined operators and aggregations originating from scientific data-analysis applications. We show how query scheduling techniques, coupled with intelligent cache replacement policies, can further improve the performance of query processing by leveraging the active semantic caching operators. We also propose a methodology for functionally decomposing complex queries in terms of primitives so that multiple reuse sites are exposed to the query optimizer, to increase the amount of reuse. The optimization framework and the database system implemented with it are designed to be efficient irrespective of the underlying parallel and/or distributed machine configuration. We present experimental results highlighting the performance improvements obtained by our methods using real scientific data-analysis applications on multiple parallel and distributed processing configurations (e.g., single symmetric multiprocessor (SMP) machine, cluster of SMP nodes, and a Grid computing configuration). 相似文献

3.

Towards Intelligent Semantic Caching for Web Sources 总被引：2，自引：0，他引：2

Dongwon Lee Wesley W. Chu 《Journal of Intelligent Information Systems》2001,17(1):23-45

An intelligent semantic caching scheme suitable for web sources is presented. Since web sources typically have weaker querying capabilities than conventional databases, existing semantic caching schemes cannot be directly applied. Our proposal takes care of the difference between the query capabilities of an end user system and web sources. In addition, an analysis on the match types between a user's input query and cached queries is presented. Based on this analysis, we present an algorithm that finds the best matched query under different circumstances. Furthermore, a method to use semantic knowledge, acquired from the data, to avoid unnecessary access to web sources by transforming the cache miss to the cache hit is presented. To verify the effectiveness of the proposed semantic caching scheme, we first show how to generate synthetic queries exhibiting different levels of semantic localities. Then, using the test sets, we show that the proposed query matching technique is an efficient and effective way for semantic caching in web databases. 相似文献

4.

Semantic caching of Web queries

Boris Chidlovskii Uwe M. Borghoff 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):2-17

Abstract. In meta-searchers accessing distributed Web-based information repositories, performance is a major issue. Efficient query processing requires an appropriate caching mechanism. Unfortunately, standard page-based as well as tuple-based caching mechanisms designed for conventional databases are not efficient on the Web, where keyword-based querying is often the only way to retrieve data. In this work, we study the problem of semantic caching of Web queries and develop a caching mechanism for conjunctive Web queries based on signature files. Our algorithms cope with both relations of semantic containment and intersection between a query and the corresponding cache items. We also develop the cache replacement strategy to treat situations when cached items differ in size and contribution when providing partial query answers. We report results of experiments and show how the caching mechanism is realized in the Knowledge Broker system. Received June 15, 1999 / Accepted December 24, 1999 相似文献

5.

Policies for Caching OLAP Queries in Internet Proxies

Loukopoulos T. Ahmad I. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(10):1124-1135

The Internet now offers more than just simple information to the users. Decision makers can now issue analytical, as opposed to transactional, queries that involve massive data (such as, aggregations of millions of rows in a relational database) in order to identify useful trends and patterns. Such queries are often referred to as On-Line-Analytical Processing (OLAP). Typically, pages carrying query results do not exhibit temporal locality and, therefore, are not considered for caching at Internet proxies. In OLAP processing, this is a major problem as the cost of these queries is significantly larger than that of the transactional queries. This paper proposes a technique to reduce the response time for OLAP queries originating from geographically distributed private LANs and issued through the Web toward a central data warehouse (DW) of an enterprise. An active caching scheme is introduced that enables the LAN proxies to cache some parts of the data, together with the semantics of the DW, in order to process queries and construct the resulting pages. OLAP queries arriving at the proxy are either satisfied locally or from the DW, depending on the relative access costs. We formulate a cost model for characterizing the respective latencies, taking into consideration the combined effects of both common Web access and query processing. We propose a cache admittance and replacement algorithm that operates on a hybrid Web-OLAP input, outperforming both pure-Web and pure-OLAP caching schemes. 相似文献

6.

Three-Level Caching for Efficient Query Processing in Large Web Search Engines 总被引：1，自引：0，他引：1

Xiaohui Long Torsten Suel 《World Wide Web》2006,9(4):369-395

Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalability. In particular, two-level caching techniques cache results of repeated identical queries at the frontend, while index data for frequently used query terms are cached in each node at a lower level. We propose and evaluate a three-level caching scheme that adds an intermediate level of caching for additional performance gains. This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists. We propose and study several offline and online algorithms for the resulting weighted caching problem, which turns out to be surprisingly rich in structure. Our experimental evaluation based on a large web crawl and real search engine query log shows significant performance gains for the best schemes, both in isolation and in combination with the other caching levels. We also observe that a careful selection of cache admission and eviction policies is crucial for best overall performance. Work supported by NSF CAREER Award CCR-0093400 and the New York State Center for Advanced Technology in Telecommunications (CATT) at Polytechnic University. 相似文献

7.

Proactive caching of DNS records: addressing a performance bottleneck

Edith Haim 《Computer Networks》2003,41(6):707-726

The resolution of a host name to an IP-address is a necessary predecessor to connection establishment and HTTP exchanges. Nonetheless, domain name system (DNS) resolutions often involve multiple remote name-servers and prolong Web response times. To alleviate this problem name-servers and Web browsers cache query results. Name-servers currently incorporate passive cache management where records are brought into the cache only as a result of clients’ requests and are used for the time to live (TTL) duration (a TTL value is provided with each record). We propose and evaluate different enhancements to passive caching that reduce the fraction of HTTP connection establishments that are delayed by slow DNS resolutions: (A) Renewal policies refresh selected expired cached entries by issuing unsolicited queries. Trace-based simulations using Web proxy logs demonstrated that a significant fraction of cache misses can be eliminated with a moderate increase in the number of DNS queries. (B) Simultaneous-validation transparently uses expired records. A DNS query is issued if the respective cached entry is no longer fresh, but concurrently, the expired entry is used to connect to the Web server and fetch the requested content. The content is served only if the expired records used turn out to be in agreement with the query response. 相似文献

8.

Exploiting Regularities in Web Traffic Patterns for Cache Replacement 总被引：2，自引：0，他引：2

Cohen Kaplan 《Algorithmica》2002,33(3):300-334

Abstract. Caching web pages at proxies and in web servers' memories can greatly enhance performance. Proxy caching is known to reduce network load and both proxy and server caching can significantly decrease latency. Web caching problems have different properties than traditional operating systems caching, and cache replacement can benefit by recognizing and exploiting these differences. We address two aspects of the predictability of traffic patterns: the overall load experienced by large proxy and web servers, and the distinct access patterns of individual pages. We formalize the notion of ``cache load' under various replacement policies, including LRU and LFU, and demonstrate that the trace of a large proxy server exhibits regular load. Predictable load allows for improved design, analysis, and experimental evaluation of replacement policies. We provide a simple and (near) optimal replacement policy when each page request has an associated distribution function on the next request time of the page. Without the predictable load assumption, no such online policy is possible and it is known that even obtaining an offline optimum is hard. For experiments, predictable load enables comparing and evaluating cache replacement policies using partial traces , containing requests made to only a subset of the pages. Our results are based on considering a simpler caching model which we call the interval caching model . We relate traditional and interval caching policies under predictable load, and derive (near)-optimal replacement policies from their optimal interval caching counterparts. 相似文献

9.

移动查询缓存处理的研究 总被引：5，自引：0，他引：5

吴婷婷苏武运周兴铭徐明《计算机研究与发展》2004,41(1):187-193

客户缓存为提高客户／服务器数据库系统整体性能以及客户方数据可用性提供了有效途径。移动环境下网络资源的贫乏使客户缓存的作用更为重要,语义缓存是基于客户查询语义相关建立的一类缓存,提出一个基于语义缓存的客户缓存机制,给出缓存的内容组织,提出缓存项合并策略;然后讨论了基于语义缓存的查询处理策略;最后,模拟结果表明该客户缓存机制能够提高分布式、特别是移动环境下客户服务器数据库系统的性能。相似文献

10.

Exploring optimization and caching for efficient collection operations

Venkata Krishna Suhas Nerella Swetha Surapaneni Sanjay K. Madria Thomas Weigert 《Automated Software Engineering》2014,21(1):3-40

Many large programs operate on collection types. Extensive libraries are available in many programming languages, such as the C++ Standard Template Library, which make programming with collections convenient. Extending programming languages to provide collection queries as first class constructs in the language would not only allow programmers to write queries explicitly in their programs but it would also allow compilers to leverage the wealth of experience available from the database domain to optimize such queries. This paper describes an approach to reduce the run time of programs involving explicit collection queries by performing run time query optimization that is effective for single runs of a program. In addition, it also leverages a cache to store previously computed results. The proposed approach relies on histograms built from the data at run time to estimate the selectivity of joins and predicates in order to construct query plans. Information from earlier executions of the same query during run time is leveraged during the construction of the query plans, even when the data has changed between these executions. An effective cache policy is also determined for caching the results of join (sub) queries. The cache is maintained incrementally, when the underlying collections change, and use of the cache space is optimized by a cache replacement policy. Our approach has been implemented within the Java Query Language (JQL) framework using AspectJ. Our approach demonstrated that its run time query optimization in integration with caching sub query result significantly improves the run time of programs with explicit queries over equivalent programs performing collection operations by iterating over those collections. This paper evaluates our approach using synthetic as well as real world Robocode programs by comparing it to JQL as a benchmark. Experimental results show that our approach performs better than the JQL approach with respect to the program run time. 相似文献

11.

Automation of application‐level caching in a seamless way

下载免费PDF全文

Jhonny Mertz Ingrid Nunes 《Software》2018,48(6):1218-1237

Meeting performance and scalability requirements while delivering services is a critical issue in web applications. Recently, latency and cost of Internet‐based services are encouraging the use of application‐level caching to continue satisfying users' demands and improve the scalability and availability of origin servers. Application‐level caching, in which developers manually control cached content, has been adopted when traditional forms of caching are insufficient to meet such requirements. Despite its popularity, this level of caching is typically addressed in an ad hoc way, given that it depends on specific details of the application. Furthermore, it forces application developers to reason about a crosscutting concern, which is unrelated to the application business logic. As a result, application‐level caching is a time‐consuming and error‐prone task, becoming a common source of bugs. Among all the issues involved with application‐level caching, the decision of what should be cached must frequently be adjusted to cope with the application evolution and usage, making it a challenging task. In this paper, we introduce an automated caching approach to automatically identify application‐level cache content at runtime by monitoring system execution and adaptively managing caching decisions. Our approach is implemented as a framework that can be seamlessly integrated into new and existing web applications. In addition to the reduction of the effort required from developers to develop a caching solution, an empirical evaluation showed that our approach significantly speeds up and improves hit ratios with improvements ranging from 2.78% to 17.18%. 相似文献

12.

Performance evaluations of data-centric information retrieval schemes for DTNs

P. Yang M. Chuah 《Computer Networks》2009,53(4):541-555

Mobile nodes in some challenging network scenarios, e.g. battlefield and disaster recovery scenarios, suffer from intermittent connectivity and frequent partitions. Disruption Tolerant Network (DTN) technologies are designed to enable communications in such environments. Several DTN routing schemes have been proposed. However, not much work has been done on designing schemes that provide efficient information access in such challenging network scenarios. In this paper, we explore how a content-based information retrieval system can be designed for DTNs. There are three important design issues, namely (a) how data should be replicated and stored at multiple nodes, (b) how a query is disseminated in sparsely connected networks, and (c) how a query response is routed back to the issuing node. We first describe how to select nodes for storing the replicated copies of data items. We consider the random and the intelligent caching schemes. In the random caching scheme, nodes that are encountered first by a data-generating node are selected to cache the extra copies while in the intelligent caching scheme, nodes that can potentially meet more nodes, e.g. faster nodes, are selected to cache the extra data copies. The number of replicated data copies K can be the same for all data items or varied depending on the access frequencies of the data items. In this work, we consider fixed, proportional and square-root replication schemes. Then, we describe two query dissemination schemes: (a) W-copy Selective Query Spraying (WSS) scheme and (b) L-hop Neighborhood Spraying (LNS) scheme. In the WSS scheme, nodes that can move faster are selected to cache the queries while in the LNS scheme, nodes that are within L-hops of a querying node will cache the queries. For message routing, we use an enhanced Prophet scheme where a next-hop node is selected only if its predicted delivery probability to the destination is higher than a certain threshold. We conduct extensive simulation studies to evaluate different combinations of the replication and query dissemination algorithms. Our results reveal that the scheme that performs the best is the one that uses the WSS scheme combined with binary spread of replicated data copies. The WSS scheme can achieve a higher query success ratio when compared to a scheme that does not use any data and query replication. Furthermore, the square-root and proportional replication schemes provide higher query success ratio than the fixed copy approach with varying node density. In addition, the intelligent caching approach can further improve the query success ratio by 5.3–15.8% with varying node density. Our results using different mobility models reveal that the query success ratio degrades at most 7.3% when the Community-Based model is used compared to the Random Waypoint (RWP) model [J. Broch et al., A Performance Comparison of Multihop wireless Ad hoc Network Routing Protocols, ACM Mobicom, 1998, pp. 85–97]. Compared to the RWP and the Community-Based mobility models, the UmassBusNet model from the DieselNet project [X. Zhang et al., Modeling of a Bus-based Disruption Tolerant Network Trace, Proceedings of ACM Mobihoc, 2007.] achieves much lower query success ratio because of the longer inter-node encounter time. 相似文献

13.

Comparing the strength of query types in property testing: The case of k-colorability

Ido Ben-Eliezer Tali Kaufman Michael Krivelevich Dana Ron 《Computational Complexity》2013,22(1):89-135

We study the power of four query models in the context of property testing in general graphs, where our main case study is the problem of testing k-colorability. Two query types, which have been studied extensively in the past, are pair queries and neighbor queries. The former corresponds to asking whether there is an edge between any particular pair of vertices, and the latter to asking for the i ^th neighbor of a particular vertex. We show that while for pair queries testing k-colorability requires a number of queries that is a monotone decreasing function in the average degree d, the query complexity in the case of neighbor queries remains roughly the same for every density and for large values of k. We also consider a combined model that allows both types of queries, and we propose a new, stronger, query model, related to the field of Group Testing. We give upper and lower bounds on the query complexity for one-sided error in all the models, where the bounds are nearly tight for three of the models. In some of the cases, our lower bounds extend to two-sided error algorithms. The problem of testing k-colorability was previously studied in the contexts of dense graphs and of sparse graphs, and in our proofs we unify approaches from those cases, and also provide some new tools and techniques that may be of independent interest. 相似文献

14.

Improving Data Availability for Better Access Performance: A Study on Caching Scientific Data on Distributed Desktop Workstations

Xiaosong Ma Sudharshan S. Vazhkudai Zhe Zhang 《Journal of Grid Computing》2009,7(4):419-438

Client-side data caching serves as an excellent mechanism to store and analyze the rapidly growing scientific data, motivating distributed, client-side caches built from unreliable desktop storage contributions to store and access large scientific data. They offer several desirable properties, such as performance impedance matching, improved space utilization, and high parallel I/O bandwidth. In this context, we are faced with two key challenges: (1) the finite amount of contributed cache space is stretched by the ever increasing scientific dataset sizes and (2) the transient nature of volunteered storage nodes impacts data availability. In this article, we address these challenges by exploiting the existence of external, primary copies of datasets. We propose a novel combination of prefix caching, collective download, and remote partial data recovery (RPDR), to deal with optimal cache space consumption and storage node volatility. Our evaluation, performed on our FreeLoader prototype, indicates that prefix caching can significantly improve the cache hit rate and partial data recovery is better than (or comparable to) many persistent-data availability techniques. 相似文献

15.

Improving SPARQL query performance with algebraic expression tree based caching and entity caching

Gang WU Meng-dong YANG College of Information Science Engineering Northeastern University Shenyang China MOE 《浙江大学学报:C卷英文版》2012,(4):281-294

相似文献

16.

Design and analysis of a variable bit rate caching algorithm for continuous media data

Ligang Dong Bharadwaj Veeravalli 《Multimedia Tools and Applications》2008,38(1):91-117

In this paper, the problem of caching continuous media data in a (main) memory and disk caching system is addressed. Caching schemes can significantly reduce the load on the network as well as on the servers, also the retrieval of documents from the cache requires short response time. In interval-level caching algorithms, an interval of data between two adjacent streams is the basic caching entity. In this paper, we design a novel algorithm, referred to as variable bit rate caching (VBRC) algorithm, which belongs to the interval-level caching algorithms. The proposed VBRC algorithm can be used in the system for memory caching or disk caching. VBRC can handle variable retrieval bandwidth as well as constant retrieval bandwidth . In designing the VBRC algorithm, we propose the strategies of reducing the number of switching operation, which will probably cause discontinuity of retrieving data. Also, we propose a just-in-time scheme for resource allocation in our VBRC algorithm and show that the caching performance in comparison with the reservation scheme adopted in the resource-based caching (RBC) algorithm is significantly improved. Our simulation study compares the recent and most popular generalized interval caching, RBC, and VBRC, on several influencing factors such as cache space size, cache I/O bandwidth, request arrival rate, and percentage of requests for large documents, with respect to the byte hit ratio and the number of switching operations. The simulation result confirms our analysis.

Bharadwaj VeeravalliEmail: URL: http://cnds.ece.nus.edu.sg

相似文献

17.

Performance improvement of web caching in Web 2.0 via knowledge discovery

Carlos Guerrero Isaac Lera Carlos Juiz 《Journal of Systems and Software》2013

Web 2.0 systems are more unpredictable and customizable than traditional web applications. This causes that performance techniques, such as web caching, limit their improvements. Our study was based on the hypotheses that the use of web caching in Web 2.0 applications, particularly in content aggregation systems, can be improved by adapting the content fragment designs. We proposed to base this adaptation on the analysis of the characterization parameters of the content elements and on the creation of a classification algorithm. This algorithm was deployed with decision trees, created in an off-line knowledge discovery process. We also defined a framework to create and adapt fragments of the web documents to reduce the user-perceived latency in web caches. The experiment results showed that our solution had a remarkable reduction in the user-perceived latency even losses in the cache hit ratios and in the overhead generated on the system, in comparison with other web cache schemes. 相似文献

18.

Enhanced Proxy Caching with Content Management

Kai Cheng Yahiko Kambayashi 《Knowledge and Information Systems》2002,4(2):202-218

In this paper, we propose a novel approach to enhancing web proxy caching, an approach that integrates content management with performance tuning techniques. We first develop a hierarchical model for management of web data, which consists of physical pages, logical pages and topics corresponding to different abstraction levels. Content management based on this model enables active utilization of the cached web contents. By defining priority on each abstraction level, the cache manager can make replacement decisions on topics, logical pages, and physical pages hierarchically. As a result, a cache can always keep the most relevant, popular, and high-quality content. To verify the proposed approach, we have designed a content-aware replacement algorithm, LRU-SP+. We evaluate the algorithm through preliminary experiments. The results show that content management can achieve 30% improvement of caching performance in terms of hit ratios and profit ratios (considering significance of topics) compared to content-blind schemes. Received 19 March 2001 / Revised 17 April 2001 / Accepted in revised form 25 May 2001 相似文献

19.

Accuracy vs. Lifetime: Linear Sketches for Aggregate Queries in Sensor Networks

Vasundhara Puttagunta Konstantinos Kalpakis 《Algorithmica》2007,49(4):357-385

The in–network aggregation paradigm in sensor networks provides a versatile approach for evaluating aggregate queries. Traditional approaches need a separate aggregate to be computed and communicated for each query and hence do not scale well with the number of queries. Since approximate query results are sufficient for many applications, we use an alternate approach based on summary data–structures. We consider two kinds of aggregate queries: location range queries that compute the sum of values reported by sensors in a given location range, and value range queries that compute the number of sensors that report values in a given range. We construct summary data–structures called linear sketches, over the sensor data using in–network aggregation and use them to answer aggregate queries in an approximate manner at the base–station. There is a trade–off between accuracy of the query results and lifetime of the sensor network that can be exploited to achieve increased lifetimes for a small loss in accuracy. Most commonly occurring sets of range queries are highly correlated and display rich algebraic structure. Our approach takes full advantage of this by constructing linear sketches that depend on queries. Experimental results show that linear sketching achieves significant improvements in lifetime of sensor networks for only a small loss in accuracy of the queries. Further, our approach achieves more accurate query results than the other classical techniques using Discrete Fourier Transform and Discrete Wavelet Transform. This work was supported in part by NASA under Cooperative Agreement NCC5–315. 相似文献

20.

Lease-based consistency schemes in the web environment

Byoung-Hoon Sung-Hwa Jai-Hoon Geoffrey C. 《Future Generation Computer Systems》2009,25(1):8-19

There are many methods to maintain consistency in the distributed computing environment. Ideally, efficient schemes for maintaining consistency should take into account the following factors: lease duration of replicated data, data access pattern and system parameters. One method used to supply strong consistency in the web environment is the lease method. During the proxy’s lease time from a web server, the web server can notify the modification to the proxy by invalidation or update. In this paper, we analyze lease protocol performance by the varying update/invalidation scheme, lease duration and read rates. By using these analyses, we can choose the adaptive lease time and proper protocol (invalidation or update scheme of the modification for each proxy in the web environment). As the number of proxies for web caching increases exponentially, a more efficient method for maintaining consistency needs to be designed. We also present three-tier hierarchies on which each group and node independently and adaptively choose the proper lease time and protocol for each proxy cache. These classifications of the scheme make proxy caching adaptive to client access pattern and system parameters. 相似文献