期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Engineering scalable, cache and space efficient tries for strings

Nikolas Askitis Ranjan Sinha 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(5):633-660

Storing and retrieving strings in main memory is a fundamental problem in computer science. The efficiency of string data structures used for this task is of paramount importance for applications such as in-memory databases, text-based search engines and dictionaries. The burst trie is a leading choice for such tasks, as it can provide fast sorted access to strings. The burst trie, however, uses linked lists as substructures which can result in poor use of CPU cache and main memory. Previous research addressed this issue by replacing linked lists with dynamic arrays forming a cache-conscious array burst trie. Though faster, this variant can incur high instruction costs which can hinder its efficiency. Thus, engineering a fast, compact, and scalable trie for strings remains an open problem. In this paper, we introduce a novel and practical solution that carefully combines a trie with a hash table, creating a variant of burst trie called HAT-trie. We provide a thorough experimental analysis which demonstrates that for large set of strings and on alternative computing architectures, the HAT-trie—and two novel variants engineered to achieve further space-efficiency—is currently the leading in-memory trie-based data structure offering rapid, compact, and scalable storage and retrieval of variable-length strings. 相似文献

2.

An efficient cache conscious multi-dimensional index structure

Jeong Min Shim Jae Soo Yoo Young Soo Min 《Information Processing Letters》2004,92(3):133-142

Recently, to relieve the performance degradation caused by the bottleneck between CPU and main memory, cache conscious multi-dimensional index structures have been proposed. The ultimate goal of them is to reduce the space for entries so as to widen index trees and minimize the number of cache misses. The existing index structures can be classified into two approaches according to their entry reduction methods. One approach is to compress MBR keys by quantizing coordinate values to the fixed number of bits. The other approach is to store only the sides of minimum bounding regions (MBRs) that are different from their parents partially. The second approach works well when the size of a node is small and the number of entries is small. In this paper, we investigate the existing multi-dimensional index structures for main memory database systems through experiments under the various work loads. Then, we propose a new index structure that exploits the properties of the both techniques. We implement existing multi-dimensional index structures and the proposed index structure. We perform various experiments to show that our approach outperforms others. 相似文献

3.

Optimal proxy cache allocation for efficient streaming media distribution

Bing Wang Sen S. Adler M. Towsley D. 《Multimedia, IEEE Transactions on》2004,6(2):366-374

We address the problem of efficiently streaming a set of heterogeneous videos from a remote server through a proxy to multiple asynchronous clients so that they can experience playback with low startup delays. We determine the optimal proxy prefix cache allocation to the videos that minimizes the aggregate network bandwidth cost. We integrate proxy caching with traditional server-based reactive transmission schemes such as hatching, patching and stream merging to develop a set of proxy-assisted delivery schemes. We quantitatively explore the impact of the choice of transmission scheme, cache allocation policy, proxy cache size, and availability of unicast versus multicast capability, on the resulting transmission cost. Our evaluations show that even a relatively small prefix cache (10%-20% of the video repository) is sufficient to realize substantial savings in transmission cost. We find that carefully designed proxy-assisted reactive transmission schemes can produce significant cost savings even in a predominantly unicast environment such as the Internet. 相似文献

4.

Compiler analysis for cache coherence: interprocedural array data-flow analysis and its impact on cache performance

Choi L. Pen-Chung Yew 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(9):879-896

In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, state reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implement cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D. 相似文献

5.

Cycle elimination for invocation graph-based context-sensitive pointer analysis

Woongsik Choi Kwang-Moo Choe 《Information and Software Technology》2011,53(8):818-833

Context

Pointer analysis is an important building block of optimizing compilers and program analyzers for C language. Various methods with precision and performance trade-offs have been proposed. Among them, cycle elimination has been successfully used to improve the scalability of context-insensitive pointer analyses without losing any precision.

Objective

In this article, we present a new method on context-sensitive pointer analysis with an effective application of cycle elimination.

Method

To obtain similar benefits of cycle elimination for context-sensitive analysis, we propose a novel constraint-based formulation that uses sets of contexts as annotations. Our method is not based on binary decision diagram (BDD). Instead, we directly use invocation graphs to represent context sets and apply a hash-consing technique to deal with the exponential blow-up of contexts.

Result

Experimental results on C programs ranging from 20,000 to 290,000 lines show that applying cycle elimination to our new formulation results in 4.5 ×speedup over the previous BDD-based approach.

Conclusion

We showed that cycle elimination is an effective method for improving the scalability of context-sensitive pointer analysis. 相似文献

6.

On exploring aggregate effect for efficient cache replacement in transcoding proxies 总被引：2，自引：0，他引：2

Cheng-Yue Chang Ming-Syan Chen 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(6):611-624

Recent technology advances in mobile networking have ushered in a new era of personal communication. Users can ubiquitously access the Internet via many emerging mobile appliances, such as portable notebooks, personal digital assistants (PDAs), and WAP-enabled cellular phones. While the transcoding proxy is attracting an increasing amount of attention in this environment, it is noted that new caching strategies are required for these transcoding proxies. We propose an efficient cache replacement algorithm for transcoding proxies. Specifically, we formulate a generalized profit function to evaluate the profit from caching each version of an object. This generalized profit function explicitly considers several new emerging factors in the transcoding proxy and the aggregate effect of caching multiple versions of the same object. It is noted that the aggregate effect is not simply the sum of the costs of caching individual versions of an object, but rather, depends on the transcoding relationship among these versions. The notion of a weighted transcoding graph is devised to evaluate the corresponding aggregate effect efficiently. Utilizing the generalized profit function and the weighted transcoding graph, we propose, in this paper, an innovative cache replacement algorithm for transcoding proxies. In addition, an effective data structure is designed to facilitate the management of the multiple versions of different objects cached in the transcoding proxy. Using an event-driven simulation, it is shown that the algorithm proposed consistently outperforms companion schemes in terms of the delay saving ratios and cache hit ratios. 相似文献

7.

Precise null-pointer analysis

Fausto Spoto 《Software and Systems Modeling》2011,10(2):219-252

In Java, C or C++, attempts to dereference the null value result in an exception or a segmentation fault. Hence, it is important to identify those program points where this undesired behaviour might occur or prove the other program points (and possibly the entire program) safe. To that purpose, null-pointer analysis of computer programs checks or infers non-null annotations for variables and object fields. With few notable exceptions, null-pointer analyses currently use run-time checks or are incorrect or only verify manually provided annotations. In this paper, we use abstract interpretation to build and prove correct a first, flow and context-sensitive static null-pointer analysis for Java bytecode (and hence Java) which infers non-null annotations. It is based on Boolean formulas, implemented with binary decision diagrams. For better precision, it identifies instance or static fields that remain always non-null after being initialised. Our experiments show this analysis faster and more precise than the correct null-pointer analysis by Hubert, Jensen and Pichardie. Moreover, our analysis deals with exceptions, which is not the case of most others; its formulation is theoretically clean and its implementation strong and scalable. We subsequently improve that analysis by using local reasoning about fields that are not always non-null, but happen to hold a non-null value when they are accessed. This is a frequent situation, since programmers typically check a field for non-nullness before its access. We conclude with an example of use of our analyses to infer null-pointer annotations which are more precise than those that other inference tools can achieve. 相似文献

8.

Data cache organization for accurate timing analysis

Martin Schoeberl Benedikt Huber Wolfgang Puffitsch 《Real-Time Systems》2013,49(1):1-28

Caches are essential to bridge the gap between the high latency main memory and the fast processor pipeline. Standard processor architectures implement two first-level caches to avoid a structural hazard in the pipeline: an instruction cache and a data cache. For tight worst-case execution times it is important to classify memory accesses as either cache hit or cache miss. The addresses of instruction fetches are known statically and static cache hit/miss classification is possible for the instruction cache. The access to data that is cached in the data cache is harder to predict statically. Several different data areas, such as stack, global data, and heap allocated data, share the same cache. Some addresses are known statically, other addresses are only known at runtime. With a standard cache organization all those different data areas must be considered by worst-case execution time analysis. In this paper we propose to split the data cache for the different data areas. Data cache analysis can be performed individually for the different areas. Access to an unknown address in the heap does not destroy the abstract cache state for other data areas. Furthermore, we propose to use a small, highly associative cache for the heap area. We designed and implemented a static analysis for this cache, and integrated it into a worst-case execution time analysis tool. 相似文献

9.

CMP体系结构上非包含高速缓存的设计及性能分析

冯昊吴承勇《计算机工程与设计》2008,29(7):1595-1600

半导体技术的发展使得在芯片上集成数十亿个晶体管成为可能.目前工业界和学术界倾向于采用片上多处理器体系结构(CMP),对于此类结构,芯片性能受片外访存影响较大,因此如何组织片上高速缓存层次结构是一个关键.针对此问题,提出采用非包含高速缓存组织片上最后一级高速缓存,以降低片外访存次数.并通过对Splash2部分测试程序的详细模拟,对CMP上高速缓存层次结构的不同组织方式做了比较.数据显示非包含高速缓存最多可使平均访存时间降低8.3%.同时,指出非包含高速缓存有助于节省片上资源的特性,并给出片上集成三级高速缓存后CMP上高速缓存层次结构的设计建议. 相似文献

10.

On learning context-free and context-sensitive languages

Boden M. Wiles J. 《Neural Networks, IEEE Transactions on》2002,13(2):491-493

The long short-term memory (LSTM) is not the only neural network which learns a context sensitive language. Second-order sequential cascaded networks (SCNs) are able to induce means from a finite fragment of a context-sensitive language for processing strings outside the training set. The dynamical behavior of the SCN is qualitatively distinct from that observed in LSTM networks. Differences in performance and dynamics are discussed. 相似文献

11.

基于最小延迟代价的Web缓存替换算法研究

韩英杰石磊《计算机工程与设计》2008,29(8):1925-1928

命中率、字节命中率和延迟时间是Web缓存系统中最重要的性能指标,但是却难以准确、合理地度量不同大小的Web对象的访问延迟.引入字节延迟的概念,为不同的对象延迟建立了一个比较合理的评价标准.提出最小延迟代价的Web缓存替换算法LLC,使用户访问的延迟时间尽可能缩短.实验结果表明,与常用的缓存替换算法相比,LLC算法在有效减少用户感知的访问延迟方面具有较好的性能表现. 相似文献

12.

Response-time analysis for fixed-priority systems with a write-back cache

Robert I. Davis Sebastian Altmeyer Jan Reineke 《Real-Time Systems》2018,54(4):912-963

This paper introduces analyses of write-back caches integrated into response-time analysis for fixed-priority preemptive and non-preemptive scheduling. For each scheduling paradigm, we derive four different approaches to computing the additional costs incurred due to write backs. We show the dominance relationships between these different approaches and note how they can be combined to form a single state-of-the-art approach in each case. The evaluation explores the relative performance of the different methods using a set of benchmarks, as well as making comparisons with no cache and a write-through cache. We also explore the effect of write buffers used to hide the latency of write-through caches. We show that depending upon the depth of the buffer used and the policies employed, such buffers can result in domino effects. Our evaluation shows that even ignoring domino effects, a substantial write buffer is needed to match the guaranteed performance of write-back caches. 相似文献

13.

Improving proxy cache performance: analysis of three replacementpolicies

Dilley J. Arlitt M. 《Internet Computing, IEEE》1999,3(6):44-50

Web cache replacement policy choice affects network bandwidth demand and object hit rate, which affect page load time. Two new policies implemented in the Squid cache server show marked improvement over the standard mechanism 相似文献

14.

Proxy cache algorithms: design, implementation, and performance 总被引：4，自引：0，他引：4

Shim J. Scheuermann P. Vingralek R. 《Knowledge and Data Engineering, IEEE Transactions on》1999,11(4):549-562

Caching at proxy servers is one of the ways to reduce the response time perceived by World Wide Web users. Cache replacement algorithms play a central role in the response time reduction by selecting a subset of documents for caching, so that a given performance metric is maximized. At the same time, the cache must take extra steps to guarantee some form of consistency of the cached documents. Cache consistency algorithms enforce appropriate guarantees about the staleness of the cached documents. We describe a unified cache maintenance algorithm, LNC-R-WS-U, which integrates both cache replacement and consistency algorithms. The LNC-R-WS-U algorithm evicts documents from the cache based on the delay to fetch each document into the cache. Consequently, the documents that took a long time to fetch are preferentially kept in the cache. The LNC-R-W3-U algorithm also considers in the eviction consideration the validation rate of each document, as provided by the cache consistency component of LNC-R-WS-U. Consequently, documents that are infrequently updated and thus seldom require validations are preferentially retained in the cache. We describe the implementation of LNC-R-W3-U and its integration with the Apache 1.2.6 code base. Finally, we present a trace-driven experimental study of LNC-R-W3-U performance and its comparison with other previously published algorithms for cache maintenance 相似文献

15.

Scalable and precise refinement of cache timing analysis via path-sensitive verification

Sudipta Chattopadhyay Abhik Roychoudhury 《Real-Time Systems》2013,49(4):517-562

Hard real-time systems require absolute guarantees in their execution times. Worst case execution time (WCET) of a program has therefore become an important problem to address. However, performance enhancing features of a processor (e.g. cache) make WCET analysis a difficult problem. In this paper, we propose a novel analysis framework by combining abstract interpretation and program verification for different varieties of cache analysis ranging from single to multi-core platforms. Our framework can be instantiated with different program verification techniques, such as model checking and symbolic execution. Our modeling is used to develop a precise yet scalable timing analysis method on top of the Chronos WCET analysis tool. Experimental results demonstrate that we can obtain significant improvement in precision with reasonable analysis time overhead. 相似文献

16.

An analysis of cache performance for a hypercube multicomputer

Stunkel C.B. Fuchs W.K. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):421-432

Multicomputer cache simulation results derived from address traces collected from an Intel iPSC/2 hypercube multicomponent are presented. The primary emphasis is on examining how increasing the number of processor nodes executing a parallel application affects the overall multicomputer cache performance. The effects on multicomputer direct-mapped cache performance of application-specific data partitioning, data access patterns, communication distribution, and communication frequency are illustrated. The effects of system accesses on total cache performance are explored, as well as the reasons for application-specific differences in cache behavior for system and user accesses. Comparing user code results with full user and system code analysis reveals the significant effect of system accesses, and this effect increases with multicomputer size. The time distribution of an application's message-passing operations is found to more strongly affect cache performance than the total amount of time spent in message-passing code 相似文献

17.

A user-access model-driven approach to proxy cache performance analysis

Edward F. Watson Ying Shi Ye-Sho Chen 《Decision Support Systems》1999,25(4):309

World-Wide Web usage has experienced tremendous growth in recent years. This growth has resulted in a significant increase in network and server loads that have adversely affected user response times. Among many viable and available approaches to reducing user response time, Web caching has received considerable attention. The purpose of this paper is to present an empirically derived model of Internet user-access activity, and to demonstrate its usefulness for conducting model-driven discrete simulation studies of cache performance analysis. The user-access model is shown to be a reasonable representation of Internet activity, and the user-access approach to cache performance analysis is shown to be a favorable alternative to trace-driven simulation. A report on the accuracy of the model and a summary of the findings are presented. 相似文献

18.

Pain monitoring: A dynamic and context-sensitive system

Zakia Hammal Miriam Kunz 《Pattern recognition》2012,45(4):1265-1280

The current paper presents an automatic and context sensitive system for the dynamic recognition of pain expression among the six basic facial expressions and neutral on acted and spontaneous sequences. A machine learning approach based on the Transferable Belief Model, successfully used previously to categorize the six basic facial expressions in static images [2], [61], is extended in the current paper for the automatic and dynamic recognition of pain expression from video sequences in a hospital context application. The originality of the proposed method is the use of the dynamic information for the recognition of pain expression and the combination of different sensors, permanent facial features behavior, transient features behavior, and the context of the study, using the same fusion model. Experimental results, on 2-alternative forced choices and, for the first time, on 8-alternative forced choices (i.e. pain expression is classified among seven other facial expressions), show good classification rates even in the case of spontaneous pain sequences. The mean classification rates on acted and spontaneous data reach 81.2% and 84.5% for the 2-alternative and 8-alternative forced choices, respectively. Moreover, the system performances compare favorably to the human observer rates (76%), and lead to the same doubt states in the case of blend expressions. 相似文献

19.

PRESC ^{2}: efficient self-reconfiguration of cache strategies for elastic caching platforms

Xiulei Qin Wei Wang Wenbo Zhang Jun Wei Xin Zhao Hua Zhong Tao Huang 《Computing》2014,96(5):415-451

Elastic caching platforms (ECPs) play an important role in accelerating the performance of Web applications. Several cache strategies have been proposed for ECPs to manage data access and distributions while maintaining the service availability. In our earlier research, we have demonstrated that there is no “one-fits-all” strategy for heterogeneous scenarios and the selection of the optimal strategy is related with workload patterns, cluster size and the number of concurrent users. In this paper, we present a new reconfiguration framework named PRESC $^{2}$ . It applies machine learning approaches to determine an optimal cache strategy and supports online optimization of performance model through trace-driven simulation or semi-supervised classification. Besides, the authors also propose a robust cache entries synchronization algorithm and a new optimization mechanism to further lower the adaptation costs. In our experiments, we find that PRESC $^{2}$ improves the elasticity of ECPs and brings big performance gains when compared with static configurations. 相似文献

20.

A note on multihead automata and context-sensitive languages

Walter J. Savitch 《Acta Informatica》1973,2(3):249-252

Summary In this note we show that the following two statements are equivalent: (1) Any language over a one-letter alphabet, which is accepted by some nondeterministic multihead automaton can also be accepted by some deterministic multihead automaton. (2) Deterministic and nondeterministic context-sensitive languages are the same.This research was supported by NSF grant GJ-27278. 相似文献