首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Effective address calculations for load and store instructions need to compete for ALU with other instructions and hence extra latencies might be incurred to data cache accesses. Fast address generation is an approach proposed to reduce cache access latencies. This paper presents a fast address generator that can eliminate most of the effective address computations by storing computed effective addresses of previous load/store instructions in a dummy register file. Experimental results show that this fast address generator can reduce effective address computations of load and store instructions by about 74% on average for SPECint2000 benchmarks and cut the execution times by 8.5%. Furthermore, when multiple dummy register files are deployed, this fast address generator eliminates over 90% of effective address computations of load and store instructions and improves the average execution times by 9.3%.  相似文献   

2.
This paper presents CMP-VR (Chip-Multiprocessor with Victim Retention), an approach to improve cache performance by reducing the number of off-chip memory accesses. The objective of this approach is to retain the chosen victim cache blocks on the chip for the longest possible time. It may be possible that some sets of the CMPs last level cache (LLC) are heavily used, while certain others are not. In CMP-VR, some number of ways from every set are used as reserved storage. It allows a victim block from a heavily used set to be stored into the reserve space of another set. In this way the load of heavily used sets are distributed among the underused sets. This logically increases the associativity of the heavily used sets without increasing the actual associativity and size of the cache. Experimental evaluation using full-system simulation shows that CMP-VR has less off-chip miss-rate as compared to baseline Tiled CMP. Results are presented for different cache sizes and associativity for CMP-VR and baseline configuration. The best improvements obtained are 45.5% and 14% in terms of miss rate and cycles per instruction (CPI) respectively for a 4 MB, 4-way set associative LLC. Reduction in CPI and miss rate together guarantees performance improvement.  相似文献   

3.
Cache memories reduce memory latency and traffic in computing systems. Most existing caches are implemented as board-based systems. Advancing VLSI technology will soon permit significant caches to be integrated on chip with the processors they support. In designing on-chip caches, the constraints of VLSI become significant. The primary constraints are economic limitations on circuit area and off-chip communications. The paper explores the design of on-chip instruction-only caches in terms of these constraints. The primary contribution of this work is the development of a unified economic model of on-chip instruction-only cache design which integrates the points of view of the cache designer and of the floorplan architect. With suitable data, this model permits the rational allocation of constrained resources to the achievement of a desired cache performance. Specific conclusions are that random line replacement is superior to LRU replacement, due to an increased flexibility in VLSI floorplan design; that variable set associativity can be an effective tool in regulating a chip's floorplan; and that sectoring permits area efficient caches while avoiding high transfer widths. Results are reported on economic functionality, from chip area and transfer width to miss ratio. These results, or the underlying analysis, can be used by microprocessor architects to make intelligent decisions regarding appropriate cache organizations and resource allocations.  相似文献   

4.
The Journal of Supercomputing - Emerging non-volatile memories (NVMs) are known as promising alternatives to SRAMs in on-chip caches. However, their limited write endurance is a major challenge...  相似文献   

5.
The performance of modern machines is increasingly limited by insufficient memory bandwidth. One way to alleviate this bandwidth limitation for a given program is to minimize the aggregate data volume the program transfers from memory. In this article we present compiler strategies for accomplishing this minimization. Following a discussion of the underlying causes of bandwidth limitations, we present a two-step strategy to exploit global cache reuse—the temporal reuse across the whole program and the spatial reuse across the entire data set used in that program. In the first step, we fuse computation on the same data using a technique called reuse-based loop fusion to integrate loops with different control structures. We prove that optimal fusion for bandwidth is NP-hard and we explore the limitations of computation fusion using perfect program information. In the second step, we group data used by the same computation through the technique of affinity-based data regrouping, which intermixes the storage assignments of program data elements at different granularities. We show that the method is compile-time optimal and can be used on array and structure data. We prove that two extensions—partial and dynamic data regrouping—are NP-hard problems. Finally, we describe our compiler implementation and experiments demonstrating that the new global strategy, on average, reduces memory traffic by over 40% and improves execution speed by over 60% on two high-end workstations.  相似文献   

6.
《Parallel Computing》2014,40(10):710-721
In this paper, we investigate the problem of fair storage cache allocation among multiple competing applications with diversified access rates. Commonly used cache replacement policies like LRU and most LRU variants are inherently unfair in cache allocation for heterogeneous applications. They implicitly give more cache to the applications that has high access rate and less cache to the applications of slow access rate. However, applications of fast access rate do not always gain higher performance from the additional cache blocks. In contrast, the slow application suffer poor performance with a reduced cache size. It is beneficial in terms of both performance and fairness to allocate cache blocks by their utility.In this paper, we propose a partition-based cache management algorithm for a shared cache. The goal of our algorithm is to find an allocation such that all heterogeneous applications can achieve a specified fairness degree as least performance degradation as possible. To achieve this goal, we present an adaptive partition framework, which partitions the shared cache among competing applications and dynamically adjusts the partition size based on predicted utility on both fairness and performance. We implement our algorithm in a storage simulator and evaluate the fairness and performance with various workloads. Experimental results show that, compared with LRU, our algorithm achieves large improvement in fairness and slightly in performance.  相似文献   

7.
The present investigation assessed the putative benefits of reducing instructions for older adults' learning of an assembly task. Young and older adults had to build a product by assembling six components. Two groups practiced following instruction methods that differed in the degree of explicit information they conveyed about the correct assembly order. After practice, retention, consolidation of performance (tested immediately after practice and on a separate day, respectively) and stability of performance (tested by introducing a concurrent second task) were assessed. Younger adults showed similar performance levels for both instruction methods. Older adults, however, showed similar retention but clearly weaker consolidation and stability of performance following less encompassing instructions. Contrary to expectations, enhancing the involvement of explicit processes allowed older adults to gain a more permanent and stable performance improvements. The findings are discussed relative to the characteristics of the assembly task.  相似文献   

8.
A program that accesses an out-of-bound array element can cause unexpected behaviour that is unacceptable to safety-critical or security-critical systems. Two traditional compile-time approaches to array bound checking are flow analysis and program verification. This paper presents a new approach, IFV, that integrates flow analysis and program verification techniques. IFV is generally about as effective as program verification yet runs in about the same time as flow analysis. Its typical runtime is proportional to the product of the program size and the number of declared variables. IFV matches loops to templates, which represent commonly occurring loop patterns, to discover loop invariants automatically, which it then uses to strengthen flow analysis. With only seven templates, it handles many common array-access patterns. Patterns not verified by flow analysis are processed with verification techniques entirely automatically. This paper also describes a prototype IFV system that performs compile-time array bound checking for programs in a subset of C. © 1997 John Wiley & Sons, Ltd.  相似文献   

9.
Web cache replacement policy choice affects network bandwidth demand and object hit rate, which affect page load time. Two new policies implemented in the Squid cache server show marked improvement over the standard mechanism  相似文献   

10.
This study examined the final grade and satisfaction level differences among students taking specific courses using three different methods: face-to-face in class, via satellite broadcasting at remote sites, and via live video-streaming at home or at work. In each case, the same course was taught by the same instructor in all three delivery methods, and an attempt was made to survey students taking the course via the three different delivery methods. MANOVA results indicated no grade or satisfaction level differences among the three populations. Self-reported computer literacy skills revealed a slight fit between the chosen delivery mode and the reported computer literacy skills. These results provide additional evidence to support both the “no significant difference” phenomenon and the use of distance education as a viable, convenient and flexible alternative delivery mode capable of extending learning opportunities to non-traditional students.  相似文献   

11.
《Computer Networks》2002,38(6):779-794
This paper describes the design and use of a synthetic web proxy workload generator called ProWGen to investigate the sensitivity of web proxy cache replacement policies to five selected web workload characteristics. Three representative cache replacement policies are considered in the simulation study: a recency-based policy called least-recently-used, a frequency-based policy called least-frequently-used-with-aging, and a size-based policy called greedy-dual-size.Trace-driven simulations with synthetic workloads from ProWGen show the relative sensitivity of these cache replacement policies to three web workload characteristics: the slope of the Zipf-like document popularity distribution, the degree of temporal locality in the document referencing behaviour, and the correlation (if any) between document size and document popularity. The three replacement policies are relatively insensitive to the percentage of one-timers in the workload, and to the Pareto tail index of the heavy-tailed document size distribution. Performance differences between the three cache replacement policies are also highlighted.  相似文献   

12.
Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.  相似文献   

13.
We introduce Pentagons (), a weakly relational numerical abstract domain useful for the validation of array accesses in byte-code and intermediate languages (IL). This abstract domain captures properties of the form of . It is more precise than the well known Interval domain, but it is less precise than the Octagon domain.The goal of is to be a lightweight numerical domain useful for adaptive static analysis, where is used to quickly prove the safety of most array accesses, restricting the use of more precise (but also more expensive) domains to only a small fraction of the code.We implemented the abstract domain in , a generic abstract interpreter for.NET assemblies. Using it, we were able to validate 83% of array accesses in the core runtime library in a little bit more than 3 minutes.  相似文献   

14.
Considering multi difficulties that determine the labor of rural teachers who perform their teaching practices in semi-isolated contexts, it is necessary to provide them a supportive system which favors their pedagogical performances to benefit rural students’ education. The aim of this phenomenological study is to describe and analyze how e-mentoring can strengthen pedagogical performances of primary rural teachers with complex geographical accesses in Chile, exploring the subjective experiences of four couples of teachers and mentors that take place in this process by e-mail relationship. Results show the necessity of considering the accompaniment as a horizontal pedagogical assistance which can be influenced by the technological resources availability, identifying an adequate profile of e-mentor to influence teacher adherence to the process, such as his communicative style, empathy, pedagogical and cognitive skills. Finally, this investigation allows projecting a viable model to be applied as support for rural education with access to Internet resource.  相似文献   

15.
The aim of this study was to evaluate the effect of a transfer technique education programme (TT) alone or in combination with physical fitness training (TTPT) compared with a control group, who followed their usual routine. Eleven clinical hospital wards were cluster randomised to either intervention (six wards) or to control (five wards). The intervention cluster was individually randomised to TT (55 nurses) and TTPT (50 nurses), control (76 nurses). The transfer technique programme was a 4-d course of train-the-trainers to teach transfer technique to their colleagues. The physical training consisted of supervised physical fitness training 1 h twice per week for 8 weeks. Implementing transfer technique alone or in combination with physical fitness training among a hospital nursing staff did not, when compared to a control group, show any statistical differences according to self-reported low back pain (LBP), pain level, disability and sick leave at a 12-month follow-up. However, the individual randomised intervention subgroup (transfer technique/physical training) significantly improved the LBP-disability (p = 0.001). Although weakened by a high withdrawal rate, teaching transfer technique to nurses in a hospital setting needs to be thoroughly considered. Other priorities such as physical training may be taken into consideration. The current study supports the findings of other studies that introducing transfer technique alone has no effect in targeting LBP. However, physical training seems to have an influence in minimising the LBP consequences and may be important in the discussion of how to prevent LBP or the recurrence of LBP among nursing personnel.  相似文献   

16.
《Ergonomics》2012,55(10):1530-1548
The aim of this study was to evaluate the effect of a transfer technique education programme (TT) alone or in combination with physical fitness training (TTPT) compared with a control group, who followed their usual routine. Eleven clinical hospital wards were cluster randomised to either intervention (six wards) or to control (five wards). The intervention cluster was individually randomised to TT (55 nurses) and TTPT (50 nurses), control (76 nurses). The transfer technique programme was a 4-d course of train-the-trainers to teach transfer technique to their colleagues. The physical training consisted of supervised physical fitness training 1 h twice per week for 8 weeks. Implementing transfer technique alone or in combination with physical fitness training among a hospital nursing staff did not, when compared to a control group, show any statistical differences according to self-reported low back pain (LBP), pain level, disability and sick leave at a 12-month follow-up. However, the individual randomised intervention subgroup (transfer technique/physical training) significantly improved the LBP-disability (p = 0.001). Although weakened by a high withdrawal rate, teaching transfer technique to nurses in a hospital setting needs to be thoroughly considered. Other priorities such as physical training may be taken into consideration. The current study supports the findings of other studies that introducing transfer technique alone has no effect in targeting LBP. However, physical training seems to have an influence in minimising the LBP consequences and may be important in the discussion of how to prevent LBP or the recurrence of LBP among nursing personnel.  相似文献   

17.
The effectiveness of the buffer cache replacement is critical to the performance of I/O systems. In this paper, we propose a degree of inter-reference gap (DIG) based block replacement scheme. This scheme keeps the simplicity of the least recently used (LRU) scheme and does not depend on the detection of access regularities. The proposed scheme is based on the low inter-reference recency set (LIRS) scheme, which is currently known to be very effective. However, the proposed scheme employs several history information items whereas the LIRS scheme uses only one history information item. The overhead of the proposed scheme is almost negligible. To evaluate the performance of the proposed scheme, the comprehensive trace-driven computer simulation is used in general access patterns. Our simulation results show that the cache hit ratio (CHR) in the proposed scheme is improved as much as 65.3% (with an average of 26.6%) compared to the LRU for the same workloads, and up to 6% compared to the LIRS in multi3 trace.  相似文献   

18.
Regular Random k-SAT: Properties of Balanced Formulas   总被引:1,自引:0,他引:1  
We consider a model for generating random k-SAT formulas, in which each literal occurs approximately the same number of times in the formula clauses (regular random and k-SAT). Our experimental results show that such regular random k-SAT instances are much harder than the usual uniform random k-SAT problems. This is in agreement with other results that show that more balanced instances of random combinatorial problems are often much more difficult to solve than uniformly random instances, even at phase transition boundaries. There are almost no formal results known for such problem distributions. The balancing constraints add a dependency between variables that complicates a standard analysis. Regular random 3-SAT exhibits a phase transition as a function of the ratio α of clauses to variables. The transition takes place at approximately α = 3.5. We show that for α > 3.78 with high probability (w.h.p.) random regular 3-SAT formulas are unsatisfiable. Specifically, the events hold with high probability if Pr when n → ∞. We also show that the analysis of a greedy algorithm proposed by Kaporis et al. for the uniform 3-SAT model can be adapted for regular random 3-SAT. In particular, we show that for formulas with ratio α < 2.46, a greedy algorithm finds a satisfying assignment with positive probability.  相似文献   

19.
Seventeen graduate students in two classes worked on a web-based programmed instruction tutoring system as the first technical exercise in a JavaTM programming course. The system taught a simple Java applet to display a text string in a browser window on the world wide web. Students completed tests of near transfer and far transfer before and after using the tutor and again after a lecture on the material. The results showed that performance improved over pre-tutor baseline on all assessments, to include the far transfer test, which required integrating information in the tutor into a rule to apply to solve a novel programming problem not explicitly taught in the tutor. Software self-efficacy also increased across four assessment occasions. These data show that programmed instruction can produce problem solving skills and can foster student confidence, based upon the documented mastery of fundamental material in a technical domain. An investigative approach that follows systematic replication, rather than null hypothesis refutation, may be best suited to assess the impact and dependability of competency-based instructional systems.  相似文献   

20.
The study focused on the effects of the CAL system constructed via cognitive conflict of decimal numbers for the sixth graders. The purpose of the system is to gauge decimal concepts of students. When students entertain misconceptions or misleading ideas, the system will in accordance with the types of the wrong answer generate appropriate cognitive conflicts as feedback to help the students realize the irrational part of their ideas. Then through the instruction screen for cognitive adjustment students’ original concepts are corrected. Research objects of this study were sixth-grade students from an elementary school in Taipei. The study took a quasi-experimental approach that employed pretest–posttest, non-equivalent-group design. The two groups of students were given the pretest, posttest, and postponed test in order to measure the effect of the system. They were also interviewed to see how their concepts had changed. This study found that, although most of the sixth graders did not understand the basic concepts of decimal numbers very well and their misconceptions were similar to those identified by other studies, the result of the experiments showed significant improvement following the use of the computer-aided learning system. The learning system helped students better retain the decimal concepts they acquired. Data from the interviews also indicated that the system, constructed on cognitive conflicts, can help students clear their misconceptions of decimals numbers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号