期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays

Cantin J.F. Smith J.E. Lipasti M.H. Moshovos A. Falsafi B. 《Micro, IEEE》2006,26(1):70-79

Coarse-grain coherence tracking is a new technique that extends a conventional coherence mechanism and optimizes coherence enforcement. It monitors the coherence status of large regions of memory and uses that information to avoid unnecessary broadcasts and filter unnecessary cache tag lookups, thus improving system performance and power consumption. 相似文献

2.

Microarchitectural innovations: boosting microprocessor performancebeyond semiconductor technology scaling

Moshovos A. Sohi G.S. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2001,89(11):1560-1575

Semiconductor technology scaling provides faster and more plentiful transistors to build microprocessors, and applications continue to drive the demand for more powerful microprocessors. Weaving the "raw" semiconductor material into a microprocessor that offers the performance needed by modern and future applications is the role of computer architecture. This paper overviews some of the microarchitectural techniques that empower modem high-performance microprocessors. The techniques are classified into: 1) techniques meant to increase the concurrency in instruction processing, while maintaining the appearance of sequential processing and 2) techniques that exploit program behavior. The first category includes pipelining, superscalar execution, out-of-order execution, register renaming, and techniques to overlap memory-accessing instructions. The second category includes memory hierarchies, branch predictors, trace caches, and memory-dependence predictors. The paper also discusses microarchitectural techniques likely to be used in future microprocessors, including data value speculation and instruction reuse, microarchitectures with multiple sequencers and thread-level speculation, and microarchitectural techniques for tackling the problems of power consumption and reliability 相似文献

3.

A Building Block for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

Zebchuk J. Moshovos A. 《Computer Architecture Letters》2007,6(2):33-36

Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed memory hierarchy enhancements for coherence traffic reduction and prefetching suggest that additional useful patterns emerge with a macroscopic, coarse-grain view. This paper presents RegionTracker, a dual-grain, on-chip cache design that exposes coarse-grain behavior while maintaining block-level communication. RegionTracker eliminates the extraneous, often imprecise coarse-grain tracking structures of previous proposals. It can be used as the building block for coarse-grain optimizations, reducing their overall cost and easing their adoption. Using full-system simulation of a quad-core chip multiprocessor and commercial workloads, we demonstrate that RegionTracker overcomes the inefficiencies of previous coarse-grain cache designs. We also demonstrate how RegionTracker boosts the benefits and reduces the cost of a previously proposed snoop reduction technique. 相似文献

4.

Speculative Memory Cloaking and Bypassing

Andreas Moshovos Gurindar S. Sohi 《International journal of parallel programming》1999,27(6):427-456

We revisit memory hierarchy design viewing memory as an inter-operation communication mechanism. We show how dynamically collected information about inter-operation memory communication can be used to improve memory latency. We propose two techniques: (1) Speculative Memory Cloaking, and (2) Speculative Memory Bypassing. In the first technique, we use memory dependence prediction to speculatively identify dependent loads and stores early in the pipeline. These instructions may then communicate prior to address calculation and disambiguation via a fast communication mechanism. In the second technique, we use memory dependence prediction to speculatively transform DEF-store-load-USE dependence chains within the instruction window into DEF-USE ones. As a result, dependent stores and loads are taken off the communication path resulting in further reduction in communication latency. Experimental analysis shows that our methods, on the average, correctly handle 40% (integer) and 19% (floating point) of all memory loads. Moreover, our techniques result in performance improvements of 4.28% (integer) and 3.20% (floating point) over a highly aggressive, dynamically scheduled processor implementing naive memory dependence speculation. We also study the value and address locality characteristics of the values our methods correctly handle. We demonstrate that our methods are orthogonal to both address and value prediction. 相似文献

5.

Low-leakage asymmetric-cell SRAM

Azizi N. Najm F.N. Moshovos A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(4):701-715

We introduce a novel family of asymmetric dual-V/sub t/ static random access memory cell designs that reduce leakage power in caches while maintaining low access latency. Our designs exploit the strong bias toward zero at the bit level exhibited by the memory value stream of ordinary programs. Compared to conventional symmetric high-performance cells, our cells offer significant leakage reduction in the zero state and, in some cases, also in the one state, albeit to a lesser extent. A novel sense amplifier, in combination with dummy bitlines, allows for read times to be on par with conventional symmetric cells. With one cell design, leakage is reduced by 7/spl times/ (in the zero state) with no performance degradation, but with a stability degradation of 6%. Another cell design reduces leakage by 2/spl times/ (in the zero state) with no performance or stability loss. An alternative cell design reduces leakage by 58/spl times/ (in the zero state) with a performance degradation of 1% and an area increase of 2.4% and no stability degradation. 相似文献