期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sharangpani H. Arora H. 《Micro, IEEE》2000,20(5):24-43

The Itanium processor is the first implementation of the IA-64 instruction set architecture (ISA). The design team optimized the processor to meet a wide range of requirements: high performance on Internet servers and workstations, support for 64-bit addressing, reliability for mission-critical applications, full IA-32 instruction set compatibility in hardware, and scalability across a range of operating systems and platforms. The processor employs EPIC (explicitly parallel instruction computing) design concepts for a tighter coupling between hardware and software. In this design style the hardware-software interface lets the software exploit all available compilation time information and efficiently deliver this information to the hardware. It addresses several fundamental performance bottlenecks in modern computers, such as memory latency, memory address disambiguation, and control flow dependencies 相似文献

2.

The IA-64 Itanium processor cartridge

Samaras W.A. Cherukuri N. Venkataraman S. 《Micro, IEEE》2001,21(1):82-89

The Itanium processor cartridge is a packaging optimization for electrical and thermal performance in a server environment. The 3-in. x 5-in. cartridge contains the Itanium CPU, up to 4 megabytes of level-3 (L3) cache, an innovative power delivery scheme, and an integrated vapor chamber thermal spreading lid for removing power. Cartridges and a chip set can be ganged electrically by means of a glueless bidirectional, multidrop system bus. Power is delivered through a custom connection with separate voltages for the 0.18-micron CPU and 0.25-micron custom cache devices. An I²C serial connection provides access to system management features such as temperature monitoring and cartridge identification information 相似文献

3.

Montecito: a dual-core, dual-thread Itanium processor 总被引：2，自引：0，他引：2

McNairy C. Bhatia R. 《Micro, IEEE》2005,25(2):10-20

Intel's Montecito is the first Itanium processor to feature duplicate, dual-thread cores and cache hierarchies on a single die. It features a landmark 1.72 billion transistors and server-focused technologies, and it requires only 100 watts of power. Intel's Itanium 2 processor series has regularly delivered additional performance through the increased frequency and cache as evidenced by the 6-Mbyte and 9-Mbyte versions. 相似文献

4.

Itanium 2 processor 6M: higher frequency and larger L3 cache

Rusu S. Muljono H. Cherkauer B. 《Micro, IEEE》2004,24(2):10-18

The third-generation Itanium processor targets the high-performance server and workstation market. To do so, the design team sought to provide higher performance through increased frequency and a larger L3 cache. At the same time, we had to limit the power dissipation to fit into the existing platform envelope. These considerations led to what we now call the Itanium 2 processor 6M: the latest generation of Itanium 2, which features a 6-Mbyte, 24-way set-associative on-die L3 cache. The design implements a 2-bundle 64-bit explicitly parallel instruction computing (EPIC) architecture and is fully compatible with previous implementations. Although this processors frequency is 50 percent higher than that of the previous generation, the maximum power dissipation holds flat at 130 W to ensure the platform's backward compatibility. 相似文献

5.

利用Itanium2的PMU部件开发程序性能分析工具

张宇峰《微机发展》2006,16(8):69-71

Itanium2处理器以寄存器组的形式提供的性能监视单元实现了在程序运行过程中捕捉微结构事件的功能。文中介绍了以Linux为Itanium2的性能监视单元提供的接口perfmon为基础的开发相对高端的性能分析工具的方法,以实现对这些由性能监视硬件提供的数据进行综合处理利用。相似文献

6.

利用Itanium2的PMU部件开发程序性能分析工具

张宇峰《计算机技术与发展》2006,16(8):69-71

Itanium2处理器以寄存器组的形式提供的性能监视单元实现了在程序运行过程中捕捉微结构事件的功能。文中介绍了以Linux为Itarium2的性能监视单元提供的接口perfmon为基础的开发相对高端的性能分析工具的方法，以实现对这些由性能监视硬件提供的数据进行综合处理利用。相似文献

7.

Validating the Itanium 2 exception control unit: a unit-level approach

Scafidi C. Gibson J.D. Bhatia R. 《Design & Test of Computers, IEEE》2004,21(2):94-101

Each new microprocessor endeavor strives to achieve the performance gains projected by Moore's law. Such performance arises, in part, from innovative and often from complex microarchitectural features. This trend of increasing functional complexity has already exacerbated the challenge of design validation, making validation the critical path to tapeout. Traditional approaches to functional validation include both focused case writing and the development of random-code generators. In either case, this method is limited to engineering "thought" experiments - the human mind can only process a finite set of states in a seemingly infinite machine state space. In April 2000, the functional model for the Itanium 2 design was nearing tape-out quality. Engineers had written most focused cases to satisfy test plan goals; the random-code generators were mature and pounding away at the RTL model with no restrictions; and the bug rate was steadily decreasing for most units. Despite this encouraging trend, engineers were still concerned with the functional quality of the exception control unit (XPN), one of the most control-logic-intensive units on the chip. 相似文献

8.

基于Itanium处理器的密码算法实现

陈迅姜晶菲张民选《计算机工程与应用》2004,40(15):40-42,208

使用ItaniumCompiler7.0编译器对现有分组密码算法的C语言实现进行编译得到汇编代码,在对这些汇编代码进行分析时可以发现编译器并没有充分利用Itanium处理器提供的资源。针对这一问题,该文提出了在Itanium处理器上有效实现常用密码算法的方法,主要是利用Itanium处理器指令集中提供的SIMD指令提高处理的并行性,并探讨了Itanium处理器SIMD指令的使用方法。相似文献

9.

Chip-level microarchitecture trends

《Micro, IEEE》2004,24(2):5-5

相似文献

10.

安腾处理器中多级分支预测机制

苏铭赵荣彩宋宗宇《微计算机信息》2005,(31):98-99

分支预测技术可消除分支指令之后损失的周期,防止流水线断流.高比率的分支预测精确度是高性能微处理器性能的保证.本文详细分析了安腾处理器(Itanium)多级分支预测机制,并研究了每级预测器的具体实现. 相似文献

11.

安腾处理器中多级分支预测机制

苏铭赵荣彩宋宗宇《微计算机信息》2005,(21)

分支预测技术可消除分支指令之后损失的周期,防止流水线断流。高比率的分支预测精确度是高性能微处理器性能的保证。本文详细分析了安腾处理器(Itanium)多级分支预测机制,并研究了每级预测器的具体实现。相似文献

12.

Resolving a L2-prefetch-caused parallel nonscaling on Intel Core microarchitecture

Nan Zhang Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(7):915-924

Parallel workloads on shared-memory multi-core processors often suffer from performance degradation. Cache eviction, true/false sharing and bus contention are among the well-understood causes to this problem. This paper presents a study that shows the L2 DPL (data prefetch logic) in processors based on Intel Core microarchitecture can be a cause to this problem as well. The study through a case of an image integration finds the nonscaling problem on the parallel integration of images whose size exceeds the capacity of the processor’s L2 cache. Through an analysis on relevant performance events using Intel VTune™Performance Analyser the L2 DPL prefetch is found less effective over the parallel integration in prefetching needed data than over the serial ones. To resolve the problem a novel parallel image reverse loading is developed with the purpose of reducing the number of memory accesses over the parallel integration and the associated delay. Experimental results demonstrate that the parallel integration after the parallel reverse loading shows significant speedup against the same parallel integration but after serial loading. 相似文献

13.

The QC-2 parallel Queue processor architecture

Ben A. Abderazek Arquimedes CanedoAuthor VitaeTsutomu YoshinagaAuthor Vitae Masahiro SowaAuthor Vitae 《Journal of Parallel and Distributed Computing》2008

Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)—an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor. 相似文献

14.

Verifying the FM9801 microarchitecture

Hunt W.A. Jr. Sawada J. 《Micro, IEEE》1999,19(3):47-55

Hardware verification accounts for a considerable portion of the costs in the microprocessor design process. Traditionally designers have verified microprocessor designs using simulation techniques that help find most design faults. However, simulation never guarantees the correct operation of the final product. Some design faults are very difficult to detect by simulation; they may slip through the verification process into manufactured chips, raising costs. We believe that verification costs can be reduced by the judicious application of formal methods, which should lower the overall costs of design 相似文献

15.

Tuning the Pentium Pro microarchitecture

Papworth D.B. 《Micro, IEEE》1996,16(2):8-15

Designing a wholly new microprocessor is difficult and expensive. To justify this effort, a major new microarchitecture must improve performance one and a half or two times over the previous-generation microarchitecture, when evaluated on equivalent process technology. In addition, semiconductor process technology continues to evolve while the processor design is in progress. The previous-generation microarchitecture increases in clock speed and performance due to compactions and conversion to newer technology. A new microarchitecture must “intercept” the process technology to achieve a compounding of process and microarchitectural speedups. This paper looks at a large microprocessor development project which reveals some of the reasoning (for goals, changes, trade-offs, and performance simulation) that lay behind its final form 相似文献

16.

Environment for PowerPC microarchitecture exploration

Moudgill M. Wellman J.-D. Moreno J.H. 《Micro, IEEE》1999,19(3):15-25

Designers face many choices when planning a new high-performance, general purpose microprocessor. Options include superscalar organization (the ability to dispatch and execute more than one instruction at a time), out-of-order issue of instructions, speculative execution, branch prediction, and cache hierarchy. However, the interaction of multiple microarchitecture features is often counterintuitive, raising questions concerning potential performance benefits and other effects on various workloads. Complex design trade-offs require accurate and timely performance modeling, which in turn requires flexible, efficient environments for exploring microarchitecture processor performance. Workload-driven simulation models are essential for microprocessor design space exploration. A processor model must ideally: capture in sufficient detail those features that are already well defined; make evolving assumptions and approximations in interpreting the desired execution semantics for those features that are not yet well defined; and be validated against the existing specification. These requirements suggest the need for an evolving but reasonably precise specification, so that validating against such a specification provides confidence in the results. Processor model validation normally relies on behavioral timing specifications based on test cases that exercise the microarchitecture. This approach, commonly used in simulation-based functional validation methods, is also useful for performance validation. In this article, we describe a workload driven simulation environment for PowerPC processor microarchitecture performance exploration. We summarize the environment's properties and give examples of its usage 相似文献

17.

The IXM2 parallel associative processor for AI

Higuchi T. Handa K. Takahashi N. Furuya T. Iida H. Sumita E. Oi O. Kitano H. 《Computer》1994,27(11):53-63

Describes the IXM2 associative processor and its main application in speech-to-speech translation. The IXM2 is a semantic memory system machine that began as a faithful implementation of the NETL semantic network machine and grew into a massively parallel SIMD machine that has demonstrated the power of large associative memories. Such processors can support robust performance in speech applications. In fact, the IXM2 with 73 transputers has outperformed a Cray in some language-translation tasks. We selected speech-to-speech translation as our main application because it is one of the grand challenges of massively parallel artificial intelligence. The social implications of successful automatic translation are enormous-e.g. people who speak different languages could communicate in real time by using interpreting telephony 相似文献

18.

Hyperthreading technology in the netburst microarchitecture

Koufaty D. Marr D.T. 《Micro, IEEE》2003,23(2):56-65

Hyperthreading technology, which brings the concept of simultaneous multithreading to the Intel architecture, was first introduced on the Intel Xeon processor in early 2002 for the server market. In November 2002, Intel launched the technology on the Intel Pentium 4 at clock frequencies of 3.06 GHz and higher, making the technology widely available to the consumer market. This technology signals a new direction in microarchitecture development and fundamentally changes the cost-benefit tradeoffs of microarchitecture design choices. This article describes how the technology works, that is, how we make a single physical processor appear as multiple logical processors to operating systems and software. We highlight the additional structures and die area needed to implement the technology and discuss the fundamental ideas behind the technology and why we can get a 25-percent boost in performance from a technology that costs less than 5 percent in added die area. We illustrate the importance of choosing the right sharing policy for each shared resource by describing, examining, and comparing three different sharing policies: partitioned resources, threshold sharing, and full sharing. The choice of policy depends on the traffic pattern, complexity and size of the resource, potential deadlock/livelock scenarios, and other considerations. Finally, we show how this technology significantly improves performance on several relevant workloads. 相似文献

19.

Micro's top picks from microarchitecture conferences

Moore C. Rudd K.W. Lee R.B. Bose P. 《Micro, IEEE》2003,23(6):8-10

相似文献

20.

VMW: a visualization-based microarchitecture workbench

Diep T.A. Shen J.P. 《Computer》1995,28(12):57-64

Superscalar processor design requires increasingly sophisticated software tools. The visualization-based microarchitecture workbench described in the paper addresses weaknesses common to most performance simulators: the lack of retargetability, visualization support, and interactive control. VMW provides a multifunction workbench for aiding designers of modern superscalar processors. It facilitates rigorous machine specification by providing specification templates at both the architecture and microarchitecture levels 相似文献