首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sharangpani  H. Arora  H. 《Micro, IEEE》2000,20(5):24-43
The Itanium processor is the first implementation of the IA-64 instruction set architecture (ISA). The design team optimized the processor to meet a wide range of requirements: high performance on Internet servers and workstations, support for 64-bit addressing, reliability for mission-critical applications, full IA-32 instruction set compatibility in hardware, and scalability across a range of operating systems and platforms. The processor employs EPIC (explicitly parallel instruction computing) design concepts for a tighter coupling between hardware and software. In this design style the hardware-software interface lets the software exploit all available compilation time information and efficiently deliver this information to the hardware. It addresses several fundamental performance bottlenecks in modern computers, such as memory latency, memory address disambiguation, and control flow dependencies  相似文献   

2.
The Itanium processor cartridge is a packaging optimization for electrical and thermal performance in a server environment. The 3-in. x 5-in. cartridge contains the Itanium CPU, up to 4 megabytes of level-3 (L3) cache, an innovative power delivery scheme, and an integrated vapor chamber thermal spreading lid for removing power. Cartridges and a chip set can be ganged electrically by means of a glueless bidirectional, multidrop system bus. Power is delivered through a custom connection with separate voltages for the 0.18-micron CPU and 0.25-micron custom cache devices. An I2C serial connection provides access to system management features such as temperature monitoring and cartridge identification information  相似文献   

3.
Montecito: a dual-core, dual-thread Itanium processor   总被引:2,自引:0,他引:2  
McNairy  C. Bhatia  R. 《Micro, IEEE》2005,25(2):10-20
Intel's Montecito is the first Itanium processor to feature duplicate, dual-thread cores and cache hierarchies on a single die. It features a landmark 1.72 billion transistors and server-focused technologies, and it requires only 100 watts of power. Intel's Itanium 2 processor series has regularly delivered additional performance through the increased frequency and cache as evidenced by the 6-Mbyte and 9-Mbyte versions.  相似文献   

4.
Rusu  S. Muljono  H. Cherkauer  B. 《Micro, IEEE》2004,24(2):10-18
The third-generation Itanium processor targets the high-performance server and workstation market. To do so, the design team sought to provide higher performance through increased frequency and a larger L3 cache. At the same time, we had to limit the power dissipation to fit into the existing platform envelope. These considerations led to what we now call the Itanium 2 processor 6M: the latest generation of Itanium 2, which features a 6-Mbyte, 24-way set-associative on-die L3 cache. The design implements a 2-bundle 64-bit explicitly parallel instruction computing (EPIC) architecture and is fully compatible with previous implementations. Although this processors frequency is 50 percent higher than that of the previous generation, the maximum power dissipation holds flat at 130 W to ensure the platform's backward compatibility.  相似文献   

5.
张宇峰 《微机发展》2006,16(8):69-71
Itanium2处理器以寄存器组的形式提供的性能监视单元实现了在程序运行过程中捕捉微结构事件的功能。文中介绍了以Linux为Itanium2的性能监视单元提供的接口perfmon为基础的开发相对高端的性能分析工具的方法,以实现对这些由性能监视硬件提供的数据进行综合处理利用。  相似文献   

6.
Itanium2处理器以寄存器组的形式提供的性能监视单元实现了在程序运行过程中捕捉微结构事件的功能。文中介绍了以Linux为Itarium2的性能监视单元提供的接口perfmon为基础的开发相对高端的性能分析工具的方法,以实现对这些由性能监视硬件提供的数据进行综合处理利用。  相似文献   

7.
Each new microprocessor endeavor strives to achieve the performance gains projected by Moore's law. Such performance arises, in part, from innovative and often from complex microarchitectural features. This trend of increasing functional complexity has already exacerbated the challenge of design validation, making validation the critical path to tapeout. Traditional approaches to functional validation include both focused case writing and the development of random-code generators. In either case, this method is limited to engineering "thought" experiments - the human mind can only process a finite set of states in a seemingly infinite machine state space. In April 2000, the functional model for the Itanium 2 design was nearing tape-out quality. Engineers had written most focused cases to satisfy test plan goals; the random-code generators were mature and pounding away at the RTL model with no restrictions; and the bug rate was steadily decreasing for most units. Despite this encouraging trend, engineers were still concerned with the functional quality of the exception control unit (XPN), one of the most control-logic-intensive units on the chip.  相似文献   

8.
基于安腾2的机群系统的实现与应用   总被引:2,自引:0,他引:2       下载免费PDF全文
本文设计并实现了一个基于安腾2处理器的机群计算系统,并结合安腾2处理器和机群系统的特性,对气象应用并行程序进行了I/O问题优化、通信优化、计算代价优化和通信数据的Cache利用率优化,以发挥该机群系统的长处,规避其弱点。测试结果表明,该机群系统适合气象应用并行软件的高效并行计算。  相似文献   

9.
使用ItaniumCompiler7.0编译器对现有分组密码算法的C语言实现进行编译得到汇编代码,在对这些汇编代码进行分析时可以发现编译器并没有充分利用Itanium处理器提供的资源。针对这一问题,该文提出了在Itanium处理器上有效实现常用密码算法的方法,主要是利用Itanium处理器指令集中提供的SIMD指令提高处理的并行性,并探讨了Itanium处理器SIMD指令的使用方法。  相似文献   

10.
分支预测技术可消除分支指令之后损失的周期,防止流水线断流.高比率的分支预测精确度是高性能微处理器性能的保证.本文详细分析了安腾处理器(Itanium)多级分支预测机制,并研究了每级预测器的具体实现.  相似文献   

11.
为研究内存故障对高可用服务器的影响,针对安腾架构的计算机提出一种多层次的内存故障注入方法,设计并实现一种新的故障注入器(HMFI),通过在物理层、操作系统内核层和进程层注入内存故障,考察目标系统对内存故障的容错能力。实验结果表明,HMFI注入的内存故障能够有效验证与分析复杂计算机系统的容错性能。  相似文献   

12.
分支预测技术可消除分支指令之后损失的周期,防止流水线断流。高比率的分支预测精确度是高性能微处理器性能的保证。本文详细分析了安腾处理器(Itanium)多级分支预测机制,并研究了每级预测器的具体实现。  相似文献   

13.
14.
Parallel workloads on shared-memory multi-core processors often suffer from performance degradation. Cache eviction, true/false sharing and bus contention are among the well-understood causes to this problem. This paper presents a study that shows the L2 DPL (data prefetch logic) in processors based on Intel Core microarchitecture can be a cause to this problem as well. The study through a case of an image integration finds the nonscaling problem on the parallel integration of images whose size exceeds the capacity of the processor’s L2 cache. Through an analysis on relevant performance events using Intel VTune™Performance Analyser the L2 DPL prefetch is found less effective over the parallel integration in prefetching needed data than over the serial ones. To resolve the problem a novel parallel image reverse loading is developed with the purpose of reducing the number of memory accesses over the parallel integration and the associated delay. Experimental results demonstrate that the parallel integration after the parallel reverse loading shows significant speedup against the same parallel integration but after serial loading.  相似文献   

15.
Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)—an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor.  相似文献   

16.
17.
Hunt  W.A.  Jr. Sawada  J. 《Micro, IEEE》1999,19(3):47-55
Hardware verification accounts for a considerable portion of the costs in the microprocessor design process. Traditionally designers have verified microprocessor designs using simulation techniques that help find most design faults. However, simulation never guarantees the correct operation of the final product. Some design faults are very difficult to detect by simulation; they may slip through the verification process into manufactured chips, raising costs. We believe that verification costs can be reduced by the judicious application of formal methods, which should lower the overall costs of design  相似文献   

18.
EPIC (explicitly parallel instruction computing) architectures, exemplified by the Intel Itanium, support a number of advanced architectural features, such as explicit instruction-level parallelism, instruction predication, and speculative loads from memory. However, compiler optimizations that take advantage of these features can profoundly restructure the program's code, making it potentially difficult to reconstruct the original program logic from an optimized Itanium executable. This paper describes techniques to undo some of the effects of such optimizations and thereby improve the quality of reverse engineering such executables.  相似文献   

19.
Papworth  D.B. 《Micro, IEEE》1996,16(2):8-15
Designing a wholly new microprocessor is difficult and expensive. To justify this effort, a major new microarchitecture must improve performance one and a half or two times over the previous-generation microarchitecture, when evaluated on equivalent process technology. In addition, semiconductor process technology continues to evolve while the processor design is in progress. The previous-generation microarchitecture increases in clock speed and performance due to compactions and conversion to newer technology. A new microarchitecture must “intercept” the process technology to achieve a compounding of process and microarchitectural speedups. This paper looks at a large microprocessor development project which reveals some of the reasoning (for goals, changes, trade-offs, and performance simulation) that lay behind its final form  相似文献   

20.
Designers face many choices when planning a new high-performance, general purpose microprocessor. Options include superscalar organization (the ability to dispatch and execute more than one instruction at a time), out-of-order issue of instructions, speculative execution, branch prediction, and cache hierarchy. However, the interaction of multiple microarchitecture features is often counterintuitive, raising questions concerning potential performance benefits and other effects on various workloads. Complex design trade-offs require accurate and timely performance modeling, which in turn requires flexible, efficient environments for exploring microarchitecture processor performance. Workload-driven simulation models are essential for microprocessor design space exploration. A processor model must ideally: capture in sufficient detail those features that are already well defined; make evolving assumptions and approximations in interpreting the desired execution semantics for those features that are not yet well defined; and be validated against the existing specification. These requirements suggest the need for an evolving but reasonably precise specification, so that validating against such a specification provides confidence in the results. Processor model validation normally relies on behavioral timing specifications based on test cases that exercise the microarchitecture. This approach, commonly used in simulation-based functional validation methods, is also useful for performance validation. In this article, we describe a workload driven simulation environment for PowerPC processor microarchitecture performance exploration. We summarize the environment's properties and give examples of its usage  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号