首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 72 毫秒
1.
现代微处理器转移预测设计策略   总被引:1,自引:0,他引:1  
现代微处理器普遍采用流水线、超级流水线、超标量或VLIW等技术来提高指令级并行度ILP,但也带来由于条件转移指令导致流水线断流的效率损失问题,现代微处理器一般采用转移预测部件来尽量减小这种影响。  相似文献   

2.
1 引言在微处理器设计中,开发指令级并行(ILP)以提高微处理器系统的性能受到了很大的限制。研究更大发射的超标量微处理器已经是一件极其复杂而没有意义的事。但是如果在开发指令级并行的同时,开发数据级并行,理论分析表明可以显著提高微处理器的性能,微处理器的等效IPC(每个时钟周期发射的指令条数)和超标量微处理器相比可以提高20~40多倍。因此,在微处理器系统设计中开发数据级并行具有重要的理论意义和实用价值。  相似文献   

3.
应用于空间关联规则挖掘的ILP方法   总被引:2,自引:0,他引:2  
李宏  蔡之华 《计算机工程与应用》2003,39(16):188-191,197
文章介绍了应用于空间关联规则挖掘的ILP方法。ILP方法全称为归纳逻辑程序设计,这种方法有利于从空间领域发现有价值的知识,系统地研究地理层的层次结构,处理诸多空间对象的空间特性。这种方法已在一个ILP系统SPADA中实现,该文将通过SPADA应用空间数据的一些实例来说明ILP方法的特点。  相似文献   

4.
王国栋  侯朝焕  马杰 《计算机工程与设计》2005,26(8):1980-1981,1985
在现代微处理器设计中,推断和预测成为开发指令级并行性ILP(Instruction-Level Parallelism)的两种重要技术途径。通过移植GCC可以开发出运行在不同系统平台的高效快速的编译系统。分析了GCC对编译优化的支持,总结了推断和预测在GCC移植过程中的实现方案。  相似文献   

5.
作为64住处理器架构的IA-64提供了更高的指令级并行性(ILP),并代表了一种新型微处理器的发展方向,对IA-64二进制指令代码流的自动分析和变换.在基于机器和操作系统的的描述来实现IA-64二进制自动翻译和逆向工程中有重要的意义。本文概述了SLED与IA-64的指令特点,详细介绍了基于SLED对IA-64指令的描述和利用MLTK自动生成反向工具的设计与实现技术.并给出了自动生成反汇编的测试结果。  相似文献   

6.
微处理器功能验证方法研究   总被引:4,自引:0,他引:4  
微处理器验证是微处理器设计的关键环节。该文探讨了微处理器模拟、硬件仿真、形式验证等方法的原理、特点和适用场合,提出了进行多层次微处理器功能验证的总体思路。  相似文献   

7.
结合实用化综合业务接入系统内部标签分组(ILP)在系统中的传输与处理结构,针对ILP在系统中基于多总线背板传输的时延与同步问题,提出了一种实用的“一对多”背板总线传输的自适应bit位同步和ILP包同步解决方案和实现方法,并讨论了空闲字节(IdleBytes)对业务承载效率的影响。  相似文献   

8.
无线传感器网络节点硬件结构的核心是微处理器,而微处理器的选择又必须和通信协议兼容.该设计采用目前广泛应用于无线传感器的IEEE 802.15.4通信协议,并选择与此兼容的JN5121微处理器(JN5121微处理器具有低功耗、低成本等特点).介绍了基于JN5121无线传感器网络的硬件组成,包括微处理器的结构、性能特点、外围电路及硬件平台的结构设计.同时阐述了JN5121的软件开发环境、射频部分的组成、连接和控制以及实物产品的开发.  相似文献   

9.
归纳逻辑程序设计(ILP)是机器学习的一个重要分支,给定一个样例集和相关背景知识,ILP研究如何构建与其相一致的逻辑程序,这些逻辑程序由有限一阶子句组成。文章描述了一种综合当前一些ILP方法多方面优势的算法ICCR,ICCR溶合了以FOIL为代表的自顶向下搜索策略和以GOLEM为代表的自底向上搜索策略,并能根据需要发明新谓词、学习递归逻辑程序,对比实验表明,对相同的样例及背景知识,ICCR比FOIL和GOLEM能学到精度更高的目标逻辑程序。  相似文献   

10.
一种动态VLIW调度机制的研究和实现   总被引:2,自引:0,他引:2       下载免费PDF全文
VLIW结构是开发ILP的一种重要手段,其优点是结构规整简单、硬件复杂度低。但是,完全依靠编译器进行指令调度的机制限制了VLIW结构性能的提高。本文提出了一种基于确定指令延迟的动态VLIW调度机制,该机制利用大部分指令执行时间确定的特点,根据运行时信息重新调度指令的执行顺序,以进一步开发ILP。在FPGA上的实验结果表明,该机制具有线性的硬件复杂度。  相似文献   

11.
Instruction-level parallel processing: History,overview, and perspective   总被引:11,自引:0,他引:11  
Instruction-level parallelism (ILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP had become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.  相似文献   

12.
指令级并行编译器的数据预取及优化方法   总被引:6,自引:0,他引:6  
微处理器芯片的处理能力越来越强,但是,存储器的速度却远远不能与其匹配,造成了整个系统的性能不理想,为解决这个总理2,编译器发展了局部性优化、数据预取等多种技术,文中将介绍一种用于ILP(Instruction lev-el Parallelism)优化编译器的数据预取技术以及一种利用寄存器堆减少主存访问次数、对程序进行 优化的方法,利用它们可以提高平均存储性能,对科学和工程计算的应用是相当有效的。  相似文献   

13.
Discovering and exploiting instruction level parallelism in code will be key to future increases in microprocessor performance. What technical challenges must compiler writers meet to better use ILP? Instruction level parallelism allows a sequence of instructions derived from a sequential program to be parallelized for execution on multiple pipelined functional units. If industry acceptance is a measure of importance, ILP has blossomed. It now profoundly influences the design of almost all leading edge microprocessors and their compilers. Yet the development of ILP is far from complete, as research continues to find better ways to use more hardware parallelism over a broader class of applications  相似文献   

14.
The microprocessor industry has responded to memory, power and ILP walls by turning to many-core processors, increasing parallelism as the primary method to improve processor performance. These processors are expected to consist of tens or even hundreds of cores. One of these future processors is the 48-core experimental processor Single-Chip Cloud Computer (SCC). The SCC was created by Intel Labs as a platform for many-core software research.  相似文献   

15.
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALUs. Previous superscalar designs implemented this interconnect using centralized structures that do not scale with increasing ILP demands. In search of scalability, recent microprocessor designs in industry and academia exhibit a trend toward distributed resources such as partitioned register files, banked caches, multiple independent compute pipelines, and even multiple program counters. Some of these partitioned microprocessor designs have begun to implement bypassing and operand transport using point-to-point interconnects. We call interconnects optimized for scalar data transport, whether centralized or distributed, scalar operand networks. Although these networks share many of the challenges of multiprocessor networks such as scalability and deadlock avoidance, they have many unique requirements, including ultra-low latency (a few cycles versus tens of cycles) and ultra-fast operation-operand matching. This work discusses the unique properties of scalar operand networks (SONs), examines alternative ways of implementing them, and introduces the AsTrO taxonomy to distinguish between them. It discusses the design of two alternative networks in the context of the Raw microprocessor, and presents timing, area, and energy statistics for a real implementation. The paper also presents a 5-tuple performance model for SONs and analyzes their performance sensitivity to network properties for ILP workloads.  相似文献   

16.
Dynamic and transparent binary translation   总被引:1,自引:0,他引:1  
High-frequency design and instruction-level parallelism (ILP) are important for high-performance microprocessor implementations. The Binary-translation Optimized Architecture (BOA), an implementation of the IBM PowerPC family, combines binary translation with dynamic optimization. The authors use these techniques to simplify the hardware by bridging a semantic gap between the PowerPC's reduced instruction set and even simpler hardware primitives. Processors like the Pentium Pro and Power4 have tried to achieve high frequency and ILP by implementing a cracking scheme in hardware: an instruction decoder in the pipeline generates multiple micro-operations that can then be scheduled out of order. BOA relies on an alternative software approach to decompose complex operations and to generate schedules, and thus offers significant advantages over purely static compilation approaches. This article explains BOA's translation strategy, detailing system issues and architecture implementation  相似文献   

17.
Rsim: simulating shared-memory multiprocessors with ILP processors   总被引:1,自引:0,他引:1  
The early 1990s saw several announcements of commercial shared-memory systems using processors that aggressively exploited instruction-level parallelism (ILP), including the MIPS R10000, Hewlett-Packard PA8000, and Intel Pentium Pro. These processors could potentially reduce memory read stalls by overlapping read latency with other operations, possibly changing the nature of performance bottlenecks in the system. The authors' experience with Rsim demonstrates that modeling ILP features is important even in shared-memory multiprocessor systems. In particular, current simple processor-based approximations cannot model significant performance effects for applications exhibiting parallel read misses. Further, recent shared-memory designs such as aggressive implementations of sequential consistency use the aggressive ILP-enhancing features of modern processors that simple processor-based simulators do not model. As microprocessor systems become more complex, the availability of shared infrastructure source code is likely to become increasingly crucial. The authors plan to release a new Rsim version shortly that will include instruction caches, TLBs, multimedia extensions, simultaneous multithreading, Rabbit fast simulation mode, and ports to Linux platforms  相似文献   

18.
Multicore architectures are becoming the main design paradigm for current and future processors. The main reason is that multicore designs provide an effective way of overcoming instruction-level parallelism (ILP) limitations by exploiting thread-level parallelism (TLP). In addition, it is a power and complexity-effective way of taking advantage of the huge number of transistors that can be integrated on a chip. On the other hand, today's higher than ever power densities have made temperature one of the main limitations of microprocessor evolution. Thermal management in multicore architectures is a fairly new area. Some works have addressed dynamic thermal management in bi/quad-core architectures. This work provides insight and explores different alternatives for thermal management in multicore architectures with 16 cores. Schemes employing both energy reduction and activity migration are explored and improvements for thread migration schemes are proposed.  相似文献   

19.
《Performance Evaluation》2006,63(9-10):939-955
Increasing diversity in telecommunication workloads leads to greater complexity in communication protocols. This occurs as channel bandwidth rapidly increases. These factors result in larger computational loads for network processors that are increasingly turning to high performance microprocessor designs. This paper presents an analytical method for estimating the performance of instruction level parallel (ILP) processors executing network protocol processing applications. Instruction dependency information extracted while executing an application is used to calculate upper and lower bounds for throughput, measured in instructions per cycle (IPC). Results using UDP/TCP/IP applications show that the simulated IPC values fall between the analytically derived upper and lower bounds, validating the model. The analytical method is much less expensive than cycle-accurate simulation, but reveals similar throughput performance predictions. This allows the architectural design space for network superscalar processors to be explored more rapidly and comprehensively, to reveal the maximum IPC that is possible for a given application workload and the available hardware resources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号