期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The QC-2 parallel Queue processor architecture

Ben A. Abderazek Arquimedes CanedoAuthor VitaeTsutomu YoshinagaAuthor Vitae Masahiro SowaAuthor Vitae 《Journal of Parallel and Distributed Computing》2008

Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)—an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor. 相似文献

2.

网络处理器高频指令对的组合设计与分析

陈红松季振洲胡铭曾季毅《小型微型计算机系统》2006,27(2):339-342

网络处理嚣是专门为网络处理而设计的处理嚣，其指令集是软硬件的界面，指令集的设计对性能有较大的影响．本文提出了一种针对高频率指令对-HFIP的组合优化方法，该方法充分利用了网络处理器基准程序里指令执行过程中的动态相关性，开发了simpIescalar模拟嚣的指令格式里未使用的空住作为新指令的扩展域．采用量化的方法对实验结果进行分析．模拟结果显示该方法合理有效，在提高网络处理器性能的同时有效降低指令cache的功耗．实现性能／功耗的权衡．相似文献

3.

The HP PA-8000 RISC CPU

Kumar A. 《Micro, IEEE》1997,17(2):27-32

The PA-8000 RISC CPU is the first of a new generation of Hewlett-Packard microprocessors. Designed for high-end systems, it is among the world's most powerful and fastest microprocessors. It features an aggressive, four-way, superscalar implementation, combining speculative execution with on-the-fly instruction reordering. The heart of the machine, the instruction reorder buffer, provides out-of-order execution capability. Our primary design objective for the PA-8000 was to attain industry-leading performance in a broad range of applications. In addition, we wanted to provide full support for 64-bit applications. To make the PA-8000 truly useful, we needed to ensure that the processor would not only achieve high benchmark performance but would sustain such performance in large, real-world applications. To achieve this goal, we designed large, external primary caches with the ability to hide memory latency in hardware. We also implemented dynamic instruction reordering in hardware to maximize instruction-level parallelism available to the execution units. The PA-8000 connects to a high-bandwidth Runway system bus, a 768-Mbyte/s split-transaction bus that allows each processor to generate multiple outstanding memory requests. The processor also provides glueless support for up to four-way multiprocessing via the Runway bus. The PA-8000 implements the new PA (Precision Architecture) 2.0, a binary-compatible extension of the previous PA-RISC architecture. All previous code executes on the PA-8000 without recompilation or translation 相似文献

4.

基于SLED的IA-64指令描述

杨欣赵荣彩齐宁《微计算机信息》2005,21(5):210-211

作为64住处理器架构的IA-64提供了更高的指令级并行性(ILP)，并代表了一种新型微处理器的发展方向，对IA-64二进制指令代码流的自动分析和变换．在基于机器和操作系统的的描述来实现IA-64二进制自动翻译和逆向工程中有重要的意义。本文概述了SLED与IA-64的指令特点，详细介绍了基于SLED对IA-64指令的描述和利用MLTK自动生成反向工具的设计与实现技术．并给出了自动生成反汇编的测试结果。相似文献

5.

IA-64解码器自动生成器的设计与实现

齐宁杨克峤苏铭赵荣彩《计算机工程与设计》2007,28(3):497-499,511

IA-64体系结构使用64位指令集,该指令集应用显式并行指令计算(EPIC)技术,可提供更高的指令级并行性(ILP),但同时也给IA-64二进制代码流的分析和变换带来了困难.介绍了一个IA-64解码器自动生成器的结构与实现,该生成器的输入为IA-64指令集的SLED描述,自动生成用于IA-64指令解码器的C代码.通过该生成器可有效减少解码器的开发时间,确保解码器的正确性,提高解码器的执行效率.实现的自动生成器可应用于IA-64二进制翻译及逆向工程中. 相似文献

6.

The 82460GX server/workstation chip set

Dahlen E. Gustin J. Meredith S. Moran O. 《Micro, IEEE》2000,20(6):69-75

Designing a chip set for a new processor architecture like the IA-64 requires handling multiple aspects. They include implementing the processor interface; providing sufficient I/O bandwidth for servers; supporting accelerated graphics port (AGP) graphics; and providing sufficient memory bandwidth for the processor, I/O, and graphics. This article provides an introduction to the memory, I/O, and graphics subsystems of Intel's Itanium processor chip set and discusses several aspects of the processor bus 相似文献

7.

LinuxPDA的SD／MMC卡驱动程序研究与设计

黄昊晶《广东电脑与电讯》2007,(3):40-45

本论文主要目的是通过对嵌入式Linux软件开发环境、配置方法和硬件驱动程序设计方法的研究,分析操作系统的内核和驱动程序的结构,参考三星公司的SD/MMC驱动和斯道开发板的开发说明书,总结出一套嵌入式Linux的MMC卡开发环境的构建方法,定制设计适合ARM9芯片的S3C2410处理器的Linux PDA的MMC和SD存储卡驱动程序。相似文献

8.

指令描述的自动检测技术 总被引：2，自引：0，他引：2

杨欣赵荣彩李崇《计算机工程与设计》2006,27(18):3344-3348,3352

通过使用高级说明语言描述指令集,自动生成指令编码和解码程序,使单调乏味而且非常容易出错的机器代码重定向工作自动化,并且通过反汇编测试平台对这项描述的正确性实现自动检测.这对于64位、具有更高的指令级并行性（ILP）的IA-64,在二进制指令代码流的自动分析和变换,基于机器和操作系统的描述来实现IA-64二进制自动翻译和逆向工程中有重要的意义.概述了对IA-64指令的SLED描述,详细阐述了利用NJMCT自动生成反向工具的设计与实现技术. 相似文献

9.

MIPS64指令集模拟器的建模与实现方法

下载免费PDF全文

蔡启先刘明余祖峰《计算机工程》2010,36(18):245-246

用软件编程的方法介绍一个与MIPS32/64指令集兼容的指令集模拟器的建模与实现过程。该方案用C++来描述处理器的硬件行为,通过在编译时选择不同的选项分别实现对MIPS32和MIPS64指令集构架的嵌入式处理器的模拟,实现除浮点数以外的所有指令的译码和执行。该方案的主要好处是代码可重用,指令扩展性能好,可以同时兼容MIPS32和MIPS64指令集的模拟。相似文献

10.

Automatic generation of compiler backends

Florian Brandner Viktor Pavlu Andreas Krall 《Software》2013,43(2):207-240

相似文献

11.

融合动态采样剖析的可重构指令集处理器

张惠臻王超《计算机科学》2013,40(3):31-35

可重构指令集处理器能够根据应用程序特点动态扩展其指令集,其硬件架构和软件工具的设计与传统设计有很大不同。在研究可重构指令集处理器软硬件特性的基础上,提出一种集成动态采样剖析硬件的可重构指令集处理器架构。该处理器具有3种不同的工作模式,它通过剖析硬件采样获取程序热点,利用配套工具链半自动地完成指令扩展生成、编译器重定向和可编程硬件逻辑配置,从而获得在不同嵌入式应用领域的硬件适应性和软件兼容性。针对性的实验结果表明,该处理器架构的采样剖析机制准确有效,并且在增加有限的硬件开销的情况下,能够很好地适应应用变化。相似文献

12.

基于xml技术的电网复杂设备信息统计系统及其应用

下载免费PDF全文

周育忠张自锋石嘉豪涂亮《计算机测量与控制》2019,27(3):245-249

针对传统电网复杂设备信息统计系统统计范围小、统计耗时长的问题,设计了一种新的电网复杂设备信息统计系统。该系统引入xml技术,优化设计了系统的硬件和软件部分,硬件部分重点设计了采集器、处理器、存储器和显示器,采集器选用LT500数据采集器,处理器为ARM10处理器,存储器选择了EMC存储器,在二极管显示器中显示统计结果。系统软件由电网复杂设备信息采集、设备信息统计处理、设备信息处理结果显示三部分组成。立足于实际,对该系统在电力网络中的适用性进行了分析。为检测系统工作效果,与传统系统进行实验对比,结果表明,设计的系统对电网复杂设备信息的统计范围高达99.26%,耗时很短,具有很高的应用价值。相似文献

13.

嵌入式系统软硬件协同验证中软件验证方法 总被引：1，自引：0，他引：1

王世好王歆民刘明业《计算机研究与发展》2005,42(3):514-519

随着集成电路及计算机技术的发展,嵌入式系统设计变得越来越复杂．复杂的嵌入式系统设计,通常采用验证的手段检验系统设计的正确性,硬件验证通常是在硬件设计描述的基础上建立用于模拟硬件功能的硬件模拟器;软件验证常用的方法是建立处理器功能模型(指令集模拟器ISS),逐条解释嵌入式软件在目标机器上的执行过程,产生模拟输出,驱动外围电路(即硬件设计)．指令集模拟器从底层时序关系模拟嵌入式软件在目标CPU上运行过程．对于复杂嵌入式系统设计,ISS模拟速度通常成为协同模拟瓶颈．基于RTOS的嵌入式软件快速验证方法可以有效地提高软件模拟速度,扩展RTOS功能,适应协同模拟需要,建立硬件模拟驱动,实现软件和硬件模拟器通信连接和协同模拟同步控制．基于RTOS的嵌入式软件验证方法以编译代码模型为基础,从系统行为级验证嵌入式软件功能,验证速度快．在实际应用中,该方法和ISS验证相结合,能够实现更有效、更快速的嵌入式系统协同验证．最后以几个典型硬件设计为基础,编写相应的控制软件,进行软硬件协同验证实验,实验结果数据说明该验证方法实用、有效、快速．相似文献

14.

A metaprogrammed C++ framework for hardware/software component integration and communication

《Journal of Systems Architecture》2014,60(10):816-827

With the ever growing complexity of System-on-Chip design, a considerable effort has been made to introduce higher levels of abstraction and to integrate high-level synthesis solutions to the design flow. In such design flows, a uniform communication interface is needed to enable high-level implementations of SoC components regardless of whether they are compiled as software running on a processor or synthesized to dedicated hardware IPs. This paper addresses this issue and proposes a component communication framework that defines an object-oriented remote call mechanism which allows transparent communication across hardware/software boundaries. The proposed framework relies on C++ static metaprogramming techniques to efficiently abstract communication between components implemented using high-level C++. We also define a portability layer that enables the migration of designs throughout different hardware platforms, operating systems, and tools. We assessed the performance and area footprint of our communication infrastructure through the implementation of a voice processing pipeline on top of a Network-on-Chip based architecture. Our results, when compared to previous related works with the same set of capabilities, show that our mechanisms yield small overhead in terms of software memory (up to 64% smaller), FPGA resources (up to 40% smaller), and hardware/software communication latency (up to 51% smaller). 相似文献

15.

Pixel processing in a memory controller

Donovan W. Sabella P. Kabir I. Hsieh M.M. 《Computer Graphics and Applications, IEEE》1995,15(1):51-61

The SX-a programmable pixel processor implemented in a workstation memory controller chip-aims to perform as well as low-end 2D and 3D graphics processors and to surpass low-end imaging accelerators. The following features help accomplish this goal: large internal register set; vectorized RISC-like instruction set; fast access to both main and video memory; fast pixel operations; free operations; unpolluted cache; and single-chip solution. We describe the workstation configuration we used for our tests and the SX processor architecture, followed by the SX instruction set and sample algorithms. Then we present SX performance results for a wide range of operations 相似文献

16.

控制与数据投机优化技术的研究 总被引：1，自引：0，他引：1

干戈连瑞琦张兆庆《计算机学报》2004,27(7):881-887

控制投机和数据投机是提高程序指令级并行度的有效方法．为了保证投机指令的正确执行，须解决两个问题，即延迟触发控制投机指令导致的异常和数据投机中的别名歧义．这需要硬件的支持才能做到，所以以前在这方面的研究大多是在模拟器上进行的，侧重于描述对模拟器结构的扩展．而IA-64是第一个同时支持这两种优化的体系结构．基于此，作者用一个统一的框架在IA-64开放源码研究编译器(ORC)中首次实现了控制与投机优化．该文以编译器为侧重点，介绍了投机优化中的几个核心问题及其解决方法，其中包括一种新的用来维护投机代码正确性的算法．实验结果表明这种方法是有效的．相似文献

17.

Evaluation and choice of various brånch predictors for low-power embedded processor

下载免费PDF全文

Fan?DongRui?Email author Yang?HongBo Gao?GuangRong Zhao?RongCai 《计算机科学技术学报》2003,18(6):833-838

Power is an important design constraint in embedded computing systems. To meet the power constraint, microarchitecture and hardware designed to achieve high performance need to be revisited, from both performance and power angles. This paper studies one of them: branch predictor. As well known, branch prediction is critical to exploit instruction level parallelism effectively, but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches. This paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realize low-power embedded processor. The sample processor studied is Godson-like processor, which is a dual-issue, out-of-order processor with deep pipeline, supporting MIPS instruction set. 相似文献

18.

Evaluation and Choice of Various Branch Predictors for Low-Power Embedded Processor 总被引：2，自引：0，他引：2

下载免费PDF全文

范东睿杨洪波高光荣赵荣彩《计算机科学技术学报》2003,18(6):0-0

Power is an important design constraint in embedded computing systems.To meet the power constraint,microarchitecture and hardware designed to achieve high performance need to be revisited,from both performance and power angles.This paper studies one of them:branch predictor.As well known,branch prediction is critical to exploit instruction level parallelism effectively,but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches.This paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realiz elow-power embedded processor.The sample processor studied is Godson-like processor,which is adual-issue,out-of-order processor with deep pipeline,supporting MIPS instruction set. 相似文献

19.

Software implementation of floating-point arithmetic on a reduced-instruction-set processor

Thomas Gross 《Journal of Parallel and Distributed Computing》1985,2(4):362-375

Current single chip implementations of reduced-instruction-set processors do not support hardware floating-point operations. Instead, floating-point operations have to be provided either by a coprocessor or by software. This paper discusses issues arising from a software implementation of floating-point arithmetic for the MIPS processor, an experimental VLSI architecture. Measurements indicate that an acceptable level of performance is achieved, but this approach is no substitute for a hardware accelerator if higher-precision results are required. This paper includes instruction profiles for the basic floating-point operations and evaluates the usefulness of some aspects of the instruction set. 相似文献

20.

Motorola's 88000 family architecture

Alsup M. 《Micro, IEEE》1990,10(3):48-66

The initial members of the 88000 family of high-performance 32-bit microprocessor are the 88100 processor and the 88200 cache and memory management unit (CMMU). The processor manipulates integer and floating-point data and initiates instruction and data memory transactions. The CMMU minimizes the latency of main memory requests by maintaining a cache for data transaction and a cache for memory management translations. A typical system consists of one processor and two identical cache chips, one servicing instruction fetch requests, the other servicing data read and write requests. The overall design process for the 88000 family is described, and the integer instructions are discussed. Decisions made with respect to the processor, cache, and software are examined. Some data on the use of the instruction set by the available compilers and the efficiency of the cache and memory systems are presented 相似文献