共查询到20条相似文献,搜索用时 93 毫秒
1.
多核已经成为通用处理器设计技术的最重要发展方向。由于多核芯片内具有多个处理蠡核,芯片的缓存结构、线程调度等与传统CPU有很大的区别,本文探讨了多核芯片的基本结构特征,并基于指令集级系统仿真工具Simics建立了多核CPU模拟环境用于进行分析。 相似文献
2.
边缘计算安全的资源受限特征及各种新型密码技术的应用,对多核密码处理器的高能效、异构性提出需求,但当前尚缺乏相关的异构多核能效模型研究.本文基于扩展Amdahl定律,引入密码串并特征、异构多核结构、数据准备时间、动态电压频率调节等因素,将核划分空闲、活跃状态,建立异构多核密码处理器的能效模型. MATLAB仿真结果表明,数据准备时间占比小于10%时,对能效的负面影响大幅下降;固定电压,频率缩放会影响能效值大小;处理器核空闲/活跃能耗比例越小,能效值越大.架构上,固定异构核,同构核数量与密码任务最大并行度相等时能效值最大,最佳异构核数可由模型变化参数仿真得到;多任务调度执行上,流水与并发执行有利于能效值的进一步提升.多核密码处理器芯片板级测试结果表明,仿真结果与实测数据相关系数接近1,芯片实测的数据准备时间、电压频率缩放等因素的影响与仿真分析基本一致,验证了所提能效模型的有效性.该文重点从影响能效变化趋势因素上,为多核密码处理器异构、高能效设计提供一定的理论分析基础与建议. 相似文献
3.
4.
本文根据硬件线程的特征,为硬件线程调度建立了一个周期与非周期混合线程集的调度模型.在数学层面描述了硬件多线程调度中每个线程被成功调度的条件判据.并在此基础之上,提出一种以截止时间与最坏执行时间差为基本因子的DR-EDF算法,提供了一种实现这种DR-EDF算法的硬件多线程控制器的设计原理.最后用FPGA为载体,实现了一款硬件多线程处理器,通过实际测试的分析结果,得出这种面向硬件多线程的实时调度算法在不影响线程集错失率前提下,提高了嵌入式系统中紧急任务的可调度性. 相似文献
5.
6.
对指令集进行扩展和添加新功能部件是提高处理器性能的有效途径.为了充分利用新的体系结构扩展,已有应用必需经过全新的优化编译.对于跨体系结构优化而言,二进制翻译已经被证明是一种行之有效的技术.本文结合trace技术和动态二进制翻译优化技术,提出一种多级动态优化框架结构,无需静态重新优化编译,在程序动态运行期间,引入多级动态优化方法和扩展指令调度.模拟结果显示该结构具有能有效形成大尺寸的指令调度窗口,准确选择热点代码及优化方法,有效提升旧有应用性能的优点,并有实现灵活,可扩展好等特点. 相似文献
7.
计算资源与寄存器资源分配是可重构处理器自动并行映射的重要问题,该文针对可重构分组密码指令集处理器的资源分配问题,建立算子调度参数模型和处理器资源参数模型,研究了分组密码并行调度与资源消耗之间的约束关系;在此基础上提出基于贪婪思维、列表调度和线性扫描的自动映射算法,实现了分组密码在可重构分组密码指令集处理器上的自动映射。通过可用资源变化实验验证算法并行映射的有效性,并对AES-128算法的映射效果做了横向对比验证算法的先进性,所提自动映射算法对分组密码在可重构处理中的并行计算研究有一定的指导意义。 相似文献
8.
计算资源与寄存器资源分配是可重构处理器自动并行映射的重要问题,该文针对可重构分组密码指令集处理器的资源分配问题,建立算子调度参数模型和处理器资源参数模型,研究了分组密码并行调度与资源消耗之间的约束关系;在此基础上提出基于贪婪思维、列表调度和线性扫描的自动映射算法,实现了分组密码在可重构分组密码指令集处理器上的自动映射.通过可用资源变化实验验证算法并行映射的有效性,并对AES-128算法的映射效果做了横向对比验证算法的先进性,所提自动映射算法对分组密码在可重构处理中的并行计算研究有一定的指导意义. 相似文献
9.
可重定位的编译器对特定应用的指令集处理器ASIP(ApplicationSpecificInstructionProcessor)的设计至关重要。文章利用开源的ORC(OpenResearchCompiler)编译器框架,以提出的一种ASIP处理器的结构模型为目标,进行了其可重定位的编译器的设计。并在指令调度和寄存器分配阶段针对这种ASIP处理器的结构做了优化。实验结果表明,编译器具有很好的可重定位性,指令调度和寄存器分配的优化也获得了较好的效果。 相似文献
10.
11.
并行多线程处理机体系结构分析 总被引:1,自引:0,他引:1
赵庆敏 《微电子学与计算机》2005,22(5):185-187
并行多线程体系结构处理机由多个逻辑处理机构成,大量的流水线控制部件由所有的逻辑处理机所共享。在每个周期,处理机从多个线程取出多条指令调度执行。另外一个特点,它同时支持指令级和线程级的并行操作。本文分析了PMA工作原理。并给出一个处理机模型。 相似文献
12.
Mladen Berekovic Mladen Berekovic Tim Niggemeier 《Journal of Signal Processing Systems》2008,50(2):201-229
A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal
processing applications by combining high frequency design techniques with a very high degree of parallel processing on a
chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme that was extended to eliminate
all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads
[simultaneously multi-threaded (SMT)]. Consequent application of fine clustering reduces the cycle-time for wire-sensitive
building blocks of the processor like the register file and the scheduling window and leads to a distributed architecture
model, where independent thread processing units, arithmetic logic units, registers files and memories are distributed across
the chip and communicate with each other by special network. A special communication protocol replaces broadcasting and associative
compare of destination tags in a centralised instruction scheduler with explicit operand transfer instructions, thus decentralizing
the control of the data flow to the greatest extent. As a result, the processor cycle time does neither depend on the issue
bandwidth of a single thread nor on the execution bandwidth of the SMT processor. This makes the performance of the architecture
scalable with both the number of function and the number of thread units without having any impact on the processors cycle-time.
Performance and scalability of the proposed microarchitecture is demonstrated with critical signal processing kernels from
the MPEG-4 video coding standard on a cycle-true simulator.
相似文献
Tim NiggemeierEmail: |
13.
提出一种可实现的用于多线程Java的处理器结构,所设计的结构由多种特别应用自理单元组成,每种单元单程执行一次。 相似文献
14.
15.
Lodi A. Toma M. Campi F. Cappelli A. Canegallo R. Guerrieri R. 《Solid-State Circuits, IEEE Journal of》2003,38(11):1876-1886
This paper describes a new architecture for embedded reconfigurable computing, based on a very-long instruction word (VLIW) processor enhanced with an additional run-time configurable datapath. The reconfigurable unit is tightly coupled with the processor, featuring an application-specific instruction-set extension. Mapping computation intensive algorithmic portions on the reconfigurable unit allows a more efficient elaboration, thus leading to an improvement in both timing performance and power consumption. A test chip has been implemented in a standard 0.18-/spl mu/m CMOS technology. The test of a signal processing algorithmic benchmark showed speedups ranging from 4.3/spl times/ to 13.5/spl times/ and energy consumption reduced up to 92%. 相似文献
16.
Harada Y. Hioe W. Takagi K. Kawabe U. 《Applied Superconductivity, IEEE Transactions on》1994,4(2):97-106
A novel processor with micro-pipelined architecture is proposed for latch-type Josephson logic devices. The processor is segmented into several operating stages activated by a multi-phase power system. Independent register groups are allocated to each stage in order to support pipeline processing of several instruction streams. This architecture allows building of a fine pipeline pitch processor which is capable of MIMD processing. A 12-bit micro-pipelined Josephson processor, containing an ALU, a multiplier and 16 registers, is described. Driven by a 3-phase AC power system, it is able to process 4 instruction streams simultaneously. A pipeline pitch of 3.3 GHz is expected using conventional Josephson device technology. A 4-bit processor design for 12-bit data length is also discussed 相似文献
17.
本文设计了一种适用于电机矢量控制算法的数字信号处理系统的微架构定义,包括其指令集定义、存储器模型以及与主CPU的交互模式.该设计具有通过固定部分多操作数有效缩减指令编码长度提高代码密度以及后台执行多周期指令提高ALU并行效率的显著优点.文中给出了典型的FOC控制算法在DSP (Digital Signal Processor)指令集上实现的指令周期数,也给出了对应架构的电路实现情况,最终以ARM CORTEX-M0及几款主流DSP作为比较基线,通过实测实验数据证明了体系结构的高能效比,以较为有限的电路面积代价,极大提高了集成DSP的嵌入式系统的运行效率. 相似文献
18.
A scalable processor architecture for multi-threaded JavaTM applications is presented. The proposed architecture consists of multiple application-specific processing elements, each able to execute a single thread at one time. The architecture is evaluated by implementing a portable and scalable Java machine on an FPGA board for demonstration 相似文献
19.
MICROTHREAD BASED (MTB) COARSE GRAINED FAULT TOLERANCE SUPERSCALAR PROCESSOR ARCHITECTURE 总被引:1,自引:0,他引:1
Fu Zhongchuan Chen Hongsong Cui Gang 《电子科学学刊(英文版)》2006,23(3):461-466
Fault tolerance in microprocessor systems has become a popular topic of architecture research. Much work has been done at different levels to accomplish reliability against soft errors, and some fault tolerance architectures have been proposed. But little attention is paid to the thread level superscalar fault tolerance. This letter introduces microthread concept into superscalar processor fault tolerance domain, and puts forward a novel fault tolerance architecture, namely, MicroThread Based (MTB) coarse grained transient fault tolerance superscalar processor architecture, then discusses some detailed implementations. 相似文献
20.
N. Vassiliadis N. Kavvadias G. Theodoridis S. Nikolaidis 《International Journal of Electronics》2013,100(6):421-438
In this paper, the architecture of an embedded processor extended with a tightly-coupled coarse-grain reconfigurable functional unit (RFU) is proposed. The efficient integration of the RFU with the control unit and the datapath of the processor eliminate the communication overhead between them. To speed up execution, the RFU exploits instruction level parallelism (ILP) and spatial computation. Also, the proposed integration of the RFU efficiently exploits the pipeline structure of the processor, leading to further performance improvements. Furthermore, a development framework for the introduced architecture is presented. The framework is fully automated, hiding all reconfigurable hardware related issues from the user. The hardware model of the architecture was synthesized in a 0.13?µm process and all information regarding area and delay were estimated and presented. A set of benchmarks is used to evaluate the architecture and the development framework. Experimental results prove performance improvements in addition to potential energy reduction. 相似文献