首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
多核已经成为通用处理器设计技术的最重要发展方向。由于多核芯片内具有多个处理蠡核,芯片的缓存结构、线程调度等与传统CPU有很大的区别,本文探讨了多核芯片的基本结构特征,并基于指令集级系统仿真工具Simics建立了多核CPU模拟环境用于进行分析。  相似文献   

2.
边缘计算安全的资源受限特征及各种新型密码技术的应用,对多核密码处理器的高能效、异构性提出需求,但当前尚缺乏相关的异构多核能效模型研究.本文基于扩展Amdahl定律,引入密码串并特征、异构多核结构、数据准备时间、动态电压频率调节等因素,将核划分空闲、活跃状态,建立异构多核密码处理器的能效模型. MATLAB仿真结果表明,数据准备时间占比小于10%时,对能效的负面影响大幅下降;固定电压,频率缩放会影响能效值大小;处理器核空闲/活跃能耗比例越小,能效值越大.架构上,固定异构核,同构核数量与密码任务最大并行度相等时能效值最大,最佳异构核数可由模型变化参数仿真得到;多任务调度执行上,流水与并发执行有利于能效值的进一步提升.多核密码处理器芯片板级测试结果表明,仿真结果与实测数据相关系数接近1,芯片实测的数据准备时间、电压频率缩放等因素的影响与仿真分析基本一致,验证了所提能效模型的有效性.该文重点从影响能效变化趋势因素上,为多核密码处理器异构、高能效设计提供一定的理论分析基础与建议.  相似文献   

3.
文中对多传感器视觉信息处理算法进行分析,根据可重构处理器的并行计算参数模型提出了一种并行计算仿真的方法。多核处理器环境中,每个线程在独立的核上运行,线程间具有并发性。利用并发的线程模拟可重构阵列单元(PE)的运算方式,调用OpenMP设置多个线程并行执行,在多核计算机平台上模拟可重构处理器的计算过程。利用此方法能在没有具体的PE连接方案前,通过使用计算核模拟PE单元,将算法映射到多核处理器环境中。通过分析算法在多核计算机上的并发执行效率,来优化视觉信息算法在可重构阵列上的映射方案。  相似文献   

4.
本文根据硬件线程的特征,为硬件线程调度建立了一个周期与非周期混合线程集的调度模型.在数学层面描述了硬件多线程调度中每个线程被成功调度的条件判据.并在此基础之上,提出一种以截止时间与最坏执行时间差为基本因子的DR-EDF算法,提供了一种实现这种DR-EDF算法的硬件多线程控制器的设计原理.最后用FPGA为载体,实现了一款硬件多线程处理器,通过实际测试的分析结果,得出这种面向硬件多线程的实时调度算法在不影响线程集错失率前提下,提高了嵌入式系统中紧急任务的可调度性.  相似文献   

5.
《今日电子》2004,(11):108-108
ZSP540核内嵌4个MAC和6个ALU,每个时钟周期最高可处理5条指令,可以运行为其他ZSP核写的已有软件,配备Z.Turbo加速器,允许SoC设计工程师通过指令集扩展或者辅助协处理器嵌入实现处理器加速的性能。  相似文献   

6.
唐遇星  邓鹍  周兴铭 《电子学报》2005,33(11):1946-1951
对指令集进行扩展和添加新功能部件是提高处理器性能的有效途径.为了充分利用新的体系结构扩展,已有应用必需经过全新的优化编译.对于跨体系结构优化而言,二进制翻译已经被证明是一种行之有效的技术.本文结合trace技术和动态二进制翻译优化技术,提出一种多级动态优化框架结构,无需静态重新优化编译,在程序动态运行期间,引入多级动态优化方法和扩展指令调度.模拟结果显示该结构具有能有效形成大尺寸的指令调度窗口,准确选择热点代码及优化方法,有效提升旧有应用性能的优点,并有实现灵活,可扩展好等特点.  相似文献   

7.
计算资源与寄存器资源分配是可重构处理器自动并行映射的重要问题,该文针对可重构分组密码指令集处理器的资源分配问题,建立算子调度参数模型和处理器资源参数模型,研究了分组密码并行调度与资源消耗之间的约束关系;在此基础上提出基于贪婪思维、列表调度和线性扫描的自动映射算法,实现了分组密码在可重构分组密码指令集处理器上的自动映射。通过可用资源变化实验验证算法并行映射的有效性,并对AES-128算法的映射效果做了横向对比验证算法的先进性,所提自动映射算法对分组密码在可重构处理中的并行计算研究有一定的指导意义。  相似文献   

8.
计算资源与寄存器资源分配是可重构处理器自动并行映射的重要问题,该文针对可重构分组密码指令集处理器的资源分配问题,建立算子调度参数模型和处理器资源参数模型,研究了分组密码并行调度与资源消耗之间的约束关系;在此基础上提出基于贪婪思维、列表调度和线性扫描的自动映射算法,实现了分组密码在可重构分组密码指令集处理器上的自动映射.通过可用资源变化实验验证算法并行映射的有效性,并对AES-128算法的映射效果做了横向对比验证算法的先进性,所提自动映射算法对分组密码在可重构处理中的并行计算研究有一定的指导意义.  相似文献   

9.
可重定位的编译器对特定应用的指令集处理器ASIP(ApplicationSpecificInstructionProcessor)的设计至关重要。文章利用开源的ORC(OpenResearchCompiler)编译器框架,以提出的一种ASIP处理器的结构模型为目标,进行了其可重定位的编译器的设计。并在指令调度和寄存器分配阶段针对这种ASIP处理器的结构做了优化。实验结果表明,编译器具有很好的可重定位性,指令调度和寄存器分配的优化也获得了较好的效果。  相似文献   

10.
任务调度问题是研究异构多核处理器中最为重要的问题之一,一个好的调度算法可以充分发挥系统性能,提高系统效率。针对遗传算法的缺陷,文章提出了一种改进的遗传算法来解决异构多核处理器任务调度问题,在算法的初始化种群产生时将Sufferage算法和随机生成方法相结合,在采用随机方法生成个体时使用Hamming距离来控制个体之间的差异,从而在提高初始种群质量的同时又保证了种群的多样性。结果表明改进后的遗传算法提高了初始种群质量,提高算法的寻优起点,具有较好的调度性。  相似文献   

11.
并行多线程处理机体系结构分析   总被引:1,自引:0,他引:1  
并行多线程体系结构处理机由多个逻辑处理机构成,大量的流水线控制部件由所有的逻辑处理机所共享。在每个周期,处理机从多个线程取出多条指令调度执行。另外一个特点,它同时支持指令级和线程级的并行操作。本文分析了PMA工作原理。并给出一个处理机模型。  相似文献   

12.
A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads [simultaneously multi-threaded (SMT)]. Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and the scheduling window and leads to a distributed architecture model, where independent thread processing units, arithmetic logic units, registers files and memories are distributed across the chip and communicate with each other by special network. A special communication protocol replaces broadcasting and associative compare of destination tags in a centralised instruction scheduler with explicit operand transfer instructions, thus decentralizing the control of the data flow to the greatest extent. As a result, the processor cycle time does neither depend on the issue bandwidth of a single thread nor on the execution bandwidth of the SMT processor. This makes the performance of the architecture scalable with both the number of function and the number of thread units without having any impact on the processors cycle-time. Performance and scalability of the proposed microarchitecture is demonstrated with critical signal processing kernels from the MPEG-4 video coding standard on a cycle-true simulator.
Tim NiggemeierEmail:
  相似文献   

13.
提出一种可实现的用于多线程Java的处理器结构,所设计的结构由多种特别应用自理单元组成,每种单元单程执行一次。  相似文献   

14.
管茂林  何义  杨乾明  张春元  伍楠 《电子学报》2012,40(7):1379-1385
针对流体系结构中VLIW代码体积对指令存储器的容量和功耗带来的问题,本文通过分析流处理器的指令特征,提出了一种新的VLIW分域压缩技术.在此基础上,本文为流体系结构设计了分布式的片上指令存储器,并提出了SIMD流水的执行模式.实验结果证明,该技术减少了38%的片外指令访存,降低约65%的片上指令存储器空间需求;分布式指令存储器减少了约37%的片上指令存储器面积,使得MASA的系统面积降低了8.92%,并降低了61%的指令存储器功耗.  相似文献   

15.
This paper describes a new architecture for embedded reconfigurable computing, based on a very-long instruction word (VLIW) processor enhanced with an additional run-time configurable datapath. The reconfigurable unit is tightly coupled with the processor, featuring an application-specific instruction-set extension. Mapping computation intensive algorithmic portions on the reconfigurable unit allows a more efficient elaboration, thus leading to an improvement in both timing performance and power consumption. A test chip has been implemented in a standard 0.18-/spl mu/m CMOS technology. The test of a signal processing algorithmic benchmark showed speedups ranging from 4.3/spl times/ to 13.5/spl times/ and energy consumption reduced up to 92%.  相似文献   

16.
A novel processor with micro-pipelined architecture is proposed for latch-type Josephson logic devices. The processor is segmented into several operating stages activated by a multi-phase power system. Independent register groups are allocated to each stage in order to support pipeline processing of several instruction streams. This architecture allows building of a fine pipeline pitch processor which is capable of MIMD processing. A 12-bit micro-pipelined Josephson processor, containing an ALU, a multiplier and 16 registers, is described. Driven by a 3-phase AC power system, it is able to process 4 instruction streams simultaneously. A pipeline pitch of 3.3 GHz is expected using conventional Josephson device technology. A 4-bit processor design for 12-bit data length is also discussed  相似文献   

17.
岳梦云  白冰 《电子学报》2000,48(10):2041-2046
本文设计了一种适用于电机矢量控制算法的数字信号处理系统的微架构定义,包括其指令集定义、存储器模型以及与主CPU的交互模式.该设计具有通过固定部分多操作数有效缩减指令编码长度提高代码密度以及后台执行多周期指令提高ALU并行效率的显著优点.文中给出了典型的FOC控制算法在DSP (Digital Signal Processor)指令集上实现的指令周期数,也给出了对应架构的电路实现情况,最终以ARM CORTEX-M0及几款主流DSP作为比较基线,通过实测实验数据证明了体系结构的高能效比,以较为有限的电路面积代价,极大提高了集成DSP的嵌入式系统的运行效率.  相似文献   

18.
A scalable processor architecture for multi-threaded JavaTM applications is presented. The proposed architecture consists of multiple application-specific processing elements, each able to execute a single thread at one time. The architecture is evaluated by implementing a portable and scalable Java machine on an FPGA board for demonstration  相似文献   

19.
Fault tolerance in microprocessor systems has become a popular topic of architecture research. Much work has been done at different levels to accomplish reliability against soft errors, and some fault tolerance architectures have been proposed. But little attention is paid to the thread level superscalar fault tolerance. This letter introduces microthread concept into superscalar processor fault tolerance domain, and puts forward a novel fault tolerance architecture, namely, MicroThread Based (MTB) coarse grained transient fault tolerance superscalar processor architecture, then discusses some detailed implementations.  相似文献   

20.
In this paper, the architecture of an embedded processor extended with a tightly-coupled coarse-grain reconfigurable functional unit (RFU) is proposed. The efficient integration of the RFU with the control unit and the datapath of the processor eliminate the communication overhead between them. To speed up execution, the RFU exploits instruction level parallelism (ILP) and spatial computation. Also, the proposed integration of the RFU efficiently exploits the pipeline structure of the processor, leading to further performance improvements. Furthermore, a development framework for the introduced architecture is presented. The framework is fully automated, hiding all reconfigurable hardware related issues from the user. The hardware model of the architecture was synthesized in a 0.13?µm process and all information regarding area and delay were estimated and presented. A set of benchmarks is used to evaluate the architecture and the development framework. Experimental results prove performance improvements in addition to potential energy reduction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号