共查询到20条相似文献,搜索用时 109 毫秒
1.
2.
为了研究现代处理器微架构中的漏洞并制定对应防护,针对负责管理访存指令执行顺序的内存顺序缓冲(MOB)进行分析,发现前向加载会把存在依赖的store指令的数据直接旁路到load指令,推测加载会提前执行不存在依赖的load指令,在带来效率优化的同时,也可能导致执行出错与相应的阻塞。针对Intel Coffee Lake微架构上现有MOB优化机制,分析如何利用内存顺序缓冲的4种执行模式与对应执行时间,构造包括暂态攻击、隐蔽信道与还原密码算法私钥的多种攻击。利用MOB引发的时间差还原内存指令地址,该地址可泄露AES T表实现的索引值。在Intel i5-9400处理器上对OpenSSL3.0.0的AES-128进行了密钥还原实验,实验结果显示,30 000组样本能以63.6%概率还原出一个密钥字节,且由于内存顺序缓冲的特性,该利用隐蔽性优于传统cache时间泄露。 相似文献
3.
4.
设计并实现了一种以单片机AT89C51和双音多频解码集成电路MT8870为核心,基于PSTN(公共交换电话网)的家用电器远程多路智能控制器。此设备具有振铃检测、密码校验和指令译码及执行等多项功能,提供多路开头控制接口和数字量控制接口,可以根据用户远程指令来执行相应控制。经过实际测试,该设备能正常工作,实现了设计中的所有功能,具有一定的实用价值。 相似文献
5.
64位MIPS指令处理器的流水线设计 总被引:2,自引:1,他引:1
介绍了一种采用64位MIPS指令集CPU的流水线设计。作为SOC的核心,CPU的性能主要取决于指令的执行效率,而采用流水线方式大大增加了指令的执行速度,提高了CPU的性能。该CPU使用五级流水线设计,文中对影响流水线正常执行的各种因素进行了分析,以及在实际设计中采用相应的控制机制,从而完成对一个具有较高性能的CPU核的流水线控制的设计。 相似文献
6.
对Montgomery算法进行了改进,提供了一种适合智能卡应用、以RISC微处理器形式实现的RSA密码协处理器。该器件的核心部分采用了两个32位乘法器的并行流水结构,其功能部件是并发操作的,指令执行亦采用了流水线的形式。在10MHz的时钟频率下,加密1024位明文平均仅需3ms,解密平均需177ms。 相似文献
7.
8.
32位浮点嵌入式MCU设计研究 总被引:1,自引:2,他引:1
本文介绍了一个基于RISC体系结构的32位浮点嵌入式MCU的设计实现。该:MCU内含128kbit的SRAM、采用哈佛结构、四级指令流水线、32位指令字长和内部43位数据字长。MCU内部设置多个快速寄存器及采用硬连线逻辑代替微程序控制的方法,加快了微处理器的速度,提高了指令执行效率。设计中还采用对寄存器同步写、异步读的方式避免了数据相关问题。 相似文献
9.
提出了一种基于PIM并行计算机体系结构的一维多媒体处理SIMD阵列,实现了基于该阵列的控制器,给出了该控制器的主要部件和指令格式,介绍了PE阵列控制;最后,给出了基于该体系结构的指令执行仿真结果. 相似文献
10.
在编写、调试系统软件的过程中,往往会出现这样的情况:编程时只注重每一条指令的执行结果而忽略指令执行后相应标志位状态的变化;或是MCU系统投入运行后,由于外界干扰使MCU内核三总线上的地址信号错乱而导致程序运行的失控,这些都将导致程序跑飞,而更有甚者将程序引入死循环使得整个系统完全瘫痪。如何拦截失去控制的程序流 相似文献
11.
《电子学报:英文版》2017,(6):1154-1160
Multi-media applications contain multibranches loop, it is of low efficiency to map them into traditional Single instruction multiple data (SIMD) structures. Considering the above matter, we proposed a multiinstruction streams extension method for traditional SIMD structures. The main idea is to simultaneously dispatch multiple instruction streams to multiple lanes. Compared with traditional SIMD whose lanes receive the unified single instruction stream but execute conditionally through a lane mask vector, Multi-instruction streams extension grants each of its lanes the ability to receive and execute the instructions of one particular branch path. Thus, it is of high efficiency to map multi-branches loop in applications. The design is finally implemented through Verilog language, and then integrated into the FT-Matrix vector-SIMD chip. Application profiling results shows that the proposed method consumes mere 2.61% area overhead while obtains about 1.8x to 2.4x performance gain. 相似文献
12.
13.
A Biased Random Instruction Generation Environment for Architectural Verification of Pipelined Processors 总被引:2,自引:0,他引:2
Ta-Chung Chang Vikram Iyengar Elizabeth M. Rudnick 《Journal of Electronic Testing》2000,16(1-2):13-27
Architectural verification is a critical aspect of the microprocessor design cycle. In this paper, we present a design verification environment centered around a biased random instruction generator for simulation-based architectural verification of pipelined microprocessors. The instruction generator uses biases specified by the user to generate instruction sequences for simulation. These biases are not hard-coded and can thus be changed depending on the specific areas in the design and type of design errors being targeted. Correctness checking is achieved using assertion checking and end-of-state comparison with a high-level architectural model. Several architectural-level errors are introduced into a behavioral model of the DLX processor to investigate the processor's response in the presence of design errors. Simulation experiments conducted using the behavioral model show that biased random instruction sequences provide higher coverage of RTL conditional branches and design errors than random instruction sequences or manually-generated test programs. Furthermore, instruction sequences containing a high percentage of read-after-write (RAW) and control dependencies are the most useful. 相似文献
14.
Roy S. Ranganathan N. Katkoori S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(11):1640-1649
Power gating is a technique commonly used for leakage reduction in integrated circuits. In microprocessors, power gating is implemented by using sleep transistors to selectively deactivate circuit modules that remain idle for sustained periods of time during program execution. In this work, we develop a new framework for power gating the functional units in embedded system microprocessors without degradation in performance. The proposed framework includes an efficient algorithm for idle time estimation, appropriate insertion of sleep instructions within the code, and a method for reactivating the sleeping units only when needed without the use of wakeup instructions. We introduce the notion of loop hierarchy trees (LHTs) to represent the partial ordering of the nested loops within the program. From the control flow graph (CFG) representation of the source program, a forest of LHTs is constructed and is used to identify the maximal subgraphs representing the long idle periods for the functional units. For each subgraph thus identified, a sleep instruction is introduced in the program with a list of corresponding functional units to be deactivated. When an instruction is decoded, the functional units needed for that instruction are automatically activated by the control unit such that the units are ready before the instruction reaches the execute stage. This eliminates the need for wakeup instructions to be inserted into the object code reducing the overheads. In our implementation, the ARM processor architecture was modified and resynthesized to include power gating by developing a CMOS cell library of functional units with the above capabilities. Experimental results are reported for a set of 12 benchmarks chosen from the MiBench suite, which indicate that, on average, our technique reduces the leakage energy in functional units by 31.1% for integer benchmarks and 26.8% for floating-point benchmarks. 相似文献
15.
Dobberpuhl D.W. Witek R.T. Allmon R. Anglin R. Bertucci D. Britton S. Chao L. Conrad R.A. Dever D.E. Gieseke B. Hassoun S.M.N. Hoeppner G.W. Kuchler K. Ladd M. Leary B.M. Madden L. McLellan E.J. Meyer D.R. Montanaro J. Priore D.A. Rajagopalan V. Samudrala S. Santhanam S. 《Solid-State Circuits, IEEE Journal of》1992,27(11):1555-1567
A 400-MIPS/200-MFLOPS (peak) custom 64-b VLSI CPU is described. The chip is fabricated in a 0.75-μm CMOS technology utilizing three levels of metalization and optimized for 3.3-V operation. The die size is 16.8 mm×13.9 mm and contains 1.68 M transistors. The chip includes separate 8-kbyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types. It is designed to execute two instructions per cycle among scoreboarded integer, floating-point, address, and branch execution units. Power dissipation is 30 W at 200-MHz operation 相似文献
16.
17.
18.
本文针对常见启发式算法中忽略指令与指令实例区别的问题,改进了一个已有启发式算法GreedyHeur:根据指令实例的启发式函数值得出相应指令的权值,并根据指令的优先级关系以贪心策略进行指令实例选择.针对启发式算法无法找到最优解的问题,本文引入基于群体搜索的差分进化算法,并结合贪心策略,提出了ISDE(Instruction Selection Based on Differential Evolution)算法.ISDE算法通过简单的编码和高效的适应度评价机制,快速地迭代搜索最优指令组合.实验结果表明,GreedyHeur和ISDE算法能快速有效地找到比已有启发式算法更优的候选指令组合. 相似文献
19.
Arquimedes Canedo Ben A. Abderazek Masahiro Sowa 《Journal of Signal Processing Systems》2010,59(1):45-55
Embedded systems are characterized by the requirement of demanding small memory footprint code. A popular architectural modification
to improve code density in RISC embedded processors is to use a reduced bit-width instruction set. This approach reduces the
length of the instructions to improve code size. However, having less addressable registers by the reduced instructions, these
architectures suffer a slight performance degradation as more reduced instructions are required to execute a given task. On
the other hand, 0-operand computers such as stack and queue machines implicitly access their source and destination operands
making instructions naturally short. Queue machines offer a highly parallel computation model, unlike the stack model. This
paper proposes a novel alternative for reducing code size by using a queue-based reduced instruction set while retaining the
high parallelism characteristics in programs. We introduce an efficient code generation algorithm to generate programs for
our reduced instruction set. Our algorithm successfully constrains the code to the reduced instruction set with the addition
of only 4% extra code, in average. We show that our proposed technique is able to generate about 16% more compact code than
MIPS16, 26% over ARM/Thumb, and 50% over MIPS32 code. Furthermore, we show that our compiler is able to extract about the
same parallelism than fully optimized RISC code. 相似文献
20.
Ji-Hong Yan Cheng Wu 《IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews》2001,31(3):361-365
This paper presents an optimization scheduling approach for concurrent design projects, in which activities may be executed in more than one operating mode and renewable as well as nonrenewable resources exist. Research on the development of a scheduling approach for concurrent scheduling is expected to shorten development lead time, minimize cost, and eliminate unnecessary redesign periods. In this paper, an integrated criterion function is proposed to ensure optimal concurrent scheduling and effective utilization of resources along with fluent delivery of information. In the criterion function, some key factors such as time order, resources, lead time and overlapping time of activities, which can make concurrent activities execute successfully, are taken into account adequately. Besides, two cruxes in concurrent engineering-role allocation, prerelease, and feedback revision process are discussed in detail. The example is part of a certain product development process, and the scheduling results demonstrate that the proposed algorithm is feasible 相似文献