首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
谓词执行能使分片式处理器充分利用众多的执行单元,开发指令级并行性.但因此形成的超块也使得分支误预测代价增大,所以提高分支预测器的性能至关重要.本文提出一种基于剖析信息决策的谓词执行技术,该技术利用剖析信息对谓词执行前后的执行周期进行估算,从而对分支的谓词执行进行决策.该技术使分支预测器的命中率提高了0.68%~3.50%,使系统性能提高了1.67%~8.33%.同时,利用select指令表示谓词化指令也消除了重命名阶段寄存器多定义问题.  相似文献   

2.
Chen  Peter C. Y.  Wonham  W. M. 《Real-Time Systems》2002,23(3):183-208
In this article, a method for scheduling a processor for non-preemptive execution of periodic tasks is presented. This method is based on the formal framework of supervisory control of timed discrete-event systems. It is shown that, with this method, the problem of determining schedulability and the problem of finding a scheduling algorithm are dual since a solution to the former necessarily implies a solution to the latter and vice versa. Furthermore, the solution to the latter thus obtained is complete in the sense that it contains all safe sequences of task execution with the guarantee that no deadline is missed. Examples are described to illustrate this method. Implication of the results and computational complexity associated with this method are discussed.  相似文献   

3.
Worst Case Execution Time Analysis for a Processor with Branch Prediction   总被引:4,自引:0,他引:4  
Colin  Antoine  Puaut  Isabelle 《Real-Time Systems》2000,18(2-3):249-274
The fundamental requirement for hard real-time systems is that task deadlines be never missed. As a consequence, knowing tasks worst case execution times (WCET) is crucial for such systems. Taking into account modern architectural features makes it possible to determine tighter WCET bounds than with program analysis that ignores such features. While effects of caches and pipelines on WCET analysis have been extensively studied, to our knowledge the effect of the branch prediction on WCET evaluation has not been studied yet. This paper describes a method for statically bounding the number of timing penalties due to erroneous branch predictions. The proposed method is based on static program analysis and branch target buffer modelling. It consists in collecting information on branch target buffer evolution by considering all possible execution paths of a program. Collected information can then be used to classify control transfer instructions so that their worst case branching cost can be estimated and incorporated into the program WCET. A method is also given to tightly predict the WCET of loops whose number of iterations depend on counter variables of outer loops. Experimental results show that the timing penalty due to wrong branch predictions estimated by the proposed technique is close to the real one, which demonstrates the practical applicability of our method.  相似文献   

4.
为满足嵌入式设备小面积高性能的需求,设计一种基于开源RISC-V指令集的32位可综合乱序处理器.处理器包括分支预测、相关性处理等关键技术,支持RISC-V基本整数运算、乘除法以及压缩指令集.采用具有顺序单发射、乱序执行、乱序写回等特性的三级流水线结构,运用哈佛体系结构及AHB总线协议,可满足并行访问指令与数据的需求.在...  相似文献   

5.
高吞吐率浮点FFT处理器的FPGA实现研究   总被引:3,自引:0,他引:3       下载免费PDF全文
受浮点操作的长流水线延迟及FPGA片上RAM端口数目的限制,传统H可处理器的吞吐率通常只能达到每周期输出一个复数结果。本文用FPGA设计并实现了一种高吞吐率的IEEE754标准单精度浮点FFT处理器,通过改进蝶形计算单元的结构并重新组织FPGA片上RAM的访问,该处理器每周期平均可输出约两个复数计算结果,吞吐率约为传统FFT处理器吞吐率的两倍。对于1024点FFT变换,可在(512+10)*10=5220周期内完成。  相似文献   

6.
Statutor names an evolving Prolog program which is being developed as a knowledge-based tutoring system in the legal domain. The system utilises direct manipulation of graphical objects as a means of eliciting complex responses from the user and for providing graphical representations of complex answers to the user. It is a marriage of good interface practice and knowledge-based programming techniques which has presented a number of interesting prospects for tutoring. Perhaps the most interesting of these is the possibility of a dialog in which the student is asked to construct an argument in order to establish the truth of a particular proposition, the system then doing the same, and feedback and student modelling information being derived from a comparison of the two argument structures. This technique is not restricted in its significance to the legal domain but is applicable wherever knowledge of a subject matter can be expressed or tested by the construction of an argument. Finally, the system demonstrates the reusability of declarative knowledge by including additional modules (an expert system shall and an authoring system) with utilise the same knowledge bases as the main Statutor program itself.  相似文献   

7.
该文论述了嵌入式微控制器,通用微处理器、数字信号处理器和片上系统等几类主流嵌入式处理器各自的特点,应用范围,发展及现状以及在此基础上新的发展趋势.并指出今后技术融合的趋势。  相似文献   

8.
该文论述了嵌入式微控制器,通用微处理器、数字信号处理器和片上系统等几类主流嵌入式处理器各自的特点,应用范围,发展及现状以及在此基础上新的发展趋势,并指出今后技术融合的趋势。  相似文献   

9.
10.
The execution of long-running batch programs imposes severe reliability constraints on a computing system since the occurrence of a failure during its execution is more likely and that once occurred, a failure would destroy all the processing perfonned thus far. This paper studies the execution delay and machine resources consumed in supporting the running of large batch programs in a computing environment interrupted by failures. The effect of checkpoints and their optimal insertion are also considered. The results are applicable to arbitrary law of failure.  相似文献   

11.
Adams  L. Ou  M. 《Micro, IEEE》1997,17(4):44-48
The need to produce an inexpensive and effective hard disk drive controller in a short time frame persuaded the Palmchip design team to try a new approach: combining the processor and the main disk drive controller into a single part. Successfully embedding the ARM7TDMI processor in our GreenLite ASIC proved to be a tremendous learning experience. We discovered ways to solve some of the problems we encountered and to ease the use of the 7TDMI core itself  相似文献   

12.
Conditional branches incur a severe performance penalty in wide-issue, deeply pipelined processors. Speculative execution(1, 2) and predicated execution(3–9) are two mechanisms that have been proposed for reducing this penalty. Speculative execution can completely eliminate the penalty associated with a particular branch, but requires accurate branch prediction to be effective. Predicated execution does not require accurate branch prediction to eliminate the branch penalty, but is not applicable to all branches and can increase the latencies within the program. This paper examines the performance benefit of using both mechanisms to reduce the branch execution penalty. Predicated execution is used to handle the hard-to-predict branches and speculative execution is used to handle the remaining branches. The hard-to-predict branches within the program are determined by profiling. We show that this approach can significantly reduce the branch execution penalty suffered by wide-issue processors.  相似文献   

13.
14.
This paper presents the application of the Visual Servoing approach to a mobile robot which must execute coordinate motions in a known indoor environment. In this work, we are interested in the execution and control of basic motions like Go to an object by using the mobile robot Hil are2Bis. We use a diagonal matrix for the gain to improve the visual servoing behaviour and the potential field formalism to avoid obstacles. Namely, the robot is controlled according to the position of some features in an image. Such a path will be executed by a nonholonomic mobile robot, which has only two degrees of freedom (two wheels), and three configuration parameters (X Y ); a camera is mounted on the robot close to the end effector of an arm, controlled to add at least a new degree of freedom (pl).  相似文献   

15.
This paper presents an algorithm for synchronization placement when using a SPMD execution model, where synchronizations are enforced only when there exists a cross-processor data dependence. In this paper, we investigate two scheduling techniques, loop-based and data-based, both of which use a SPMD model. Using scheduling information from previous stages in the compilation process, a new technique to determine potential cross-processor data dependences is presented. Given the minimum number of cross-processor data dependences that must be satisfied, a new optimization is used so as to minimize the number of synchronization points needed to satisfy them. This algorithm has been successfully implemented in an experimental compiler. Initial experimental data show this technique to be very effective, outperforming existing methods.  相似文献   

16.
众核处理器片上同步机制和评估方法研究   总被引:1,自引:0,他引:1  
同步机制是片上多核/众核处理器正确执行和协同通信的关键,其效率对处理器的性能非常重要.针对片上众核体系结构,提出并实现了两种粗粒度同步机制和一种细粒度同步机制,即片上专用硬件支持的同步机制、基于原语的片上互斥访问同步机制和基于满空标志位的细粒度同步机制;提出了粗粒度同步机制的评估标准和评估方法,并设计了量化评估程序.以片上同构众核处理器Godson-T模拟器和AMD Opteron商业片上多核处理器为平台,评估比较了提出的硬件支持的同步机制与基于原语的同步机制的性能.结果表明,硬件支持可以使得片上众核处理器的同步机制性能明显提高;在传统基于原语的同步机制中,大部分性能损失是由于负载不平衡和同步点的串行化操作而造成的等待时间.  相似文献   

17.
The performance of a multiprocessor architecture is determined both by the way the program is partitioned into processes and by the way these processes are allocated to different processors. In the fine-grain dataflow model, where each process consists of a single instruction, decomposition of a program into processes is achieved automatically by compilation. This paper investigates the effectiveness of fine-grain decomposition in the context of the prototype dataflow machine now in operation at the University of Manchester. The current machine is a uniprocessor, known as the Single-Ring Dataflow Machine, comprising a single processing element which contains several units connected together in a pipelined ring. A Multi-ring Dataflow Machine (MDM) containing several such processing elements connected together via an interprocessor switching network, is currently under investigation. The paper describes a method of allocating dataflow instructions to processing elements in the MDM, and examines the influence of this method on selection of a switching network. Results obtained from simulation of the MDM are presented. They show that programs are executed efficiently when their parallelism is matched to the parallelism of the machine hardware.  相似文献   

18.
基于NIOS处理器的面阵CCD采集系统设计   总被引:3,自引:3,他引:0  
设计了一种基于Altera的SOPC Builder的软核处理器NIOSⅡ实现面阵CCD采集的系统;该系统采用集成了时序发生器(LR38617)、垂直驱动放大(LR36685)、模拟前端(IR3Y48A1)等功能的集成芯片模块LR38642驱动RJ21P3AHOPT面阵CCD,经LR38642内部的高速A/D转换后的各像素数字信号在NIOSⅡ处理器的控制下,通过自定义采集IP核以及Avalon总线传输到片外SDRAM存储区;经实验论证,该系统结构紧凑,可控性强,可用于高性能数码相机和实时图像采集与处理等场合。  相似文献   

19.
在当前的嵌入式设备中,触摸屏作为人机接口得到了广泛的应用.本文讨论了在基于PXA255处理器的开发平台上使用ADS7846和UCB1400控制芯片完成触摸屏模块的硬件设计,以及ADS7846在Linux操作系统中的软件驱动程序.  相似文献   

20.
Incoherent noise is manifest in measurements of expectation values when the underlying ensemble evolves under a classical distribution of unitary processes. While many incoherent processes appear decoherent, there are important differences. The distribution functions underlying incoherent processes are either static or slowly varying with respect to control operations and so the errors introduced by these distributions are refocusable. The observation and control of incoherence in small Hilbert spaces is well known. Here we explore incoherence during an entangling operation, such as is relevant in quantum information processing. As expected, it is more difficult to separate incoherence and decoherence over such processes. However, by studying the fidelity decay under a cyclic entangling map we are able to identify distinctive experimental signatures of incoherence. This is demonstrated both through numerical simulations and experimentally in a three qubit nuclear magnetic resonance implementation.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号