首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 0 毫秒
1.
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper, we motivate the need for renaming and for predicates that reflect path information. We present Predicated Static Single Assignment (PSSA) which uses renaming and introduces Full -Path Predicates to remove false dependences and enable aggressive predicated optimization and instruction scheduling. We demonstrate the usefulness of PSSA for Predicated Speculation and Control Height Reduction. These two predicated code optimizations used during instruction scheduling reduce the dependence length of the critical paths through a predicated region. Our results show that using PSSA to enable speculation and control height reduction reduces execution time from 12 to 68%.  相似文献   

2.
指令间的依赖关系是阻碍指令调度发挥作用,进而影响指令级并行的主要障碍。寄存器重命名是解决控制依赖和数据依赖的一种重要技术。研究并实现了一种指令调度中的寄存器重命名技术。它在164.gzip和186.crafty上分别取得了约5%和3%的加速比。  相似文献   

3.
This paper presents a set of efficient graph transformations for local instruction scheduling. These transformations to the data-dependency graph prune redundant and inferior schedules from the solution space of the problem. Optimally scheduling the transformed problems using an enumerative scheduler is faster and the number of problems solved to optimality within a bounded time is increased. Furthermore, heuristic scheduling of the transformed problems often yields improved schedules for hard problems. The basic node-based transformation runs in O(ne) time, where n is the number of nodes and e is the number of edges in the graph. A generalized subgraph-based transformation runs in O(n2 e) time. The transformations are implemented within the Gnu Compiler Collection (GCC) and are evaluated experimentally using the SPEC CPU2000 floating-point benchmarks targeted to various processor models. The results show that the transformations are fast and improve the results of both heuristic and optimal scheduling.  相似文献   

4.
Conventional schedulers schedule operations in dependence order and never revisit or undo a scheduling decision on any operation. In contrast, backtracking schedulers may unschedule operations and can often generate better schedules. This paper develops and evaluates the backtracking approach to fill branch delay slots. We first present the structure of a generic backtracking scheduling algorithm and prove that it terminates. We then describe two more aggressive backtracking schedulers and evaluate their effectiveness. We conclude that aggressive backtracking-based instruction schedulers can effectively improve schedule quality by eliminating branch delay slots with a small amount of additional computation.  相似文献   

5.
Petri网作为一种可视化的规格语言,越来越多地用于实时系统的评估和分析。该文提出了一种基于Petri网的分布式实时系统模型,并对该模型中的局部调度器和消息调度器进行了描述。根据该模型可以开发相应的分布式实时调度模拟器,这样就可以在系统的开发初期,利用模拟器来验证在给定的局部调度策略和消息调度策略下,系统任务的时间约束是否能够得到有效的保障。同时该模型还可以很容易地转化为系统的快速原型。  相似文献   

6.
The On-Line Multiprocessor Scheduling Problem with Known Sum of the Tasks   总被引:2,自引:0,他引:2  
In this paper we investigate a semi on-line multiprocessor scheduling problem. The problem is the classical on-line multiprocessor problem where the total sum of the tasks is known in advance. We show an asymptotic lower bound on the performance ratio of any algorithm (as the number of processors gets large), and present an algorithm which has performance ratio at most for any number of processors. When compared with known general lower bounds, this result indicates that the information on the sum of tasks substantially improves the performance ratio of on-line algorithms.  相似文献   

7.
Mannor  Shie  Meir  Ron 《Machine Learning》2002,48(1-3):219-251
We consider the existence of a linear weak learner for boosting algorithms. A weak learner for binary classification problems is required to achieve a weighted empirical error on the training set which is bounded from above by 1/2 – , > 0, for any distribution on the data set. Moreover, in order that the weak learner be useful in terms of generalization, must be sufficiently far from zero. While the existence of weak learners is essential to the success of boosting algorithms, a proof of their existence based on a geometric point of view has been hitherto lacking. In this work we show that under certain natural conditions on the data set, a linear classifier is indeed a weak learner. Our results can be directly applied to generalization error bounds for boosting, leading to closed-form bounds. We also provide a procedure for dynamically determining the number of boosting iterations required to achieve low generalization error. The bounds established in this work are based on the theory of geometric discrepancy.  相似文献   

8.
Microarchitecture of the Godson-2 Processor   总被引:23,自引:3,他引:23       下载免费PDF全文
The Godson project is the first attempt to design high performance general-purpose microprocessors in China. This paper introduces the microarchitecture of the Godson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that implements the 64-bit MlPS-like instruction set. The adoption of the aggressive out-of-order execution techniques (such as register mapping, branch prediction, and dynamic scheduling) and cache techniques (such as non-blocking cache, load speculation, dynamic memory disambiguation) helps the Godson-2 processor to achieve high performance even at not so high frequency. The Godson-2 processor has been physically implemented on a 6-metal 0.18μm CMOS technology based on the automatic placing and routing flow with the help of some crafted library cells and macros. The area of the chip is 6,700 micrometers by 6,200 micrometers and the clock cycle at typical corner is 2.3ns.  相似文献   

9.
多处理机制调度设计研究   总被引:2,自引:0,他引:2       下载免费PDF全文
本文给出了现今几种典型的并行计算机体系结构及处理机分配与调度策略,重点研究了共享内存对称多处理机的主要线程调度算法。  相似文献   

10.
针对仅含纯周期任务集合、符合ARINC653多分区构架航电系统两级调度模型的可调度性判定问题,提出一种基于分区的航电系统调度分析工具。通过设定时钟变量模拟航电系统各分区中任务集调度过程,依据纯周期任务集及分区航电系统时间片分派特性确定仿真区间,设计优化的调度分析算法,判定航电系统分区级时间片分派的正确性及各分区中任务集的可调度性。测试及实例分析结果表明,该工具能自动、准确、快速地判定航电系统分区以及任务级调度模型的可调度性,并能以甘特图的方式绘制系统调度过程,较现有工具更为直观、高效。  相似文献   

11.

Instruction Reuse is a microarchitectural technique that exploits dynamic instruction repetition to remove redundant computations at run-time. In this paper we examine instruction reuse of integer ALU and load instructions in network processing applications and attempt to answer the following questions: (1) How much of instruction repetition can be reused in packet processing applications?, (2) Can the temporal locality of network traffic be exploited to reduce interference in the Reuse Buffer and improve reuse? and (3) What is the effect of reuse on microarchitectural features such as resource contention and memory accesses? We use an execution driven simulation methodology to evaluate instruction reuse and find that for the benchmarks considered, 1 to 50% of the dynamic instructions are reused yielding performance improvement between 1 and 20%. To further improve reuse, a flow aggregation scheme as well as an architecture for exploiting the same is proposed. This scheme is mostly applicable to header processing applications and exploits temporal locality in packet data to uncover higher reuse. As a side effect, instruction reuse reduces memory traffic and improves performance.

  相似文献   

12.
RAID的并行I/O调度算法分析   总被引:6,自引:1,他引:6  
由于越来越多的应用受限于I/O,存储系统正起着越来越重要的作用,磁盘阵列RAID是一种提供高性能I/O的最常见存储设备,本文分析了RAID并行I/O调度算法的I/O执行时间和磁盘利用率,为合理配置高性能阵列提供了依据。  相似文献   

13.
On the Complexity of Adjacent Resource Scheduling   总被引:1,自引:0,他引:1  
We study the problem of scheduling resource(s) for jobs in an adjacent manner (ARS). The problem relates to fixed-interval scheduling on one hand, and to the problem of two-dimensional strip packing on the other. Further, there is a close relation with multiprocessor scheduling. A distinguishing characteristic is the constraint of resource-adjacency. As an application of ARS, we consider an airport where passengers check in for their flight, joining lines before one or more desks, at the desk the luggage is checked and so forth. To smoothen these operations the airport maintains a clear order in the waiting lines: a number n(f) of adjacent desks is to be assigned exclusively during a fixed time-interval I(f) to flight f. For each flight in a given planning horizon of discrete time periods, one seeks a feasible assignment to adjacent desks and the objective is to minimize the total number of involved desks. The paper explores two problem variants and relates them to other scheduling problems. The basic, rectangular version of ARS is a special case of multiprocessor scheduling. The other problem is more general and it does not fit into any existing scheduling model. After presenting an integer linear program for ARS, we discuss the complexity of both problems, as well as of special cases. The decision version of the rectangular problem remains strongly NP-complete. The complexity of the other problem is already strongly NP-complete for two time periods. The paper also determines a number of cases that are solvable in polynomial time.  相似文献   

14.
龙芯2号处理器设计和性能分析   总被引:16,自引:4,他引:16  
介绍龙芯2号处理器设计及其性能测试结果.龙芯2号采用四发射超标量超流水结构。片内一级指令和数据高速缓存各64KB,片外二级高速缓存最多可达8MB.为了充分发挥流水线的效率,龙芯2号实现了先进的转移猜测、寄存器重命名、动态调度等乱序执行技术以及非阻塞的Cache访问和load Speculation等动态存储访问机制.龙芯2号处理器采用0.18gm的CMOS工艺实现,在正常电压下的最高工作频率为500MHz,500MHz时的实测功耗为3~5W.龙芯2号单精度峰值浮点运算速度为20亿a/秒,双精度浮点运算速度为10亿a/秒,SPECCPU2000的实测性能是龙芯1号的8~10倍,综合性能已经达到PentiumⅢ的水平.目前芯片样机能流畅运行完整的64位中文Linux操作系统,全功能的Mozilla浏览器、多媒体播放器和OpenOffice办公套件,可以满足绝大多数桌面应用的要求.  相似文献   

15.
近年来,网络编码作为提高通信系统吞吐量一种手段。在多播的通信网络,网络中各个传送节点结合动态变化的网络情况,对不同信息流的数据包进行编码处理,从而减轻局部节点的阻塞,提高了整个通信系统的性能。文章主要工作要体现在典型无线通信网络中,引入动态网络编码调度算法,提高无线通信系统的网络编码增益和系统吞吐量;探讨在自适应无线通信系统下,如何适当的使用自适应技术,使得动态网络编码调度算法的作用发挥到最大。各个节点间的发送端更应选取适合的自适应技术,来提高无线通信系统性能。通过MATLAB仿真显示,带有自适应技术的无线网络,在动态网络编码调度算法作用下,对改进系统性能有着更加重要的现实意义。  相似文献   

16.
Chandra  Ramesh  Liu  Xue  Sha  Lui 《Real-Time Systems》2003,24(2):153-169
Many real-time systems, such as manufacturing plants, have long life cycles. To enable the realization of technological innovations and to mitigate the risk and cost of bringing new control technologies into functioning systems, flexible and reliable real-time software architectures such as Simplex have been developed. There is also an emerging trend that integrates the design of controllers and schedulers. For example, algorithms that identify the optimal frequencies of control tasks subject to schedulability constraints have been developed, and the notion of feedback schedulers has been investigated. However, the optimization of the performance of flexible and reliable architectures with analytically redundant software controllers has remained an open problem. In fact, a direct application of the existing optimization methods would not yield the optimal frequencies. In this paper, we present a method that correctly finds the optimal frequencies for systems using analytically redundant controllers. We also show that the proposed method is robust against inaccuracies in the estimation of failure rates of the controllers.  相似文献   

17.
SDTA指令集体系结构是一种基于传输触发的VLIW体系结构。本文结合SDTA指令集结构的特点,经过循环展开和循环化简、强度消弱、过程集成、机器方言和指令归并等指令调度优化技术,高效实现了自然对数函数ln(x)。实验结果表明,在Neuron处理器上,ln(x)不但数据精度高,而且运行周期数只有gcc3.2.2数学库中自然对数函数运行周期数的33%左右。  相似文献   

18.
许多网络应用需要网络交换节点能保证分组转发的时延,周期流量的调度是提供这一保证的重要手段.在流量负荷过载的情况下,如何进行优化调度是该领域的重要课题.文中首先依据交换机吞吐率和呼损率两个性能指标,分别定义了两种交换机周期流量调度的最优化问题.为了分析这些优化调度问题的复杂性,文中定义了一种受限的Max2Sat问题,并证明该Max2Sat问题是NP完全的.然后,通过将该问题多项式归约到交换机周期流优化调度问题,证明了仅有1和2嵌套周期流的交换机优化调度问题是强NP完全问题.并进一步利用该结果证明了任意嵌套周期的优化调度问题也是NP难的.  相似文献   

19.
Hau-San  Kent K.T.  Horace H.S.   《Pattern recognition》2004,37(12):2307-2322
Classification of 3D head models based on their shape attributes for subsequent indexing and retrieval are important in many applications, as in hierarchical content-based retrieval of these head models for virtual scene composition, and the automatic annotation of these characters in such scenes. While simple feature representations are preferred for more efficient classification operations, these features may not be adequate for distinguishing between the subtly different head model classes. In view of these, we propose an optimization approach based on genetic algorithm (GA) where the original model representation is transformed in such a way that the classification rate is significantly enhanced while retaining the efficiency and simplicity of the original representation. Specifically, based on the Extended Gaussian Image (EGI) representation for 3D models which summarizes the surface normal orientation statistics, we consider these orientations as random variables, and proceed to search for an optimal transformation for these variables based on genetic optimization. The resulting transformed distributions for these random variables are then used as the modified classifier inputs. Experiments have shown that the optimized transformation results in a significant improvement in classification results for a large variety of class structures. More importantly, the transformation can be indirectly realized by bin removal and bin count merging in the original histogram, thus retaining the advantage of the original EGI representation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号