首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   11篇
  完全免费   7篇
  自动化技术   18篇
  2013年   1篇
  2011年   3篇
  2010年   1篇
  2009年   2篇
  2007年   1篇
  2004年   1篇
  2000年   1篇
  1999年   1篇
  1996年   1篇
  1995年   1篇
  1994年   3篇
  1992年   1篇
  1991年   1篇
排序方式: 共有18条查询结果,搜索用时 46 毫秒
1.
消除VLIW结构上的循环体间冗余流相关   总被引:2,自引:1,他引:1       下载免费PDF全文
容红波  汤志忠 《软件学报》2000,11(1):126-132
数据相关是并行处理的基本依据.该文指出,VLIW(very long instruction word)特有的锁步性质使其数据相关性分析具有与众不同的特点.同一体差上的流相关形成一个线序集合,多体差上的特征流相关之间也存在包含关系.据此,提出一种用于VLIW的消除循环体间冗余流相关的方法.该方法是完备的,可以去除所有冗余的体间流相关,从而减轻循环调度的负担.文章给出判定单体差和多体差存在冗余的充分必要条件,以及消除冗余的线性复杂度的算法.这种方法具有普遍意义,可作为VLIW上软件流水和多指令流调度的基础.  相似文献
2.
A performance-based parallel loop scheduling on grid environments   总被引:2,自引:2,他引:0  
The effectiveness of loop self-scheduling schemes has been shown on traditional multiprocessors in the past and computing clusters in the recent years. However, parallel loop scheduling has not been widely applied to computing grids, which are characterized by heterogeneous resources and dynamic environments. In this paper, a performance-based approach, taking the two characteristics above into consideration, is proposed to schedule parallel loop iterations on grid environments. Furthermore, we use a parameter, SWR, to estimate the proportion of the workload which can be scheduled statically, thus alleviating the effect of irregular workloads. Experimental results on a grid testbed show that the proposed approach can reduce the completion time for applications with regular or irregular workloads. Consequently, we claim that parallel loop scheduling can benefit applications on grid environments.  相似文献
3.
多重循环的软件流水技术   总被引:1,自引:1,他引:0       下载免费PDF全文
汤志忠  王雷  钱江 《软件学报》1996,7(7):422-427
为了解决多重循环的指令级并行编译问题,本文提出了反刍方法,以一种新的思维方式处理多重循环,将其视为一个程序流整体,有效地开发了多重循环的并行度.另外,本文还给出了实现反刍方法的基本步骤以及相应的硬件支持.最后,通过一些初步实验的结果验证了本算法的有效性,并讨论了其时间和空间效益,分析了其主要特点.  相似文献
4.
Decomposed software pipelining: A new perspective and a new approach   总被引:1,自引:0,他引:1  
Software pipelining is an efficient instruction-level loop scheduling technique, but existing software pipelining approaches have not been widely used in practical and commercial compilers. This is mainly because resource constraints and the cyclic data dependencies make software pipelining very complicated and difficult to apply. In this paper we present a new perspective on software pipelining in which it is decomposed into two subproblems—one is free from cyclic data dependencies and can be effectively solved by the list scheduling technique, and the other is free from resource constraints and can be easily solved by classical polynomial-time algorithms of graph theory. Based on this new perspective, we develop a new instruction-level loop scheduling approach, call DEcomposed Software Pipelining (DESP).  相似文献
5.
This paper uses timed petri net to model and analyze the problem of instructionlevel loop scheduling with resource constraints,which has been proven to be an NP complete problem.First,we present a new timed Petri net model to integrate functional unit allocation,register allocation and spilling into a unified theoretical framework.Then we develop a state subgraph,called Register Allocation Solution Graph,which can effectively describe the major behavior of our new model.the main property of this state subgraph is that the number of all its nodes is polynomial.Finally we present and prove that the optimum loop schedules can be found with polynomial computation complexity,for almost all practical loop programs.Our work lightens a new idea of finding the optimum loop schedules.  相似文献
6.
This paper presents SUPPLE (SUPort for Parallel Loop Execution), an innovative run-time support for the execution of parallel loops with regular stencil data references and non-uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non-uniform iterations. It adopts, as far as possible, a static scheduling policy derived from the owner computes rule, and moves data and iterations among processors only if a load imbalance actually occurs. SUPPLE always tries to overlap communications with useful computations by reordering loop iterations and prefetching remote ones in the case of workload imbalance. The SUPPLE approach has been validated by many experimental results obtained by running a multi-dimensional flame simulation kernel on a 64-node Cray T3D. We have fed the benchmark code with several synthetic input data sets built on the basis of a load imbalance model. We have compared our results with those obtained with a CRAFT Fortran implementation of the benchmark.  相似文献
7.
OpenMP作为共享存储并行编程标准,以其良好的易用性、支持增量并行等特点成为并行程序设计的主流模型之一.OpenMP标准是针对UMA共享存储结构制定的,其循环调度机制只考虑了负载平衡而无须考虑数据分布.然而在机群OpenMP系统中,数据局部性是影响性能的关键因素.针对OpenMP标准中静态调度策略不适合机群计算的缺点,提出了一个充分体现拥有者计算原则的LBS调度算法,并通过扩展制导的方式在机群OpenMP系统(OpenMP/JIAJIA)上加以实现.测试结果表明,LBS算法对于机群OpenMP系统很有效.  相似文献
8.
Global software pipelining is a complex but efficient compilation technique to exploit instruction-level parallelism for loops with branches.This paper presents a novel global software pipelining technique,called Trace Software Pipelining,targeted to the instruction-level parallel processors such as Very Long Instruction Word (VLIW) and superscalar machines.Trace software pipelining applies a global code scheduling technique to compact the original loop body.The resulting loop is called a trace software pipelined (TSP) code.The trace softwrae pipelined code can be directly executed with special architectural support or can be transformed into a globally software pipelined loop for the current VLIW and superscalar processors.Thus,exploiting parallelism across all iterations of a loop can be completed through compacting the original loop body with any global code scheduling technique.This makes our new technique very promising in practical compilers.Finally,we also present the preliminary experimental results to support our new approach.  相似文献
9.
Load balance is important because it may affect the speedup attained through the concurrent execution of loop iterations on a parallel processor. We study loop load balance in the context of the well-known Perfect benchmarks. Several static and dynamic characteristics of the Perfect benchmark DOALL loops are observed and interpreted. Thelate arrival of processors is noted as a major source of load imbalance. This observation suggested the idea ofprocessor preallocation. An analytic cost model is presented and the advantages of processor preallocation are demonstrated by experimental evaluation on a CRAY Y-MP8 under the Unicos operating system.  相似文献
10.
In this paper, we consider the optimal loop scheduling and minimum storage allocation problems based on the argument-fetching dataflow architecture model. Under the argument-fetching model, the result generated by a node is stored in a unique location which is addressable by its successors. The main contribution of this paper includes: for loops containing no loop-carried dependences, we prove that the problem of allocating minimum storage required to support rate-optimal loop scheduling can be solved in polynomial time. The polynomial time algorithm is based on the fact that the constraint matrix in the formulation is totally unimodular. Since the instruction processing unit of an argument-fetching dataflow architecture is very much like a conventional processor architecture without a program counter, the solution of the optimal loop storage allocation problem for the former will also be useful for the latter.  相似文献
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号