期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

James C. Dehnert Ross A. Towle 《The Journal of supercomputing》1993,7(1-2):181-227

The Cydra 5 is a VLIW minisupercomputer with hardware designed to accelerate a broad class of inner loops, presenting unique challenges to its compilers. We discuss the organization of its Fortran/77 compiler and several of the key approaches developed to fully exploit the hardware. These include the intermediate representation used; the preparation, overlapped scheduling, and register allocation of inner loops; the speculative execution model used to control global code motion; and the machine model and local instruction scheduling approach. 相似文献

2.

基于路径分组与数据相关松弛的软件流水

容红波汤志忠《软件学报》2001,12(4):544-555

软件流水是循环调度的重要方法.有分支循环的流水依然是个难题.现有算法可以分为4类:循环线性化、路径分离、整体调度和路径选择.它们都未能和谐地解决两个对立问题:转移时间最小化和最差约束问题.提出了基于路径分组和数据相关松弛的软件流水框架,试图无矛盾地解决上述问题.其主要思想是:(1)路径分组,即按照路径的执行概率和转移概率将路径分组,力求最小化转移时间;(2)数据相关松弛,力求避免最差约束,即当循环有多条路径时,有些相关在循环执行中并不一定有实例,理想的策略是仅当它有实例时才遵守.初步实验和定性分析表明,此相似文献

3.

软件流水的开销模型和决策框架 总被引：1，自引：0，他引：1

下载免费PDF全文

李文龙林海波汤志忠《软件学报》2004,15(7):1005-1011

软件流水是一种重要的指令调度技术,它通过重叠地执行不同的循环体来提高指令级并行性(instruction level parallelism,简称ILP).模调度是一类被广泛采用的软件流水调度算法.软件流水并非一种无损的优化方法,它具有一定的开销,比如延长了编译时间、增加了寄存器压力等.而且,受到体系结构、调度算法以及程序特性的限制,进行软件流水并不一定能达到理想的加速比,有时反而会引起性能下降.提出了一种面向程序特性的软件流水开销模型,对此模型下的软件流水开销进行了量化分析,并提出了一种基于相关性分析的相似文献

4.

Construction of speculative optimization algorithms

A. A. Belevantsev S. S. Gaisaryan V. P. Ivannikov 《Programming and Computer Software》2008,34(3):138-153

In modern processors, instructions to perform operations are often produced before it becomes known that this is required. Such an expedient, which is called speculative execution, helps to reveal parallelism at the instruction level. In the EPIC architectures, the speculative execution is completely controlled by the compiler, which makes it possible to avoid using complex hardware mechanisms for supporting speculative instruction production. Moreover, the idea of the speculative execution can be used by the compiler in machine-independent optimizations. The paper describes a scheme of construction of the speculative optimization that is based on the selection of properties of the control flow and data flow that are important from the optimization standpoint and on the estimation of the probabilities of their fulfillment. The probabilities found are used for searching and constructing advantageous speculative and bookkeeping transformations. For optimizations that include only speculative movements of instructions upwards along the control flow graph, on the basis of the suggested scheme, a method has been developed that includes algorithms for finding probabilities of data and control dependences, for estimating benefit of speculative movements, and for constructing a recovery code. On the basis of this method, an algorithm for the speculative scheduling of instructions for the Intel Itanium architecture has been developed and implemented. Specific features of its implementation and experimental results are described. 相似文献

5.

高级综合中VHDL描述向Petri网转换方法的研究 总被引：1，自引：0，他引：1

刁岚松孟晗刘明业杨凯《计算机辅助设计与图形学学报》2003,15(9):1105-1111

提出一种基于执行路径的Petri网生成算法,该算法提取VHDL源描述中的功能和时序信息,生成与源描述完全等价的Petri网结构．算法采用条件树结构保存条件,语句执行条件和Petri网迁移条件都依据条件树生成．生成的Petri网能够准确地保存源描述中的I／O时序信息,形成调度过程中I／O操作处理的基础．从该结构出发,能够方便地实现各种I／O模式的调度。相似文献

6.

IA-64中软件流水的寄存器需求研究 总被引：1，自引：0，他引：1

林海波李文龙汤志忠《计算机研究与发展》2004,41(1):22-27

软件流水是开发循环程序指令级并行性的重要方法之一，IA-64是支持软件流水的EPIC体系结构，通过对NAS Benchmarks中可软件流水循环所需的寄存器进行量化分析，提出了一种限制循环展开因子的启发式算法，有效地解决了因可用寄存器不足而导致软件流水失败的问题，并提高了应用程序的执行速度。相似文献

7.

Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

R. Govindarajan N. S. S. Narasimha Rao E. R. Altman Guang R. Gao 《International journal of parallel programming》2000,28(1):1-46

Instruction scheduling methods which use the concepts developed by the classical pipeline theory have been proposed for architectures involving deeply pipelined function units. These methods rely on the construction of state diagrams (or automatons) to (i) efficiently represent the complex resource usage pattern; and (ii) analyze legal initiation sequences, i.e., those which do not cause a structural hazard. In this paper, we propose a state-diagram based approach for modulo scheduling or software pipelining, an instruction scheduling method for loops. Our approach adapts the classical pipeline theory for modulo scheduling, and, hence, the resulting theory is called Modulo-Scheduled pipeline (MS-pipeline) theory. The state diagram, called the Modulo-Scheduled (MS) state diagram is helpful in identifying legal initiation or latency sequences, that improve the number of instructions initiated in a pipeline. An efficient method, called Co-scheduling, which uses the legal initiation sequences as guidelines for constructing software pipelined schedules has been proposed in this paper. However, the complexity of the constructed MS-state diagram limits the usefulness of our Co-scheduling method. Further analysis of the MS-pipeline theory, reveals that the space complexity of the MS-state diagram can be significantly reduced by identifying primary paths. We develop the underlying theory to establish that the reduced MS-state diagram consisting only of primary paths is complete; i.e., it retains all the useful information represented by the original state diagram as far as scheduling of operations is concerned. Our experiments show that the number of paths in the reduced state diagram is significantly lower—by 1 to 3 orders of magnitude—compared to the number of paths in the original state diagram. The reduction in the state diagram facilitate the Co-scheduling method to consider multiple initiations sequences, and hence obtain more efficient schedules. We call the resulting method, enhanced Co-scheduling. The enhanced Co-scheduling method produced efficient schedules when tested on a set of 1153 benchmark loops. Further the schedules produced by this method are significantly better than those produced by Huff's Slack Scheduling method, a competitive software pipelining method, in terms of both the initiation interval of the schedules and the time taken to construct them. 相似文献

8.

反投机技术研究

孙维新赵荣彩苏铭齐宁《计算机工程与应用》2007,43(1):74-78

投机优化技术作为一种先进的现代编译技术,有效地提高了指令执行的并行性。然而,在逆向工程中,有时要实现代码的跨平台移植,而投机优化技术又受硬件平台的制约;有时需要优化代码的结构,使程序的逻辑结构易于理解;这些都要求消除这种与硬件息息相关的优化技术。论文基于IA-64平台,提出了一种反投机处理算法,对ICC编译器编译后的可执行二进制代码进行处理,消除代码中的投机优化,将其转换成等价的没有投机优化的指令序列,这样使反投机后的代码更容易理解,而且在逆向工程中摆脱了硬件的限制。测试表明该反投机技术可以对ICC编译后的代码进行有效处理。相似文献

9.

IA-64软件流水的反流水算法研究

崔平非庞建民赵荣彩崔雪冰《计算机应用》2006,26(8):1919-1921

软件流水是一种开发循环程序指令级并行性的技术, 它通过并行执行连续的多个迭代来加快循环的执行速度。而在逆向工程中,软件流水却为逆向翻译带来了困难。为此,基于IA-64平台,提出了一种反流水算法,针对循环中包含软件流水的汇编代码进行处理,将其反向转换成语义等价的串行代码,并通过实验验证了该算法的有效性,为在二进制翻译中处理软件流水代码奠定了基础。相似文献

10.

面向密码流处理器的AES算法软件流水实现方法

王寿成徐进辉严迎建李功丽贾永旺《计算机应用》2017,37(6):1620-1624

针对轮函数在分组密码实现过程中耗时过长的问题,提出了面向可重构密码流处理器（RCSP）的高级加密标准（AES）算法软件流水实现方法。该方法将轮函数操作划分为若干流水段,不同流水段对应不同的并行密码资源,通过并行执行多个轮函数的不同流水段,从而开发指令级并行性提高轮函数执行速度,进而提升分组密码的执行性能。在RCSP的单簇、双簇和四簇运算资源下分析了AES算法的流水线划分过程和软件流水映射方法,实验结果表明,该软件流水实现方法使得单分组或多分组不同数据分块的操作并行执行,不仅能够提升单分组串行执行性能,还能够通过开发分组间的并行性来提高多分组并行执行性能。相似文献

11.

符号执行中高语句覆盖率的路径调度

下载免费PDF全文

李朝君蒋凡《计算机工程与应用》2010,46(14):66-68

符号执行和约束求解相结合的软件测试方法采用深度优先搜索的路径调度算法会造成测试路径聚居性问题,实际软件中存在路径爆炸,使得采用该算法的测试语句覆盖率低下。提出一种新的PSHC路径调度算法。先将路径分为前缀和后缀两部分,每次测试总是试图寻找这样的路径,该路径与已存在的路径具有最短的相同前缀,并且包含尽可能多的尚未被访问过的基本块作为其后缀。基于Phoenix漏洞发掘工具的实验结果表明,PSHC算法可以迅速提高测试的语句覆盖率到100%,有效解决由于深度优先搜索的路径聚居性导致的测试代码的局部性问题,PSHC算法产生的路径数与循环深度无关,软件规模越大,该算法的表现越好。相似文献

12.

一种具有时间多样性的虚拟机软件保护方法

房鼎益赵媛王怀军顾元祥许广莲《软件学报》2015,26(6):1322-1339

软件核心算法防逆向保护,是软件研发乃至软件产业发展的迫切需求,也是当前软件安全研究领域的热点之一.虚拟机软件保护作为一种保护强度高、商业应用广的技术,已被用于软件核心算法保护,并在很大程度上能够抵御攻击者的逆向分析.但这种保护方法难以抵御累积攻击,无法提供更加持久的保护.时间多样性是指一个软件在不同时间被执行时,执行路径不同,主要用于抵御累积攻击.将时间多样性与虚拟机软件保护相结合,提出了一种具有时间多样性的虚拟机软件保护方法,称为TDVMP.在TDVMP中,通过构造多条相异的执行路径,使得被保护软件在不同次执行时,能够动态选取不同执行路径,从而极大地增加了攻击者进行累积的核心算法逆向分析攻击的难度.同时,对于TDVMP设计中的关键问题,比如多执行路径的构造与选择等进行了详细讨论.此外,提出了时间多样性保护效果的评价指标,并给出了其度量及计算方法.以所实现的原型系统为基础,通过一组具有一定实用价值的实例,对所提出的方法进行了测试、实验.结果表明,TDVMP对于软件核心算法防逆向保护是有效且实用的. 相似文献

13.

Scheduling multiprocessor tasks with genetic algorithms 总被引：4，自引：0，他引：4

Correa R.C. Ferreira A. Rebreyend P. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(8):825-837

In the multiprocessor scheduling problem, a given program is to be scheduled in a given multiprocessor system such that the program's execution time is minimized. This problem being very hard to solve exactly, many heuristic methods for finding a suboptimal schedule exist. We propose a new combined approach, where a genetic algorithm is improved with the introduction of some knowledge about the scheduling problem represented by the use of a list heuristic in the crossover and mutation genetic operations. This knowledge-augmented genetic approach is empirically compared with a “pure” genetic algorithm and with a “pure” list heuristic, both from the literature. Results of the experiments carried out with synthetic instances of the scheduling problem show that our knowledge-augmented algorithm produces much better results in terms of quality of solutions, although being slower in terms of execution time 相似文献

14.

Power-aware scheduling for AND/OR graphs in real-time systems 总被引：2，自引：0，他引：2

Zhu D. Mosse D. Melhem R. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(9):849-864

Power aware computing has become popular, recently and many techniques have been proposed to manage processor energy consumption for traditional real-time applications. In this paper, we are concerned mainly with the AND/OR model of real-time applications that have different execution paths consisting of different tasks. The contribution of this paper is twofold. First, we propose a greedy slack stealing algorithm to deal with applications represented by AND/OR graphs and prove its correctness in terms of meeting the timing constraints. Then, using statistical information about the applications, we propose a few variations of speculative scheduling algorithms that intend to save energy by reducing the number of speed changes (and, thus, the overhead) while ensuring that the application meets its timing constraints. Some practical issues are also considered, such as shared memory access contention and idle energy consumption. The performance of the algorithms is analyzed with respect to processor energy savings. The results surprisingly show that the greedy slack stealing scheme is better than some speculative schemes and that the greedy scheme is good enough when a reasonable minimal speed exists in the system or when there are only a few (four to six) voltage/speed levels. 相似文献

15.

多核片上系统的高效软硬件划分及调度算法

韩红蕾刘文菊武继刚李慧《计算机工程与科学》2011,33(9):57

软硬件划分与调度是软硬件协同设计的关键环节,是经典的组合优化问题。本文针对调度与软硬件划分问题提出一种高效的启发式算法。调度算法根据任务的出度及软件计算时间对任务赋予不同的优先级,出度越大,优先级越高,出度相同的情况下,软件计算时间越大,优先级越高。划分算法首先寻找关键路径,然后将关键路径上具有最高受益面积比的任务交由硬件去实现。每次迭代更新当前关键路径的调度长度及剩余硬件面积。继续循环,直到剩余的硬件面积不再满足关键路径上的任何一个软件任务所需的硬件面积的要求为止,这样使得硬件面积的使用率比较高。实验表明,该算法对已有算法的改进可达到38%。相似文献

16.

On the Boosting of Instruction Scheduling by Renaming

Wang L. Yang Ted C. 《The Journal of supercomputing》2001,19(2):173-197

Speculative execution is the execution of instructions before it is known whether these instructions should be executed. In the speculative execution for instruction level parallelism (ILP) processors, the concept of shadow register provides a hardware solution to maintain semantics of a program from the pollution of boosted instructions that are incorrectly predicted. In a recent study, Chang and Lai proposed a special register file based on shadow register, named conjugate register file (CRF), to support multilevel boosting in speculative execution. They also proposed a scheduling heuristic named frequency-driven scheduling to incorporate with CRF for execution. However, the ability of boosting is still constrained since the concept of register pair will force the results produced speculatively be stored in dedicated locations. Moreover, when the parallelism potential increases to tens through the advancement of hardware techniques, the heavy demand on register usage and the complexity of register file may well become a serious bottleneck for the exploitation of ILP.In this paper, the algorithm of frequency-driven scheduling is modified by replacing the function of hardware CRF with the technique of variable renaming during compilation. The new scheduling technique, named LESS, can exploit the parallelism efficiently with limited number of registers. Moreover, since the technique can benefit ILP without any special hardware support, it can be incorporated with any other ILP architecture without changing its instruction set architecture (ISA).Simulation results show that the performance achievable by LESS is better than other existing methods. For example, under the ILP model with an issue rate of 8, the speculative execution can achieve an increase of 34% in parallelism, as compared to 18% in CRF scheme. 相似文献

17.

Heuristics for scheduling I/O operations 总被引：1，自引：0，他引：1

Jain R. Somalwar K. Werth J. Browne J.C. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(3):310-320

The I/O bottleneck in parallel computer systems has recently begun receiving increasing interest. Most attention has focused on improving the performance of I/O devices using fairly low level parallelism in techniques such as disk striping and interleaving. Widely applicable solutions, however, will require an integrated approach which addresses the problem at multiple system levels, including applications, systems software, and architecture. We propose that within the context of such an integrated approach, scheduling parallel I/O operations will become increasingly attractive and can potentially provide substantial performance benefits. We describe a simple I/O scheduling problem and present approximate algorithms for its solution. The costs of using these algorithms in terms of execution time, and the benefits in terms of reduced time to complete a batch of I/O operations, are compared with the situations in which no scheduling is used, and in which an optimal scheduling algorithm is used. The comparison is performed both theoretically and experimentally. We have found that, in exchange for a small execution time overhead, the approximate scheduling algorithms can provide substantial improvements in I/O completion times 相似文献

18.

Optimizing large join queries using a graph-based approach 总被引：4，自引：0，他引：4

Chiang Lee Chi-Sheng Shih Yaw-Huei Chen 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(2):298-315

Although many query tree optimization strategies have been proposed in the literature, there still is a lack of a formal and complete representation of all possible permutations of query operations (i.e., execution plans) in a uniform manner. A graph-theoretic approach presented in the paper provides a sound mathematical basis for representing a query and searching for an execution plan. In this graph model, a node represents an operation and a directed edge between two nodes indicates the older of executing these two operations in an execution plan. Each node is associated with a weight and so is an edge. The weight is an expression containing optimization required parameters, such as relation size, tuple size, join selectivity factors. All possible execution plans are representable in this graph and each spanning tree of the graph becomes an execution plan. It is a general model which can be used in the optimizer of a DBMS for internal query representation. On the basis of this model, we devise an algorithm that finds a near optimal execution plan using only polynomial time. The algorithm is compared with a few other popular optimization methods. Experiments show that the proposed algorithm is superior to the others under most circumstances 相似文献

19.

A coarse-grained reconfigurable computing architecture with loop self-pipelining

Yong Dou GuiMing Wu JinHui Xu XingMing Zhou 《中国科学F辑(英文版)》2009,52(4):575-587

Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also introduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors. Supported by the National Natural Science Foundation of China (Grant No. 60633050, 60621003) and the National High Technology Research and Development Program of China (Grant No. 2007AA01Z06) 相似文献

20.

Using parallelization and hardware concurrency to improve the performance of a genetic algorithm

Vijay Tirumalai Kenneth G. Ricks Keith A. Woodbury 《Concurrency and Computation》2007,19(4):443-462

Genetic algorithms (GAs) are powerful tools for solving many problems requiring the search of a solution space having both local and global optima. The main drawback for GAs is the long execution time normally required for convergence to a solution. This paper discusses three different techniques that can be applied to GAs to improve overall execution time. A serial software implementation of a GA designed to solve a task scheduling problem is used as the basis for this research. The execution time of this implementation is then improved by exploiting the natural parallelism present in the algorithm using a multiprocessor. Additional performance improvements are provided by implementing the original serial software GA in dedicated reconfigurable hardware using a pipelined architecture. Finally, an advanced hardware implementation is presented in which both pipelining and duplicated hardware modules are used to provide additional concurrency leading to further performance improvements. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献