首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Data dependence concepts are reviewed, concentrating on and extending previous work on direction vectors. A bit vector representation of direction vectors is discussed. Various program restructuring transformations, such as loop circulation (a form of loop interchanging), reversal, skewing, sectioning (strip mining), combing, and rotation, are discussed in terms of their effects on the execution of the program, the required dependence tests for legality, and the effects of each transformation on the dependence graph. The bit vector representation of direction vectors is used to develop simple and efficient bit vector operations for the dependence tests and to generate the modified direction vector for each transformation. Finally, a simple method to interchange complex convex loop limits is given, which is useful when several loop restructuring operations are being applied in sequence.This work was supported by NSF Grant CCR-8906909 and DARPA Grant MDA972-88-J-1004.  相似文献   

2.
容红波  汤志忠 《软件学报》2001,12(4):544-555
软件流水是循环调度的重要方法.有分支循环的流水依然是个难题.现有算法可以分为4类:循环线性化、路径分离、整体调度和路径选择.它们都未能和谐地解决两个对立问题:转移时间最小化和最差约束问题.提出了基于路径分组和数据相关松弛的软件流水框架,试图无矛盾地解决上述问题.其主要思想是:(1)路径分组,即按照路径的执行概率和转移概率将路径分组,力求最小化转移时间;(2)数据相关松弛,力求避免最差约束,即当循环有多条路径时,有些相关在循环执行中并不一定有实例,理想的策略是仅当它有实例时才遵守.初步实验和定性分析表明,此  相似文献   

3.
计算机图形学中裁剪算法的效率会直接影响到图形计算、处理、显示的速度和用户体验.本文在Java环境下,围绕降低循环中冗余计算、重用对象、多线程控制等方面,对传统的矢量图裁剪算法进行改进.改进后的算法在执行效率和内存消耗上均优于传统的矢量图裁剪算法.该算法的应用可以有效的提高工程图形的生成效率.  相似文献   

4.
刘金硕  黄朔  邓娟 《计算机工程》2022,48(12):16-23
当使用高分辨率的图像作为图像处理算法的输入时会降低算法运行速度,将算法并行化可提升执行效率,但手动将串行程序转换为并行程序则较为繁琐,并且现有自动并行翻译工具性能不稳定,同时翻译后的程序是单一并行模式。面向基于面片的三维多视角立体视觉(PMVS)算法,提出一种从C到CUDA的自动两级并行翻译方法。使用ANTLR自动解析源C代码,通过分析数据依赖关系和循环数组私有化来识别可并行化的循环结构,将算法翻译成CPU多线程和GPU两级并行结构的代码。在算法执行过程中,将输入图像在CPU和GPU上分别进行处理,降低了算法总执行时间。实验结果表明,该方法的计算加速比随着输入图像分辨率的增加逐渐提高,最高约达到32,相比于PPCG和OpenACC自动并行翻译方法提升明显。  相似文献   

5.
6.
Supernode transformation is a technique to decrease the communication overhead by partitioning and scheduling a loop nest to a multi-processor system. This is achieved by grouping a number of iterations in a perfectly nested loop with regular dependences as a $supernode$ . Previous work has been focusing on finding the optimal supernode size and shape as well as an optimal execution schedule for multi-processor systems with unbounded resources. This paper emphasizes on the actual implementation strategies of supernode transformations on multi-core systems with limited resources. Using an example, the longest common subsequence (LCS) problem, we present and compare three different multithreading implementations. A formula for the total execution time of each method is presented. The techniques are benchmarked on a 12-core and a 4-core machine. On the 12-core machine our first technique, which yields increased data locality, speeds up the unaltered sequential loop nest 16.7 times. Combining this technique with skewing the loop by changing the linear schedule scores a 42.6 speedup. A more sophisticated method that executes entire rows of the loop nest in one thread scores a 59.5 speedup. Concepts presented and discussed in this paper on the LCS problem serve as basic foundation for implementations at regular dependence algorithms.  相似文献   

7.
迁移工作流是一种多执行主体并发协同工作的工作流范型,其自组织、自协调特性决定了迁移实例的执行状态容易受到外部环境和内部执行状态的影响而发生异常。对异常状态的检测以及发生故障后的恢复是保障其可靠性的关键。然而目前已有的监控处理机制不能有效支持对此问题的处理,因此,提出一种基于层次监控模型的迁移实例失败的协调恢复机制,实现对多迁移实例执行状态的监控以及执行失败后实现状态的一致性恢复。初步的实验表明,该机制能有效监控具有并发协同行为的执行主体,并保障恢复的一致性。  相似文献   

8.
The parallel processing system -Harray- for scientific computations is introduced. The special features of the -Harray- system described are (1) the Controlled Dataflow (CD flow) mechanism, (2) the preceding activation scheme with graph unfolding, and (3) the visual environment for dataflow program development.The CD flow mechanism, controlling the sequence of execution in two levels—dataflow execution in each processor and control flow execution between processors—is adapted in the -Harray- system. Though dataflow computers are expected to extract parallelism fully from a program, they have many problems, such as the difficulty of controlling the sequence of execution. To solve these problems, the CD flow mechanism is adopted.The preceding activation scheme makes it possible to bypass control dependencies in a program, such as IF-GOTO statements which decrease the parallelism in a program. The flow graph of a program is unfolded to decrease the control dependency and to increase the parallelism.The visual environment helps programmers in the writing and debugging of a dataflow program. The environment consists of a graphical editor of a dataflow graph, and a debugger.These special features of the -Harray- system and its execution mechanism are described.  相似文献   

9.
循环倾斜是程序优化中一种循环变换的手段,它改变空间迭代形式,将循环存在的跨迭代的并行用传统的并行标识出来,使得循环可以并行执行。但是循环倾斜后,并行执行的数据在内存中是离散的,而且每次迭代执行的次数是不一致的。为了更有效地利用SIMD,本文提出一种基于全局数据重组的循环倾斜优化方法。首先分析循环倾斜优化,针对数据离散的问题实现全局数据重组,改善数据局部性,循环易于向量化操作;针对迭代执行次数不一致问题,实现非满载向量操作,使尾循环得以向量执行。最后选择wavefront程序进行测试,优化后,程序计算可以获得平均10.73倍的加速效果。  相似文献   

10.
摘 要: 多维分类根据数据实例的特征向量将数据实例在多个维度上进行分类,具有广泛的应用前景。在多维分类算法的模型学习过程中,海量的训练数据使得准确的分类算法需要很长的模型训练时间。为了提高多维分类的执行效率,同时保持高的预测准确性,本文提出了一种基于贝叶斯网络的多维分类学习方法。首先,将多维分类问题描述为条件概率分布问题。其次,根据类别向量之间的依赖关系建立了条件树贝叶斯网络模型。最后,根据训练数据集对条件树贝叶斯网络模型的结构和参数进行学习,并提出了一种多维分类预测算法。大量的真实数据集实验表明,本文提出的方法与当前最好的多维分类算法MMOC相比,在保持高准确性的同时将模型的训练时间降低了两个数量级。因此,本文提出的方法更适用于海量数据的多维分类应用中。  相似文献   

11.
李斌  周清雷 《计算机应用研究》2013,30(11):3418-3420
针对软件水印鲁棒性差、水印分存算法执行效率低的问题, 提出了一种基于混沌优化的分存软件水印方案。该方案通过引入混沌系统, 将水印信息矩阵分割、混沌置乱, 形成分存水印; 水印嵌入时, 将分存水印一一编码为DPPCT拓扑图, 并将hash处理后的水印信息分别填充于各个DPPCT的info域; 水印嵌入后, 利用混沌加密, 保护全部代码, 防止逆向工程等手段对软件水印的破坏。理论分析和实验表明, 该方案可有效地抵抗各种语义保持变换攻击, 减少程序负载, 提高水印的鲁棒性及执行效率。  相似文献   

12.
张磊  陶彬贤  钱巨 《计算机科学》2013,40(1):139-143
指针的动态性使得程序分析中一个指针变量往往被认为有多个可能的指向目标,构成多个指向关系。现有的依赖图构建方法虽然较全面地考虑了指针的多指向性,但并未考虑指向关系之间的可组合性,因此精度上仍存在许多不足。为此,提出了一种利用无效指向组合优化依赖图构建的方法,新方法可以排除现有方法所不能识别的伪依赖,从而有效地提高依赖图的构建精度。  相似文献   

13.
将软件进行多线程改进,可以解决软件并行性问题,能够显著提升软件的运行效率。但如果软件改进的方法不当很容易造成系统不稳定。该丈简要介绍了线程与进程的特点与差异,对在Linux操作系统环境下软件多线程与多进程的执行效率进行了对比,分析了产生这种执行效率差异的原因以及多线程与多进程技术应用在软件各方面改进时的优劣,并提出了实施软件改进的策略与实现方法。  相似文献   

14.
工作流程中的结构冲突会导致工作流管理系统无法正常运行,因此需要在工作流程付诸实施之前验证工作流的正确性,检测出其中的冲突。目前,尚无一个完美的算法既能检测出无环工作流图的冲突,也能检测出带环工作流图中的结构冲突,为此提出了一种新的基于图搜索的工作流图验证算法,利用巧妙方法将流程图中的环转化为无环流程子图,高效地检测出了有环工作流图和无环工作流图的结构冲突。  相似文献   

15.
Data dependence and its application to parallel processing   总被引:8,自引:0,他引:8  
Data dependence testing is required to detect parallelism in programs. In this paper data dependence concepts and data dependence direction vectors are reviewed. Data dependence computation in parallel and vector constructs as well as serialdo loops is covered. Several transformations that require data dependence are given as examples, such as vectorization (translating serial code into vector code), concurrentization (translating serial code into concurrent code for multiprocessors), scalarization (translating vector or concurrent code into serial code for a scalar uniprocessor), loop interchanging and loop fusion. The details of data dependence testing including several data dependence decision algorithms are given.  相似文献   

16.
随着生产工艺的提高,芯片上能集成越来越多的晶体管,多线程技术也逐步成为一种主流的处理器体系结构技术.提出一种融合同时多线程技术和微线程技术的新型体系结构同时多微线程(simultaneous multi-microthreading,SMMT),并给出同时多微线程体系结构的实现方案.SMMT有效结合同时多线程技术硬件代价小和微线程技术能够加速单进程应用的优点,通过软硬件协同的方式充分挖掘单进程程序的微线程级并行性.通过在设计的龙芯2号同时多微线程处理器上进行性能评测,结果表明,同时多微线程体系结构能够有效地加速单进程的程序,以很小的硬件代价显著地提高了处理器的性能.  相似文献   

17.
This paper describes several loop transformation techniques for extracting parallelism from nested loop structures. Nested loops can then be scheduled to run in parallel so that execution time is minimized. One technique is called selective cycle shrinking, and the other is called true dependence cycle shrinking. It is shown how selective shrinking is related to linear scheduling of nested loops and how true dependence shrinking is related to conflict-free mappings of higher dimensional algorithms into lower dimensional processor arrays. Methods are proposed in this paper to find the selective and true dependence shrinkings with minimum total execution time by applying the techniques of finding optimal linear schedules and optimal and conflict-free mappings proposed by W. Shang and A.B. Fortes  相似文献   

18.
Current trend of research on multithreading processors is toward the chip multithreading (CMT), which exploits thread level parallelism (TLP) and improves performance of softwares built on traditional threading components, e.g., Pthread. There exist commercially available processors that support simultaneous multithreading (SMT) on multicore processors. But they are basically based on the conventional sequential execution model, and execute multiple threads in parallel under the control of OS that handles interruptions. Moreover, there exist few languages or programming techniques to utilize the multicore processors effectively. We are taking another approach to develop a multithreading processor, which is dedicated to TLP. Our processor, named Fuce, is based on the continuation-based multithreading. A thread is defined as a block of sequentially ordered instructions which are executed without interruption. Every thread execution is triggered only by the event called continuation. This paper first introduces the continuation-based multithread execution model and its processor architecture then gives multithreaded programming techniques and the continuation-based multithreading language system CML. Last, the performance of the Fuce processor is evaluated by means of the clock-level software simulation.  相似文献   

19.
The design of specialized processing array architectures, capable of executing any given arbitrary algorithm, is proposed. An approach is adopted in which the algorithm is first represented in the form of a dataflow graph and then mapped onto the specialized processor array. The processors in this array execute the operations included in the corresponding nodes (or subsets of nodes) of the dataflow graph, while regular interconnections of these elements serve as edges of the graph. To speed up the execution, the proposed array allows the generation of computation fronts and their cancellation at a later time, depending on the arriving data operands; thus it is called a data-driven array. The structure of the basic cell and its programming are examined. Some design details are presented for two selected blocks, the instruction memory and the flag array. A scheme for mapping a dataflow graph (program) onto a hexagonally connected array is described and analyzed. Two distinct performance measures-mapping efficiency and array utilization-and some performance results are discussed  相似文献   

20.
基于最佳并行度的任务依赖图调度   总被引:4,自引:0,他引:4  
杜建成  黄皓  陈道蓄  谢立 《软件学报》1999,10(10):1038-1046
基于最佳并行度的任务依赖图调度策略充分利用编译时刻所得到的全局信息,采用横向和纵向任务合并,处理节点预分配,静态调度和动态调度相结合、集中式调度和分层调度相结合等措施,是一种简单的、具有较高效率的实用化调度方案.该调度方案能够在尽量压缩调度长度的情况下节约系统资源.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号