期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘雷李晶陈莉冯晓兵《计算机工程》2014,(3):99-102,112

投机并行化是解决遗留串行代码并行化的重要技术,但以往投机并行化运行时系统面临着诸多的性能问题,如任务分配不均衡、通信频繁、冲突代价高,以及进程启动,结柬频繁而导致开销过高等。为此,提出一种基于进程实现的投机并行化运行时系统。采用隐式单程序多数据的并行任务划分和执行模式。通过实现重甩进程的投机任务调度策略和委托正确性检查技术,降低投机进程启动/结束和通信的开销,提高投机进程的利用率,同时利用守护进程与投机进程协同执行的方式,确保在投机进程出现异常情况时程序也能正确执行。实验结果表明,该基于进程实现的投机运行时系统比同类型系统的性能提高231%。相似文献

2.

The Static Parallelization of Loops and Recursions

Lengauer Christian Gorlatch Sergei Herrmann Christoph 《The Journal of supercomputing》1997,11(4):333-353

We demonstrate approaches to the static parallelization of loops and recursions on the example of the polynomial product. Phrased as a loop nest, the polynomial product can be parallelized automatically by applying a space-time mapping technique based on linear algebra and linear programming. One can choose a parallel program that is optimal with respect to some objective function like the number of execution steps, processors, channels, etc. However,at best,linear execution time complexity can be atained. Through phrasing the polynomial product as a divide-and-conquer recursion, one can obtain a parallel program with sublinear execution time. In this case, the target program is not derived by an automatic search but given as a program skeleton, which can be deduced by a sequence of equational program transformations. We discuss the use of such skeletons, compare and assess the models in which loops and divide-and-conquer resursions are parallelized and comment on the performance properties of the resulting parallel implementations. 相似文献

3.

并行化编译器中基于工作量的条件并行化研究

侯永生赵荣彩张平韩枫《微计算机信息》2005,21(4):220-221

并行化编译器通过发掘串行程序中的并行性来提高程序的运行性能。但当可并行的工作量与并行的线程数目之比较小时，有可能采用并行执行反而会降低程序的整体性能。本文工作基于SUIF结构．研究精确的工作量计算方法，并实现了基于工作量的条件并行化技术．有效地提高了并行程序的执行性能。相似文献

4.

Nested Parallelization with OpenMP

Dieter an Mey Samuel Sarholz Christian Terboven 《International journal of parallel programming》2007,35(5):459-476

OpenMP is widely accepted as a de facto standard for shared memory parallel programming in Fortran, C and C++. Nested parallelization has been included in the first OpenMP specification, but it took a few years until the first commercially available compilers supported this optional part of the specification. We employed nested parallelization using OpenMP in three production codes: a C++ code for content-based image retrieval, a C++ code for the computation of critical points in multi-block CFD datasets, and a multi-block Navier-Stokes solver written in Fortran90. In this paper we discuss the opportunities as well as the deficiencies of the nested parallelization support in OpenMP. 相似文献

5.

Automatic Parallelization of Recursive Procedures

Manish Gupta Sayak Mukhopadhyay Navin Sinha 《International journal of parallel programming》2000,28(6):537-562

Parallelizing compilers have traditionally focussed mainly on parallelizing loops. This paper presents a new framework for automatically parallelizing recursive procedures that typically appear in divide-and-conquer algorithms. We present compile-time analysis, using powerful, symbolic array section analysis, to detect the independence of multiple recursive calls in a procedure. This allows exploitation of a scalable form of nested parallelism, where each parallel task can further spawn off parallel work in subsequent recursive calls. We describe a runtime system which efficiently supports this kind of nested parallelism without unnecessarily blocking tasks. We have implemented this framework in a parallelizing compiler, which is able to automatically parallelize programs like quicksort and mergesort, written in C. For cases where even the advanced compile-time analysis we describe is not able to prove the independence of procedure calls, we propose novel techniques for speculative runtime parallelization, which are more efficient and powerful in this context than analogous techniques proposed previously for speculatively parallelizing loops. Our experimental results on an IBM G30 SMP machine show good speedups obtained by following our approach. 相似文献

6.

Run-Time Support for the Automatic Parallelization of Java Programs

Bryan Chan Tarek S. Abdelrahman 《The Journal of supercomputing》2004,28(1):91-117

相似文献

7.

Smith-Waterman算法的若干优化及并行实现

周澄郁松年《计算机工程与应用》2003,39(23):89-91

Smith-Waterman算法是目前被使用最广泛的序列相似性比较算法之一,它适用于寻找局部相似序列对。该算法精确度较高,一直沿用到现在。目前,使Smith-Waterman算法提速,寻找该算法的优化方法,是世界各地的科学家们正花费大量心血研究的课题。该文从算法并行化着手,充分利用近期蓬勃发展的高性能计算机系统,提出了若干Smith-Waterman算法的优化思想,并在cluster机上实现。相似文献

8.

基于多核处理器的三维场景并行化探讨

吴玮欣陆达《计算机与现代化》2010,(3):15-18

提高三维场景的运行速度一直以来都是程序开发人员需要面临的一大难题,随着面向主流应用的多核处理器的出现与普及,利用处理器提供的多个内核而不通过编写多线程的方法来提高程序的并行性成为了一种可能。本文介绍虚拟现实开发工具OpenGL和共享存储系统并行编程接口OpenMP;分析OpenGL绘制三维场景的一般过程;并以纹理映射为例着重探讨在OpenGL程序中使用OpenMP来提高程序并行性的方法。相似文献

9.

基于Matlab的遗留系统并行化重构方法

樊峰峰张延园林奕《计算机与现代化》2012,(5):23-26

随着CPU多核架构的普及,应用的复杂和数据集的膨胀,基于Matlab的遗留系统中的串行程序代码无法充分发挥系统潜在的性能优势,无力应对当前大型数据集的处理应用需求。Matlab的并行计算模型为数据密集型的处理任务提供了并行支持。本文首先从系统架构扩展和业务代码并行化入手,分析遗留系统并行化重构过程要点和方法,应用案例的并行化重构实验数据表明了系统重构处理大型数据集的性能提升。相似文献

10.

基于TBB的二维DCT并行化设计

陈荣鑫杨岳斌《微计算机应用》2011,32(11)

线程构造块(TBB)能简化并行化设计,支持高效地实现多核并行功能.给出面向多核计算的二维DCT的并行化方法,并利用TBB平台实现;针对高耗时的余弦计算,利用查表和分块计算措施进行优化,并探讨粒度设置方法.在多核环境中的实验结果表明,优化后的并行化方法能有效改善执行性能,获得较好的加速比,且具备可扩展性. 相似文献

11.

基于Parallel studio的视频编解码并行化优化

杨川杨斌李刚李杰《微计算机应用》2010,31(3)

随着计算机技术的不断发展,人们对多媒体技术的实时性有了更高的要求,特别是视频编解码的时间效率.另外,随着多核CPU及相关技术的不断普及,使得原有非并行化程序的性能的不足显现了出来,因此对传统程序的并行化迫在眉睫.本文以目前较流行的视频编解码算法h.263为例,通过一个具体的视频会议系统,分析传统串行编解码算法的性能,通过英特尔Parallel studio并行化分析工具,找到算法的运行瓶颈,然后用英特尔线程构建模块对编解码算法进行并行化优化,取得了良好的效果. 相似文献

12.

LU分解在Godson-Tvl众核体系结构上的半行化研究

龙国平范东睿《计算机学报》2009,32(11)

随着集成电路工艺的发展,众核体系结构成为人们日益关注的计算平台.LU分解是科学和工程计算中被广泛使用的核心算法之一,尽管在传统的并行体系结构上已有大量的并行化研究工作,但是结合新犁众核体系结构特征的工作还不多.文章从负载均衡、延迟容忍和性能分析模型3个方面系统研究了LU分解在众核体系结构上的并行化问题.该文的贡献在于:首先,针对二维卷帘负载分配方案难以达到良好负载均衡的缺点,提出一种新的"之"字形分配方案,实验表明不经任何优化的情况下性能比前者提高20%,优化后达到了40%;其次,提出了一个性能加速比的分析模型,并用实验定量研究了实测性能加速比和理论值之间的差距,发现在合理利用片上存储优化访存延迟,并恰当选择矩阵分块参数的情况下,实测加速效果能比较接近理论值;通过实验还证明实测性能难以达到理论预测值的两个主要原因:访存带宽有限和片上网络的资源竞争. 相似文献

13.

基于实例学习的并行负荷分配中的训练实例选择问题

龙舜林永听王会进《计算机研究与发展》2008,45(Z1):228-232

基于实例学习的可适应性并行任务负荷分配算法能根据应用程序的静态特征估计其运算负荷,选定好的任务负荷分配方案使其多线程并行接近甚至达到最优,它具有低成本和高效率的特点.通过一系列实验,分析研究训练实例的选择对基于实例学习优化的效果的影响,从中总结一些有益的经验,以便进一步提高算法性能. 相似文献

14.

Unique Sets Oriented Parallelization of Loops with Non-uniform Dependences

Ju J.; Chaudhary V. 《Computer Journal》1997,40(6):322-339

相似文献

15.

光线跟踪程序PBRT的并行化及性能优化

付雄 ;王汝传《微机发展》2008,(10):5-8

随着多核处理器的出现和迅速发展,将以前经典的串行程序并行化,更好地利用多核体系结构提高其性能,成为了当前多核处理器应用研究值得关注的一个问题。以并行化光线跟踪程序PBRT为例,深入研究了串行程序并行化中的并行模型的设计与实现、正确性验证,以及并行化后的性能优化等问题。优化后的并行PBRT取得了4个线程时近3．5倍的加速比,证明了所给出的并行化及性能优化有良好的效果。相似文献

16.

Transparent Speculative Parallelization of Discrete Event Simulation Applications Using Global Variables

Alessandro?Pellegrini Sebastiano?Peluso Francesco?Quaglia Email author Roberto?Vitali 《International journal of parallel programming》2016,44(6):1200-1247

Parallelizing (compute-intensive) discrete event simulation (DES) applications is a classical approach for speeding up their execution and for making very large/complex simulation models tractable. This has been historically achieved via parallel DES (PDES) techniques, which are based on partitioning the simulation model into distinct simulation objects (somehow resembling objects in classical object-oriented programming), whose states are disjoint, which are executed concurrently and rely on explicit event-exchange (or event-scheduling) primitives as the means to support mutual dependencies and notification of their state updates. With this approach, the application developer is necessarily forced to reason about state separation across the objects, thus being not allowed to rely on shared information, such as global variables, within the application code. This implicitly leads to the shift of the user-exposed programming model to one where sequential-style global variable accesses within the application code are not allowed. In this article we remove this limitation by providing support for managing global variables in the context of DES code developed in ANSI-C, which gets automatically parallelized. Particularly, we focus on speculative (also termed optimistic) PDES systems that run on top of multi-core machines, where simulation objects can concurrently process their events with no guarantee of causal consistency and actual violations of causality rules are recovered through rollback/recovery schemes. In compliance with the nature of speculative processing, in our proposal global variables are transparently mapped to multi-versions, so as to avoid any form of safety predicate verification upon their updates. Consistency is ensured via the introduction of a new rollback/recovery scheme based on detecting global variables’ reads on non-correct versions. At the same time, efficiency in the execution is guaranteed by managing multi-version variables’ lists via non-blocking algorithms. Furthermore, the whole approach is fully transparent, being it based on automatized instrumentation of the application software (particularly ELF objects). Hence the programmer is exposed to the classical (and easy to code) sequential-style programming scheme while accessing any global variable. An experimental assessment of our proposal, based on a suite of case study applications, run on top of an off-the-shelf Linux machine equipped with 32 CPU-cores and 64 GB of RAM, is also presented. 相似文献

17.

一种优化的三序列比对算法及并行实现

王涛郁松年颜鹤《计算机工程与应用》2005,41(11):62-65,131

序列比对算法在许多不同的领域得到应用。当前,一个重要的应用就是比对大分子,例如DNA和蛋白质序列比对。许多情况,有必要比对三序列。DavidR.Powell就提出过一种使用线性空位罚分的优化的三序列比对算法。这个算法最早是由Ukkonen提出的,该算法基于简单打分的两序列比对。该文通过引入“检查点法”对其进行改进,并充分利用近期蓬勃发展的高性能计算技术,对算法并行化,且在cluster机上实现。相似文献

18.

Generation of Efficient Nested Loops from Polyhedra 总被引：1，自引：0，他引：1

Fabien Quilleré Sanjay Rajopadhye Doran Wilde 《International journal of parallel programming》2000,28(5):469-498

Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target space-time domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a trade-off between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have nonoptimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead. 相似文献

19.

生物序列拼装欧拉路径算法的Gamma描述及其并行化研究 总被引：1，自引：0，他引：1

廖文昭童维勤蔡立志《小型微型计算机系统》2004,25(4):707-711

序列拼装是生物基因测序的一个重要环节,也是生物信息学重要的研究内容．[2]中将Eulerian路径的方法应用于序列拼接,较好地解决传统序列拼装软件中存在的repeat问题,从而提高序列拼装的精度,但对于该方法的研究目前还只有串行化的实现,拼装速度不够理想．在本文中,我们采用了并行化Gamma模型形式化地描述了用于序列拼装的Eulerian方法,并给出了Gamma程序的并行化实现方案．相似文献

20.

CFD显式差分程序的自动并行技术研究

张江涛王正华车永刚《计算机工程》2002,28(7):102-103,247

对CFD显式计算程序的自动并行技术作了研究和探讨,从应用的角度出发,充分利用显式差分的特点和模拟手工并行的过程,集中研究了区域划分,相关性分析以及同步通信与优化等自动并行的核心技术。相似文献