期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Finding Optimal Ordering of Sparse Matrices for Column-Oriented Parallel Cholesky Factorization

Lin Wen-Yang 《The Journal of supercomputing》2003,24(3):259-277

In this paper, we consider the problem of finding fill-preserving sparse matrix orderings for parallel factorization. That is, given a large sparse symmetric and positive definite matrix A that has been ordered by some fill-reducing ordering, we want to determine a reordering that is appropriate in terms of preserving the sparsity and minimizing the cost to perform the Cholesky factorization in parallel. Past researches on this problem all are based on the elimination tree model, in which each node represents the task for factoring a column, and thus, can be seen as a coarse-grained task dependence model. To exploit more parallelism, Joseph Liu proposed a medium-grained task model, called the column task graph, and showed that it is amenable to the shared-memory supercomputers. Based on the column task graph, we devise a greedy reordering algorithm, and show that our algorithm can find the optimal ordering among the class of all fill-preserving orderings of the given sparse matrix A. 相似文献

2.

Scheduling loops with partial loop-carried dependencies

Frdric Brgier Marie-Christine Counilh Jean Roman 《Parallel Computing》2000,26(13-14)

This paper deals with task scheduling, where each task is one particular iteration of a DO loop with partial loop-carried dependencies. Independent iterations of such loops can be scheduled in an order different from the one of classical serial execution, so as to increase program performance.The approach that we present is based both on the use of a directive added to the High Performance Fortran (HPF2) language, which specifies the dependencies between iterations, and on inspector/executor support, implemented in the CoLUMBO library, which builds the task graph and schedules tasks associated with iterations. We validate our approach by showing results achieved on an IBM SP2 for a sparse Cholesky factorization algorithm applied to real problems. 相似文献

3.

Supernodal sparse Cholesky factorization on graphics processing units

Dan Zou Yong Dou Song Guo Rongchun Li Lin Deng 《Concurrency and Computation》2014,26(16):2713-2726

Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

4.

同构计算环境中一种快速有效的静态任务调度算法 总被引：9，自引：1，他引：9

李庆华韩建军 Abbas A.Essa 《计算机研究与发展》2005,42(1):118-125

快速有效的调度任务是多处理器计算环境中的一个关键问题．目前任务调度算法中刻画任务依赖关系最流行的模型是DAG,在以前的文献中,提出了一种新的更实际、更普遍的TTIG模型及其相应的MATE算法(基于同构计算环境)．延伸了TTIG模型,并提出基于同构系统的新的算法及两种启发式方法(GBHA1和GBHA2)．GBHA以组的形式尽量消除图中回路,因而能获得任务图的全局信息,具有更好的调度性能．在模拟实验中,将此算法与MATE和其他同构环境中基于DAG的有效调度算法,在不同测试条件下进行了比较,结果显示GBHA在性能上明显优于MATE,与基于DAG模型的调度算法比较而言,在性能方面各有千秋,但在算法时间复杂度方面具有显著的优势．相似文献

5.

基于任务-资源分配图优化选取的网格依赖任务调度 总被引：3，自引：0，他引：3

陈廷伟张斌郝宪文《计算机研究与发展》2007,44(10):1741-1750

任务调度是网格应用系统获得高性能的关键.网格计算中一个大型的应用程序往往被分解为具有依赖关系的多个任务.在资源个体差异较大、广域互连的网格环境下任务间的依赖关系对传统的调度策略提出了新的挑战.任务调度的主要工作是为任务分配资源以及确定任务的执行次序,将依赖任务的可能的资源分配方案表示为任务-资源分配图（T-RAG）,在该图的基础上提出了基于T-RAG优化选取的依赖任务调度模型,将依赖任务调度问题转化为图的优化选取问题,解析最优任务-资源分配图可以同时确定资源分配方案和任务的执行次序即为最优调度方案.最后,实现了基于该模型的任务调度算法,该算法与ILHA算法的对比分析表明,在资源差异较大及任务间存在大量数据传输的情况下所提出的算法更优. 相似文献

6.

云中多媒体应用中基于混合DAG的最优任务调度研究

郭雅琼宋建新《计算机科学》2015,42(Z11):413-416

云计算的平台优势使得它在多媒体应用中得到广泛使用。由于多媒体服务的多样性和异构性,如何将多媒体任务有效地调度至虚拟机进行处理成为当前多媒体应用的研究重点。对此,研究了云中多媒体最优任务调度问题,首先引入有向无环图来模拟任务中的优先级及任务之间的依赖性,分别对串行、并行、混合结构任务调度模型进行任务调度研究,根据有限资源成本将关键路径中任务节点融合,提出一种实用的启发式近似最优调度方法。实验结果表明,所提调度方法能够以最短的执行时间在有限的资源成本下完成最优的任务分配。相似文献

7.

智能建筑室内环境分布式可计算WSN任务调度研究

高治军王洪玉王鑫韩忠华《传感技术学报》2014,27(3):378-382

针对智能建筑室内环境下并行计算的动态任务调度问题,构建了基于分布式CPS思想的无线传感器网络(WSN)模型,并分别设计了基于可计算复杂性的任务分配策略和基于动态调度算法的任务调度策略。通过先将任务分配成若干个子任务,采用多带图灵机输入任务,由合适的计算节点进行计算,形成有向无环图,再按调度优先级排列任务,形成任务调度序列表,依序处理任务,从而达到了将任务分配、调度和执行相结合的目的。实验结果表明该策略可有效减少智能建筑室内环境分布式可计算WSN分布运行时任务之间的通讯时间和等待时间,同时提高了任务调度的成功率,最终优化系统的运行效率。相似文献

8.

A scalable approach to solving dense linear algebra problems on hybrid CPU‐GPU systems

Fengguang Song Jack Dongarra 《Concurrency and Computation》2015,27(14):3702-3723

Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU‐GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasks without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double‐precision Cholesky factorization and QR factorization. Our approach demonstrates a performance comparable to Intel MKL on shared‐memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared‐memory systems with multiple GPUs. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

9.

基于资源预测的网格任务调度模型 总被引：1，自引：0，他引：1

程宏兵《计算机应用》2010,30(9):2530-2534

跨越虚拟组织中多个域(或集群)的网格任务调度由于资源的不确定性（如动态性和异构性）而成为网格应用中亟待解决的问题。提出了一种有效的基于资源预测的网格任务调度模型——RPTS,该模型利用加权最小二乘方法进行参数估计的自回归滑动平均(ARMA)预测方法对网格环境下的主机负载进行预测。利用上述资源预测结果和一类数据并行性网格任务的建模结果,对它们进行预处理、匹配并调度执行。RPTS充分考虑了网格环境下资源的动态性和异构性,为解决网格环境下任务调度问题提供了一种较好的方法。与其他一些网格任务调度方法进行了一系列的仿真实验,结果表明RPTS模型具有任务执行时间最短和稳定性较好的特点。相似文献

10.

一种基于两级DAG模型的MapReduce工作流异构调度算法

王宇新王飞王冠郭禾《计算机工程与科学》2019,41(8):1353-1359

MapReduce编程模型被广泛应用于大数据处理平台,而一个有效的任务调度算法对模型的运行效率至关重要。将MapReduce工作流的Map和Reduce阶段分别拆解为若干个有先后序限定关系的作业,每个作业再拆解为多个任务。之后基于计算集群的可用资源和任务异构性,构建面向作业和任务的2级有向无环图(DAG)模型,同时提出基于2级优先级排序的异构调度算法2-MRHS。算法的第1阶段进行优先级排序,即对作业和任务分别进行优先权值计算,再汇总得到任务的调度队列;第2阶段进行任务分配,即基于最快完成时间将每个任务所包含的数据块子任务分配给最适合的计算结点。采用大批量随机生成的DAG模型进行实验,结果表明与其他相关算法相比,本文算法有更短的调度长度(makespan)且更加稳定。相似文献

11.

网格原子化作业的二分图调度方法

李建勋郭建华李维乾曹茂生《计算机科学》2015,42(3):233-236, 251

对于网格系统中计算力调度等问题,结合有向无环作业图DATG和无向节点图UNG,采用并行集APS建立了一种基于二分图的网格调度算法BGS,并在惩罚策略、负载均衡、复活机制的引导下,使系统的调度动态地逐步趋向优化.实验结果表明:该算法能够更加适应网格资源的变化,降低作业负载,提高作业的并行化程度,并能根据系统负载合理地利用节点资源. 相似文献

12.

Contention-aware scheduling with task duplication 总被引：1，自引：0，他引：1

Oliver Sinnen^{Author Vitae} Andrea ToAuthor VitaeManpreet KaurAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(1):77-86

Finding an efficient schedule for a task graph on several processors is a trade-off between maximising concurrency and minimising interprocessor communication. Task duplication is a technique that has been employed to reduce or avoid interprocessor communication. Certain tasks are duplicated on several processors to produce the data locally and avoid the communication among processors. Most of the algorithms using task duplication have been proposed for the classic scheduling model, which allows concurrent communication and ignores contention for communication resources. It is increasingly recognised that this classic model is unrealistic and does not permit creating accurate and efficient schedules. The recently proposed contention model introduces contention awareness into task scheduling by assigning the edges of the task graph to the links of the communication network. It is intuitive that scheduling under such a model benefits even more from task duplication, yet no such algorithm has been proposed as it is not trivial to duplicate tasks under the contention model. This paper proposes a contention-aware task duplication scheduling algorithm. We investigate the fundamentals for task duplication in the contention model and propose an algorithm that is based on state-of-the-art techniques found in task duplication and contention-aware algorithms. An extensive experimental evaluation demonstrates the significant improvements to the speedup of the produced schedules. 相似文献

13.

Highly scalable parallel algorithms for sparse matrix factorization

Gupta A. Karypis G. Kumar V. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):502-520

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer 相似文献

14.

一种满足可靠性和能效的云工作流调度方法

殷越隋丽娜《计算机应用与软件》2021,38(1):13-20,62

为了同步解决云工作流调度时的失效和高能耗问题,提出一种基于可靠性和能效的工作流调度算法.算法为了在截止时间的QoS约束下最大化系统可靠性并最小化调度能耗,将工作流调度过程划分为四个阶段:计算任务优先级、工作流任务聚簇、截止时间子分配和任务调度.算法在满足执行次序的情况下对任务进行拓扑排序,并以通信代价最小为目标对任务进... 相似文献

15.

基于图染色理论和遗传蜂群算法的并行测试任务调度

吴勇王雪赵焕义《计算机应用》2015,35(5):1280-1283

针对并行测试中任务优化调度这一关键性问题,提出了一种图染色理论和遗传蜂群算法相结合的任务调度优化算法.首先,建立了基于图染色理论的并行测试任务关系模型,用图来描述测试任务占用仪器资源的情况;然后, 在测试任务关系模型的基础上,将遗传算法特有的交叉、变异操作与人工蜂群(ABC)算法相结合搜索最优解,能够有效避免算法早熟并且加速算法收敛;最终得到并行度最大的任务分组方案.经仿真验证,所提方法能有效地实现并行测试,提高自动测试系统的测试效率. 相似文献

16.

混合蚁群优化算法求解卫星数传调度问题

孙兵陈祥国《计算机应用研究》2012,29(11):4064-4068

为了求解卫星数传调度问题,提出了混合蚁群优化算法。算法设计了基于任务数传操作的解构造图,提出了基于解构造图的任务调度序列和资源分配序列概率决策模型,采用基于随机加权的混合策略综合利用问题的启发式信息。算法通过基于混沌变异的列信息素向量更新策略增强解构造的多样性,通过具有补偿机制的全局信息素更新策略来保证算法的收敛性。利用STK工具设计了五个调度场景,并利用计算机生成各场景的数传任务。仿真实验结果表明,该算法是可行、有效的,收敛性和解多样性较好。相似文献

17.

A two-phase scheduling method with the consideration of task clustering for earth observing satellites

Guohua Wu Jin LiuManhao Ma Dishan Qiu 《Computers & Operations Research》2013

Satellite observation scheduling plays a significant role in improving the efficiency of satellite observation systems. Although extensive scheduling algorithms have been proposed for the satellite observation scheduling problem (SOSP), the task clustering strategy has not been taken into account up to now. This paper presents a novel two-phase based scheduling method with the consideration of task clustering for solving SOSP. This method comprises two phases: a task clustering phase and a task scheduling phase. In the task clustering phase, we construct a task clustering graph model and use an improved minimum clique partition algorithm to obtain cluster-tasks. In the task scheduling phase, based on overall tasks and obtained cluster-tasks, we construct an acyclic directed graph model and utilize a hybrid ant colony optimization coming with a mechanism of local search, called ACO–LS, to produce optimal or near optimal schedules. Extensive experimental simulations demonstrate the efficiency of the proposed scheduling method. 相似文献

18.

嵌入式并行系统中基于任务优化的调度算法

下载免费PDF全文

刘勇尹增山杨根庆《计算机工程》2008,34(2):11-13

在嵌入式并行计算系统中,任务调度是决定系统性能的关键。多任务调度中,启发式调度法是一种设计简单且性能良好的调度方法。目前的调度算法大多是基于任务复制的,没有充分考虑前驱任务与其后继任务间的相关性。该文提出了一种基于相关任务优化(DTO)的调度算法,通过分析已用处理机的负载和空闲时间,尽量减少系统的调度长度和处理机数目。算法分析结果表明,DTO算法在性能上优于其他算法,对嵌入式并行计算系统中的多任务调度是一个较好的选择。相似文献

19.

异构计算环境下基于优先队列划分的调度算法

王兰张龙信满君丰周立前李肯立《小型微型计算机系统》2020,(2):303-309

人工智能的飞速发展对高性能计算提出了更高的要求,异构计算环境下任务调度问题一直是高性能计算中的关键问题.本文提出一种基于优先队列划分的调度算法(PQDSA),该算法根据DAG(有向无循环图)任务集的入口节点数量确定优先队列数,通过任务的通信开销和计算开销划分任务队列,进而将关键节点任务分配给合适的队列,以产生效果较佳的任务调度队列,从而提高任务间的并行性,降低任务集的完工时间.与此同时,进一步基于插入策略将任务调度到处理器上,使任务调度更加高效地执行.PQDSA算法可以减少任务间的时间消耗,提高处理器的调度效率.通过与两个经典算法的性能对比,实验结果表明本文提出的PQDSA算法在任务完工时间和调度效率方面都要明显优于对比的算法. 相似文献

20.

优化并行图重写计算粒度的编译时部分调度策略

田新民王鼎兴《计算机学报》1992,15(11):838-847

并行图重写计算的有效实现需要压缩重写任务的频繁生成、切换和同步开销.为此本文提出了一种编译时重写粒度优化技术——编译时部分调度.其核心思想是基于对重写结点的全序性质和执行语义的分析,编译时构造保持原有执行语义的粗粒度顺序重写体。在本文定义的形式框架下,我们建立了编译时部分调度的安全条件,并给出了严格的证明.实验研究结果表明编译时部分调度能有效地增大重写粒度,重写任务数压缩了30—60％,并且计算的安全性得到了保证. 相似文献