首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 218 毫秒
1.
面积最大优先调度的预约回填算法   总被引:2,自引:1,他引:1  
传统backfilling算法是在先来先服务基础上,将小作业回填到空闲CPU,以提高CPU利用率。该算法偏向小作业,大作业也会因为长期等待出现饥饿现象。当空闲CPU数无法满足算法中小作业回填要求时,系统仍有部分CPU闲置,难以更好地提高CPU利用率。本文中提出的算法以作业所需CPU数及预估运行时间构成的二维面积作为优先调度的条件,引入二级优先级和预约算法消除大作业的饥饿现象,减少回填作业CPU数,相应增加预估运行时间,更好提高CPU利用率。实验证明,该算法比传统backfilling算法在保证用户公平性,缩短作业平均响应时间及CPU利用率方面有所提高。  相似文献   

2.
并行作业调度系统负责对高性能计算系统中作业队列的管理。其核心功能是在每次调度发生时,选择下一个被执行的作业。最简单的调度算法是先来先服务(FCFS)。但这种方法的缺点是资源利用率很低。解决这个问题,目前常用的算法有EASY Backfilling。但EASY算法也存在两个缺陷:要求用户估计作业运行时间和偏爱小作业。针对这两个问题,本文设计了一种新的调度方法:基于优先级的抢占式并行调度(Priority-based Preemptive Scheduling),并实现了两种算法的模拟系统,从性能和公平性两个角度对PPS算法和EASY算法进行了比较分析,表明了PPS算法的有效性。  相似文献   

3.
方程  王凤儒 《计算机应用》2005,25(B12):349-353
讨论了在分布式系统中多组作业的并行调度问题,提出了一种描述作业推进速度的指标——调度效率和一个新的并行调度算法(BCPSA)。以调度效率作为调度的依据,通过追求多组作业的均衡推进,来达到有效利用处理机时间的目的。同时利用静态压缩算法,来进一步压缩调度长度,提高处理机的利用率。实验表明该算法具有较短的调度长度和较高的处理机利用率。  相似文献   

4.
张硕  何发智  周毅  鄢小虎 《计算机应用》2016,36(12):3274-3279
基于统一计算设备架构(CUDA)对图形处理器(GPU)下的并行粒子群优化(PSO)算法作改进研究。根据CUDA的硬件体系结构特点,可知Block是串行执行的,线程束(Warp)才是流多处理器(SM)调度和执行的基本单位。为了充分利用Block中线程的并行性,提出基于自适应线程束的GPU并行PSO算法:将粒子的维度和线程相对应;利用GPU的Warp级并行,根据维度的不同自适应地将每个粒子与一个或多个Warp相对应;自适应地将一个或多个粒子与每个Block相对应。与已有的粗粒度并行方法(将每个粒子和线程相对应)以及细粒度并行方法(将每个粒子和Block相对应)进行了对比分析,实验结果表明,所提出的并行方法相对前两种并行方法,CPU加速比最多提高了40。  相似文献   

5.
传统的并行Join算法缺少必要的容错能力,且数据划分不均往往导致单个线程的阻塞成为整个任务执行的瓶颈。针对以上问题,分析内存连接的各个阶段对Join算法性能的影响,提出一种可利用MapReduce的动态机制,避免了传统并行连接算法的数据任务分派不均和容错问题。算法使用MapReduce编程框架,并通过封装分块标记减少MapReduce Join执行过程中标记和排序的计算开销,使算法性能显著提高。实验结果表明,该算法在共享内存体系结构下,性能上相比已有算法有显著改进。  相似文献   

6.
为提高XQuery语言的处理性能,针对XQuery并行实现中的任务调度问题,提出一种适用于共享内存多线程环境的调度算法。在一种新型调度策略的指导下,能够利用XQuery语言中存在任务并行性、数据并行性和流水线并行性的特点,提高程序并行执行效率;针对流水线并行执行方式,建立一种流水线局部并行自动机模型,通过利用流水线中各节拍之间的空闲等待时间,提高系统资源的利用率。通过实验验证了该算法的可行性和有效性。  相似文献   

7.
异构HPL(high-performance Linpack)效率的提高需要充分发挥加速部件和通用CPU计算能力,加速部件集成了更多的计算核心,负责主要的计算,通用CPU负责任务调度的同时也参与计算.在合理划分任务、平衡负载的前提下,优化CPU端计算性能对整体效率的提升尤为重要.针对具体平台体系结构特点对BLAS(basic linear algebra subprograms)函数进行优化往往可以更加充分地利用通用CPU计算能力,提高系统整体效率.BLIS(BLAS-like library instantiation software)算法库是开源的BLAS函数框架,具有易开发、易移植和模块化等优点.基于异构系统平台体系结构以及HPL算法特点,充分利用三级缓存、向量化指令和多线程并行等技术手段优化CPU端调用的各级BLAS函数,应用auto-tuning技术优化矩阵分块参数,从而形成了HygonBLIS算法库.与MKL相比,在异构环境下,HPL算法整体性能提高了11.8%.  相似文献   

8.
针对目前大多数并行Delaunay网格生成算法对共享内存结构利用不充分,不能够利用超级计算机多层次体系结构优势的情况,提出了一种充分利用共享内存结构的基于算法并行模式的并行Delaunay网格生成算法。通过对候选点集进行高效划分来实现插点操作的并行,增大了一次选择之后进行并行插点的点集规模。使用OpenMP并行模型对所提出算法进行并行实现,并和串行开源软件Triangle进行了对比。实验结果表明算法能够将候选点集划分成互不冲突的子集进行并行处理,在保证网格质量的同时具有较好的并行效率。  相似文献   

9.
李荣胜  赵文峰  徐惠民 《计算机应用》2010,30(10):2771-2773
在商业网格和云计算环境中,作业有到达时间、计算量、预算、截止期等属性,区分作业的重要性和紧迫性是调度系统的关键问题之一。现有的作业优先级只考虑作业的单个或部分属性。综合考虑以上提及的四个属性,定义了基于价值密度和相对截止期的作业优先级,提出了基于价值密度和相对截止期的网格作业调度算法,并结合回填算法(EASY backfilling)来提高资源的利用率。仿真结果显示,基于价值密度和相对截止期的作业优先级很好地体现了作业的重要性和紧迫性;而回填算法在提高资源利用率上对某些优先级策略效果显著,有些则效果不明显。  相似文献   

10.
在航天器型号设计阶段需要利用高性能计算系统开展大量的仿真分析工作,昂贵的许可证资源使用极其紧张,作业计算效率低.针对高性能计算系统中现有作业派发机制未动态考虑高性能运算主机空闲状态的缺陷和不足开展研究,基于资源调度软件Platform LSF,结合航天器仿真分析特点,提出一种新的思路,设计并实现一种新的基于CPU因子(CPU Factor)影响的二次调度算法,CPU因子用于区分不同机器的相对运行速度,仿真结果表明算法能够有效提升作业计算效率,缩短许可证资源占用时间.实际案例说明算法具备推广应用的可能,一定程度的提高了许可证资源利用率,满足了航天器仿真分析过程中对于成本控制和资源精益化利用的实际需求.  相似文献   

11.
The utilization of parallel computers depends on how jobs are packed together: if the jobs are not packed tightly, resources are lost due to fragmentation. The problem is that the goal of high utilization may conflict with goals of fairness or even progress for all jobs. The common solution is to use backfilling, which combines a reservation for the first job in the interest of progress with packing of later jobs to fill in holes and increase utilization. However, backfilling considers the queued jobs one at a time, and thus might miss better packing opportunities. We propose the use of dynamic programming to find the best packing possible given the current composition of the queue, thus maximizing the utilization on every scheduling step. Simulations of this algorithm, called lookahead optimizing scheduler (LOS), using trace files from several IBM SP parallel systems, show that LOS indeed improves utilization, and thereby reduces the mean response time and mean slowdown of all jobs. Moreover, it is actually possible to limit the lookahead depth to about 50 jobs and still achieve essentially the same results. Finally, we experimented with selecting among alternative sets of jobs that achieve the same utilization. Surprising results indicate that choosing the set at the head of the queue does not necessarily guarantee best performance. Instead, repeatedly selecting the set with the maximal overall expected slowdown boosts performance when compared to all other alternatives checked.  相似文献   

12.
Bayesian inference is one of the most important methods for estimating phylogenetic trees in bioinformatics. Due to the potentially huge computational requirements, several parallel algorithms of Bayesian inference have been implemented to run on CPU-based clusters, multicore CPUs, or small clusters of CPUs and GPUs. To the best of our knowledge, however, none of the existing methods is able to simultaneously and fully utilize both CPUs and GPUs for the computations, leaving idle either the CPU part or the GPU part of modern heterogeneous supercomputers. Aiming at an optimized utilization of heterogeneous computing resources, which is a promising hardware architecture for future bioinformatics applications, we present a new hybrid parallel algorithm and implementation of Bayesian phylogenetic inference, which combines MPI, OpenMP, and CUDA programming. The novelty of our algorithm, denoted as oMC3, is its ability of using CPU cores simultaneously with GPUs for the computations, while ensuring a fair work division between the two types of hardware components. We have implemented oMC3 based on MrBayes, which is one of the most popular software packages for Bayesian phylogenetic inference. Numerical experiments show that oMC3 obtains 2.5× speedup over nMC3, which is a cutting-edge GPU implementation of MrBayes, on a single server consisting of two GPUs and sixteen CPU cores. Moreover, oMC3 scales nicely when 128 GPUs and 1536 CPU cores are in use.  相似文献   

13.
The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued job). However, predictions have not been incorporated into production schedulers, partially due to a misconception (that we resolve) claiming inaccuracy actually improves performance, but mainly because underprediction is technically unacceptable: users will not tolerate jobs being killed just because system predictions were too short. We solve this problem by divorcing kill-time from the runtime prediction and correcting predictions adaptively as needed if they are proved wrong. The end result is a surprisingly simple scheduler, which requires minimal deviations from current practices (e.g., using FCFS as the basis) and behaves exactly like EASY as far as users are concerned; nevertheless, it achieves significant improvements in performance, predictability, and accuracy. Notably, this is based on a very simple runtime predictor that just averages the runtimes of the last two jobs by the same user; counter intuitively, our results indicate that using recent data is more important than mining the history for similar jobs. All the techniques suggested in this paper can be used to enhance any backfilling algorithm and are not limited to EASY  相似文献   

14.
In cloud computing, scheduling plays an eminent role while processing enormous jobs. The paralle jobs utmost need parallel processing capabilities which leads to CPU underutilization mainly due to synchronization and communication among parallel processes. Researchers introduced several algorithms for scheduleing parallel jobs namely, Conservative Migration Consolidation supported Backfilling (CMCBF) and Aggressive Migration Consolidation supported Backfilling (AMCBF). The greatest challenge of a existing scheduling algorithm is to improve the data center utilization without affecting job responsiveness. Hence, this work proposes an Effective Multiphase Scheduling Approach (EMSA) to process the jobs. In EMSA, the jobs are initially preprocessed and batched together to avoid starvation and to mitigate unwanted delay. Later, an Associate Priority Method has been proposed which prioritizes the batch jobs to minimize the number of migrations. Finally, the prioritized jobs are scheduled using Priority Scheduling with BackFilling algorithm to utilize the intermediate idle nodes. Moreover, the virtualization technology partitions the computing capacity of the Virtual Machine (VM) into two-tier VM as foreground VM (FVM) and Background VM (BVM) to improve node utilization. Hence, Priority Scheduling with Consolidation based BackFilling algorithm has been deployed in a two-tier VM that processes the jobs by utilizing the VMs effectively. Experimental results show that the performance of the proposed work performs better than other existing algorithms by increasing the resource utilization by 8%.  相似文献   

15.
This paper deals with a problem of finding valid solutions to systems of polynomial constraints. Although there have been several quite successful algorithms based on domain subdivision to resolve this problem, some major issues are still demanding further research. Prime obstacles in developing an efficient subdivision-based polynomial constraint solver are the exhaustive, although hierarchical, search of the zero-set in the parameter domain, which is computationally demanding, and their scalability in terms of the number of variables. In this paper, we present a hybrid parallel algorithm for solving systems of multivariate constraints by exploiting both the CPU and the GPU multicore architectures. We dedicate the CPU for the traversal of the subdivision tree and the GPU for the multivariate polynomial subdivision. By decomposing the constraint solving technique into two different components, hierarchy traversal and polynomial subdivision, each of which is more suitable to CPUs and GPUs, respectively, our solver can fully exploit the availability of hybrid, multicore architectures of CPUs and GPUs. Furthermore, our GPU-based subdivision method takes advantage of the inherent parallelism in the multivariate polynomial subdivision. We demonstrate the efficacy and scalability of the proposed parallel solver through several examples in geometric applications, including Hausdorff distance queries, contact point computations, surface–surface intersections, ray trap constructions, and bisector surface computations. In our experiments, the proposed parallel method achieves up to two orders of magnitude improvement in performance compared to the state-of-the-art subdivision-based CPU solver.  相似文献   

16.
方民权  张卫民  高畅  方建滨 《软件学报》2015,26(S2):247-256
高光谱遥感影像降维最大噪声分数变换(maximum noise fraction rotation,简称MNF rotation)方法运算量大,耗时长.基于多核CPU与众核MIC(many integrated cores)平台,研究MNF算法的并行方案和性能优化.通过热点分析,针对滤波、协方差矩阵运算和MNF变换等热点,提出相应并行方案和多种优化策略,量化分析优化效果,设计MKL(math kernel library)库函数实现方案并测评其性能;设计并实现基于多核CPU的C-MNF和基于CPU/MIC的M-MNF并行算法.实验结果显示,C-MNF算法在多核CPU取得的加速比为58.9~106.4,而基于CPU/MIC异构系统的M-MNF算法性能最好,加速比最高可达137倍.  相似文献   

17.
分布式和并行系统的负载平衡是影响系统性能的一个重要因素,本文提出了一个基于预测的动态负载平衡算法,本算法以本地负载信息为基础预测该结点达到空闲状态的时间,并且在该结点到达空闲状态之前发出任务请求,从而保证系统中各结点都处于忙碌状态,提高系统资源的利用率,提高系统性能。  相似文献   

18.
Parallel processing is essential for large-scale analytics. Principal Component Analysis (PCA) is a well known model for dimensionality reduction in statistical analysis, which requires a demanding number of I/O and CPU operations. In this paper, we study how to compute PCA in parallel. We extend a previous sequential method to a highly parallel algorithm that can compute PCA in one pass on a large data set based on summarization matrices. We also study how to integrate our algorithm with a DBMS; our solution is based on a combination of parallel data set summarization via user-defined aggregations and calling the MKL parallel variant of the LAPACK library to solve Singular Value Decomposition (SVD) in RAM. Our algorithm is theoretically shown to achieve linear speedup, linear scalability on data size, quadratic time on dimensionality (but in RAM), spending most of the time on data set summarization, despite the fact that SVD has cubic time complexity on dimensionality. Experiments with large data sets on multicore CPUs show that our solution is much faster than the R statistical package as well as solving PCA with SQL queries. Benchmarking on multicore CPUs and a parallel DBMS running on multiple nodes confirms linear speedup and linear scalability.  相似文献   

19.
当前世界上排前几位的超级计算机都基于大量CPU和GPU组合的混合架构,它们对某些特殊问题,譬如基于FFT的图像处理或N体颗粒计算等领域可获得很高的性能。但是对由有限差分(或基于网格的有限元)离散的偏微分方程问题,于CPU/GPU集群上获得较好的性能仍然是一种挑战。本文提出并测试一种基于这类集群架构的混合算法。算法的可扩展性通过区域分解算法实现,而GPU的性能由基于光滑聚集的代数多重网格法获得,避免了在GPU上表现不理想的不完全分解算法。本文的数值实验采用32CPU/GPU求解用差分离散后达三千万未知数的偏微分方程。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号