共查询到20条相似文献,搜索用时 83 毫秒
1.
袁再龙 《计算机测量与控制》2014,22(6):1941-1943
为了实现大规模计算机集群上的高效分布式并行计算,设计了一种基于改进图划分和量子遗传算法的异构节点并行计算模型;首先,介绍了传统图划分模型并分析了其不足,然后从图的有向性、通信开销计算和负载均衡度等方面对传统的图划分模型进行了改进,从而得到一个改进的图划分模型;最后,以最小化通信开销和优化资源负载均衡为目标,通过设计编码方案,在改进的图划分模型上提出了采用量子遗传算法获取最优任务划分方案的最优解;仿真实验表明:文中方法能有效实现任务的并行计算,与其它方法相比,具有较小的通信开销和较好的负载均衡度,具有很强的可行性。 相似文献
3.
异构型并行分布计算系统PVM的结构分析 总被引:2,自引:0,他引:2
本文描述了异构型并行分布计算系统PVM的组成和特征,详细分析了它的软件结构、工作流程和消息通信机制,并提出了它存在的不足之处。 相似文献
4.
5.
应用于高性能计算领域的通用GPU拥有强大的并行计算能力,以通用GPU作为主处理器的数据分析系统相较于传统数据库能够提供更好的性能。在大数据场景下,如何根据CPU和GPU的资源在处理器之间合理分配工作负载是亟待解决的问题。提出了一种CPU GPU异构数据分析系统上的负载均衡处理策略。该策略采用流水线模型将工作负载分解,基于流水线设计了负载均衡模型,将工作负载合理分配至异构处理器,减少系统总执行时间开销,实现了性能提升。实验结果表明,提出的基于流水线的负载均衡模型能适应不同查询请求下的不同数据量场景,具有良好的性能。 相似文献
6.
在集群系统中的一个非常重要的问题就是尽量确保负载的均衡.由于目前的负载均衡算法大多针对同构的集群系统,没有很好的扩展性.研究了异构集群系统,提出了一种异构服务器集群的动态负载均衡算法,并取得了较好的效果,特别是在负载较重的时候. 相似文献
7.
在分布式大数据处理框架的作业运行过程中,会有大量的数据通过网络传输,数据在各节点之间传输所需的时间已成为作业运行的主要开销之一.在节点异构带宽的情况下,因为带宽瓶颈节点的存在,传统的数据分区方法效率低下.针对这个问题,建立了节点间的数据传输模型,该模型以降低数据传输时间为目标,根据各节点的上下行带宽和初始数据量大小,计算出各节点的最优数据分发比例.以该模型为基础,设计了基于带宽的数据分区方法,该数据分区方法使得各节点按最优数据分发比例来分配数据.最后在Apache Flink框架中将基于带宽的数据分区方法进行了实现,并通过实验进行了验证.实验结果表明:异构带宽条件下,基于带宽的数据分区方法可以有效减少数据分区所需的时间. 相似文献
8.
9.
10.
算力网络通过网络连接计算节点以突破单点算力限制,近年来正快速发展应用于越来越多的业务领域.当前流行的视频直播依赖于大量视频帧传输和转码处理,探索算力网络实现高效视频分发具有重要的现实意义.相比于传统的大规模数据处理,视频类应用对于传输时延和带宽的保障有更高要求.然而当前各云服务提供的节点算力各不相同,同时节点间网络链路状态经常变化不定,使选择传输和转码综合性能最优节点实现低时延、高带宽的视频分发面临很大挑战.为此,设计基于异构算力节点协同的高效视频分发方案,包括通过强化学习规划视频传输路径并合理选取处理转码节点;对不同视频分发任务采用优先级排队调度同时自适应调整资源以降低对节点资源的突发竞争;采用分层日志同步容错机制在节点故障后快速恢复数据一致性,最终部署多云服务分布式节点实现一个完整的视频分发系统.大量超高清视频直播实验表明,该方案性能相比现有视频分发方法有明显改进. 相似文献
11.
12.
一种面向异构计算的结构化并行编程框架 总被引:1,自引:0,他引:1
随着人工智能时代的到来,异构计算在深度学习、科学计算等领域发挥着越来越重要的作用。目前异构计算系统在应用上的瓶颈之一在于缺少高效的软件开发框架,已有的OpenCL、CUDA等支持GPU、DSP及FPGA的编程框架基于C/C++语言和传统的并行编程方法,导致软件开发效率较低,软件推理和调试困难,难以灵活处理计算设备之间的协作和调度。提出一种面向异构计算平台的基于脚本语言的结构化并行编程框架,提供结构化的并行编程接口,支持计算任务到异构计算设备的映射,便于并行程序的推理和验证。设计并实现了基于遗传算法的结构化调度算法,充分利用异构计算系统的计算能力,提高了异构计算系统的软件开发效率。实验结果表明,提出的编程框架在CPU+GPU平台上实现了相对于单处理器1.5到2.5倍的加速比。 相似文献
13.
14.
并行构件技术的出现提高了并行软件的开发效率,但现有的并行构件技术缺乏对异构多核平台的支持.为了提高并行构件程序在异构平台上的执行性能,扩展CCA(通用构件体系结构)并行构件模型支持CCA异构并行构件,提出了一种异构的CCA并行构件模型.使用管理者—工人模式调度CCA异构并行构件内的计算任务到异构多核平台上加速执行.在CCA构件工具包的基础上实现了支持扩展CCA并行构件模型的编译系统和运行时框架.在CELL BE和GPU两种异构多核处理器上进行的实验证明了提出的方法比原始的CCA构件程序具有较优的性能.提出的并行构件模型应用在并行程序开发中可以提高并行程序的性能. 相似文献
15.
《国际计算机数学杂志》2012,89(3):607-618
The permeability of a 3D geological fracture network is determined by triangulating the fractures and solving the 2D Darcy's equation in each fracture. Here, the numerical modelling aims to simulate a great number of networks made up of a great number of fractures i.e. from 103 to 106 fractures. Parallel computing allows us to solve very large linear systems improving the realism of simulations. Several algorithms to simulating fluid flow are proposed for the cases of significant matrix permeability. In the case of a weak permeability matrix, the flow is focused in the fractures having a strong permeability and fluids percolate through networks of interconnected fractures. In this paper, we present a complete parallel algorithm for solving flow equations in fracture networks. We consider an imprevious matrix. The different parts of the algorithm are detailed. Numerical examples using the mixed finite element (MFE) method for various fracture networks illustrate the efficiency and robustness of the proposed algorithm. To the best of our knowledge, results for parellel simulation of fluid flow in discrete-fractured media with impervious matrix using the MFE method are the first to appear in the literature. 相似文献
16.
The purpose of content‐based image retrieval (CBIR) is to retrieve, from real data stored in a database, information that is relevant to a query. In remote sensing applications, the wealth of spectral information provided by latest‐generation (hyperspectral) instruments has quickly introduced the need for parallel CBIR systems able to effectively retrieve features of interest from ever‐growing data archives. To address this need, this paper develops a new parallel CBIR system that has been specifically designed to be run on heterogeneous networks of computers (HNOCs). These platforms have soon become a standard computing architecture in remote sensing missions due to the distributed nature of data repositories. The proposed heterogeneous system first extracts an image feature vector able to characterize image content with sub‐pixel precision using spectral mixture analysis concepts, and then uses the obtained feature as a search reference. The system is validated using a complex hyperspectral image database, and implemented on several networks of workstations and a Beowulf cluster at NASA's Goddard Space Flight Center. Our experimental results indicate that the proposed parallel system can efficiently retrieve hyperspectral images from complex image databases by efficiently adapting to the underlying parallel platform on which it is run, regardless of the heterogeneity in the compute nodes and communication links that form such parallel platform. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献
17.
18.
付朝江 《计算机工程与应用》2011,47(27):52-54
基于MPI集群环境对弹塑性区域分解有限元并行计算进行研究。提出了基于三阶和四阶的龙格库塔(Runge-Kutta)方法对应力-应变关系进行积分的算法。积分过程中自动调整子步大小来控制积分过程中的误差。研制了采用最小残余平滑法的子结构预处理共轭梯度并行求解算法。算法在基于工作站机群的并行环境下实现。计算结果表明:该算法具有良好的并行加速比和效率,是一种有效的并行求解算法。 相似文献
19.
论述了随机行走算法的基本原理,理论分析了给定允许误差和置信概率下,随机行走算法的结束条件;讨论了随机行走算法在电路分析中的应用,并结合应用实例分析了算法的性能;讨论了算法的时间复杂性和影响算法执行时间的主要因素,重点分析了算法的并行特征,提出了采用并行计算技术提高算法性能的新方法,通过与串行算法的实验比较,表明了并行计算技术是提高随机行走算法执行速度的有效方法,比现有的方法适应性更广。 相似文献
20.
Rajarethinam Madhura Vaidyanathan Rhymend Uthariaraj Benjamin Lydia Elizabeth 《Software》2023,53(2):390-412
The task scheduling in heterogeneous distributed computing systems plays a crucial role in reducing the makespan and maximizing resource utilization. The diverse nature of the devices in heterogeneous distributed computing systems intensifies the complexity of scheduling the tasks. To overcome this problem, a new list-based static task scheduling algorithm namely Deadline-Aware-Longest-Path-of-all-Predecessors (DA-LPP) is being proposed in this article. In the prioritization phase of the DA-LPP algorithm, the path length of the current task from all its predecessors at each level is computed and among them, the longest path length value is assigned as the rank of the task. This strategy emphasizes the tasks in the critical path. This well-optimized prioritization phase leads to an observable minimization in the makespan of the applications. In the processor selection phase, the DA-LPP algorithm implements the improved insertion-based policy which effectively utilizes the unoccupied leftover free time slots of the processors which improve resource utilization, further least computation cost allocation approach is followed to minimize the overall computation cost of the processors and parental prioritization policy is incorporated to further reduce the scheduling length. To demonstrate the robustness of the proposed algorithm, a synthetic graph generator is used in this experiment to generate a huge variety of graphs. Apart from the synthetic graphs, real-world application graphs like Montage, LIGO, Cybershake, and Epigenomic are also considered to grade the performance of the DA-LPP algorithm. Experimental results of the DA-LPP algorithm show improvement in performance in terms of scheduling length ratio, makespan reduction rate , and resource reduction rate when compared with other algorithms like DQWS, DUCO, DCO and EPRD. The results reveal that for 1000 task set with deadline equals to two times of the critical path, the scheduling length ratio of the DA-LPP algorithm is better than DQWS by 35%, DUCO by 23%, DCO by 26 %, and EPRD by 17%. 相似文献