首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
凭借着高性能,低功耗的特性,多核处理器已经占据了目前的主要市场.提出一种多核处理平台上基于任务图模型的调度策略.建立了多核平台上任务图的空间与时间并行调度模型;针对任务图的空间并行与时间并行调度模型提出了并行节点合并、分配的优化算法与流水线并行的优化算法.最后,提出将优化的空间与时间并行调度技术相结合的并行调度策略.通过实验验证,本文提出的算法比其他多核并行调度算法降低了处理器核心间的通信与同步开销,提高了系统的计算效率与吞吐量.  相似文献   

2.
分布存储系统上一种新的并行调度算法   总被引:3,自引:0,他引:3  
在一般的分布存储系统上各个处理器可能不同且资源共享,导致了并行任务在各个处理器上的执行时间具有很大的随机性,主要根据系统及并行任务特性等引进特征参数,采用计算与通信重叠等方法设计出了一种新的并行调度算法,即使在多用户环境下应用此算法不仅能达到极高的负载平衡,充分利用系统资源而且能有效地提高并行效率及加速比。实验结果表明,提出的新的并行调度算法与已有的类似调度算法相比能更加有效地利用系统资源及提高并行效率。  相似文献   

3.
傅立国  姚远  丁锐 《计算机应用》2014,34(4):1014-1018
不规则计算在大规模并行应用中广泛存在。在面向分布存储结构的自动并行化过程中,较难在编译时为不规则循环生成并行代码。并行代码中的通信代码对程序运行结果的正确性以及加速效果有着严重的影响。通过分析程序的数组重分布图,使用部分冗余的通信方式来维持不规则数组访问的生产者消费者关系,可以在编译时为一类常见的不规则循环自动生成有效的通信代码。该方法使用计算分解和数组引用的访问表达式求解不规则数组在各处理器的本地定义集作为通信的数据集,分析针对此类不规则循环划分的通信策略,继而生成相应的通信代码。实验测试的结果取得了预期的加速效果,验证了方法的有效性。  相似文献   

4.
在舰载通信的研究中,大型舰载实时通信系统性能的好坏关系到舰载通信的高效性.舰载实时通信系统为了保证实时性,当有新的通信任务进入时,需重新计算并调整所有任务连接的开始时间,当旧任务还没完全退出时,会产生通信任务冲突.传统的大型舰载实时通信为了解决这冲突带来的问题,往往采用延迟的策略,虽然延迟能避免冲突,但是对延迟过程没有进行细分,造成所有任务都要参与延迟,效率较低.提出一种大型舰载实时通信系统通信优化模型,采用支持向量数据描述的数据检测算法,将到超球体中心距离高于球体半径的数据当成冲突数据,调整新加入工作窗口休眠模式的启动时间,降低休眠工作窗口间数据调度的竞争力,提高处在休眠状态工作窗口的数据调度优先级,将采用节能特性数据调度方法中调度器的输入平滑算法,描述成两目标的非线性最优化问题,经简单的循环运算获取具有最高优先级的休眠窗口,优先调度休眠窗口中的冲突数据.仿真结果表明,所提模型不仅拥有较高的冲突调度效率,而且平均吞吐率以及准确率均达到了令人满意的效果.  相似文献   

5.
对于分布内存体系结构的并行计算机而言,如何对计算和数据进行合理划分以增加数据本地化减少处理器间的通信是提高其并行性能的关键,但在数据划分过程中,重分布通信有时不可避免,如何进行合理的数据和计算划分以减少通信并最大限度的利用程序的并行性是并行编译中的一个重要问题。该文主要讨论了一种支持数据重分布的自动进行计算和数据划分的算法。  相似文献   

6.
丁锐  赵荣彩  韩林 《计算机科学》2012,39(3):290-294
计算和数据自动划分是并行化编译中一种自动分配计算和数据到各个处理机的优化技术,划分的结果直接影响程序并行的性能。数组是划分处理的主要对象之一,一些数组分布后的收益不高,但带来的并行约束却能对其它数组的划分产生干扰,导致大量数据重分布通信的产生。现有的划分算法中没有约定数组分布的优先次序,因此无法限制这些数组并行约束的传播,降低了优化编译器后端自动生成并行代码的性能。提出了一种基于主导值的计算和数据自动划分算法:将划分过程中数组对程序并行性的影响量化为主导值,并依据主导值的大小约定数组分布的优先次序,限制干扰数组并行约束的传播速度,提高划分结果的合理性。实验结果表明,算法能够获得良好的划分效果。  相似文献   

7.
丁锐  赵荣彩  韩林 《软件学报》2013,24(12):2843-2858
划分是一种自动分配计算和数据到各个处理器的编译技术,是分布存储结构下并行编译的核心问题.以往的划分研究较少从生命期的角度考虑数据分解问题,分解在数组的不同生命期中不一致时会产生冗余通信.为解决上述问题,提出了一种数据分解算法,通过定义-引用图来表示数组的数据流信息,并使用分解映射表为数组不同的生命期建立各自的数据分解.对矩阵求逆等9 个实际用例的实验结果表明,与以往不区分生命期的划分研究相比,使用所提算法能够在寻找数据分解时对并行收益做出更准确的评估,减少了通信冗余,从而提升了自动生成的并行代码的加速比.  相似文献   

8.
针对传统的加权循环队列调度在移动消息推送平台中会增加系统开销、延长消息发送时间等问题,提出一种基于动态权值的加权循环调度策略.该策略在原有加权循环调度算法基础上,采用了动态权值策略,使得推送系统不再需要对消息的发送情况作额外的记录,有效降低了系统开销.对提出的算法进行了模拟实验,实验结果表明,改进的策略减少了消息发送的整体时延,提高了移动推送平台的消息发送效率.  相似文献   

9.
划分是把程序中不同的计算和数据分配到并行处理系统的不同处理机来充分利用并行系统的计算资源、提高程序处理速度的一种优化技术.划分的效果对程序在并行系统上的执行效率将产生至关重要的影响,因此划分问题一直是并行领域研究的一个热点.但是应用程序的一些特性,如非紧密嵌套循环、一条语句对非只读数组的多次引用间存在重叠、不同语句对同一数组不同步长的引用,给有效解决划分问题设置了极大的障碍.已有的划分算法无法对具有这些特征的程序进行自动划分.虽然在对具有这些特征的程序进行手工优化过程中,存在一些直观上的划分策略,但这些策略无法应用到编译器中来指导编译器完成对程序的自动划分.文中根据这类程序的特点,提出了一种基于代表元的划分算法.该算法通过使用程序中对划分计算产生实际影响的数组引用作为代表元素构造各种划分的限制条件,完成程序的划分.同时通过寻找最大一致性数据划分方向有效减少了程序划分过程中的数据重组织通信.该算法已经在AFT2004中实现,并对应用程序获得了很好的效果.  相似文献   

10.
针对目前分布对象中间件消息通信机制中不支持时间解耦和服务质量控制的问题,提出一种能够支持消息异步传递和时间解耦的异步通信模型,通过在客户端引入消息转换层来完成异步消息的提取、包装和转换,将原始的请求转换为一种可路由的消息,然后设计一种软件路由代理来实现异步消息的传递与转发.重点论述了分布对象中间件异步通信消息的转换机制.  相似文献   

11.
Many scientific applications require array redistribution when the programs run on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs will degrade considerably. The redistribution overheads consist of two parts: index computation and inter-processor communication. If there is no communication scheduling in a redistribution routine, the inter-processor communication will incur a larger communication idle time when there exists node contention and/or difference among message lengths during one particular communication step. In order to solve this problem, in this paper, we propose an efficient scheduling scheme that not only minimizes the number of communication steps and eliminates node contention, but also minimizes the difference of message lengths in each communication step. Thus, the communication idle time is reduced in redistribution routines.  相似文献   

12.
13.
Irregular array redistribution has been paid attention recently since it can distribute different size of data segment to heterogeneous processors according to their computational ability. It’s also the reason why it has been kept an eye on load balance. High Performance Fortran Version 2 (HPF2) provides GEN_BLOCK distribution format which facilitates generalized block distributions. In this paper, we present a two-phase degree-reduction (TPDR) method for scheduling HPF2 irregular array redistribution. Using a bipartite communication graph, the first phase of TPDR schedules communication links adjacent to processors that with degree greater than two. A communication step will be scheduled follow each degree-reduction iteration. The second phase of TPDR schedules remaining messages of all processors that with degree-2 and degree-1 using an adjustable coloring mechanism. An extended algorithm based on TPDR is also presented in this paper. Effectiveness of the proposed methods not only avoids node contention but also shortens the overall communication cost. The proposed methods are also practicable due to low algorithmic complexity. To evaluate the performance of our methods, we have implemented both algorithms along with the divide-and-conquer algorithm and two scheduling mechanism. The simulation results show improvement of total communication costs.
Chao-Yang LanEmail:
  相似文献   

14.
For many parallel applications on distributed memory systems, array re-decomposition is usually required to enhance data locality and reduce the communication overheads. How to effectively schedule messages to improve the performance of array re-decomposition has received much attention in recent years. This paper is devoted to develop efficient scheduling algorithms using the compiling information provided by array distribution patterns, array alignment patterns and the periodic property of array accesses. Our algorithms not only avoid inter-processor contention, but also reduces real communication cost and communication generation time. The experimental results show that the performance of array redecomposition can be significantly improved using our algorithms  相似文献   

15.
Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes O(max(P, Q)) time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation  相似文献   

16.
This paper proposes a scheduling algorithm to solve the problem of task scheduling in a cloud computing system with time‐varying communication conditions. This algorithm converts the scheduling problem with communication changes into a directed acyclic graph (DAG) scheduling problem for existing fuzzy communication task nodes, that is, the scheduling problem for a communication‐change DAG (CC‐DAG). The CC‐DAG contains both computation task nodes and communication task nodes. First, this paper proposes a weighted time‐series network bandwidth model to solve the indefinite processing time (cost) problem for a fuzzy communication task node. This model can accurately predict the processing time of a fuzzy communication task node. Second, to address the scheduling order problem for the computation task nodes, a dynamic pre‐scheduling search strategy (DPSS) is proposed. This strategy computes the essential paths for the pre‐scheduling of the computation task nodes based on the actual computation costs (times) of the computation task nodes and the predicted processing costs (times) of the fuzzy communication task nodes during the scheduling process. The computation task node with the longest essential path is scheduled first because its completion time directly influences the completion time of the task graph. Finally, we demonstrate the proposed algorithm via simulation experiments. The experimental results show that the proposed DPSS produced remarkable performance improvement rate on the total execution time that ranges between 11.5% and 21.2%. In view of the experimental results, the proposed algorithm provides better quality scheduling solution that is suitable for scientific application task execution in the cloud computing environment than HEFT, PEFT, and CEFT algorithms.  相似文献   

17.
The authors present a compile-time scheduling heuristic called dynamic level scheduling, which accounts for interprocessor communication overhead when mapping precedence-constrained, communicating tasks onto heterogeneous processor architectures with limited or possibly irregular interconnection structures. This technique uses dynamically-changing priorities to match tasks with processors at each step, and schedules over both spatial and temporal dimensions to eliminate shared resource contention. This method is fast, flexible, widely targetable, and displays promising performance  相似文献   

18.
One of the main obstacles in obtaining high performance from heterogeneous distributed computing (HDC) system is the inevitable communication overhead. This occurs when tasks executing on different computing nodes exchange data or the assigned sub-task size is very small. In this paper, we present adaptive pre-task assignment (APA) strategy for heterogeneous distributed raytracing system. In this strategy, the master assigns pre-task to the each node. The size of sub-task for each node is proportional to the node’s performance. One of the main features of this strategy is that it reduces the inter-processes communication, the cost overhead of the node’s idle time and load imbalance, which normally occurs in traditional runtime task scheduling (RTS) strategies. Performances of the RTS and APA strategies are evaluated on manager/master and workers model of HDC system. The experimental results of our proposed (APA) strategy have shown a significant improvement in the performance over RTS strategy.  相似文献   

19.
We develop a message scheduling scheme for efficiently realizing all-to-all personalized communication (AAPC) on Ethernet switched clusters with one or more switches. To avoid network contention and achieve high performance, the message scheduling scheme partitions AAPC into phases such that 1) there is no network contention within each phase and 2) the number of phases is minimum. Thus, realizing AAPC with the contention-free phases computed by the message scheduling algorithm can potentially achieve the minimum communication completion time. In practice, phased AAPC schemes must introduce synchronizations to separate messages in different phases. We investigate various synchronization mechanisms and various methods for incorporating synchronizations into the AAPC phases. Experimental results show that the message scheduling-based AAPC implementations with proper synchronization consistently achieve high performance on clusters with many different network topologies when the message size is large  相似文献   

20.
陈丽  李治军  姜守旭 《软件学报》2014,25(10):2362-2372
车联网信道资源稀缺及车载节点间的间歇性短暂链接,给车载节点通过无线接入点(AP)接入互联网进行内容下载带来了巨大挑战.AP覆盖范围内的资源分配与Internet链接空洞区域的传输调度相互依赖,共同影响其下载性能,而现有文献往往将二者孤立开来分别进行研究.为了提高下载性能,将二者作为一个整体,从全局优化的角度研究内容下载的效率问题,并将其形式化为下载数据量最大的结合非冲突调度的资源分配问题.但是,在证明该问题是 NP-难的基础上,提出结合链接空洞区域的传输调度的资源分配近似算法(JAS)来解决该问题.该算法将整个链接空洞区域节点间链接的时空变化模型化为拓扑图序列,并基于此构建其传输冲突图序列,在 AP 通信覆盖区域基于图序列计算优化的资源分配节点集进行资源分配,以期达到扩展AP通信范围、填补Internet链接空洞的目的.模拟实验结果表明,JAS 算法与现有方法相比显著提高了文件下载量及传输的成功率.此外,还对影响内容下载性能的相关因素进行了分析.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号