首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 73 毫秒
1.
Benchmarking and Comparison of the Task Graph Scheduling Algorithms   总被引:2,自引:0,他引:2  
The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NP-completeness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather meaningless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories according to their assumptions and functionalities. We then propose a set of benchmarks that are based on diverse structures and are not biased toward a particular scheduling technique. We have implemented 15 scheduling algorithms and compared them on a common platform by using the proposed benchmarks, as well as by varying important problem parameters. We interpret the results based upon the design philosophies and principles behind these algorithms, drawing inferences why some algorithms perform better than others. We also propose a performance measure called scheduling scalability (SS) that captures the collective effectiveness of a scheduling algorithm in terms of its solution quality, the number of processors used, and the running time.  相似文献   

2.
长期以来,speedup一直被视为衡量并地处理性能的主要指标之。不论是并行计算机系统的设计者,还是并行算法的设计者,均非常重视speedup指标。那么,speedup能否像人们仃的那样正确的描述并行处理的性能呢?迄今为止,人们对这一问题尚缺乏认识。本文从speedup的定义出发,结合实例,全面分析了speedup度量并行处理的性能存在的问题,以及可能导致的错误,还讨论了speedup的适应条件。  相似文献   

3.
LilyTask任务并行环境中基于任务关系的初始任务分配算法   总被引:5,自引:0,他引:5  
邸楠  王韬  李晓明 《计算机学报》2005,28(5):892-899
LilyTask是一个基于任务并行的并行程序设计环境,它引入了任务间关系的概念.任务间会由于这种任务依赖关系而产生等待,为了减少这种等待开销,LilyTask系统在预编译阶段分析这些数据依赖关系,并做出相应的静态任务分配.该文给出在LilyTask任务并行环境中的一族新的基于任务关系图的静态任务分配的算法——WCP算法,并在实际测试中与另外两个著名的静态分配算法ETF和MCP算法作了比较,测试结果说明WCP算法在任务计算开销与通信开销不能准确给出的情况下有更好的分配效果.  相似文献   

4.
This paper presents further results on the design and implementation of various optimizations based on our earlier work of developing a parallel pipelined model for the computational intensive applications that have multiple processing tasks. Performance evaluation of this model was done by using a real-time airborne radar application that employs a Space-Time Adaptive Processing (STAP) algorithm. This paper focuses on the following four issues: (1) The tradeoffs between increasing the throughput and reducing the latency are examined in more detail when allocating processors among different processing tasks. (2) A multi-threaded design is incorporated into the pipeline model and implemented on a massively parallel computer with symmetric multi-processor nodes, which shows enhanced performance. (3) The disk I/O is incorporated into the parallel pipeline to study its effect on performance in which two I/O task designs have been implemented: embedding I/O in the pipeline or having a separate I/O task. By using a double buffering approach together with the asynchronous I/O, the overall pipeline performance scales well as the number of processors increases. (4) From the comparison of the two I/O implementations, it is discovered that the latency may be improved when merging multiple tasks into a single task. The effect of reorganizing the task structure of the pipeline is discussed in detail. All the performance results shown in this work demonstrate the linear scalability the parallel pipeline model can achieve using a production radar application. Although this paper focuses on the implementation of the parallel pipeline model and uses the results from a STAP application to support the claims of the discovered properties for this pipeline, this model is also applicable to many other types of applications with similar computational characteristics.  相似文献   

5.
本文通过分析行列式求值的特点和行列式求值的并行处理技术与10阶行列式对PGR系统测试的结果,阐述了行列式求值在多处理机系统性能评价中的应用,并对多处理机系统的加速比进行了分析。  相似文献   

6.
    
Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high‐performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high‐performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD‐style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well‐known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi‐baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user‐friendly tool for obtaining high performance in many important image processing research areas. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

7.
群机系统上单并发任务簇的近优分配算法   总被引:8,自引:0,他引:8       下载免费PDF全文
缩短程序的序的时间是并行处理的首要目标,有效的任务分配算法是实现这一目标的关键,对群机4系统来说更是如此。文中针对并行语言中常用的并行范式-单并发任务簇提出了近优分配算法OPTA,并在群机系统上做了与MH算法的比照实验,结果表明较MH算法缩短程序执行时间10%左右.  相似文献   

8.
并行处理系统中的一种新的任务调度算法模型   总被引:2,自引:0,他引:2       下载免费PDF全文
本文提出一种新的任务调度算法,是利用改进的启发式群聚算法,对MARL-LO算法进行了改进,弥补了MARY-LO算法的不足,并增加了一些动态控制功能,较好地解决了n个处理器的分配问题。  相似文献   

9.
文章论述和分析了任务粒度、并行度和并行通信方式三者之间的关系,以及DAG任务图中的任务映射方法。任务粒度的增加会减少通信,降低并行度,但在采用不同的通信方式时,反而会增加通信开销。在任务图确定的情况下,盲目地增加处理机数目并不能提高任务的并行度,该文提出了一种映射方法以最大限度地利用好处理机资源。  相似文献   

10.
Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called thebisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry, small diameter, small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, calledfault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures.  相似文献   

11.
Photon mapping is a global illumination algorithm which is composed of two steps: photon tracing and photon searching. During photon searching step, each shading point needs to search the photon-tree t...  相似文献   

12.
基于Shared-Nothing的并行Hash连接算法效率分析   总被引:3,自引:0,他引:3       下载免费PDF全文
李庆华  睢海燕  邓冲 《软件学报》2000,11(3):386-392
该文研究了基于Shared-Nothing结构的几种常用并行连接算法,分析了影响查询响应时间的各种因素.在此基础上,以多种硬件成分作为参数建立一个代价分析模型.使用该模型计算并行Hash算法在每个处理机上的平均任务执行时间和总的查询响应时间,并比较了几种算法在不同硬件配置下的执行效率.所提出的模型和分析方法为评价和选取并行连接算法提供了一种可行的途径.  相似文献   

13.
面向实时嵌入式操作系统的进程机制   总被引:1,自引:0,他引:1  
周昕  傅鹂  黄海伦 《计算机工程》2010,36(15):51-54
面向通信领域的嵌入式程序必须在资源受限的硬件环境中应对不断增加的通信业务,单纯依靠商用嵌入式操作系统的任务机制已不能提供足够的业务并行度和吞吐量。针对该问题,基于嵌入式操作系统任务机制提出一种更小粒度的进程解决方案,相对于任务对象,使用进程作为执行单元不仅内存资源占用少,且进程之间切换速度快,系统可以支持大量进程并行。该进程机制能够提供有效的系统监测和故障诊断手段,从而保证系统的健壮性。  相似文献   

14.
Object-oriented database management systems (OODBMSs) provide rich facilities for the modeling and processing of structural as well as behavioral properties of complex application objects. However, due to their inherent generality and continuously evolving functionalities, efficient implementations are important for these OODBMSs to support the present and future applications, particularly when the databases are very large. In this paper, we present several parallel, multi-wavefront algorithms based on two processing approaches, i.e., identification and elimination approaches, to verify association patterns specified in queries. Both approaches allow more processors to operate concurrently on a query than the traditional tree-structured query processing approach, thus introducing a higher degree of parallelism in query processing. A heuristic method is presented for partitioning an object-oriented database (OODB). The main consideration for partitioning the database is load balancing. This method also tries to reduce the communication time by reducing the length of the path that wavefronts need to be propagated. Multiple wavefront algorithms based on the two approaches for tree-structured queries have been implemented on an nCUBE 2 parallel computer. The implementation of the query processor allows multiple queries to be executed simultaneously. This implementation provides an environment for evaluating the algorithms and the heuristic method for partitioning the database. The evaluation results are presented in this paper.Recommended by: Patrick Valduriez  相似文献   

15.
针对Trace驱动的并行性能模拟问题,提出基于Trace信息指导的映射方法CO-LP3M。CO-LP3M利用从Trace中提取的目标应用程序的通信特征,以宿主机物理进程间通信次数最小化为目标,兼顾计算负载均衡,生成并行模拟任务到宿主机的映射。对Jacobi3D和HPL两个程序进行实验改为:对HPL程序进行实验(注:此处本来是两个程序的,后来为了缩减篇幅就删掉了其中的一个),结果表明CO-LP3M可有效提高并行模拟性能,相对于常见的映射方式,模拟性能最多提高14.7%。在此基础上给出CO-LP3M的扩展技术SCO-LP3M。  相似文献   

16.
在异构计算环境中,有效的任务调度对于获得高性能是十分重要的。现在虽然已经有许多异构处理器调度算法,但它们或者不具有良好的效果,或者算法代价太高。提出了一种新的基于表的调度算法APS。APS利用有向无环图来计算任务优先级,并采用基于调度的策略分配任务到不同处理器,以获得任务最少完工时间。将APS和LMT,HEFT,CPOP算法做比较之后得出:在大多数情况下APS算法都能获得更好性能。  相似文献   

17.
    
Scientific applications represent a dominant sector of compute‐intensive applications. Using massively parallel processing systems increases the feasibility to automate such applications because of the cooperation among multiple processors to perform the designated task. This paper proposes a parallel hidden Markov model (HMM) algorithm for 3D magnetic resonance image brain segmentation using two approaches. In the first approach, a hierarchical/multilevel parallel technique is used to achieve high performance for the running algorithm. This approach can speed up the computation process up to 7.8× compared with a serial run. The second approach is orthogonal to the first and tries to help in obtaining a minimum error for 3D magnetic resonance image brain segmentation using multiple processes with different randomization paths for cooperative fast minimum error convergence. This approach achieves minimum error level for HMM training not achievable by the serial HMM training on a single node. Then both approaches are combined to achieve both high accuracy and high performance simultaneously. For 768 processing nodes of a Blue Gene system, the combined approach, which uses both methods cooperatively, can achieve high‐accuracy HMM parameters with 98% of the error level and 2.6× speedup compared with the pure accuracy‐oriented approach alone. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
并行自动测试系统中,并行测试程序的运行管理是系统设计中的关键问题。由于该类系统的强异构性特征和特殊的运行监管要求,常规的基于多计算机系统的任务管理机制不适于并行自动测试系统。以事务工作流描述并行测试任务运行管理模型,在此基础上,提出了基于事务工作流的并行测试任务运行管理机制,最后,给出了运行管理调度的算法实现思路。  相似文献   

19.
20.
    
Gordon Lyon  Raghu Kacker  Arnaud Linz 《Software》1995,25(12):1299-1314
Code scalability, crucial on any parallel system, determines how well parallel code avoids becoming a bottleneck as its host computer is made larger. The scalability of computer code can be estimated by statistically designed experiments that empirically approximate a multivariate Taylor expansion of the code's execution response function. Each suspected code bottleneck corresponds to a first-order term in the expansion, the coefficient for that term indicating how sensitive execution is to changes in the suspect location. However, it is the expansion coefficients for second-order interactions between code segments and the number of processors that are fundamental to discovering which program elements impede parallel speedup. A new, unified view of these second-order coefficients yields an informal relative scalability test of high utility in code development. Discussion proceeds through actual examples, including a straightforward illustration of the test applied to SLALOM, a complex, multiphase benchmark. A quick graphical shortcut makes the scalability test readily accessible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号