期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A static mapping heuristics to map parallel applications to heterogeneous computing systems

Ranieri Baraglia Renato Ferrini Pierluigi Ritrovato 《Concurrency and Computation》2005,17(13):1579-1605

In order to minimize the execution time of a parallel application running on a heterogeneously distributed computing system, an appropriate mapping scheme is needed to allocate the application tasks to the processors. The general problem of mapping tasks to machines is a well‐known NP‐hard problem and several heuristics have been proposed to approximate its optimal solution. In this paper we propose a static graph‐based mapping algorithm, called Heterogeneous Multi‐phase Mapping (HMM), which permits suboptimal mapping of a parallel application onto a heterogeneous computing distributed system by using a local search technique together with a tabu search meta‐heuristic. HMM allocates parallel tasks by exploiting the information embedded in the parallelism forms used to implement an application, and considering an affinity parameter, that identifies which machine in the heterogeneous computing system is most suitable to execute a task. We compare HMM with some leading techniques and with an exhaustive mapping algorithm. We also give an example of mapping of two real applications using HMM. Experimental results show that HMM performs well demonstrating the applicability of our approach. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

2.

An efficient grid scheduling strategy for data parallel applications

Kashif Hesham Khan Kalim Qureshi Mostafa Abd-El-Barr 《The Journal of supercomputing》2014,68(3):1487-1502

Scheduling large-scale application in heterogeneous grid systems is a fundamental NP-complete problem that is critical to obtain good performance and execution cost. To achieve high performance in a grid system it requires effective task partitioning, resource management and load balancing. The heterogeneous and dynamic nature of a grid, as well as the diverse demands of applications running on the grid, makes grid scheduling a major task. Existing schedulers in wide-area heterogeneous systems require a large amount of information about the application and the grid environment to produce reasonable schedules. However, this required information may not be available, may be too expensive to collect, or may increase the runtime overhead of the scheduler such that the scheduler is rendered ineffective. We believe that no one scheduler is appropriate for all grid systems and applications. This is because while data parallel applications in which further data partitioning is possible can be further improved by efficient management of resources, smart selection of resources and load balancing can be possible, in functional/not-dividable-task parallel applications such partitioning is either not possible or difficult or expensive in term of performance. In this paper, we propose a scheduler for data parallel applications (SDPA) which offers an efficient task partitioning and load balancing strategy for data parallel applications in grid environment. The proposed SDPA offers two major features: maintaining job priority even if insufficient number of free resources is available and pre-task assignment to cut the idle time of nodes. The SDPA selects nodes smartly according to the nature of task and the nodes’ resources availability. Simulation results conducted reveal that SDPA achieves performance improvement over reported strategies in the reviewed literature in terms of execution time, throughput and waiting time. 相似文献

3.

Dynamic software updates for parallel high‐performance applications

Dong Kwan Kim Eli Tilevich Calvin J. Ribbens 《Concurrency and Computation》2011,23(4):415-434

Despite using multiple concurrent processors, a typical high‐performance parallel application is long‐running, taking hours, even days to arrive at a solution. To modify a running high‐performance parallel application, the programmer has to stop the computation, change the code, redeploy, and enqueue the updated version to be scheduled to run, thus wasting not only the programmer's time, but also expensive computing resources. To address these inefficiencies, this article describes how dynamic software updates (DSU) can be used to modify a parallel application on the fly, thus saving the programmer's time and using expensive computing resources more productively. The net effect of updating parallel applications dynamically can reduce the total time that elapses between posing a problem and arriving at a solution, otherwise known as time‐to‐discovery. To explore the benefits of dynamic updates for high performance applications, this article takes a two‐pronged approach. First, we describe our experiences of building and evaluating a system for dynamically updating applications running on a parallel cluster. We then review a large body of literature describing the existing state of the art in DSU and point out how this research can be applied to high‐performance applications. Our experimental results indicate that DSU have the potential to become a powerful tool in reducing time‐to‐discovery for high‐performance parallel applications. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

4.

Elastic superposition task mapping for NoC-based reconfigurable systems

《Microprocessors and Microsystems》2017

With technology progress, more and more applications are integrated into a single chip. This requires a large number of processing elements (PEs) in a system, such that computation can be effectively enhanced through parallel processing. To support more efficient parallel processing, the Network-on-Chip (NoC) is being increasingly adopted as an interconnection architecture. Nevertheless, for NoC-based reconfigurable systems, the issue of mapping tasks to the PEs becomes more complex, due to the characteristic of hardware reconfiguration. This work proposes a novel Elastic Superposition Mapping (ESM) that introduces a useful PE reservation heuristic along with dynamic cross-application superposition. The ESM can provide a great elasticity for an NoC-based reconfigurable system to map more applications. Thus, the task load on PE will increase. Experiments show that, compared to the state-of-the-art mapping methods, 7% to 49% more applications can be executed, the average task load on PE can be increased by 5.5% to 56%, and the application waiting time can be reduced by 11% to 54%. 相似文献

5.

RFID并行化定位方法研究与实现

张博李军怀王志晓张璟《计算机工程与应用》2014,50(15):181-186

针对典型RFID室内定位算法普遍存在计算量大、实时性差等问题,通过对基于虚拟信号强度的RFID室内定位算法中路径损耗指数N、虚拟标签RSSI估计以及定位过程等并行化特点分析,从任务分解、任务映射和任务合并等方面给出了并行化计算的解决方案。同时,在虚拟参考标签RSSI值计算和定位匹配过程中,提出了基于区域划分的并行定位处理方法。实验结果表明该方法具有较高的实时性和加速比,而且与串行化算法相比具有较高的稳定性。相似文献

6.

A data-driven paradigm for mapping problems

《Parallel Computing》2015

We present a new data-driven paradigm for solving mapping problems on parallel computers. This paradigm targets at mapping data modules, instead of task modules, onto multiple processing cores. By dependency analysis of data modules, we devise a data movement matrix to reduce the need of manipulating task program modules at the expenses of handling data modules. To visualize and quantify the complex maneuver, we adopt the parallel activities trace graphs introduced earlier. To demonstrate the procedure and algorithmic values of our paradigm, we test it on the Strassen matrix multiplication and Cholesky matrix inversion algorithms. Mapping tasks has been more widely studied while mapping data is a new approach that appears to be more efficient for data-intensive applications that are becoming prevalent for today's parallel computers with millions of cores. 相似文献

7.

基于粒子群优化的异构多处理器任务调度算法

李静梅张博王雪《计算机应用研究》2012,29(10):3621-3624

为提高异构多处理器任务调度的执行效率,充分发挥多处理器并行性能,提出一种基于粒子群优化的异构多处理器任务调度算法——FPSOTTS算法。该算法以求得任务最短完成时间为目标,首先通过建立新的编码方式和粒子更新公式实现粒子搜索空间到离散空间的映射,使连续的粒子群优化算法适用于离散的异构多处理器任务调度问题;同时通过引入禁忌算法进行局部搜索,克服粒子群算法的早熟收敛现象,避免陷入局部最优。实验结果表明,FPSOTTS算法的执行效率优于Min-min算法和遗传算法,有效地降低任务的执行时间。FP-SOTTS算法很好地解决了异构多处理器任务调度问题,并且适合于大规模并行任务调度。相似文献

8.

Mapping of time-consuming multitask applications on a cloud system by multiobjective Differential Evolution

《Parallel Computing》2015

Cloud computing is on-demand provisioning of virtual resources aggregated together so that by specific contracts users can lease access to their combined power.Here we hypothesize a new form of service contract by means of which users do not explicitly require resources, but simply supply information about their time-consuming multitask applications and specify their needs through some quality of service (QoS) parameters. The individuation of the virtual machines (VMs) onto which map and execute them is left to the cloud manager. Unfortunately the task/node mapping, already known as NP-hard for conventional parallel systems, becomes more challenging when application tasks must be run on VMs hosted on heterogeneous and shared cloud nodes, and when it must comply with QoS requests too. To support this new cloud service, a novel mapper tool, based on a multiobjective Differential Evolution algorithm, is proposed. Such a tool defines the mapping of the tasks on the VMs with the aim to exploit as much as possible the available cloud resources without penalizing the execution time of the submitted applications and, at the same time, to respect users’ QoS requests.To reveal the robustness of this evolutionary tool, an experimental analysis on artificial time-consuming parallel applications, modeled as task interaction graphs, has been effected. 相似文献

9.

Mapping pipeline skeletons onto heterogeneous platforms

Anne Benoit Yves Robert 《Journal of Parallel and Distributed Computing》2008

Mapping applications onto parallel platforms is a challenging problem, that becomes even more difficult when platforms are heterogeneous — nowadays a standard assumption. A high-level approach to parallel programming not only eases the application developer’s task, but it also provides additional information which can help realize an efficient mapping of the application. 相似文献

10.

计算网格中基于时间均衡的并行粗粒度任务调度算法

胡艳丽张维明肖卫东汤大权《小型微型计算机系统》2008,29(1):124-129

考虑网格资源异构、自治、动态等特性,讨论本地用户具有强占优先权情况下的任务调度问题,提出了TBBS(Time-Balancing Based Scheduling Algorithm)算法.建立调度优化模型,以期望完成时间最小为目标选择执行任务的最佳资源组合.以时间均衡策略将任务分解并调度到资源上执行,减少了子任务同步时因等待而产生的延时,获得较好的并行计算性能.采用重复调度策略,适应计算网格中资源的特性. 相似文献

11.

Three‐dimensional thinning algorithms on graphics processing units and multicore CPUs

J. Jimnez J. Ruiz de Miras 《Concurrency and Computation》2012,24(14):1551-1571

Three‐dimensional curve skeletons are a very compact representation of three‐dimensional objects with many uses and applications in ﬁelds such as computer graphics, computer vision, and medical imaging. An important problem is that the calculation of the skeleton is a very time‐consuming process. Thinning is a widely used technique for calculating the curve skeleton because of the properties it ensures and the ease of implementation. In this paper, we present parallel versions of a thinning algorithm for eﬃcient implementation in both graphics processing units and multicore CPUs. The parallel programming models used in our implementations are Compute Uniﬁed Device Architecture (CUDA) and Open Computing Language (OpenCL). The speedup achieved with the optimized parallel algorithms for the graphics processing unit achieves 106.24x against the CPU single‐process version and more than 19x over the CPU multithreaded version. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

12.

Parallel computation on multilayer cluster grids

R. Zarrelli M. Petrone 《Concurrency and Computation》2007,19(13):1865-1878

Cluster architectures are increasingly used to solve high‐performance computing applications. To build more computational power, sets of clusters, interconnected by high‐speed networks, can be used in an elaboration to form a cluster grid. In this type of architecture, it is difficult to exploit all the internal resources of a cluster, because each one can be shielded by a firewall and is usually configured with machines where there is only one visible IP front‐end node that hides all its internal nodes from the external world. The exploitation of resources is even more complicated if we consider the general case where each internal node of a cluster can be a front‐end node of an another cluster. This type of architecture has been defined as a multilayer cluster grid. In this paper, a Parallel Virtual Machine (PVM) extension is presented which, through a middleware solution based on the H2O distributed metacomputing framework, permits the building of a parallel virtual machine in a multilayer cluster grid environment. In addition, the existing code written for PVM can be executed into this environment without modifications. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献

13.

Parallel Implementation of a Class of Adaptive Signal Processing Applications

M. Lee W. Liu V. K. Prasanna 《Algorithmica》2001,30(4):645-684

Recently, High Performance Computing (HPC) platforms have been employed to realize many computationally demanding applications in signal and image processing. These applications require real-time performance constraints to be met. These constraints include latency as well as throughput. In order to meet these performance requirements, efficient parallel algorithms are needed. These algorithms must be engineered to exploit the computational characteristics of such applications. In this paper we present a methodology for mapping a class of adaptive signal processing applications onto HPC platforms such that the throughput performance is optimized. We first define a new task model using the salient computational characteristics of a class of adaptive signal processing applications. Based on this task model, we propose a new execution model. In the earlier linear pipelined execution model, the task mapping choices were restricted. The new model permits flexible task mapping choices, leading to improved throughput performance compared with the previous model. Using the new model, a three-step task mapping methodology is developed. It consists of (1) a data remapping step, (2) a coarse resource allocation step, and (3) a fine performance tuning step. The methodology is demonstrated by designing parallel algorithms for modern radar and sonar signal processing applications. These are implemented on IBM SP2 and Cray T3E, state-of-the-art HPC platforms, to show the effectiveness of our approach. Experimental results show significant performance improvement over those obtained by previous approaches. Our code is written using C and the Message Passing Interface (MPI). Thus, it is portable across various HPC platforms. Received April 8, 1998; revised February 2, 1999. 相似文献

14.

Security‐aware scheduling model for computational grid

Rekha Kashyap Deo Prakash Vidyarthi 《Concurrency and Computation》2012,24(12):1377-1391

Grid applications with stringent security requirements introduce challenging concerns because the schedule devised by nonsecurity‐aware scheduling algorithms may suffer in scheduling security constraints tasks. To make security‐aware scheduling, estimation and quantification of security overhead is necessary. The proposed model quantifies security, in the form of security levels, on the basis of the negotiated cipher suite between task and the grid‐node and incorporates it into existing heuristics MinMin and MaxMin to make it security‐aware MinMin(SA) and MaxMin(SA). It also proposes SPMaxMin (Security Prioritized MinMin) and its comparison with three heuristics MinMin(SA), MaxMin(SA), and SPMinMin on heterogeneous grid/task environment. Extensive computer simulation results reveal that the performance of the various heuristics varies with the variation in computational and security heterogeneity. Its analysis over nine heterogeneous grid/task workload situations indicates that an algorithm that performs better for one workload degrades in another. It is conspicuous that for a particular workload one algorithm gives better makespan while another gives better response time. Finally, a security‐aware scheduling model is proposed, which adapts itself to the dynamic nature of the grid and picks the best suited algorithm among the four analyzed heuristics on the basis of job characteristics, grid characteristics, and desired performance metric. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

15.

A layered method for process and code mapping

Stephen Taylor Evyatar Av-Ron Ehud Shapiro 《New Generation Computing》1987,5(2):185-205

相似文献

16.

云环境下的一种并行任务划分方法研究

牛伟伟张千《计算机时代》2014,(6):15-18

并行任务划分一直是高性能计算的研究重点。结合地震资料数据处理的应用云环境,以任务运行时间估计模型作为优化目标函数,提出了一种改进的粒子群优化算法,用以解决地震资料任务划分问题。仿真实验证明,改进后的算法增强了全局搜索能力,提高了收敛速度和收敛精度,有效提高了云环境下任务的执行效率。相似文献

17.

Grid applications on distributed memory architectures: Implementation and evaluation

Karl Solchenbach 《Parallel Computing》1988,7(3):341-356

It was shown in the paper of Solchenbach and Trottenberg (in this special issue) that grid algorithms are inherently parallel and that parallel grid algorithms for regular grids can be efficiently implemented on dm-mp systems using the concept of grid partitioning.

In this paper, we demonstrate that grid applications can be implemented quite easily on dm-mp systems if a hardware-independent process system exists and convenient tools (such as the SUPRENUM mapping and communications library) are available.

The evaluation of parallel grid algorithms shows that the multiprocessor speedup and efficiency for single grid applications depends on the communication/calculation performance ratio of the hardware, on the communication/calculation ratio of the algorithms, and on the process size. The efficiency of parallel multigrid algorithms additionally depends on the number of nodes. 相似文献

18.

一种低管理开销的网格计算模型

下载免费PDF全文

何凯杨学刚周庶恺杨愚鲁《计算机工程与应用》2008,44(8):123-127

针对利用Internet上大量空闲计算资源来解决大规模分布式计算问题这一需求,提出了一种低管理开销的网格计算模型。在该模型中,不存在任何节点来管理动态变化的资源,而与之相适应的信息机制、任务调度算法和有限任务复制算法在没有管理节点存在的情况下,以较低的开销使系统在动态的环境中达到自然的协调,实现大规模的分布计算。开发的仿真软件验证了该模型的有效性,并对相关结果进行了初步的性能分析;仿真结果表明,该模型在动态的环境中负载分布合理,资源的计算能力能得到充分利用,为高效地完成参数扫描、蒙特卡罗模拟等大规模易并行计算提供了一个可行的方法。相似文献

19.

Performance modeling of parallel applications for grid scheduling

H.A. Sathish 《Journal of Parallel and Distributed Computing》2008,68(8):1135-1145

Grids consist of both dedicated and non-dedicated clusters. For effective mapping of parallel applications on grid resources, a grid metascheduler has to evaluate different sets of resources in terms of predicted execution times for the applications when executed on the sets of resources. In this work, we have developed a comprehensive set of performance modeling strategies for predicting execution times of parallel applications on both dedicated and non-dedicated environments. Our strategies adapt to changing network and CPU loads on the grid resources. We have evaluated our strategies on 8, 16, 24 and 32-node clusters with random loads and load traces from a grid system. Our strategies give less than 30% average percentage prediction errors in all cases, which, to our knowledge, is the best reported for non-dedicated environments. We also found that grid scheduling using predictions of execution times from our performance modeling techniques will lead to perfect mapping of applications to resources in many cases. 相似文献

20.

Reclaiming the energy of a schedule: models and algorithms

Guillaume Aupy Anne Benoit Fanny Dufoss Yves Robert 《Concurrency and Computation》2013,25(11):1505-1523

We consider a task graph to be executed on a set of processors. We assume that the mapping is given, say by an ordered list of tasks to execute on each processor, and we aim at optimizing the energy consumption while enforcing a prescribed bound on the execution time. Although it is not possible to change the allocation of a task, it is possible to change its speed. Rather than using a local approach such as backfilling, we consider the problem as a whole and study the impact of several speed variation models on its complexity. For continuous speeds, we give a closed‐form formula for trees and series–parallel graphs, and we cast the problem into a geometric programming problem for general directed acyclic graphs. We show that the classical dynamic voltage and frequency scaling (DVFS) model with discrete modes leads to an NP‐complete problem, even if the modes are regularly distributed (an important particular case in practice, which we analyze as the incremental model). On the contrary, the Vdd‐hopping model that allows to switch between different supply voltages (V_DD) while executing a task leads to a polynomial solution. Finally, we provide an approximation algorithm for the incremental model, which we extend for the general DVFS model. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献