期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities

Hector Ortega-Arranz Yuri Torres Arturo Gonzalez-Escribano Diego R. Llanos 《International journal of parallel programming》2015,43(5):939-960

相似文献

2.

uBench: exposing the impact of CUDA block geometry in terms of performance

Yuri Torres Arturo Gonzalez-Escribano Diego R. Llanos 《The Journal of supercomputing》2013,65(3):1150-1163

The choice of thread-block size and shape is one of the most important user decisions when a parallel problem is written for any CUDA architecture. The reason is that thread-block geometry has a significant impact on the global performance of the program. Unfortunately, the programmer has not enough information about the subtle interactions between this choice of parameters and the underlying hardware. This paper presents uBench, a complete suite of micro-benchmarks, in order to explore the impact on performance of (1) the thread-block geometry choice criteria, and (2) the GPU hardware resources and configurations. Each micro-benchmark has been designed to be as simple as possible to focus on a single effect derived from the hardware and thread-block parameter choice. As an example of the capabilities of this benchmark suite, this paper shows an experimental evaluation and comparison of Fermi and Kepler architectures. Our study reveals that, in spite of the new hardware details introduced by Kepler, the principles underlying the block geometry selection criteria are similar for both architectures. 相似文献

3.

Efficient heterogeneous programming with FPGAs using the Controller model

Rodriguez-Canal Gabriel Torres Yuri Andújar Francisco J. Gonzalez-Escribano Arturo 《The Journal of supercomputing》2021,77(12):13995-14010

The Journal of Supercomputing - The Controller model is a heterogeneous parallel programming model implemented as a library. It transparently manages the coordination, communication and kernel... 相似文献

4.

Using the Xeon Phi Platform to Run Speculatively-Parallelized Codes

Alvaro Estebanez Diego R. Llanos Arturo Gonzalez-Escribano 《International journal of parallel programming》2017,45(2):225-241

Intel Xeon Phi accelerators are one of the newest devices used in the field of parallel computing. However, there are comparatively few studies concerning their performance when using most of the existing parallelization techniques. One of them is thread-level speculation, a technique that optimistically tries to extract parallelism of loops without the need of a compile-time analysis that guarantees that the loop can be executed in parallel. In this article we evaluate the performance delivered by an Intel Xeon Phi coprocessor when using a software, state-of-the-art thread-level speculative parallelization library in the execution of well-known benchmarks. We describe both the internal characteristics of the Xeon Phi platform and the particularities of the thread-level speculation library being used as benchmark. Our results show that, although the Xeon Phi delivers a relatively good speedup in comparison with a shared-memory architecture in terms of scalability, the relatively low computing power of its computational units when specific vectorization and SIMD instructions are not fully exploited makes this first generation of Xeon Phi architectures not competitive (in terms of absolute performance) with respect to conventional multicore systems for the execution of speculatively parallelized code. 相似文献

5.

The BonaFide C Analyzer: automatic loop-level characterization and coverage measurement

Sergio Aldea Diego R. Llanos Arturo Gonzalez-Escribano 《The Journal of supercomputing》2014,68(3):1378-1401

The advent of multicore technologies has increased the interest in parallelization techniques for existing sequential applications. These techniques include the need of detecting loops that are good candidates for parallelization, and classifying all variables of these loops according to their use, a task surprisingly hard to be carried out manually. In this paper, we introduce the BonaFide C Analyzer, an XML-based framework that combines static analysis of source code with profiling information to generate complete reports regarding all loops in a C application, including loop coverage, loop suitability for parallelization, a classification of all variables inside loops based on their accesses, and other hurdles that restrict the parallelization. This information allows to analyze how particular language constructs are used in real-world applications, and helps the programmer to parallelize the code. To show the features of the framework, we present the results of an in-depth loop characterization of C applications that are part of the SPEC CPU2006 benchmark suite. Our study shows that 47.72 % of loops present in the applications analyzed are potentially parallelizable with existent parallel programming models such as OpenMP, while an additional 37.7 % of loops could be run in parallel with the help of runtime speculative parallelization techniques. 相似文献

6.

A multi-device version of the HYFMGPU algorithm for hyperspectral scenes registration

Fernández-Fabeiro Jorge Ordóñez Álvaro Gonzalez-Escribano Arturo Heras Dora B. 《The Journal of supercomputing》2019,75(3):1551-1564

The Journal of Supercomputing - Hyperspectral image registration is a relevant task for real-time applications like environmental disasters management or search and rescue scenarios. Traditional... 相似文献

7.

Toward a BLAS library truly portable across different accelerator types

Rodriguez-Gutiez Eduardo Moreton-Fernandez Ana Gonzalez-Escribano Arturo Llanos Diego R. 《The Journal of supercomputing》2019,75(11):7101-7124

The Journal of Supercomputing - Scientific applications are some of the most computationally demanding software pieces. Their core is usually a set of linear algebra operations, which may represent... 相似文献

8.

Comprehensive Evaluation of a New GPU-based Approach to the Shortest Path Problem

Hector Ortega-Arranz Yuri Torres Arturo Gonzalez-Escribano Diego R. Llanos 《International journal of parallel programming》2015,43(5):918-938

相似文献

9.

New Data Structures to Handle Speculative Parallelization at Runtime

Alvaro Estebanez Diego R. Llanos Arturo Gonzalez-Escribano 《International journal of parallel programming》2016,44(3):407-426

相似文献

10.

Extending a hierarchical tiling arrays library to support sparse data partitioning

Javier Fresno Arturo Gonzalez-Escribano Diego R. Llanos 《The Journal of supercomputing》2013,64(1):59-68

Layout methods for dense and sparse data are often seen as two separate problems with their own particular techniques. However, they are based on the same basic concepts. This paper studies how to integrate automatic data-layout and partition techniques for both dense and sparse data structures. In particular, we show how to include support for sparse matrices or graphs in Hitmap, a library for hierarchical tiling and automatic mapping of arrays. The paper shows that it is possible to offer a unique interface to work with both dense and sparse data structures. Thus, the programmer can use a single and homogeneous programming style, reducing the development effort and simplifying the use of sparse data structures in parallel computations. Our experimental evaluation shows that this integration of techniques can be effectively done without compromising performance. 相似文献