首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we present a new MapReduce framework, called Grex, designed to leverage general purpose graphics processing units (GPUs) for parallel data processing. Grex provides several new features. First, it supports a parallel split method to tokenize input data of variable sizes, such as words in e-books or URLs in web documents, in parallel using GPU threads. Second, Grex evenly distributes data to map/reduce tasks to avoid data partitioning skews. In addition, Grex provides a new memory management scheme to enhance the performance by exploiting the GPU memory hierarchy. Notably, all these capabilities are supported via careful system design without requiring any locks or atomic operations for thread synchronization. The experimental results show that our system is up to 12.4× and 4.1× faster than two state-of-the-art GPU-based MapReduce frameworks for the tested applications.  相似文献   

2.
3.
This paper addresses the speedup of the numerical solution of shallow-water systems in 2D domains by using modern graphics processing units (GPUs). A first order well-balanced finite volume numerical scheme for 2D shallow-water systems is considered. The potential data parallelism of this method is identified and the scheme is efficiently implemented on GPUs for one-layer shallow-water systems. Numerical experiments performed on several GPUs show the high efficiency of the GPU solver in comparison with a highly optimized implementation of a CPU solver.  相似文献   

4.
Techniques for applying graphics processing units (GPU) to the general-purpose nongraphics computations proposed in recent years by the companies ATI (AMD FireStream, 2006) and NVIDIA (CUDA: Compute Unified Device Architecture, 2007) have given an impetus to developing algorithms and software packages for solving problems of diffractive optics with the aid of the GPU. The computations based on the wide-spread Ray Tracing method were among the first to be implemented using the GPU. The method attracted the attention of the CUDA technology architects, who proposed its GPU-based implementation at the conference NVISION08 (2008). The potential of this direction is associated both with the research into the general issues of mapping of the Ray Tracing method onto the GPU architecture (involving the use of various grid domains and trees) and with developing dedicated software packages (RTE and Linzik projects). In this work, a special attention is given to the overview of techniques for the GPU-aided implementation of the FDTD (finite-difference time-domain) method, which offers an instrument for solving problems of micro- and nanooptics using the rigorous electromagnetic theory. The review of the related papers ranges from the initial research (based on the use of textures) to the complete software solutions (like FDTD Software and FastFDTD).  相似文献   

5.
INTRANS is a man-computer interactive graphics system, intended for analysis of urban and transportation planning problems. It is designed to operate primarily under time sharing on IBM 360/370 computers. The paper describes the functional design and the structure of the data management of INTRANS.

The data management is designed to answer the specific needs of planning applications: large data-sets, comparative analysis of several alternatives, and interface with batch processing computer programs. At the same time, the system is designed to operate within an environment of extremely limited resources of core and computing time.  相似文献   


6.
7.
The progress made in accelerating simulations of fluid flow using GPUs, and the challenges that remain, are surveyed. The review first provides an introduction to GPU computing and programming, and discusses various considerations for improved performance. Case studies comparing the performance of CPU- and GPU-based solvers for the Laplace and incompressible Navier–Stokes equations are performed in order to demonstrate the potential improvement even with simple codes. Recent efforts to accelerate CFD simulations using GPUs are reviewed for laminar, turbulent, and reactive flow solvers. Also, GPU implementations of the lattice Boltzmann method are reviewed. Finally, recommendations for implementing CFD codes on GPUs are given and remaining challenges are discussed, such as the need to develop new strategies and redesign algorithms to enable GPU acceleration.  相似文献   

8.
Road network microsimulation is computationally expensive, and existing state of the art commercial tools use task parallelism and coarse-grained data-parallelism for multi-core processors to achieve improved levels of performance. An alternative is to use Graphics Processing Units (GPUs) and fine-grained data parallelism. This paper describes a GPU accelerated agent based microsimulation model of a road network transport system. The performance for a procedurally generated grid network is evaluated against that of an equivalent multi-core CPU simulation. In order to utilise GPU architectures effectively the paper describes an approach for graph traversal of neighbouring information which is vital to providing high levels of computational performance. The graph traversal approach has been integrated within a GPU agent based simulation framework as a generalised message traversal technique for graph-based communication. Speed-ups of up to 43 × are demonstrated with increased performance scaling behaviour. Simulation of over half a million vehicles and nearly two million detectors at a rate of 25 × faster than real-time is obtained on a single GPU.  相似文献   

9.

Biologically-inspired computer vision is a research area that offers prominent directions in a large variety of fields. Several processing algorithms inspired in natural vision enable detecting moving objects from video sequences so far. One example is lateral interaction in accumulative computation (LIAC), a classical bioinspired method that has been applied to numerous environments and applications. LIAC is the implementation for computer vision of two biologically-inspired methods denominated algorithmic lateral interaction and accumulative computation. The method has traditionally reached high precision but unfortunately requires high computing times. This paper introduces a proposal based on graphics processing units in order to speed up the original sequential code. This way not only excellent performance in terms of accuracy is maintained, but also real-time is obtained. A speed-up of 67× from the parallel over its sequential counterpart is achieved for several tested video sequences.

  相似文献   

10.
In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). We develop asynchronous iteration algorithms in CUDA and compare them with parallel implementations of synchronous relaxation methods on CPU- or GPU-based systems. For a set of test matrices from UFMC we investigate convergence behavior, performance and tolerance to hardware failure. We observe that even for our most basic asynchronous relaxation scheme, the method can efficiently leverage the GPUs computing power and is, despite its lower convergence rate compared to the Gauss–Seidel relaxation, still able to provide solution approximations of certain accuracy in considerably shorter time than Gauss–Seidel running on CPUs- or GPU-based Jacobi. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes–using multiple iterations within the “subdomain” handled by a GPU thread block–we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times, while keeping the execution time of a global iteration practically the same. The combination with the advantageous properties of asynchronous iteration methods with respect to hardware failure identifies the high potential of the asynchronous methods for Exascale computing.  相似文献   

11.
Molecular dynamics (MD) methods compute the trajectory of a system of point particles in response to a potential function by numerically integrating Newton?s equations of motion. Extending these basic methods with rigid body constraints enables composite particles with complex shapes such as anisotropic nanoparticles, grains, molecules, and rigid proteins to be modeled. Rigid body constraints are added to the GPU-accelerated MD package, HOOMD-blue, version 0.10.0. The software can now simulate systems of particles, rigid bodies, or mixed systems in microcanonical (NVE), canonical (NVT), and isothermal-isobaric (NPT) ensembles. It can also apply the FIRE energy minimization technique to these systems. In this paper, we detail the massively parallel scheme that implements these algorithms and discuss how our design is tuned for the maximum possible performance. Two different case studies are included to demonstrate the performance attained, patchy spheres and tethered nanorods. In typical cases, HOOMD-blue on a single GTX 480 executes 2.5–3.6 times faster than LAMMPS executing the same simulation on any number of CPU cores in parallel. Simulations with rigid bodies may now be run with larger systems and for longer time scales on a single workstation than was previously even possible on large clusters.  相似文献   

12.
In many biomedical research laboratories, data analysis and visualization algorithms are typical prototypes using an interpreted programming language. If performance becomes an issue, they are ported to C and integrated with interpreted systems, not fully utilizing object‐oriented software development. This paper presents an overview of Scopira, an open source C++ framework suitable for biomedical data analysis and visualization. Scopira provides high‐performance end‐to‐end application development features, in the form of an extensible C++ library. This library provides general programming utilities, numerical matrices and algorithms, parallelization facilities, and graphical user interface elements. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

13.
With the advancement in experimental devices and approaches, scientific data can be collected more easily. Some of them are huge in size. The floating centroids method (FCM) has been proven to be a high performance neural network classifier. However, the FCM is difficult to learn from a large data set, which restricts its practical application. In this study, a parallel floating centroids method (PFCM) is proposed to speed up the FCM based on the Compute Unified Device Architecture, especially for a large data set. This method performs all stages as a batch in one block. Blocks and threads are responsible for evaluating classifiers and performing subtasks, respectively. Experimental results indicate that the speed and accuracy are improved by employing this novel approach.  相似文献   

14.
We present a Graphics Processing Unit (GPU) implementation of the level set method for topology optimization. The solution of three-dimensional topology optimization problems with millions of elements becomes computationally tractable with this GPU implementation and NVIDIA supercomputer-grade GPUs. We demonstrate this by solving the inverse homogenization problem for the design of isotropic materials with maximized bulk modulus. We trace the maximum bulk modulus optimization results to very high porosities to demonstrate the detail achievable with a high computational resolution. By utilizing a parallel GPU implementation rather than a sequential CPU implementation, similar increases in tractable computational resolution would be expected for other topology optimization problems.  相似文献   

15.
The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing tool, called ParTransgrid, is developed to translate the general grid format like CFD General Notation System into an efficient distributed mesh data format for large-scale parallel computing. Through ParTransgrid, a flexible face-based parallel unstructured mesh data structure designed in Hierarchical Data Format can be obtained to support various cell-centered unstructured CFD solvers. The whole parallel preprocessing operations include parallel grid I/O, parallel mesh partition, and parallel mesh migration, which are linked together to resolve the run-time and memory consumption bottlenecks for increasingly large grid size problems. An inverted index search strategy combined with a multi-master-slave communication paradigm is proposed to improve the pairwise face matching efficiency and reduce the communication overhead when constructing the distributed sparse graph in the phase of parallel mesh partition. And we present a simplified owner update rule to fast the procedure of raw partition boundaries migration and the building of shared faces/nodes communication mapping list between new sub-meshes with an order of magnitude of speed-up. Experiment results reveal that ParTransgrid can be easily scaled to billion-level grid CFD applications, the preparation time for parallel computing with hundreds of thousands of cores is reduced to a few minutes.  相似文献   

16.
In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study, we use the NanosCompiler that supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms (MPI+OpenMP and MLP) and discuss OpenMP implementation issues that affect the performance of multi-level parallel applications.  相似文献   

17.
In the development of products involving fluids, computational fluid dynamics (CFD) has been increasingly applied to investigate the flow associated with various product operating conditions or product designs. The batch simulation is usually conducted when CFD is heavily used, which is not able to respond to the changes in flow regime when the fluid domain changes. In order to overcome this defect, a rule-based intelligent CFD simulation system for steam simulation is proposed to analyze the specific product design and generate the corresponding robust simulation model with accurate results. The rules used in the system are based on physical knowledge and CFD best practices which make this system easy to be applied in other application scenarios by changing the relevant knowledge base. Fluid physics features and dynamic physics features are used to model the intelligent functions of the system. Incorporating CAE boundary features, the CFD analysis view is fulfilled, which maintains the information consistency in a multi-view feature modeling environment. The prototype software tool is developed by Python 3 with separated logics and settings. The effectiveness of the proposed system is proven by the case study of a disk-type gate valve and a pipe reducer in a piping system.  相似文献   

18.
A new modular code called BOUT++ is presented, which simulates 3D fluid equations in curvilinear coordinates. Although aimed at simulating Edge Localised Modes (ELMs) in tokamak x-point geometry, the code is able to simulate a wide range of fluid models (magnetised and unmagnetised) involving an arbitrary number of scalar and vector fields, in a wide range of geometries. Time evolution is fully implicit, and 3rd-order WENO schemes are implemented. Benchmarks are presented for linear and non-linear problems (the Orszag-Tang vortex) showing good agreement. Performance of the code is tested by scaling with problem size and processor number, showing efficient scaling to thousands of processors.Linear initial-value simulations of ELMs using reduced ideal MHD are presented, and the results compared to the ELITE linear MHD eigenvalue code. The resulting mode-structures and growth-rate are found to be in good agreement (γBOUT++=0.245ωA, γELITE=0.239ωA, with Alfvénic timescale 1/ωA=R/VA). To our knowledge, this is the first time dissipationless, initial-value simulations of ELMs have been successfully demonstrated.  相似文献   

19.
20.
Many visual tasks in modern personal devices such smartphones resort heavily to graphics processing units (GPUs) for their fluent user experiences. Because most GPUs for embedded systems are non-preemptive by nature, it is important to schedule GPU resources efficiently across multiple GPU tasks. We present a novel spatial resource sharing (SRS) technique for GPU tasks, called a budget-reservation spatial resource sharing (BR-SRS) scheduling, which limits the number of GPU processing cores for a job based on the priority of the job. Such a priority-driven resource assignment can prevent a high-priority foreground GPU task from being delayed by background GPU tasks. The BR-SRS scheduler is invoked only twice at the arrival and completion of jobs, and thus, the scheduling overhead is minimized as well. We evaluated the performance of our scheduling scheme in an Android-based smartphone, and found that the proposed technique significantly improved the performance of high-priority tasks in comparison to the previous temporal budget-based multi-task scheduling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号