期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Accelerating the Lyapack library using GPUs

Ernesto Dufrechu Pablo Ezzatti Enrique S. Quintana-Ortí Alfredo Remón 《The Journal of supercomputing》2013,65(3):1114-1124

Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend with new functionality. However, in general, the use of Matlab in combination with a general-purpose multi-core architecture (CPU) offers limited performance when tackling the sparse linear algebra operations underlying the numerical methods involved in control theory. In this paper we extend Lyapack to leverage the computational power of graphics processors (GPUs). The experimental evaluation of a new CUDA-enabled solver for the Lyapunov equation, a crucial operation appearing in control theory problems, shows a significant runtime reduction when compared with the original CPU version of Lyapack, while retaining the usability of a Matlab-based implementation. 相似文献

2.

基于GPU实现的大量刚体运动实时仿真

彭林春杨红雨杨光《计算机仿真》2013,30(2)

针对存在大量刚体运动的虚拟场景,提出一种基于在GPU上实现的实时仿真算法,利用GPU的并行计算能力,实时处理刚体交互,更新刚体状态.使用深度剥离技术,离散化刚体,并使其由一组具有相同大小的粒子表示.每一帧刚体间的运动交互则由粒子间的运动交互来实现.碰撞粒子对间的交互则用离散元法.通过统一网格方式,分割仿真域空间,以提高碰撞检测速率.实验证明,提出的新算法大幅度的提升了仿真大量刚体的速率. 相似文献

3.

Simulating a P system based efficient solution to SAT by using GPUs

《The Journal of Logic and Algebraic Programming》2010,79(6):317-325

P systems are inherently parallel and non-deterministic theoretical computing devices defined inside the field of Membrane Computing. Many P system simulators have been presented in this area, but they are inefficient since they cannot handle the parallelism of these devices. Nowadays, we are witnessing the consolidation of the GPUs as a parallel framework to compute general purpose applications. In this paper, we analyse GPUs as an alternative parallel architecture to improve the performance in the simulation of P systems, and we illustrate it by using the case study of a family of P systems that provides an efficient and uniform solution to the SAT problem. Firstly, we develop a simulator that fully simulates the computation of the P system, demonstrating that GPUs are well suited to simulate them. Then, we adapt this simulator to the GPU architecture idiosyncrasies, improving the performance of the previous simulator. 相似文献

4.

应用平行坐标系进行可视化优化设计 总被引：1，自引：0，他引：1

王绍敏孙晓静王克峰姚平经《计算机与应用化学》2004,21(1):11-15

应用可视化方法进行化工过程优化设计：应用平行坐标系进行多维矢量可视化,并在此基础上提出了一种在n维空间用于多维矢量线性回归的“扫描线”算法。在优化问题中该算法可以将问题简化到一或二维,从而以二到三维笛卡尔坐标为基础的传统可视化方法可用于优化设计。将这种方法与遗传算法相结合,不仅可以得到较好的优化结果,还可得到一些“额外信息”以使用户理解基于可视化的优化设计过程并接受该结果。相似文献

5.

A watershed data management and visualization system using code-first approach

Yan Li Rui Gao Xiaobin Kang Chong Chen Qingguo Zhou 《Multimedia Tools and Applications》2017,76(18):18221-18235

With rapid growth of population and economic development in northwest China, water resource over-exploitation has led to severe deterioration of watershed ecosystems. In this study, we developed a hydrological information platform (called Watershed Datacenter System) for sharing, managing, analyzing and visualizing a diverse range of hydrologic data collected at watershed scale. This platform can help investigators and geotechnical experts to conduct watershed researches with the intensive data convenience. This Watershed Datacenter system (WDC) is developed with Entity Framework 6 (EF6) approach which based on Model-View-Controller (MVC) architecture pattern and several other useful technologies, such as ArcGIS API and Responsive web design. Observation Database Model (ODM), hydrological model as a service, Web services and time-series analysis tools are seamlessly integrated into our WDC with the help of open source HIS (Hydrologic Information System) from the CUAHSI (Consortium of Universities for the Advancement of Hydrologic Science, Inc.). The results demonstrate that the WDC offers quite a few advantageous features for managing and analyzing of the data for watershed research. 相似文献

6.

Accelerating compute-intensive image segmentation algorithms using GPUs

Mohammed Shehab Mahmoud Al-Ayyoub Yaser Jararweh Moath Jarrah 《The Journal of supercomputing》2017,73(5):1929-1951

Image segmentation is an important process that facilitates image analysis such as in object detection. Because of its importance, many different algorithms were proposed in the last decade to enhance image segmentation techniques. Clustering algorithms are among the most popular in image segmentation. The proposed algorithms differ in their accuracy and computational efficiency. This paper studies the most famous and new clustering algorithms and provides an analysis on their feasibility for parallel implementation. We have studied four algorithms which are: fuzzy C-mean, type-2 fuzzy C-mean, interval type-2 fuzzy C-mean, and modified interval type-2 fuzzy C-mean. We have implemented them in a sequential (CPU only) and a parallel hybrid CPU–GPU version. Speedup gains of 6\(\times \) to 20\(\times \) were achieved in the parallel implementation over the sequential implementation. We detail in this paper our discoveries on the portions of the algorithms that are highly parallel so as to help the image processing community, especially if these algorithms are to be used in real-time processing where efficient computation is critical. 相似文献

7.

Rapid pairwise intersection tests using programmable GPUs

Yoo-Joo Choi Young J. Kim Myoung-Hee Kim 《The Visual computer》2006,22(2):80-89

Detecting self-intersections within a triangular mesh model is fundamentally a quadratic problem in terms of its computational complexity, since in principle all triangles must be compared with all others. We reflect the 2D nature of this process by storing the triangles as multiple 1D textures in texture memory, and then exploit the massive parallelism of graphics processing units (GPUs) to perform pairwise comparisons, using a pixel shader. This approach avoids the creation and maintenance of auxiliary geometric structures, such as a bounding volume hierarchy (BVH); but nevertheless we can plug in auxiliary culling schemes, and use stencils to indicate triangle pairs that do not need to be compared. To overcome the readback bottleneck between GPU and CPU, we use a hierarchical encoding scheme. We have applied our technique to detecting self-intersections in extensively deformed models, and we achieve an order of magnitude increase in performance over CPU-based techniques such as [17]. 相似文献

8.

Parallel border tracking in binary images using GPUs

Garcia-Molla Victor M. Alonso-Jordá Pedro García-Laguía Ricardo 《The Journal of supercomputing》2022,78(7):9817-9839

The Journal of Supercomputing - Border tracking in binary images is an important kernel for many applications. There are very efficient sequential algorithms, most notably, the algorithm proposed... 相似文献

9.

Parallel multi-objective Ant Programming for classification using GPUs

Alberto Cano Juan Luis Olmo Sebastián Ventura 《Journal of Parallel and Distributed Computing》2013

Classification using Ant Programming is a challenging data mining task which demands a great deal of computational resources when handling data sets of high dimensionality. This paper presents a new parallelization approach of an existing multi-objective Ant Programming model for classification, using GPUs and the NVIDIA CUDA programming model. The computational costs of the different steps of the algorithm are evaluated and it is discussed how best to parallelize them. The features of both the CPU parallel and GPU versions of the algorithm are presented. An experimental study is carried out to evaluate the performance and efficiency of the interpreter of the rules, and reports the execution times and speedups regarding variable population size, complexity of the rules mined and dimensionality of the data sets. Experiments measure the original single-threaded and the new multi-threaded CPU and GPU times with different number of GPU devices. The results are reported in terms of the number of Giga GP operations per second of the interpreter (up to 10 billion GPops/s) and the speedup achieved (up to 834× vs CPU, 212× vs 4-threaded CPU). The proposed GPU model is demonstrated to scale efficiently to larger datasets and to multiple GPU devices, which allows the expansion of its applicability to significantly more complicated data sets, previously unmanageable by the original algorithm in reasonable time. 相似文献

10.

Many-core on-the-fly model checking of safety properties using GPUs

Anton Wijs Dragan Bošnački 《International Journal on Software Tools for Technology Transfer (STTT)》2016,18(2):169-185

相似文献

11.

Simulation and visualization of nanodiamond and nanographite

Joan Adler Jeremie Zaffran Amihai Silverman Anastassia Sorkin Or Cohen Rafi Kalish 《Computer Physics Communications》2011,(9):2009-2012

Carbon allotropes exhibit an enormous range of properties due to their varied hybridizations. Different hybridizations are correlated with different geometrical structures. We address the visualization of both the geometry and the electronic density for different cases including diamonds with defects and mixed diamond/graphite/amorphous samples, in order to explore this connection more deeply. 相似文献

12.

Speeding up global illumination computations using programmable GPUs

G. Fournier B. Proche 《Simulation Modelling Practice and Theory》2005,13(8):727-740

In the context of realistic image synthesis, many stochastic methods have been proposed to sample direct and indirect radiance. We present new ways to use hardware graphics to sample direct and indirect lighting in a scene. Jittered sampling of light sources can easily be implemented on a fragment program to obtain soft shadow samples. Using a voxel representation of the scene, indirect illumination can be computed using hemispherical jittered sampling. These algorithms have been implemented in our rendering framework but can be used in other contexts like radiosity or final gathering of the photon map. 相似文献

13.

Accelerating 3D medical volume segmentation using GPUs

Al-Ayyoub Mahmoud AlZu’bi Shadi Jararweh Yaser Shehab Mohammed A. Gupta Brij B. 《Multimedia Tools and Applications》2018,77(4):4939-4958

Multimedia Tools and Applications - Medical images have an undeniably integral role in the process of diagnosing and treating of a very large number of ailments. Processing such images (for... 相似文献

14.

High-performance cone beam reconstruction using CUDA compatible GPUs

Yusuke Okitsu Fumihiko Ino Kenichi Hagihara 《Parallel Computing》2010,36(2-3):129-141

Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 512³-voxel volume from 360 512²-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 1024³-voxel volume. 相似文献

15.

An efficient parallel-network packet pattern-matching approach using GPUs

《Journal of Systems Architecture》2014,60(5):431-439

In the past few years, the increase in interest usage has been substantial. The high network bandwidth speed and the large amount of threats pose challenges to current network intrusion detection systems, which manage high amounts of network traffic and perform complicated packet processing. Pattern matching is a computationally intensive process included in network intrusion detection systems. In this paper, we present an efficient graphics processing unit (GPU)-based network packet pattern-matching algorithm by leveraging the computational power of GPUs to accelerate pattern-matching operations and subsequently increase the overall processing throughput. According to the experimental results, the proposed algorithm achieved a maximal traffic processing throughput of over 2 Gbit/s. The results demonstrate that the proposed GPU-based algorithm can effectively enhance the performance of network intrusion detection systems. 相似文献

16.

Optimizing convolution operations on GPUs using adaptive tiling

《Future Generation Computer Systems》2014

The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. High-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date. 相似文献

17.

On numerical stabilization in the solution of Saint-Venant equations using the finite element method

Fatemeh Zarmehi Majid Rahimpour 《Computers & Mathematics with Applications》2011,62(4):1957-1968

Solving the Saint-Venant equations by using numerical schemes like finite difference and finite element methods leads to some unwanted oscillations in the water surface elevation. The reason for these oscillations lies in the method used for the approximation of the nonlinear terms. One of the ways of smoothing these oscillations is by adding artificial viscosity into the scheme. In this paper, by using a suitable discretization, we first solve the one-dimensional Saint-Venant equations by a finite element method and eliminate the unwanted oscillations without using an artificial viscosity. Second, our main discussion is concentrated on numerical stabilization of the solution in detail. In fact, we first convert the systems resulting from the discretization to systems relating to just water surface elevation. Then, by using M-matrix properties, the stability of the solution is shown. Finally, two numerical examples of critical and subcritical flows are given to support our results. 相似文献

18.

Scalable CAIM discretization on multiple GPUs using concurrent kernels

Alberto Cano Sebastián Ventura Krzysztof J. Cios 《The Journal of supercomputing》2014,69(1):273-292

Class-attribute interdependence maximization (CAIM) is one of the state-of-the-art algorithms for discretizing data for which classes are known. However, it may take a long time when run on high-dimensional large-scale data, with large number of attributes and/or instances. This paper presents a solution to this problem by introducing a graphic processing unit (GPU)-based implementation of the CAIM algorithm that significantly speeds up the discretization process on big complex data sets. The GPU-based implementation is scalable to multiple GPU devices and enables the use of concurrent kernels execution capabilities of modern GPUs. The CAIM GPU-based model is evaluated and compared with the original CAIM using single and multi-threaded parallel configurations on 40 data sets with different characteristics. The results show great speedup, up to 139 times faster using four GPUs, which makes discretization of big data efficient and manageable. For example, discretization time of one big data set is reduced from 2 h to \(<\) 2 min. 相似文献

19.

A kinetic scheme for the Saint-Venant system¶with a source term

B. Perthame C. Simeoni 《Calcolo》2001,38(4):201-231

相似文献

20.

HPCCD: Hybrid Parallel Continuous Collision Detection using CPUs and GPUs

Duksu Kim Jae‐Pil Heo Jaehyuk Huh John Kim Sung‐eui Yoon 《Computer Graphics Forum》2009,28(7):1791-1800

We present a novel, hybrid parallel continuous collision detection (HPCCD) method that exploits the availability of multi‐core CPU and GPU architectures. HPCCD is based on a bounding volume hierarchy (BVH) and selectively performs lazy reconstructions. Our method works with a wide variety of deforming models and supports self‐collision detection. HPCCD takes advantage of hybrid multi‐core architectures – using the general‐purpose CPUs to perform the BVH traversal and culling while GPUs are used to perform elementary tests that reduce to solving cubic equations. We propose a novel task decomposition method that leads to a lock‐free parallel algorithm in the main loop of our BVH‐based collision detection to create a highly scalable algorithm. By exploiting the availability of hybrid, multi‐core CPU and GPU architectures, our proposed method achieves more than an order of magnitude improvement in performance using four CPU‐cores and two GPUs, compared to using a single CPU‐core. This improvement results in an interactive performance, up to 148 fps, for various deforming benchmarks consisting of tens or hundreds of thousand triangles. 相似文献