首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Lyapack is a package for the solution of large-scale sparse problems arising in control theory. The package has a modular design, and is implemented as a Matlab toolbox, which renders it easy to utilize, modify and extend with new functionality. However, in general, the use of Matlab in combination with a general-purpose multi-core architecture (CPU) offers limited performance when tackling the sparse linear algebra operations underlying the numerical methods involved in control theory. In this paper we extend Lyapack to leverage the computational power of graphics processors (GPUs). The experimental evaluation of a new CUDA-enabled solver for the Lyapunov equation, a crucial operation appearing in control theory problems, shows a significant runtime reduction when compared with the original CPU version of Lyapack, while retaining the usability of a Matlab-based implementation.  相似文献   

2.
针对存在大量刚体运动的虚拟场景,提出一种基于在GPU上实现的实时仿真算法,利用GPU的并行计算能力,实时处理刚体交互,更新刚体状态.使用深度剥离技术,离散化刚体,并使其由一组具有相同大小的粒子表示.每一帧刚体间的运动交互则由粒子间的运动交互来实现.碰撞粒子对间的交互则用离散元法.通过统一网格方式,分割仿真域空间,以提高碰撞检测速率.实验证明,提出的新算法大幅度的提升了仿真大量刚体的速率.  相似文献   

3.
P systems are inherently parallel and non-deterministic theoretical computing devices defined inside the field of Membrane Computing. Many P system simulators have been presented in this area, but they are inefficient since they cannot handle the parallelism of these devices. Nowadays, we are witnessing the consolidation of the GPUs as a parallel framework to compute general purpose applications. In this paper, we analyse GPUs as an alternative parallel architecture to improve the performance in the simulation of P systems, and we illustrate it by using the case study of a family of P systems that provides an efficient and uniform solution to the SAT problem. Firstly, we develop a simulator that fully simulates the computation of the P system, demonstrating that GPUs are well suited to simulate them. Then, we adapt this simulator to the GPU architecture idiosyncrasies, improving the performance of the previous simulator.  相似文献   

4.
应用平行坐标系进行可视化优化设计   总被引:1,自引:0,他引:1  
应用可视化方法进行化工过程优化设计:应用平行坐标系进行多维矢量可视化,并在此基础上提出了一种在n维空间用于多维矢量线性回归的“扫描线”算法。在优化问题中该算法可以将问题简化到一或二维,从而以二到三维笛卡尔坐标为基础的传统可视化方法可用于优化设计。将这种方法与遗传算法相结合,不仅可以得到较好的优化结果,还可得到一些“额外信息”以使用户理解基于可视化的优化设计过程并接受该结果。  相似文献   

5.
With rapid growth of population and economic development in northwest China, water resource over-exploitation has led to severe deterioration of watershed ecosystems. In this study, we developed a hydrological information platform (called Watershed Datacenter System) for sharing, managing, analyzing and visualizing a diverse range of hydrologic data collected at watershed scale. This platform can help investigators and geotechnical experts to conduct watershed researches with the intensive data convenience. This Watershed Datacenter system (WDC) is developed with Entity Framework 6 (EF6) approach which based on Model-View-Controller (MVC) architecture pattern and several other useful technologies, such as ArcGIS API and Responsive web design. Observation Database Model (ODM), hydrological model as a service, Web services and time-series analysis tools are seamlessly integrated into our WDC with the help of open source HIS (Hydrologic Information System) from the CUAHSI (Consortium of Universities for the Advancement of Hydrologic Science, Inc.). The results demonstrate that the WDC offers quite a few advantageous features for managing and analyzing of the data for watershed research.  相似文献   

6.
Image segmentation is an important process that facilitates image analysis such as in object detection. Because of its importance, many different algorithms were proposed in the last decade to enhance image segmentation techniques. Clustering algorithms are among the most popular in image segmentation. The proposed algorithms differ in their accuracy and computational efficiency. This paper studies the most famous and new clustering algorithms and provides an analysis on their feasibility for parallel implementation. We have studied four algorithms which are: fuzzy C-mean, type-2 fuzzy C-mean, interval type-2 fuzzy C-mean, and modified interval type-2 fuzzy C-mean. We have implemented them in a sequential (CPU only) and a parallel hybrid CPU–GPU version. Speedup gains of 6\(\times \) to 20\(\times \) were achieved in the parallel implementation over the sequential implementation. We detail in this paper our discoveries on the portions of the algorithms that are highly parallel so as to help the image processing community, especially if these algorithms are to be used in real-time processing where efficient computation is critical.  相似文献   

7.
Detecting self-intersections within a triangular mesh model is fundamentally a quadratic problem in terms of its computational complexity, since in principle all triangles must be compared with all others. We reflect the 2D nature of this process by storing the triangles as multiple 1D textures in texture memory, and then exploit the massive parallelism of graphics processing units (GPUs) to perform pairwise comparisons, using a pixel shader. This approach avoids the creation and maintenance of auxiliary geometric structures, such as a bounding volume hierarchy (BVH); but nevertheless we can plug in auxiliary culling schemes, and use stencils to indicate triangle pairs that do not need to be compared. To overcome the readback bottleneck between GPU and CPU, we use a hierarchical encoding scheme. We have applied our technique to detecting self-intersections in extensively deformed models, and we achieve an order of magnitude increase in performance over CPU-based techniques such as [17].  相似文献   

8.
The Journal of Supercomputing - Border tracking in binary images is an important kernel for many applications. There are very efficient sequential algorithms, most notably, the algorithm proposed...  相似文献   

9.
Classification using Ant Programming is a challenging data mining task which demands a great deal of computational resources when handling data sets of high dimensionality. This paper presents a new parallelization approach of an existing multi-objective Ant Programming model for classification, using GPUs and the NVIDIA CUDA programming model. The computational costs of the different steps of the algorithm are evaluated and it is discussed how best to parallelize them. The features of both the CPU parallel and GPU versions of the algorithm are presented. An experimental study is carried out to evaluate the performance and efficiency of the interpreter of the rules, and reports the execution times and speedups regarding variable population size, complexity of the rules mined and dimensionality of the data sets. Experiments measure the original single-threaded and the new multi-threaded CPU and GPU times with different number of GPU devices. The results are reported in terms of the number of Giga GP operations per second of the interpreter (up to 10 billion GPops/s) and the speedup achieved (up to 834× vs CPU, 212× vs 4-threaded CPU). The proposed GPU model is demonstrated to scale efficiently to larger datasets and to multiple GPU devices, which allows the expansion of its applicability to significantly more complicated data sets, previously unmanageable by the original algorithm in reasonable time.  相似文献   

10.
11.
Carbon allotropes exhibit an enormous range of properties due to their varied hybridizations. Different hybridizations are correlated with different geometrical structures. We address the visualization of both the geometry and the electronic density for different cases including diamonds with defects and mixed diamond/graphite/amorphous samples, in order to explore this connection more deeply.  相似文献   

12.
In the context of realistic image synthesis, many stochastic methods have been proposed to sample direct and indirect radiance. We present new ways to use hardware graphics to sample direct and indirect lighting in a scene. Jittered sampling of light sources can easily be implemented on a fragment program to obtain soft shadow samples. Using a voxel representation of the scene, indirect illumination can be computed using hemispherical jittered sampling. These algorithms have been implemented in our rendering framework but can be used in other contexts like radiosity or final gathering of the photon map.  相似文献   

13.
Multimedia Tools and Applications - Medical images have an undeniably integral role in the process of diagnosing and treating of a very large number of ailments. Processing such images (for...  相似文献   

14.
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 5123-voxel volume from 360 5122-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 10243-voxel volume.  相似文献   

15.
In the past few years, the increase in interest usage has been substantial. The high network bandwidth speed and the large amount of threats pose challenges to current network intrusion detection systems, which manage high amounts of network traffic and perform complicated packet processing. Pattern matching is a computationally intensive process included in network intrusion detection systems. In this paper, we present an efficient graphics processing unit (GPU)-based network packet pattern-matching algorithm by leveraging the computational power of GPUs to accelerate pattern-matching operations and subsequently increase the overall processing throughput. According to the experimental results, the proposed algorithm achieved a maximal traffic processing throughput of over 2 Gbit/s. The results demonstrate that the proposed GPU-based algorithm can effectively enhance the performance of network intrusion detection systems.  相似文献   

16.
The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. High-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date.  相似文献   

17.
Solving the Saint-Venant equations by using numerical schemes like finite difference and finite element methods leads to some unwanted oscillations in the water surface elevation. The reason for these oscillations lies in the method used for the approximation of the nonlinear terms. One of the ways of smoothing these oscillations is by adding artificial viscosity into the scheme. In this paper, by using a suitable discretization, we first solve the one-dimensional Saint-Venant equations by a finite element method and eliminate the unwanted oscillations without using an artificial viscosity. Second, our main discussion is concentrated on numerical stabilization of the solution in detail. In fact, we first convert the systems resulting from the discretization to systems relating to just water surface elevation. Then, by using M-matrix properties, the stability of the solution is shown. Finally, two numerical examples of critical and subcritical flows are given to support our results.  相似文献   

18.
Class-attribute interdependence maximization (CAIM) is one of the state-of-the-art algorithms for discretizing data for which classes are known. However, it may take a long time when run on high-dimensional large-scale data, with large number of attributes and/or instances. This paper presents a solution to this problem by introducing a graphic processing unit (GPU)-based implementation of the CAIM algorithm that significantly speeds up the discretization process on big complex data sets. The GPU-based implementation is scalable to multiple GPU devices and enables the use of concurrent kernels execution capabilities of modern GPUs. The CAIM GPU-based model is evaluated and compared with the original CAIM using single and multi-threaded parallel configurations on 40 data sets with different characteristics. The results show great speedup, up to 139 times faster using four GPUs, which makes discretization of big data efficient and manageable. For example, discretization time of one big data set is reduced from 2 h to \(<\) 2 min.  相似文献   

19.
20.
We present a novel, hybrid parallel continuous collision detection (HPCCD) method that exploits the availability of multi‐core CPU and GPU architectures. HPCCD is based on a bounding volume hierarchy (BVH) and selectively performs lazy reconstructions. Our method works with a wide variety of deforming models and supports self‐collision detection. HPCCD takes advantage of hybrid multi‐core architectures – using the general‐purpose CPUs to perform the BVH traversal and culling while GPUs are used to perform elementary tests that reduce to solving cubic equations. We propose a novel task decomposition method that leads to a lock‐free parallel algorithm in the main loop of our BVH‐based collision detection to create a highly scalable algorithm. By exploiting the availability of hybrid, multi‐core CPU and GPU architectures, our proposed method achieves more than an order of magnitude improvement in performance using four CPU‐cores and two GPUs, compared to using a single CPU‐core. This improvement results in an interactive performance, up to 148 fps, for various deforming benchmarks consisting of tens or hundreds of thousand triangles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号