首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hyperspectral unmixing is essential for efficient hyperspectral image processing. Nonnegative matrix factorization based on minimum volume constraint (MVC-NMF) is one of the most widely used methods for unsupervised unmixing for hyperspectral image without the pure-pixel assumption. But the model of MVC-NMF is unstable, and the traditional solution based on projected gradient algorithm (PG-MVC-NMF) converges slowly with low accuracy. In this paper, a novel parallel method is proposed for minimum volume constrained hyperspectral image unmixing on CPU–GPU Heterogeneous Platform. First, a optimized unmixing model of minimum logarithmic volume regularized NMF is introduced and solved based on the second-order approximation of function and alternating direction method of multipliers (SO-MVC-NMF). Then, the parallel algorithm for optimized MVC-NMF (PO-MVC-NMF) is proposed based on the CPU–GPU heterogeneous platform, taking advantage of the parallel processing capabilities of GPUs and logic control abilities of CPUs. Experimental results based on both simulated and real hyperspectral images indicate that the proposed algorithm is more accurate and robust than the traditional PG-MVC-NMF, and the total speedup of PO-MVC-NMF compared to PG-MVC-NMF is over 50 times.  相似文献   

2.
Solving block-tridiagonal systems is one of the key issues in numerical simulations of many scientific and engineering problems. Non-zero elements are mainly concentrated in the blocks on the main diagonal for most block-tridiagonal matrices, and the blocks above and below the main diagonal have little non-zero elements. Therefore, we present a solving method which mixes direct and iterative methods. In our method, the submatrices on the main diagonal are solved by the direct methods in the iteration processes. Because the approximate solutions obtained by the direct methods are closer to the exact solutions, the convergence speed of solving the block-tridiagonal system of linear equations can be improved. Some direct methods have good performance in solving small-scale equations, and the sub-equations can be solved in parallel. We present an improved algorithm to solve the sub-equations by thread blocks on GPU, and the intermediate data are stored in shared memory, so as to significantly reduce the latency of memory access. Furthermore, we analyze cloud resources scheduling model and obtain ten block-tridiagonal matrices which are produced by the simulation of the cloud-computing system. The computing performance of solving these block-tridiagonal systems of linear equations can be improved using our method.  相似文献   

3.
Incorporating a GPU architecture into CMP, which is more efficient with certain types of applications, is a popular architecture trend in recent processors. This heterogeneous mix of architectures will use an on-chip interconnection to access shared resources such as last-level cache tiles and memory controllers. The configuration of this on-chip network will likely have a significant impact on resource distribution, fairness, and overall performance.  相似文献   

4.

In the last decades, the socio-demographic evolution of the population has substantially changed mobility demand, posing new challenges in minimizing urban congestion and reducing environmental impact. In this scenario, understanding how different modes of transport can efficiently share (partially or totally) a common infrastructure is crucial for urban development. To this aim, we present a stochastic model-based analysis of critical intersections shared by tram traffic and private traffic, combining a microscopic model of the former with a macroscopic model of the latter. Advanced simulation tools are typically used for such kind of analyses, by playing various traffic scenarios. However, simulation is not an exhaustive approach, and some critical, possibly rare, event may be ignored. For this reason, our aim is instead to adopt suitable analytical solution techniques and tools that can support instead a complete, exhaustive analysis, so being able to take into account rare events as well. Transient analysis of the overall traffic model using the method of stochastic state classes is adopted to support the evaluation of relevant performance measures, namely the probability of traffic congestion over time and the average number of private vehicles in the queue over time. A sensitivity analysis is performed with respect to multiple parameters, notably including the arrival rate of private vehicles, the frequency of tram rides, and the time needed to recover from traffic congestion.

  相似文献   

5.
Heterogeneous architectures comprising a multi-core CPU and many-core GPU(s) are increasingly being used within cluster and cloud environments. In this paper, we study the problem of optimizing the overall throughput of a set of applications deployed on a cluster of such heterogeneous nodes. We consider two different scheduling formulations. In the first formulation, we consider jobs that can be executed on either the GPU or the CPU of a single node. In the second formulation, we consider jobs that can be executed on the CPU, GPU, or both, of any number of nodes in the system. We have developed scheduling schemes addressing both of the problems. In our evaluation, we first show that the schemes proposed for first formulation outperform a blind round-robin scheduler and approximate the performances of an ideal scheduler that involves an impractical exhaustive exploration of all possible schedules. Next, we show that the scheme proposed for the second formulation outperforms the best of existing schemes for heterogeneous clusters, TORQUE and MCT, by up to 42%. Additionally, we evaluate the robustness of our proposed scheduling policies under inaccurate inputs to account for real execution scenarios. We show that, with up to 20% of inaccuracy in the input, the degradation in performance is marginal (less than 7%) on the average.  相似文献   

6.
Multiphase flow implementations of the lattice Boltzmann method (LBM) are widely applied to the study of porous medium systems. In this work, we construct a new variant of the popular “color” LBM for two-phase flow in which a three-dimensional, 19-velocity (D3Q19) lattice is used to compute the momentum transport solution while a three-dimensional, seven velocity (D3Q7) lattice is used to compute the mass transport solution. Based on this formulation, we implement a novel heterogeneous GPU-accelerated algorithm in which the mass transport solution is computed by multiple shared memory CPU cores programmed using OpenMP while a concurrent solution of the momentum transport is performed using a GPU. The heterogeneous solution is demonstrated to provide speedup of 2.6×2.6× as compared to multi-core CPU solution and 1.8×1.8× compared to GPU solution due to concurrent utilization of both CPU and GPU bandwidths. Furthermore, we verify that the proposed formulation provides an accurate physical representation of multiphase flow processes and demonstrate that the approach can be applied to perform heterogeneous simulations of two-phase flow in porous media using a typical GPU-accelerated workstation.  相似文献   

7.
8.
Neural Computing and Applications - Seismic catalogs are vital to understanding and analyzing the progress of active fault systems. The background seismicity rate in a seismic catalog, strongly...  相似文献   

9.
10.
《Parallel Computing》2014,40(5-6):70-85
QR factorization is a computational kernel of scientific computing. How can the latest computer be used to accelerate this task? We investigate this topic by proposing a dense QR factorization algorithm with adaptive block sizes on a hybrid system that contains a central processing unit (CPU) and a graphic processing unit (GPU). To maximize the use of CPU and GPU, we develop an adaptive scheme that chooses block size at each iteration. The decision is based on statistical surrogate models of performance and an online monitor, which avoids unexpected occasional performance drops. We modify the highly optimized CPU–GPU based QR factorization in MAGMA to implement the proposed schemes. Numerical results suggest that our approaches are efficient and can lead to near-optimal block sizes. The proposed algorithm can be extended to other one-sided factorizations, such as LU and Cholesky factorizations.  相似文献   

11.
Similarity search in high-dimensional spaces is a pivotal operation for several database applications, including online content-based multimedia services. With the increasing popularity of multimedia applications, these services are facing new challenges regarding (1) the very large and growing volumes of data to be indexed/searched and (2) the necessity of reducing the response times as observed by end-users. In addition, the nature of the interactions between users and online services creates fluctuating query request rates throughout execution, which requires a similarity search engine to adapt to better use the computation platform and minimize response times. In this work, we address these challenges with Hypercurves, a flexible framework for answering approximate k-nearest neighbor (kNN) queries for very large multimedia databases. Hypercurves executes in hybrid CPU–GPU environments and is able to attain massive query-processing rates through the cooperative use of these devices. Hypercurves also changes its CPU–GPU task partitioning dynamically according to the observed load, aiming for optimal response times. In our empirical evaluation, dynamic task partitioning reduced query response times by approximately 50 % compared to the best static task partition. Due to a probabilistic proof of equivalence to the sequential kNN algorithm, the CPU–GPU execution of Hypercurves in distributed (multi-node) environments can be aggressively optimized, attaining superlinear scalability while still guaranteeing, with high probability, results at least as good as those from the sequential algorithm.  相似文献   

12.
A two-dimensional multiple-histogram method for isothermal–isobaric ensemble is discussed in detail, implemented for isothermal–isobaric Monte Carlo simulations of molecular clusters, and employed in a case study on phase changes in pure water clusters containing 15 through 21 water molecules. Full phase diagrams of these clusters are reported in the temperature–pressure plane over a broad range of temperatures (T=30–800T=30800 K) and pressures P=103–109P=103109 Pa. The main focus of the work is on the structural transformation occurring in the solid phase of these clusters and leading from cluster structures with all molecules on the cluster surface to cage-like structures with one molecule inside, and on how the transformation is influenced by increased pressure and temperature.  相似文献   

13.
The automatic generation of 3D finite element meshes (FEM) is still a bottleneck for the simulation of large fluid dynamic problems. Although today there are several algorithms that can generate good meshes without user intervention, in cases where the geometry changes during the calculation and thousands of meshes must be constructed, the computational cost of this process can exceed the cost of the FEM. There has been a lot of work in FEM parallelization and the algorithms work well in different parallel architectures, but at present there has not been much success in the parallelization of mesh generation methods. This paper will present a massive parallelization scheme for re-meshing with tetrahedral elements using the local modification algorithm. This method is frequently used to improve the quality of elements once the mesh has been generated, but we will show it can also be applied as a regeneration process, starting with the distorted and invalid mesh of the previous step. The parallelization is carried out using OpenCL and OpenMP in order to test the method in a multiple CPU architecture and also in Graphics Processing Units (GPUs). Finally we present the speedup and quality results obtained in meshes with hundreds of thousands of elements and different parallel APIs.  相似文献   

14.
Modern graphics processing units (GPUs) have been widely utilized in magnetohydrodynamic (MHD) simulations in recent years. Due to the limited memory of a single GPU, distributed multi-GPU systems are needed to be explored for large-scale MHD simulations. However, the data transfer between GPUs bottlenecks the efficiency of the simulations on such systems. In this paper we propose a novel GPU Direct–MPI hybrid approach to address this problem for overall performance enhancement. Our approach consists of two strategies: (1) We exploit GPU Direct 2.0 to speedup the data transfers between multiple GPUs in a single node and reduce the total number of message passing interface (MPI) communications; (2) We design Compute Unified Device Architecture (CUDA) kernels instead of using memory copy to speedup the fragmented data exchange in the three-dimensional (3D) decomposition. 3D decomposition is usually not preferable for distributed multi-GPU systems due to its low efficiency of the fragmented data exchange. Our approach has made a breakthrough to make 3D decomposition available on distributed multi-GPU systems. As a result, it can reduce the memory usage and computation time of each partition of the computational domain. Experiment results show twice the FLOPS comparing to common 2D decomposition MPI-only implementation method. The proposed approach has been developed in an efficient implementation for MHD simulations on distributed multi-GPU systems, called MGPU–MHD code. The code realizes the GPU parallelization of a total variation diminishing (TVD) algorithm for solving the multidimensional ideal MHD equations, extending our work from single GPU computation (Wong et al., 2011) to multiple GPUs. Numerical tests and performance measurements are conducted on the TSUBAME 2.0 supercomputer at the Tokyo Institute of Technology. Our code achieves 2 TFLOPS in double precision for the problem with 12003 grid points using 216 GPUs.  相似文献   

15.
Numerical simulations have been performed on the pressure-driven rarefied flow through channels with a sudden contraction–expansion of 2:1:2 using isothermal two and three-dimensional lattice Boltzmann method (LBM). In the LBM, a Bosanquet-type effective viscosity and a modified second-order slip boundary condition are used to account for the rarefaction effect on gas viscosity to cover the slip and transition flow regimes, that is, a wider range of Knudsen number. Firstly, the in-house LBM code is verified by comparing the computed pressure distribution and flow pattern with experimental ones measured by others. The verified code is then used to study the effects of the outlet Knudsen number Kn o , driving pressure ratio P i /P o , and Reynolds number Re, respectively, varied in the ranges of 0.001–1.0, 1.15–5.0, and 0.02–120, on the pressure distributions and flow patterns as well as to document the differences between continuum and rarefied flows. Results are discussed in terms of the distributions of local pressure, Knudsen number, centerline velocity, and Mach number. The variations of flow patterns and vortex length with Kn o and Re are also documented. Moreover, a critical Knudsen number is identified to be Kn oc  = 0.1 below and above which the behaviors of nonlinear pressure profile and velocity distribution and the variations of vortex length with Re upstream and downstream of constriction are different from those of continuum flows.  相似文献   

16.
17.
TH-PPM-UWB system of image transmission is presented in this paper. And then the performance analysis over an ideal AWGN (Additive White Gaussian Noise) channel is given. The studies indicate that the selection of parameters has a great influence on BER (Bit Error Rate) of the system, especially the number of pulses per information bit. The simulation results of image transmission demonstrate that a larger number of pulses per information bit will lead to better BER performance.  相似文献   

18.
In this paper, a lattice Boltzmann model for the Korteweg–de Vries (KdV) equation with higher-order accuracy of truncation error is presented by using the higher-order moment method. In contrast to the previous lattice Boltzmann model, our method has a wide flexibility to select equilibrium distribution function. The higher-order moment method bases on so-called a series of lattice Boltzmann equation obtained by using multi-scale technique and Chapman–Enskog expansion. We can also control the stability of the scheme by modulating some special moments to design the dispersion term and the dissipation term. The numerical example shows the higher-order moment method can be used to raise the accuracy of truncation error of the lattice Boltzmann scheme.  相似文献   

19.
20.
This paper presents a new optimization method for coupled vehicle–bridge systems subjected to uneven road surface excitation. The vehicle system is simplified as a multiple rigid-body model and the single-span bridge is modeled as a simply supported Bernoulli–Euler beam. The pseudo-excitation method transforms the random surface roughness into the superposition of a series of deterministic pseudo-harmonic excitations, which enables convenient and accurate computation of first and second order sensitivity information. The precise integration method is used to compute the vertical random vibrations for both the vehicle and the bridge. The sensitivities are used to find the optimal solution, with vehicle ride comfort taken as the objective function. Optimization efficiency and computational accuracy are demonstrated numerically.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号