首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The p-median problem is a well-known NP-hard problem. Many heuristics have been proposed in the literature for this problem. In this paper, we exploit a GPGPU parallel computing platform to present a new genetic algorithm implemented in CUDA and based on a pseudo-Boolean formulation of the p-median problem. We have tested the effectiveness of our algorithm using a Tesla K40 (2880 CUDA cores) on 290 different benchmark instances obtained from OR-Library, discrete location problems benchmark library, and benchmarks introduced in recent publications. The algorithm succeeded in finding optimal solutions for all instances except for two OR-library instances, namely pmed 30 and pmed 40, where better than 99.9% approximations were obtained.  相似文献   

2.

Different statistical approaches have been proposed last years for finding differentially methylated DNA regions, starting from the outputs of DNA methylation analysis tools. However, these approaches do not allow an interactive and flexible exploration of these regions. Additionally, they add a high computation workload when used with large datasets. In this paper, we propose a new approach consisting in the transformation of DNA methylation results into a methylation signal and the Haar wavelet transformation of that signal for the displaying of the methylation results at different scales. Additionally, we propose the parallelization of the Haar wavelet transform on the GPU, as well as the GPU-based visualization of the methylation signal. The performance evaluation results show that this is the first proposal which allows the interactive visualization of different methylation signals with different resolution levels, in such a way that it can be used to visually detect differentially methylated regions accurately, in a user-friendly and flexible way, and with a very low computational workload.

  相似文献   

3.
Room impulse response (RIR) simulation based on the image-source method is widely used in room acoustic research. The calculation of the RIR in computer has to digitalize sound propagation delay into discrete samples. To carefully consider the digitalization error greatly increases the massive computational load of the image-source method. Therefore many real-time audio applications simply round-off the propagation delay to its nearest sample. This approximation, however, especially when the sampling frequency is low, degrades the phase precision that is required by applications such as microphone array. In this paper, by involving a Hanning-windowed ideal low-pass filter to reduce the digitalization error, a more precise image-source model is studied. We analyze its parallel calculation procedure and propose to use Graphics Processing Unit (GPU) to accelerate the calculation speed. The calculation procedure is divided into many parallel threads and arranged according the GPU architecture and its optimization criteria. We evaluate the calculation speeds of different RIRs using a general 5-core CPU, an ordinary GPU (GTX750) and an advanced GPU (K20C). The results show that, with similar precise RIR results, the speedup ratios of GTX750 and K20C over the general CPU can achieve 20 and 120 respectively.  相似文献   

4.
In this paper we consider the parallelization of the generation and iterative solution of coupled linear systems modelling the interaction of an acoustic field in a fluid medium with an elastic structure immersed in the fluid. The particular case studied is that of a hollow steel sphere in water. The aim of the work is to speed up the generation and solution of the systems. We describe the methods used, which involve special sparse storage arrangements and a novel application of a sparse approximate inverse preconditioning technique, and present results showing that the methods are very effective in terms of speeding up the generation and iterative solution of the systems. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

5.
We introduce a GPU-based parallel vertex substitution (pVS) algorithm for the p-median problem using the CUDA architecture by NVIDIA. pVS is developed based on the best profit search algorithm, an implementation of vertex substitution (VS), that is shown to produce reliable solutions for p-median problems. In our approach, each candidate solution in the entire search space is allocated to a separate thread, rather than dividing the search space into parallel subsets. This strategy maximizes the usage of GPU parallel architecture and results in a significant speedup and robust solution quality. Computationally, pVS reduces the worst case complexity from sequential VS’s O(p · n2) to O(p · (n ? p)) on each thread by parallelizing computational tasks on GPU implementation. We tested the performance of pVS on two sets of numerous test cases (including 40 network instances from OR-lib) and compared the results against a CPU-based sequential VS implementation. Our results show that pVS achieved a speed gain ranging from 10 to 57 times over the traditional VS in all test network instances.  相似文献   

6.
Since sequential languages such as Fortran and C are more machine-independent than current parallel languages, it is highly desirable to develop powerful parallelization tools which can generate parallel codes, automatically or semi-automatically, targeting different parallel architectures. Array data-flow analysis is known to be crucial to the success of automatic parallelization. Such an analysis should be performed interprocedurally and symbolically and it often needs to handle the predicates represented by IF conditions. Unfortunately, such a powerful program analysis can be extremely time-consuming if it is not carefully designed. How to enhance the efficiency of this analysis to a practical level remains an issue largely untouched to date. This paper presents techniques for efficient interprocedural array data-flow analysis and documents experimental results of its implementation in a research parallelizing compiler. Our techniques are based on guarded array regions and the resulting tool runs faster, by one or two orders of magnitude, than other similarly powerful tools  相似文献   

7.
This article describes application of our theory of parallelization of implicit ADI schemes to parabolized flows. A parallel multi-domain version of a turbulent developing flow in a straight duct (case A) and viscous flow in a curved duct (case B) are presented. Semi-implicit and explicit methods for the determination of boundary values for the auxiliary ADI functions on the interfaces between the sub-domains are utilized. Numerical runs show that the proposed algorithm is valid in the regions with rapidly varying fields of governing variables (near-entrance region for the case A, region 30°<θ<60° for the case B) as well as in the regions with slow axial modification of the flowfield. The algorithm is suitable for small transverse velocity (case A) and for transverse velocity of order of streamwise velocity (case B). A simplified version of our theoretical model of parallel efficiency is developed and utilized for optimal multidomain partitioning. Computer runs of the multi-domain code are done on a Meiko CS and on a DEC Alpha farm with PVM communication software. The predictions of parallel efficiency obtained by the model compare well with those of actual computer runs. The parallelization parameters obtained are quite different for two considered MIMD machines. This fact confirms the importance of a priori estimation of parallelization efficiency of an algorithm and correct choice of a parallel computer.  相似文献   

8.
9.
We present a geometry compression scheme for restricted quadtree meshes and use this scheme for the compression of adaptively triangulated digital elevation models (DEMs). A compression factor of 8–9 is achieved by employing a generalized strip representation of quadtree meshes to incrementally encode vertex positions. In combination with adaptive error-controlled triangulation, this allows us to significantly reduce bandwidth requirements in the rendering of large DEMs that have to be paged from disk. The compression scheme is specifically tailored for GPU-based decoding, since it minimizes dependent memory access operations. We can thus trade CPU operations and CPU–GPU data transfer for GPU processing, resulting in twice faster streaming of DEMs from main memory into GPU memory. A novel storage format for decoded DEMs on the GPU facilitates a sustained rendering throughput of about 300 million triangles per second. Due to these properties, the proposed scheme enables scalable rendering with respect to the display resolution independent of the data size. For a maximum screen-space error below 1 pixel it achieves frame rates of over 100 fps, even on high-resolution displays. We validate the efficiency of the proposed method by presenting experimental results on scanned elevation models of several hundred gigabytes.  相似文献   

10.
The focus of this paper is to present the results of our investigation and evaluation of various shared-memory parallelizations of the data association problem in multitarget tracking. The multitarget tracking algorithm developed was for a sparse air traffic surveillance problem, and is based on an Interacting Multiple Model (IMM) state estimator embedded into the (2D) assignment framework. The IMM estimator imposes a computational burden in terms of both space and time complexity, since more than one filter model is used to calculate state estimates, covariances, and likelihood functions. In fact, contrary to conventional wisdom, for sparse multitarget tracking problems, we show that the assignment (or data association) problem is not the major computational bottleneck. Instead, the interface to the assignment problem, namely, computing the rather numerous gating tests and IMM state estimates, covariance calculations, and likelihood function evaluations (used as cost coefficients in the assignment problem), is the major source of the workload. Using a measurement database based on two FAA air traffic control radars, we show that a “coarse-grained” (dynamic) parallelization across the numerous tracks found in a multitarget tracking problem is robust, scalable, and demonstrates superior computational performance to previously proposed “fine-grained” (static) parallelizations within the IMM  相似文献   

11.
New parallel objective function determination methods for the job shop scheduling problem are proposed in this paper, considering makespan and the sum of jobs execution times criteria, however, the methods proposed can be applied also to another popular objective functions such as jobs tardiness or flow time. Parallel Random Access Machine (PRAM) model is applied for the theoretical analysis of algorithm efficiency. The methods need a fine-grained parallelization, therefore the approach proposed is especially devoted to parallel computing systems with fast shared memory (e.g. GPGPU, General-Purpose computing on Graphics Processing Units).  相似文献   

12.
Calculating with graphs and relations has many applications in the analysis of software systems, for example, the detection of design patterns or patterns of problematic design and the computation of design metrics. These applications require an expressive query language, in particular, for the detection of graph patterns, and an efficient evaluation of the queries even for large graphs. In this paper, we introduce RML, a simple language for querying and manipulating relations based on predicate calculus, and CrocoPat, an interpreter for RML programs. RML is general because it enables the manipulation not only of graphs (i.e., binary relations), but of relations of arbitrary arity. CrocoPat executes RML programs efficiently because it internally represents relations as binary decision diagrams, a data structure that is well-known as a compact representation of large relations in computer-aided verification. We evaluate RML by giving example programs for several software analyses and CrocoPat by comparing its performance with calculators for binary relations, a Prolog system, and a relational database management system.  相似文献   

13.
The LPJ dynamic global vegetation and hydrology model with river routing is implemented on a compute cluster in order to reduce the overall computation time. In order to achieve this, a parallel algorithm had to be developed for the river routing part of the LPJ code. It can be shown that the run time of the parallel LPJ model scales well with the number of parallel tasks, even getting a super-linear speedup for 8–128 tasks. The sequential part of the model code can be estimated to be only 0.16%. This offers the opportunity, for example, to apply the model to find optimal climate change mitigation/adaptation paths requiring a multitude of subsequent simulation runs. The algorithm can also be used for networks with a different topology than river routing networks.  相似文献   

14.
He  Wei-Jia  Yang  Ming-Lin  Wang  Wu  Sheng  Xin-Qing 《The Journal of supercomputing》2021,77(2):1502-1516
The Journal of Supercomputing - A many-core parallel approach of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model is presented on the homegrown...  相似文献   

15.
16.
17.
This paper shows that the uniform k-way partitioning problem can be transformed into the max-cut problem using a graph modification technique. An iterative algorith, based on Kernighan-Lin's (KL) method, for partitioning graphs is presented that exploits the problem equivalence property. The algorithm deals with nodes of various sizes without any performance degradation. The computing time per pass of the algorithm is O(itkN2), where N is the number of nodes in the given graph. In practice, only a small number of passes are typically needed, leading to a fast approximation algorithm for k-way partitioning. Experimental results show that the proposed algorithm outperforms the KL algorithm in the quality of solutions. The performance gap between the proposed and KL algorithms becomes much bigger as the amount of size differences between nodes increases.  相似文献   

18.
We consider the routing open shop problem being a generalization of two classical discrete optimization problems: the open shop scheduling problem and the metric traveling salesman problem. The jobs are located at nodes of some transportation network, and the machines travel on the network to execute the jobs in the open shop environment. The machines are initially located at the same node (depot) and must return to the depot after completing all the jobs. It is required to find a non-preemptive schedule with the minimum makespan. The problem is NP-hard even on the two-node network with two machines. We present new polynomial-time approximation algorithms with worst-case performance guarantees.  相似文献   

19.
In this paper two heuristic algorithms are presented for the weighted set covering problem. The first algorithm uses a simple, polynomial procedure to construct feasible covering solutions. The procedure is shown to possess a worst case performance bound that is a function of the size of the problem. The second algorithm is a solution improvement procedure that attempts to form reduced cost composite solutions from available feasible covering solutions. Computational results are presented for both algorithms on several large set covering problems generated from airline crew scheduling data.  相似文献   

20.
The job grouping problem consists of assigning a set of jobs, each with a specific set of tool requirements, to machines with a limited tool capacity in order to minimize the number of machines needed. Traditionally, a formulation has been used that assigns jobs to machines. However, such a formulation contains a lot of symmetry since the machines are identical and they can be permuted in any feasible solution. We propose a new formulation for this problem, based on the asymmetric representatives formulation (ARF) idea. This formulation eliminates the symmetry between the identical machines. We further propose various symmetry breaking constraints, including variable reduction and lexicographic ordering constraints, which can be added to the traditional formulation. These formulations are tested on a data set from the literature and newly generated data sets using a state-of-the-art commercial solver, which includes symmetry breaking features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号