首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents the porting of 2D and 3D Navier–Stokes equations solvers for unstructured grids, from the CPU to the graphics processing unit (GPU; NVIDIA’s Ge-Force GTX 280 and 285), using the CUDA language. The performance of the GPU implementations, with single, double or mixed precision arithmetic operations, is compared to that of the CPU code.Issues regarding the optimal handling of the unstructured grid topology on the GPU, particularly for vertex-centered CFD algorithms, are discussed. Restructuring the existing codes was necessary in order to maximize the parallel efficiency of the GPU implementations. The mixed precision implementation, in which the left-hand-side operators are computed with single precision, was shown to bridge the gap between the single and double precision speed-ups. Based on the different speed-ups and prediction accuracy of the aforementioned GPU implementations of the Navier–Stokes equations solver, a hierarchical optimization method which is suitable for GPUs is proposed and demonstrated in inviscid and turbulent 2D flow problems. The search for the optimal solution(s) splits into two levels, both relying upon evolutionary algorithms (EAs) though with different evaluation tools each. The low level EA uses the very fast single precision GPU implementation with relaxed convergence criteria for the inexpensive evaluation of candidate solutions. Promising solutions are regularly broadcast to the high level EA which uses the mixed precision GPU implementation of the same flow solver. Single- and two-objective aerodynamic shape optimization problems are solved using the developed software.  相似文献   

2.
In this paper, we develop two robust moving horizon state observer (MHSO) algorithms which are capable of handling system non-linear uncertainties and physical state constraints. The algorithms employ open-loop and closed-loop prediction strategies to convert the design into multi-parameter quadratic programming (mp-QP), and also utilize the novel rewinding optimization to eliminate the conflict between open-loop prediction and closed-loop implementation. Based on the optimal solutions to mp-QP problems, MHSO is obtained by a series of offline linear/affline observation polices, and computational complexity is reduced dramatically. The convergence of observation errors, as one of challenges for robust MHSO, is also solved by introducing two auxiliary tuning parameters, the arrival weighting and the arrival observer gain. Finally, a simulation example demonstrates that our algorithms are practical and effective.  相似文献   

3.
The performance of any algorithm will largely depend on the setting of its algorithm-dependent parameters. The optimal setting should allow the algorithm to achieve the best performance for solving a range of optimization problems. However, such parameter tuning itself is a tough optimization problem. In this paper, we present a framework for self-tuning algorithms so that an algorithm to be tuned can be used to tune the algorithm itself. Using the firefly algorithm as an example, we show that this framework works well. It is also found that different parameters may have different sensitivities and thus require different degrees of tuning. Parameters with high sensitivities require fine-tuning to achieve optimality.  相似文献   

4.
Many problems in computer graphics and vision can be formulated as a nonlinear least squares optimization problem, for which numerous off-the-shelf solvers are readily available. Depending on the structure of the problem, however, existing solvers may be more or less suitable, and in some cases the solution comes at the cost of lengthy convergence times. One such case is semi-sparse optimization problems, emerging for example in localized facial performance reconstruction, where the nonlinear least squares problem can be composed of hundreds of thousands of cost functions, each one involving many of the optimization parameters. While such problems can be solved with existing solvers, the computation time can severely hinder the applicability of these methods. We introduce a novel iterative solver for nonlinear least squares optimization of large-scale semi-sparse problems. We use the nonlinear Levenberg-Marquardt method to locally linearize the problem in parallel, based on its first-order approximation. Then, we decompose the linear problem in small blocks, using the local Schur complement, leading to a more compact linear system without loss of information. The resulting system is dense but its size is small enough to be solved using a parallel direct method in a short amount of time. The main benefit we get by using such an approach is that the overall optimization process is entirely parallel and scalable, making it suitable to be mapped onto graphics hardware (GPU). By using our minimizer, results are obtained up to one order of magnitude faster than other existing solvers, without sacrificing the generality and the accuracy of the model. We provide a detailed analysis of our approach and validate our results with the application of performance-based facial capture using a recently-proposed anatomical local face deformation model.  相似文献   

5.
During the last years, GPU manycore devices have demonstrated their usefulness to accelerate computationally intensive problems. Although arriving at a parallelization of a highly parallel algorithm is an affordable task, the optimization of GPU codes is a challenging activity. The main reason for this is the number of parameters, programming choices, and tuning techniques available, many of them related with complex and sometimes hidden architecture details. A useful strategy to systematically attack these optimization problems is to characterize the different kernels of the application, and use this knowledge to select appropriate configuration parameters. The All-Pair Shortest-Path (APSP) problem is a well-known problem in graph theory whose objective is to find the shortest paths between any pairs of nodes in a graph. This problem can be solved by highly parallel and computational intensive tasks, being a good candidate to be exploited by manycore devices. In this paper, we use kernel characterization criteria to optimize an APSP algorithm implementation for NVIDIA GPUs. Our experimental results show that the combined use of proper configuration policies, and the concurrent kernels capability of new CUDA architectures, leads to a performance improvement of up to 62 % with respect to one of the possible configurations recommended by CUDA, considered as baseline.  相似文献   

6.
In this paper, we present an efficient method implemented on Graphics Processing Unit (GPU), DEMCMC-GPU, for multi-objective continuous optimization problems. The DEMCMC-GPU kernel is the DEMCMC algorithm, which combines the attractive features of Differential Evolution (DE) and Markov Chain Monte Carlo (MCMC) to evolve a population of Markov chains toward a diversified set of solutions at the Pareto optimal front in the multi-objective search space. With parallel evolution of a population of Markov chains, the DEMCMC algorithm is a natural fit for the GPU architecture. The implementation of DEMCMC-GPU on the pre-Fermi architecture can lead to a ~25 speedup on a set of multi-objective benchmark function problems, compare to the CPU-only implementation of DEMCMC. By taking advantage of new cache mechanism in the emerging NVIDIA Fermi GPU architecture, efficient sorting algorithm on GPU, and efficient parallel pseudorandom number generators, the speedup of DEMCMC-GPU can be aggressively improved to ~100.  相似文献   

7.
Many software engineering problems have been addressed with search algorithms. Search algorithms usually depend on several parameters (e.g., population size and crossover rate in genetic algorithms), and the choice of these parameters can have an impact on the performance of the algorithm. It has been formally proven in the No Free Lunch theorem that it is impossible to tune a search algorithm such that it will have optimal settings for all possible problems. So, how to properly set the parameters of a search algorithm for a given software engineering problem? In this paper, we carry out the largest empirical analysis so far on parameter tuning in search-based software engineering. More than one million experiments were carried out and statistically analyzed in the context of test data generation for object-oriented software using the EvoSuite tool. Results show that tuning does indeed have impact on the performance of a search algorithm. But, at least in the context of test data generation, it does not seem easy to find good settings that significantly outperform the “default” values suggested in the literature. This has very practical value for both researchers (e.g., when different techniques are compared) and practitioners. Using “default” values is a reasonable and justified choice, whereas parameter tuning is a long and expensive process that might or might not pay off in the end.  相似文献   

8.
Algorithms for coplanar camera calibration   总被引:5,自引:0,他引:5  
Abstract. Coplanar camera calibration is the process of determining the extrinsic and intrinsic camera parameters from a given set of image and world points, when the world points lie on a two-dimensional plane. Noncoplanar calibration, on the other hand, involves world points that do not lie on a plane. While optimal solutions for both the camera-calibration procedures can be obtained by solving a set of constrained nonlinear optimization problems, there are significant structural differences between the two formulations. We investigate the computational and algorithmic implications of such underlying differences, and provide a set of efficient algorithms that are specifically tailored for the coplanar case. More specifically, we offer the following: (1) four algorithms for coplanar calibration that use linear or iterative linear methods to solve the underlying nonlinear optimization problem, and produce sub-optimal solutions. These algorithms are motivated by their computational efficiency and are useful for real-time low-cost systems. (2) Two optimal solutions for coplanar calibration, including one novel nonlinear algorithm. A constraint for the optimal estimation of extrinsic parameters is also given. (3) A Lyapunov type convergence analysis for the new nonlinear algorithm. We test the validity and performance of the calibration procedures with both synthetic and real images. The results consistently show significant improvements over less complete camera models. Received: 30 September 1998 / Accepted: 12 January 2000  相似文献   

9.
Data clustering is related to the split of a set of objects into smaller groups with common features. Several optimization techniques have been proposed to increase the performance of clustering algorithms. Swarm Intelligence (SI) algorithms are concerned with optimization problems and they have been successfully applied to different domains. In this work, a Swarm Clustering Algorithm (SCA) is proposed based on the standard K-Means and on K-Harmonic Means (KHM) clustering algorithms, which are used as fitness functions for a SI algorithm: Fish School Search (FSS). The motivation is to exploit the search capability of SI algorithms and to avoid the major limitation of falling into locally optimal values of the K-Means algorithm. Because of the inherent parallel nature of the SI algorithms, since the fitness function can be evaluated for each individual in an isolated manner, we have developed the parallel implementation on GPU of the SCAs, comparing the performances with their serial implementation. The interest behind proposing SCA is to verify the ability of FSS algorithm to deal with the clustering task and to study the difference of performance of FSS-SCA implemented on CPU and on GPU. Experiments with 13 benchmark datasets have shown similar or slightly better quality of the results compared to standard K-Means algorithm and Particle Swarm Algorithm (PSO) algorithm. There results of using FSS for clustering are promising.  相似文献   

10.
GPU上计算流体力学的加速   总被引:1,自引:0,他引:1  
本文将计算流体力学中的可压缩的纳维叶-斯托克斯(Navier-Stokes),不可压缩的Navier-Stokes和欧拉(Euler)方程移植到NVIDIA GPU上.模拟了3个测试例子,2维的黎曼问题,方腔流问题和RAE2822型的机翼绕流.相比于CPU,我们在GPU平台上最高得到了33.2倍的加速比.为了最大程度提...  相似文献   

11.
We consider the problem of scheduling jobs on two parallel identical machines where an optimal schedule is defined as one that gives the smallest makespan (the completion time of the last job) among the set of schedules with optimal total flowtime (the sum of the completion times of all jobs). We propose an algorithm to determine optimal schedules for the problem, and describe a modified multifit algorithm to find an approximate solution to the problem in polynomial computational time. Results of a computational study to compare the performance of the proposed algorithms with a known heuristic shows that the proposed heuristic and optimization algorithms are quite effective and efficient in solving the problem.Scope and purposeMultiple objective optimization problems are quite common in practice. However, while solving scheduling problems, optimization algorithms often consider only a single objective function. Consideration of multiple objectives makes even the simplest multi-machine scheduling problems NP-hard. Therefore, enumerative optimization techniques and heuristic solution procedures are required to solve multi-objective scheduling problems. This paper illustrates the development of an optimization algorithm and polynomially bounded heuristic solution procedures for the scheduling jobs on two identical parallel machines to hierarchically minimize the makespan subject to the optimality of the total flowtime.  相似文献   

12.
Real-world simulation optimization (SO) problems entail complex system modeling and expensive stochastic simulation. Existing SO algorithms may not be applicable for such SO problems because they often evaluate a large number of solutions with many simulation calls. We propose an integrated solution method for practical SO problems based on a hierarchical stochastic modeling and optimization (HSMO) approach. This method models and optimizes the studied system at increasing levels of accuracy by hierarchical sampling with a selected set of principal parameters. We demonstrate the efficiency of HSMO using the example problem of Brugge oil field development under geological uncertainty.  相似文献   

13.
The optimization of algorithm (hyper-)parameters is crucial for achieving peak performance across a wide range of domains, ranging from deep neural networks to solvers for hard combinatorial problems. However, the proper evaluation of new algorithm configuration (AC) procedures (or configurators) is hindered by two key hurdles. First, AC scenarios are hard to set up, including the target algorithm to be optimized and the problem instances to be solved. Second, and even more significantly, they are computationally expensive: a single configurator run involves many costly runs of the target algorithm. Here, we propose a benchmarking approach that uses surrogate scenarios, which are computationally cheap while remaining close to the original AC scenarios. These surrogate scenarios approximate the response surface corresponding to true target algorithm performance using a regression model. In our experiments, we construct and evaluate surrogate scenarios for hyperparameter optimization as well as for AC problems that involve performance optimization of solvers for hard combinatorial problems. We generalize previous work by building surrogates for AC scenarios with multiple problem instances, stochastic target algorithms and censored running time observations. We show that our surrogate scenarios capture overall important characteristics of the original AC scenarios from which they were derived, while being much easier to use and orders of magnitude cheaper to evaluate.  相似文献   

14.
Much has been said about the fusion of bio-inspired optimization algorithms and Deep Learning models for several purposes: from the discovery of network topologies and hyperparametric configurations with improved performance for a given task, to the optimization of the model’s parameters as a replacement for gradient-based solvers. Indeed, the literature is rich in proposals showcasing the application of assorted nature-inspired approaches for these tasks. In this work we comprehensively review and critically examine contributions made so far based on three axes, each addressing a fundamental question in this research avenue: (a) optimization and taxonomy (Why?), including a historical perspective, definitions of optimization problems in Deep Learning, and a taxonomy associated with an in-depth analysis of the literature, (b) critical methodological analysis (How?), which together with two case studies, allows us to address learned lessons and recommendations for good practices following the analysis of the literature, and (c) challenges and new directions of research (What can be done, and what for?). In summary, three axes – optimization and taxonomy, critical analysis, and challenges – which outline a complete vision of a merger of two technologies drawing up an exciting future for this area of fusion research.  相似文献   

15.
针对联机分析处理(OLAP)中事实表与多个维表之间的星形连接执行代价较高的问题,提出了一种在先进的多核中央处理器(CPU)和图形处理器(GPU)上的星形连接优化方法。首先,对于多核CPU和GPU平台的星形连接中的物化代价问题,提出了基于向量索引的CPU和GPU平台上的向量化星形连接算法;然后,通过面向CPU cache和GPU shared memory大小的向量划分来提出基于向量粒度的星形连接操作,从而优化星形连接中向量索引的物化代价;最后,提出了基于压缩向量的星形连接算法,将定长向量索引压缩为变长的二元向量索引,从而在低选择率时提高cache内向量索引的存储访问效率。实验结果表明,在CPU平台上向量化星形连接算法相对于常规的行式或列式连接性能提升了40%以上,在GPU平台上向量化星形连接算法相对于常规星形连接算法性能提升超过了15%;与当前主流的内存数据库和GPU数据库相比,优化的星形连接算法性能相对于最优内存数据库Hyper性能提升了130%,相对于最优的GPU数据库OmniSci性能提升了80%。可见基于向量索引的向量化星形连接优化技术有效地提高了多表连接性能,与传统优化技术相比,基于向量索引的向量化处理提高了较小cache上的数据存储访问效率,压缩向量进一步提升了向量索引在cache内的访问效率。  相似文献   

16.
Scientific research is becoming increasingly dependent on the large-scale analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.). Scientific computing (Petitet et al. 1999) aims at constructing mathematical models and numerical solution techniques for solving problems arising in science and engineering. In this paper, we describe the services of an integrated portal based on the P-Grade (Parallel Grid Run-time and Application Development Environment) portal (http://www.p-grade.hu) that enables the solution of large-scale linear systems of equations using direct solvers, makes easier the use of parallel block iterative algorithm and provides an interface for parallel decision making algorithms. The ultimate goal is to develop a single sign on integrated multi-service environment providing an easy access to different kind of mathematical calculations and algorithms to be performed on hybrid distributed computing infrastructures combining the benefits of large clusters, Grid or cloud, when needed.  相似文献   

17.
Phillips  Stein  Torng  Wein 《Algorithmica》2008,32(2):163-200
Abstract. We consider two fundamental problems in dynamic scheduling: scheduling to meet deadlines in a preemptive multiprocessor setting, and scheduling to provide good response time in a number of scheduling environments. When viewed from the perspective of traditional worst-case analysis, no good on-line algorithms exist for these problems, and for some variants no good off-line algorithms exist unless P = NP . We study these problems using a relaxed notion of competitive analysis, introduced by Kalyanasundaram and Pruhs, in which the on-line algorithm is allowed more resources than the optimal off-line algorithm to which it is compared. Using this approach, we establish that several well-known on-line algorithms, that have poor performance from an absolute worst-case perspective, are optimal for the problems in question when allowed moderately more resources. For optimization of average flow time, these are the first results of any sort, for any NP -hard version of the problem, that indicate that it might be possible to design good approximation algorithms.  相似文献   

18.
In recent years, graphical processing unit(GPU)-accelerated intelligent algorithms have been widely utilized for solving combination optimization problems, which are NP-hard. These intelligent algorithms involves a common operation, namely reduction, in which the best suitable candidate solution in the neighborhood is selected. As one of the main procedures, it is necessary to optimize the reduction on the GPU. In this paper, we propose an enhanced warp-based reduction on the GPU. Compared with existing block-based reduction methods, our method exploit efficiently the potential of implementation at warp level, which better matches the characteristics of current GPU architecture. Firstly, in order to improve the global memory access performance, the vectoring accessing is utilized. Secondly, at the level of thread block reduction, an enhanced warp-based reduction on the shared memory are presented to form partial results. Thirdly, for the configuration of the number of thread blocks, the number of thread blocks can be obtained by maximizing the size of thread block and the maximum size of threads per stream multi-processor on GPU. Finally, the proposed method is evaluated on three generations of NVIDIA GPUs with the better performances than previous methods.  相似文献   

19.
Simplex type algorithms perform successive pivoting operations (or iterations) in order to reach the optimal solution. The choice of the pivot element at each iteration is one of the most critical step in simplex type algorithms. The flexibility of the entering and leaving variable selection allows to develop various pivoting rules. In this paper, we have proposed some of the most well-known pivoting rules for the revised simplex algorithm on a CPU–GPU computing environment. All pivoting rules have been implemented in MATLAB and CUDA. Computational results on randomly generated optimal dense linear programs and on a set of benchmark problems (Netlib-optimal, Kennington, Netlib-infeasible, Mészáros) are also presented. These results showed that the proposed GPU implementations of the pivoting rules outperform the corresponding CPU implementations.  相似文献   

20.
The paper deals with applications of numerical methods for optimal shape design of composite materials structures and devices. We consider two different physical models described by specific partial differential equations (PDEs) for real-life problems. The first application relates microstructural biomorphic ceramic materials for which the homogenization approach is invoked to formulate the macroscopic problem. The obtained homogenized equation in the macroscale domain is involved as an equality constraint in the optimization task. The second application is connected to active microfluidic biochips based on piezoelectrically actuated surface acoustic waves (SAWs). Our purpose is to find the best material-and-shape combination in order to achieve the optimal performance of the materials structures and, respectively, an improved design of the novel nanotechnological devices. In general, the PDEs constrained optimization routine gives rise to a large-scale nonlinear programming problem. For the numerical solution of this problem we use one-shot methods with proper optimization algorithms and inexact Newton solvers. Computational results for both applications are presented and discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号