期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An effective Parallel Multistart Tabu Search for Quadratic Assignment Problem on CUDA platform 总被引：1，自引：0，他引：1

Michał Czapiński 《Journal of Parallel and Distributed Computing》2013

相似文献

2.

An Evolutionary Parallel Tabu Search approach for distribution systems reinforcement planning

《Advanced Engineering Informatics》2002,16(3):205-215

In this paper a new meta-heuristic optimisation technique is proposed. The method is based on the Parallel Tabu Search (PTS) algorithm and the application is the optimal electrical distribution systems reinforcement planning through the installation of photovoltaic plants, parallel cables, capacitor banks and transformers. The issue is a combinatorial optimisation problem; the objective function is a non-linear expression of a large number of variables. In these cases, meta-heuristics have proved to work well and one of the most efficient is the Tabu Search algorithm. For large-scale problems, parallelisation improves Tabu Search computational efficiency as well as its exploration ability. In this paper, an enhanced version of PTS, Evolutionary Parallel Tabu Search (EPTS), is proposed. It performs reproduction operators on sub-neighbourhoods directing the search towards more promising areas of the search space. The problem of distribution systems reinforcement planning has been studied in detail and the results of the application show that the EPTS outperforms the PTS and Particle Swarm Optimisation algorithms.The algorithm's performance is also tested on mathematical test functions and other properties of the proposed algorithm are examined. 相似文献

3.

基于GPU的现代并行优化算法

张庆科杨波王琳朱福祥《计算机科学》2012,39(4):304-311

针对现代优化算法在处理相对复杂问题中所面临的求解时间复杂度较高的问题,引入基于GPU的并行处理解决方法。首先从宏观角度阐释了基于计算统一设备架构CUDA的并行编程模型,然后在GPU环境下给出了基于CUDA架构的5种典型现代优化算法(模拟退火算法、禁忌搜索算法、遗传算法、粒子群算法以及人工神经网络)的并行实现过程。通过对比分析在不同环境下测试的实验案例统计结果,指出基于GPU的单指令多线程并行优化策略的优势及其未来发展趋势。相似文献

4.

Feature tracking and matching in video using programmable graphics hardware 总被引：2，自引：0，他引：2

Sudipta N. Sinha Jan-Michael Frahm Marc Pollefeys Yakup Genc 《Machine Vision and Applications》2011,22(1):207-217

This paper describes novel implementations of the KLT feature tracking and SIFT feature extraction algorithms that run on the graphics processing unit (GPU) and is suitable for video analysis in real-time vision systems. While significant acceleration over standard CPU implementations is obtained by exploiting parallelism provided by modern programmable graphics hardware, the CPU is freed up to run other computations in parallel. Our GPU-based KLT implementation tracks about a thousand features in real-time at 30 Hz on 1,024 × 768 resolution video which is a 20 times improvement over the CPU. The GPU-based SIFT implementation extracts about 800 features from 640 × 480 video at 10 Hz which is approximately 10 times faster than an optimized CPU implementation. 相似文献

5.

How GPUs Work 总被引：1，自引：0，他引：1

David Luebke Greg Humphreys 《Computer》2007,40(2):96-100

GPUs have moved away from the traditional fixed-function 3D graphics pipeline toward a flexible general-purpose computational engine. Today, GPUs can implement many parallel algorithms directly using graphics hardware. Well-suited algorithms that leverage all the underlying computational horsepower often achieve tremendous speedups. Truly, the GPU is the first widely deployed commodity desktop parallel computer 相似文献

6.

Efficient graphics processing unit based layered decoders for quasicyclic low‐density parity‐check codes

Rongchun Li Yong Dou Dan Zou Shi Wang Ying Zhang 《Concurrency and Computation》2015,27(1):29-46

Because layered low‐density parity‐check (LDPC) decoding algorithm was proposed, one can exploit the diversity gain to achieve performance comparable to the traditional two‐phase message passing (TPMP) decoding but with about twice faster decoding convergence compared to TPMP. In order to reduce the decoding time of layered LDPC decoder, a graphics processing unit (GPU) is exploited as the modem processor so that the decoding procedure can be processed in parallel using numerous threads in the GPU. In this paper, we present the parallel algorithms and efficient implementations on the GPU for two different layered message passing schemes, the row‐layered and column‐layered decoding. In the experiments, the quasicyclic LDPC codes for WiFi (802.11n) and WiMAX (802.16e) are decoded by the proposed layered LDPC decoders. The experimental results show that our decoder has good bit error ratio (BER) performance comparable to TPMP decoder. The peak throughput is 712 Mbps, which is about two orders of magnitude faster than that of CPU implementation and comparable to the dedicated hardware solutions. Compared to the existing fastest GPU‐based implementation, the presented decoder can achieve a performance improvement of 2.3 times. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

7.

Algorithms of GPU-enabled reactive force field (ReaxFF) molecular dynamics

《Journal of molecular graphics & modelling》2013

Reactive force field (ReaxFF), a recent and novel bond order potential, allows for reactive molecular dynamics (ReaxFF MD) simulations for modeling larger and more complex molecular systems involving chemical reactions when compared with computation intensive quantum mechanical methods. However, ReaxFF MD can be approximately 10–50 times slower than classical MD due to its explicit modeling of bond forming and breaking, the dynamic charge equilibration at each time-step, and its one order smaller time-step than the classical MD, all of which pose significant computational challenges in simulation capability to reach spatio-temporal scales of nanometers and nanoseconds. The very recent advances of graphics processing unit (GPU) provide not only highly favorable performance for GPU enabled MD programs compared with CPU implementations but also an opportunity to manage with the computing power and memory demanding nature imposed on computer hardware by ReaxFF MD. In this paper, we present the algorithms of GMD-Reax, the first GPU enabled ReaxFF MD program with significantly improved performance surpassing CPU implementations on desktop workstations. The performance of GMD-Reax has been benchmarked on a PC equipped with a NVIDIA C2050 GPU for coal pyrolysis simulation systems with atoms ranging from 1378 to 27,283. GMD-Reax achieved speedups as high as 12 times faster than Duin et al.’s FORTRAN codes in Lammps on 8 CPU cores and 6 times faster than the Lammps’ C codes based on PuReMD in terms of the simulation time per time-step averaged over 100 steps. GMD-Reax could be used as a new and efficient computational tool for exploiting very complex molecular reactions via ReaxFF MD simulation on desktop workstations. 相似文献

8.

移动通讯系统中固定网络的优化设计

张维好张英杰章兢《计算技术与自动化》2004,23(3):93-97

搜索技术，诸如遗传算法，模拟退火算法，禁忌搜索和随机移动算法，已经广泛应用于全局优化。本文重点介绍了采用这些搜索算法优化设计蜂窝网络固定网部分性能比较的实验分析。在讨论不同算法的特性参数对算法性能的影响基础上，在相同的约束条件下使用MatLab对这些算法进行了比较实验，结果表明：在给定的问题模式和网络节点位置的假定下，禁忌搜索算法和遗传算法能提供较好的、稳定的解决方案。相似文献

9.

Pharmacy duty scheduling problem

下载免费PDF全文

Özgür Özpeynirci Ebru Ağlamaz 《International Transactions in Operational Research》2016,23(3):459-480

In this study, we define the pharmacy duty scheduling problem, which requires a subset of pharmacies to be on duty on national holidays, at weekends, and at nights, in order to be able to satisfy the emergency medicine needs. We model the pharmacy duty scheduling problem as a multiperiod p‐median problem with special side constraints, and analyze the computational complexity. We propose a Tabu Search heuristic and develop lower bound (LB) algorithms. We test the performance of mathematical models, Tabu Search heuristic, and the LBs on randomly generated instances. We analyze the current system in ?zmir, the third largest city in Turkey, with a population of 3.5 million, and apply solution methods. Our results show that the proposed Tabu Search algorithm suggests improvements on the current system. 相似文献

10.

A two-stage Ant Colony optimization algorithm to minimize the makespan on unrelated parallel machines—part II: enhancements and experimentations

Jean-Paul Arnaout Rami Musa Ghaith Rabadi 《Journal of Intelligent Manufacturing》2014,25(1):43-53

In a previous paper (Arnaout et al in J Intell Manuf 21:693–701, 2010), an Ant Colony optimization (ACO I) algorithm was introduced for minimizing the schedule’s makespan on unrelated parallel machines with sequence-dependent setup times. Optimal solutions for small instances of this problem were obtained by solving a mixed integer program. However, for larger instances (up to 10 machines and 120 jobs), heuristic and approximate algorithms were necessary to reach solutions in reasonable computational times. ACO I’s performance was evaluated by comparing its solutions to solutions obtained using Tabu Search and MetaRaPS (metaheuristic for Randomized Priority Search). While the results indicated that ACO I outperformed the other heuristics, it also showed that MetaRaPS had a better performance when all ratios of N/M (jobs to machines ratio) were considered. In this paper, we introduce an enhanced ACO which will be referred to as ACO II and compare its performance to other existing and new algorithms including ACO I, MetaRaPS, and SA. The extensive and expanded experiments conducted prove the superiority of the enhanced ACO II. 相似文献

11.

CLUS_GPU-BLASTP: accelerated protein sequence alignment using GPU-enabled cluster

Sita Rani O. P. Gupta 《The Journal of supercomputing》2017,73(10):4580-4595

Basic Local Alignment Search Tool (BLAST) is one of the most frequently used algorithms for bioinformatics applications. In this paper, an accelerated implementation of protein BLAST, i.e., CLUS_GPU-BLASTP for multiple query sequence processing in parallel, on graphical processing unit (GPU)-enabled high-performance cluster is proposed. The experimental setup consisted of a high-performance GPU-enabled cluster. Each compute node of the cluster consisted of two hex-core Intel, Xeon 2.93 GHz processors with 50 GB RAM and 12 MB cache. Each compute node was also equipped with a NVIDIA M2050 GPU. In comparison with the famous GPU-BLAST, our BLAST implementation is 2.1 times faster on single compute node. On a cluster of 12 compute nodes, our implementation gave a speedup of 13.2X. In comparison with standard single-threaded NCBI-BLAST, our implementation achieves a speedup ranging from 7.4X to 8.2X. 相似文献

12.

图形硬件通用计算技术的应用研究 总被引：2，自引：0，他引：2

张杨诸昌钤何太军《计算机应用》2005,25(9):2192-2195

在通用计算的图形硬件加速研究中,综合了在OPENGL体系下的计算模型。通过实验,测试了该计算结构的性能并分析了提高计算性能的一些方法。在此基础上,介绍一种基于GPU的并行计算二维离散余弦变换方法。该方法可在GPU上通过一遍绘制,对一幅图像1至4个颜色通道,同时进行8×8大小像素块的离散余弦变换。实验表明在该实验硬件基础上,采用GPU加速的并行离散余弦变换,可比相同算法的CPU实现提高数百倍。相似文献

13.

CUDA‐quicksort: an improved GPU‐based implementation of quicksort

Emanuele Manca Andrea Manconi Alessandro Orro Giuliano Armano Luciano Milanesi 《Concurrency and Computation》2016,28(1):21-43

Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General‐purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU‐based implementations of the quicksort were presented in literature: the GPU‐quicksort, a compute‐unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA‐quicksort an iterative GPU‐based implementation of the sorting algorithm. CUDA‐quicksort has been designed starting from GPU‐quicksort. Unlike GPU‐quicksort, it uses atomic primitives to perform inter‐block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA‐quicksort is up to four times faster than GPU‐quicksort and up to three times faster than CDP‐quicksort. An in‐depth analysis of the performance between CUDA‐quicksort and GPU‐quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA‐quicksort. Experimental results show that CUDA‐quicksort is faster than the CDP‐quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

14.

Potential of minicomputer-array processor system for nonlinear finite-element analysis

Gregg A. Strohkorb Ahmed K. Noor 《Computers & Structures》1984,18(4):703-718

A study is made of the potential of using a minicomputer-array processor system for efficient solution of large-scale nonlinear finite-element problems. A PRIME 750 is used as the host computer, and a software simulator residing on the PRIME is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory, parallel and pipeline processing are reviewed and the interplay between various hardware components is examined. Effective use of the minicomputer-array processor system for nonlinear analysis requires the following: (a) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (b) reduction of I/O operations; and (c) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor. Results of the study of the two benchmarks indicate that these two problems would run faster on a PRIME 750 coupled with the AP-120B than on the PRIME 750 alone. The 1715 degree-of-freedom problem would run about five times faster, and the 3230 degree-of-freedom problem would run about ten times faster. New advances in array-processor hardware are outlined, and possible improvements in the computational algorithms are discussed. The combination of the two can significantly enhance the effectiveness of the minicomputer-array processor system for large-scale nonlinear analysis. 相似文献

15.

Probabilistic GRASP-Tabu Search algorithms for the UBQP problem

Yang Wang Zhipeng Lü Fred Glover Jin-Kao Hao 《Computers & Operations Research》2013

This paper presents two algorithms combining GRASP and Tabu Search for solving the Unconstrained Binary Quadratic Programming (UBQP) problem. We first propose a simple GRASP-Tabu Search algorithm working with a single solution and then reinforce it by introducing a population management strategy. Both algorithms are based on a dedicated randomized greedy construction heuristic and a tabu search procedure. We show extensive computational results on two sets of 31 large random UBQP instances and one set of 54 structured instances derived from the MaxCut problem. Comparisons with state-of-the-art algorithms demonstrate the efficacy of our proposed algorithms in terms of both solution quality and computational efficiency. It is noteworthy that the reinforced GRASP-Tabu Search algorithm is able to improve the previous best known results for 19 MaxCut instances. 相似文献

16.

Real-time multi-camera video analytics system on GPU

Puren Guler Deniz Emeksiz Alptekin Temizel Mustafa Teke Tugba Taskaya Temizel 《Journal of Real-Time Image Processing》2016,11(3):457-472

In this article, parallel implementation of a real-time intelligent video surveillance system on Graphics Processing Unit (GPU) is described. The system is based on background subtraction and composed of motion detection, camera sabotage detection (moved camera, out-of-focus camera and covered camera detection), abandoned object detection, and object-tracking algorithms. As the algorithms have different characteristics, their GPU implementations have different speed-up rates. Test results show that when all the algorithms run concurrently, parallelization in GPU makes the system up to 21.88 times faster than the central processing unit counterpart, enabling real-time analysis of higher number of cameras. 相似文献

17.

基于GPU的H.264并行解码算法

陈鹏曹剑炜陈庆奎《计算机工程》2014,(1):283-286

针对并行处理H.264标准视频流解码问题,提出基于CPU/GPU的协同运算算法。以统一设备计算架构(CUDA)语言作为GPU编程模型,实现DCT逆变换与帧内预测在GPU中的加速运算。在保持较高计算精度的前提下,结合CUDA混合编程,提高系统的计算性能。利用NIVIDIA提供的CUDA语言,在解码过程中使DCT逆变换和帧内预测在GPU上并行实现,将并行算法与CPU单机实现进行比较,并用不同数量的视频流验证并行解码算法的加速效果。实验结果表明,该算法可大幅提高视频流的编解码效率,比CPU单机的平均计算加速比提高10倍。相似文献

18.

Solid modeling of polyhedral objects by Layered Depth-Normal Images on the GPU

Charlie C.L. Wang Yuen-Shan Leung 《Computer aided design》2010,42(6):535-544

We introduce a novel solid modeling framework taking advantage of the architecture of parallel computing on modern graphics hardware. Solid models in this framework are represented by an extension of the ray representation — Layered Depth-Normal Images (LDNI), which inherits the good properties of Boolean simplicity, localization and domain decoupling. The defect of ray representation in computational intensity has been overcome by the newly developed parallel algorithms running on the graphics hardware equipped with Graphics Processing Unit (GPU). The LDNI for a solid model whose boundary is represented by a closed polygonal mesh can be generated efficiently with the help of hardware accelerated sampling. The parallel algorithm for computing Boolean operations on two LDNI solids runs well on modern graphics hardware. A parallel algorithm is also introduced in this paper to convert LDNI solids to sharp-feature preserved polygonal mesh surfaces, which can be used in downstream applications (e.g., finite element analysis). Different from those GPU-based techniques for rendering CSG-tree of solid models Hable and Rossignac (2007, 2005) [1] and [2], we compute and store the shape of objects in solid modeling completely on graphics hardware. This greatly eliminates the communication bottleneck between the graphics memory and the main memory. 相似文献

19.

A GPGPU based program to solve the TDSE in intense laser fields through the finite difference approach

Cathal Ó Broin L.A.A. Nikolopoulos 《Computer Physics Communications》2014

We present a General-purpose computing on graphics processing units (GPGPU) based computational program and framework for the electronic dynamics of atomic systems under intense laser fields. We present our results using the case of hydrogen, however the code is trivially extensible to tackle problems within the single-active electron (SAE) approximation. Building on our previous work, we introduce the first available GPGPU based implementation of the Taylor, Runge–Kutta and Lanczos based methods created with strong field ab-initio simulations specifically in mind; CLTDSE. The code makes use of finite difference methods and the OpenCL framework for GPU acceleration. The specific example system used is the classic test system; Hydrogen. After introducing the standard theory, and specific quantities which are calculated, the code, including installation and usage, is discussed in-depth. This is followed by some examples and a short benchmark between an 8 hardware thread (i.e. logical core) Intel Xeon CPU and an AMD 6970 GPU, where the parallel algorithm runs 10 times faster on the GPU than the CPU. 相似文献

20.

Vehicle routing problem with stochastic travel times including soft time windows and service costs

Duygu Taş Nico DellaertTom van Woensel Ton de Kok 《Computers & Operations Research》2013

This paper studies a vehicle routing problem with soft time windows and stochastic travel times. A model is developed that considers both transportation costs (total distance traveled, number of vehicles used and drivers' total expected overtime) and service costs (early and late arrivals). We propose a Tabu Search method to solve this model. An initialization algorithm is developed to construct feasible routes by taking into account the travel time stochasticity. Solutions provided by the Tabu Search algorithm are further improved by a post-optimization method. We conduct our computational experiments for well-known problem instances. Results show that our Tabu Search method performs well by obtaining very good final solutions in a reasonable amount of time. 相似文献