首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The p-median problem is a well-known NP-hard problem. Many heuristics have been proposed in the literature for this problem. In this paper, we exploit a GPGPU parallel computing platform to present a new genetic algorithm implemented in CUDA and based on a pseudo-Boolean formulation of the p-median problem. We have tested the effectiveness of our algorithm using a Tesla K40 (2880 CUDA cores) on 290 different benchmark instances obtained from OR-Library, discrete location problems benchmark library, and benchmarks introduced in recent publications. The algorithm succeeded in finding optimal solutions for all instances except for two OR-library instances, namely pmed 30 and pmed 40, where better than 99.9% approximations were obtained.  相似文献   

2.
许建  林泳  秦勇  黄翰 《计算机应用研究》2013,30(9):2656-2659
为提高协同过滤算法的可伸缩性, 加快其运行速度, 提出了一种基于GPU(graphic processing unit)的并行协同过滤算法来实现高速并行处理。GPU的运算模式采用单指令多数据流, 适用于逻辑性弱、数据量巨大的运算, 而这正是协同过滤算法所具有的特点。使用统一计算设备框架(compute unified device architecture, CUDA)实现了此协同过滤算法。实验表明, 在中低端的GPU上该算法与在高端的四核CPU上的协同过滤算法相比, 其加速比达到40倍以上, 显著地提高了算法的可伸缩性, 而算法在准确率方面也有优秀的表现。  相似文献   

3.
韩玉  闫镔  宇超群  李磊  李建新 《计算机应用》2012,32(5):1407-1410
针对FDK算法重建耗时长的问题,提出了一种基于图形处理器(GPU)的FDK并行加速算法。通过采用合理的线程分配方式,对反投影参数计算过程中与体素无关的中间变量的提取和预计算、对全局存储器访问次数的细致优化等策略,提高FDK算法的执行效率。仿真实验结果表明,在不牺牲重建质量的前提下,完全优化后的FDK并行加速算法重建2563规模的体数据需要0.5s,重建5123规模的体数据需要2.5s,这与较新的研究成果相比有很大幅度的提升。  相似文献   

4.
将投资限制引入经典约束p-中位问题,提出带投资的约束P-中位问题,该问题更适用于交通、物流等领域的设施选址。在深入分析带投资约束P-中位问题的数学模型的基础上,首先提出了适用于该问题求解的局部搜索策略;其次,将局部搜索策略与拉格朗日启发式算法和蚁群算法相结合,设计了求解该问题的拉格朗日混合蚁群算法。实验结果表明:带投资的约束P-中位问题能够根据投资金额规划不同的投资方案;且提出的混合蚁群算法较大程度上提高了蚁群算法和拉格朗日启发式算法的求解精度,具有较好的收敛性。  相似文献   

5.
The efficiency of metaheuristic algorithms depends significantly on the number of fitness value evaluations performed on candidate solutions. In addition to various intelligent techniques used to obtain better results, parallelization of calculations can substantially improve the solutions in cases where the problem is NP-hard and requires many evaluations. This study proposes a new parallel tabu search method for solving the Maximum Vertex Weight Clique Problem (MVWCP) on the Non-Uniform Memory Access (NUMA) architectures using the OpenMP parallel programming paradigm. Achieving scalability in the NUMA architectures presents significant challenges due to the high complexity of their memory systems, which can lead to performance loss. However, our proposed Tabu-NUMA algorithm provides up to 18 × $$ 18\times $$ speed-up with 64 cores for ten basic problem instances in DIMACS-W and BHOSLIB-W benchmarks. And it improves the performance of the serial Multi Neighborhood Tabu Search (MN/TS) algorithm for 38 problem instances in DIMACS-W and BHOSLIB-W benchmarks. We further evaluate our algorithm on larger datasets with thousands of edges and vertices from Network Data Repository benchmark problem instances, and we report significant improvements in terms of speed up. Our results confirm that the Tabu-NUMA algorithm is among the best recent algorithms for solving MVWCP on the NUMA architectures.  相似文献   

6.
提出一种基于GPU的高程并行插值算法,实现了对三维地表上海量离散点的并行加速渲染。通过高程纹理组织三维地表网格高程数据作为离散点渲染的基础,并通过GLSL编写GPU着色器程序动态控制图形渲染管线,实现视点相关的高程并行插值算法。实验结果表明,提出的基于GPU的高程并行插值算法较传统的内存插值算法,将三维地表上海量离散点的渲染量级从百万级提高到了千万级。  相似文献   

7.
8.
Abstract The p-center problem is one of choosing p facilities from a set of candidates to satisfy the demands of n clients in order to minimize the maximum cost between a client and the facility to which it is assigned. In this article, PBS, a population based meta-heuristic for the p-center problem, is described. PBS is a genetic algorithm based meta-heuristic that uses phenotype crossover and directed mutation operators to generate new starting points for a local search. For larger p-center instances, PBS is able to effectively utilize a number of computer processors. It is shown empirically that PBS has comparable performance to state-of-the-art exact and approximate algorithms for a range of p-center benchmark instances.  相似文献   

9.
提出了三种新的GPU并行的自适应邻域模拟退火算法,分别是GPU并行的遗传-模拟退火算法,多条马尔可夫链并行的退火算法,基于BLOCK分块的GPU并行模拟退火算法,并通过对GPU端的程序采取合并内存访问,避免bank冲突,归约法等方式进一步提升了性能。实验中选取了11个典型的基准函数,实验结果证明这三种GPU并行退火算法比nonu-SA算法具有更好的精度和更快的收敛速度。  相似文献   

10.
竞争决策算法是在分析大自然生物世界特别是人类的各种竞争机制和决策原理的基础上,利用竞争造就优化、决策左右结果的特性来达到优化目的的新型寻优算法。采用竞争决策算法原理,利用竞争决策算法的通用模型,求解图的最小顶点覆盖问题。  相似文献   

11.
图的最小顶点覆盖问题的DNA表面计算模型   总被引:1,自引:0,他引:1       下载免费PDF全文
基于生化反应原理的DNA计算具有强大的并行运算能力,DNA计算机在求解NP问题上存在着硅计算机无法比拟的先天的优越性。采用荧光标记的策略,给出了一种新的图的最小顶点覆盖问题的DNA表面计算模型。该模型首先将问题解空间的DNA分子固定在固体载体上,然后通过进行相应的生化反应来求得图的最小顶点覆盖问题的所有解。新算法利用荧光猝灭技术,通过观察荧光来排除非解,具有编码、解读简单和错误率低的特点。  相似文献   

12.
Hardware/software partitioning is an essential step in hardware/software co-design. For large size problems, it is difficult to consider both solution quality and time. This paper presents an efficient GPU-based parallel tabu search algorithm (GPTS) for HW/SW partitioning. A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically. A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS. To further minimize the transfer overhead of GPTS between CPU and GPU, an optimized transfer strategy for GPU-based tabu evaluation is proposed, which considers that all the candidates do not satisfy the given constraint. Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning. The proposed parallelization is significant when considering the ordinary GPU platform.  相似文献   

13.
针对目前并行Prim最小生成树算法效率不高的问题,在分析现有并行Prim算法的基础上,提出了适于GPU架构的压缩邻接表图表示形式,开发了基于GPU的minreduction数据并行原语,在NVIDIA GPU上设计并实现了基于Prim算法思想的并行最小生成树算法。该算法通过使用原语缩短关键步骤的查找时间,从而获得较高效率。实验表明,相对于传统CPU实现算法和不使用原语的算法,该算法具有较明显的性能优势。  相似文献   

14.
15.
Ramsey numbers and an approximation algorithm for the vertex cover problem   总被引:5,自引:0,他引:5  
Summary We show two results. First we derive an upper bound for the special Ramsey numbers r k(q) where r k(q) is the largest number of nodes a graph without odd cycles of length bounded by 2k+1 and without an independent set of size q+1 can have. We prove The proof is constructive and yields an algorithm for computing an independent set of that size. Using this algorithm we secondly describe an OV¦·¦E¦) time bounded approximation algorithm for the vertex cover problem, whose worst case ratio is , for all graphs with at most (2k+3) k (2k+2) nodes (e.g. 1.8, if ¦V¦146000).  相似文献   

16.
A sequential algorithm with complexity O(M2+n) for the integer knapsack problem is presented. M is the capacity of the knapsack, and n the number of objects. The algorithm admits an efficient parallelization on a p-processor ring machine. The corresponding parallel algorithm is O(M2/p+n). The parallel algorithm is compared with a version of the well-known Lee algorithm adapted to the integer knapsack problem. Computational results on both a local area network and a transputer are reported.  相似文献   

17.
In 2005, Demange and Paschos proposed in [M. Demange, V.Th. Paschos, On-line vertex-covering, Theoret. Comput. Sci. 332 (2005) 83-108] an online algorithm (noted LR here) for the classical vertex cover problem. They shown that, for any graph of maximum degree Δ, LR constructs a vertex cover whose size is at most Δ times the optimal one (this bound is tight in the worst case).Very recently, two of the present authors have shown in [F. Delbot, C. Laforest, A better list heuristic for vertex cover, Inform. Process. Lett. 107 (2008) 125-127] that LR has interesting properties (it is a good “list algorithm” and it can easily be distributed). In addition, LR has good experimental behavior in spite of its Δ approximation (or competitive) ratio and the fact that it can be executed without the knowledge of the full instance at the beginning.In this paper we analyze it deeper and we show that LR has good “average” performances: we prove that its mean approximation ratio is strictly less than 2 for any graph and is equal to 1+e−2≈1.13 in paths. LR is then a very interesting algorithm for constructing small vertex covers, despite its bad worst case behavior.  相似文献   

18.
A parallel two-list algorithm for the knapsack problem   总被引:10,自引:0,他引:10  
An n-element knapsack problem has 2n possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2n/8) processors to solve the knapsack problem in O(2n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(25n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(23n/8) under the same O(2n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.  相似文献   

19.
The dynamic lot-sizing model (DLS) is one of the most frequently used models in production and inventory system because lot decisions can greatly affect the performance of the system. The practicality of DLS algorithms is hindered by the huge amount of computer resources required for solving these models, even for a modest problem. This study developed a parallel algorithm to solve the lot-sizing problem efficiently. Given that n is the size of the problem, the complexity of the proposed parallel algorithm is O(n2p) with p processors. Numerical experiments are provided to verify the complexity of the proposed algorithm. The empirical results demonstrate that the speedup of this parallel algorithm approaches linearity, which means that the proposed algorithm can take full advantage of the distributed computing power as the size of the problem increases.  相似文献   

20.
In this work we describe some parallel algorithms for solving nonlinear systems using CUDA (Compute Unified Device Architecture) over a GPU (Graphics Processing Unit). The proposed algorithms are based on both the Fletcher–Reeves version of the nonlinear conjugate gradient method and a polynomial preconditioner type based on block two-stage methods. Several strategies of parallelization and different storage formats for sparse matrices are discussed. The reported numerical experiments analyze the behavior of these algorithms working in a fine grain parallel environment compared with a thread-based environment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号