共查询到20条相似文献,搜索用时 15 毫秒
1.
The subset‐sum problem is a well‐known non‐deterministic polynomial‐time complete (NP‐complete) decision problem. This paper proposes a novel and efficient implementation of a parallel two‐list algorithm for solving the problem on a graphics processing unit (GPU) using Compute Unified Device Architecture (CUDA). The algorithm is composed of a generation stage, a pruning stage, and a search stage. It is not easy to effectively implement the three stages of the algorithm on a GPU. Ways to achieve better performance, reasonable task distribution between CPU and GPU, effective GPU memory management, and CPU–GPU communication cost minimization are discussed. The generation stage of the algorithm adopts a typical recursive divide‐and‐conquer strategy. Because recursion cannot be well supported by current GPUs with compute capability less than 3.5, a new vector‐based iterative implementation mechanism is designed to replace the explicit recursion. Furthermore, to optimize the performance of the GPU implementation, this paper improves the three stages of the algorithm. The experimental results show that the GPU implementation has much better performance than the CPU implementation and can achieve high speedup on different GPU cards. The experimental results also illustrate that the improved algorithm can bring significant performance benefits for the GPU implementation. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
2.
We explore the comparative performance of the Cray XMT and XMT‐2 massively multithreaded supercomputers. We use benchmarks to evaluate memory accesses for various types of loops. We also compare the performance of these machines on matrix multiply and on three previously implemented dynamic programming algorithms. It is shown that the relative performance of these machines is dependent on the size (number of processors) of the configuration, as well as the size of the problem being evaluated. In particular, small configurations of the original XMT can sometimes show slightly better performance than larger configurations of the XMT‐2, for the same problem size. We note that, under heavy memory load, performance of loops can saturate well before the maximum number of processors available. This suggests that it may not always be useful to use the maximum number of processors for a specific run. We also show that manual restructuring of nested loops, including decreasing the parallelism, can result in major improvements in performance. The results in this paper indicate that careful exploration of the space of problem sizes, number of processors used, and choices of loop parallelization can yield substantial improvements in performance. These improvements can be very significant for production codes that run for extended periods of time. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
3.
Improved low-density subset sum algorithms 总被引:7,自引:0,他引:7
Matthijs J. Coster Antoine Joux Brian A. LaMacchia Andrew M. Odlyzko Claus-Peter Schnorr Jacques Stern 《Computational Complexity》1992,2(2):111-128
The general subset sum problem is NP-complete. However, there are two algorithms, one due to Brickell and the other to Lagarias and Odlyzko, which in polynomial time solve almost all subset sum problems of sufficiently low density. Both methods rely on basis reduction algorithms to find short non-zero vectors in special lattices. The Lagarias-Odlyzko algorithm would solve almost all subset sum problems of density<0.6463 ... in polynomial time if it could invoke a polynomial-time algorithm for finding the shortest non-zero vector in a lattice. This paper presents two modifications of that algorithm, either one of which would solve almost all problems of density<0.9408 ... if it could find shortest non-zero vectors in lattices. These modifications also yield dramatic improvements in practice when they are combined with known lattice basis reduction algorithms. 相似文献
4.
Roman Kolpakov;Mikhail Posypkin; 《Concurrency and Computation》2024,36(18):e8144
In the paper, we compute some lower bounds on time of parallel solving of the subset sum problem on a big number of processors by several versions of dynamic programming algorithm Balsub proposed before by Pisinger. Based on these lower bounds, we propose a version of Balsub which could be possibly effectively parallelized. 相似文献
5.
蚁群优化算法应用于复杂问题的求解是非常耗时的。文章在MATLAB环境下实现了一个基于GPU+CPU的并行MAX-MIN蚁群系统,并将其应用于旅行商问题的求解。让全部蚂蚁共享一个伪随机数矩阵,一个信息素矩阵,一个禁忌矩阵和一个概率矩阵,并运用了一个全新的基于这些矩阵的随机选择算法—AIR(All-In-Roulette)。文章还介绍了如何使用这些矩阵来构造并行蚁群优化算法,并与相应串行算法进行了比较。计算结果表明新的并行算法比相应串行算法要高效很多。 相似文献
6.
Saniyah S. Bokhari 《Concurrency and Computation》2012,24(18):2241-2254
The subset‐sum problem is a well‐known NP‐complete combinatorial problem that is solvable in pseudo‐polynomial time, that is, time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. We show how this problem can be parallelized on three contemporary architectures, that is, a 128‐processor Cray Extreme Multithreading (XMT) massively multithreaded machine, a 16‐processor IBM x3755 shared memory machine, and a 240‐core NVIDIA FX 5800 graphics processing unit (GPU). We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word‐level locking that is available on this architecture. For the other two machines, we present an alternating word algorithm that can implement an efficient solution. Our results show that the GPU performs well for problems whose tables fit within the device memory. Because GPUs typically have memories in the order of 10GB, such architectures are best for small problem sizes that have tables of size approximately 1010. The IBM x3755 performs very well on medium‐sized problems that fit within its 64‐GB memory but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 1011 bits. The Cray XMT shows very good scaling for large problems and demonstrates sustained performance as the problem size increases. However, this machine has poor scaling for small problem sizes; it performs best for problem sizes of 1012 bits or more. The results in this paper illustrate that the subset‐sum problem can be parallelized well on all three architectures, albeit for different ranges of problem sizes. The performance of these three machines under varying problem sizes show the strengths and weaknesses of the three architectures. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
7.
将串行动态二表算法应用于并行三表算法的设计中,提出一种求解背包、精确的可满足性和集覆盖等背包类NP完全问题的并行三表六子表算法.基于EREW-PRAM模型,该算法可使用O(2n/8)的处理机在O(27n/16)的时间和O(213n/48)的空间求解n维背包类问题,其时间-空间-处理机折衷为O(25n/6).与现有文献的性能对比分析表明,该算法极大地提高了并行求解背包类问题的时间-空间-处理机折衷性能.由于该算法能够破解更高维数的背包类公钥和数字水印系统,其结论在密钥分析领域具有一定的理论和实际意义. 相似文献
8.
多重群体遗传算法在多选择背包问题中的应用 总被引:2,自引:0,他引:2
叶宇风 《计算机工程与设计》2005,26(12):3442-3443,3464
在解决多选择背包问题中,引入了多重群体遗传算法作为求解方法,根据此问题的特点,制定了具体的杂交、变异方法,设计了遗传算法。在算法中以目标函数加惩罚函数为适应值评价函数,采用新陈代谢的跨世代选择策略,以更好地保持进化过程中的遗传多样性。实践表明,引入了多重群体遗传算法之后,求解此问题效率有明显的改善与提高。 相似文献
9.
将串行动态二表算法应用于并行三表算法的设计中,提出一种求解背包、精确的可满足性和集覆盖等背包类NP完全问题的并行三表六子表算法.基于EREW-PRAM模型,该算法可使用O(2n/8)的处理机在O(27n/16)的时间和O(213n/48)的空间求解n维背包类问题,其时间-空间-处理机折衷为O(25n/6).与现有文献的性能对比分析表明,该算法极大地提高了并行求解背包类问题的时间-空间-处理机折衷性能.由于该算法能够破解更高维数的背包类公钥和数字水印系统,其结论在密钥分析领域具有一定的理论和实际意义. 相似文献
10.
Jos Rui Figueira Gabriel Tavares Margaret M. Wiecek 《Computers & Operations Research》2010,37(4):700-711
The paper presents a generic labeling algorithm for finding all nondominated outcomes of the multiple objective integer knapsack problem (MOIKP). The algorithm is based on solving the multiple objective shortest path problem on an underlying network. Algorithms for constructing four network models, all representing the MOIKP, are also presented. Each network is composed of layers and each network algorithm, working forward layer by layer, identifies the set of all permanent nondominated labels for each layer. The effectiveness of the algorithms is supported with numerical results obtained for randomly generated problems for up to seven objectives while exact algorithms reported in the literature solve the multiple objective binary knapsack problem with up to three objectives. Extensions of the approach to other classes of problems including binary variables, bounded variables, multiple constraints, and time-dependent objective functions are possible. 相似文献
11.
12.
Paolo Detti 《Information Processing Letters》2009,109(11):582-584
A polynomial algorithm for the multiple bounded knapsack problem with divisible item sizes is presented. The complexity of the algorithm is O(n2+nm), where n and m are the number of different item sizes and knapsacks, respectively. It is also shown that the algorithm complexity reduces to O(nlogn+nm) when a single copy exists of each item. 相似文献
13.
This paper introduces new problem-size reduction heuristics for the multidimensional knapsack problem. These heuristics are based on solving a relaxed version of the problem, using the dual variables to formulate a Lagrangian relaxation of the original problem, and then solving an estimated core problem to achieve a heuristic solution to the original problem. We demonstrate the performance of these heuristics as compared to legacy heuristics and two other problem reduction heuristics for the multi-dimensional knapsack problem. We discuss problems with existing test problems and discuss the use of an improved test problem generation approach. We use a competitive test to highlight the performance of our heuristics versus the legacy heuristic approaches. We also introduce the concept of computational versus competitive problem test data sets as a means to focus the empirical analysis of heuristic performance. 相似文献
14.
《Concurrency and Computation》2017,29(8)
The availability of heterogeneous CPU+GPU systems has opened the door to new opportunities for the development of parallel solutions to tackle complex biological problems. The reconstruction of evolutionary histories among species represents a grand computational challenge, which can be addressed by exploiting this kind of hardware designs. In this research, we study the application of heterogeneous computing with OpenCL to accelerate one of the most well‐known objective functions for inferring phylogenies, the phylogenetic parsimony function. For this purpose, we undertake the design of CPU and GPU kernel implementations of this relevant function, proposing a heterogeneous CPU+GPU multidevice approach that distributes multiple parsimony evaluations among processing devices. Experiments on 6 real nucleotide data sets and comparisons with other parallel implementations give account of the benefits of the proposal in this paper, obtaining significant parallel results by combining CPU and GPU capabilities in accordance with the characteristics of the input data. 相似文献
15.
AVS标准中整数DCT变换的CUDA并行算法 总被引:1,自引:0,他引:1
随着图形处理器(GPU)的处理能力的不断增强,图形处理器越来越多的运用在计算密集型的数据处理中.AVS标准视频压缩算法中一些步骤存在典型的并行特性,高清、超清视频压缩的串行算法执行时间开销较大,难以满足实时编码的需要,因此利用GPU的并行处理能力和CUDA的编程框架对AVS标准中的整数DCT变换算法进行了并行实现.经过实验测试,并行算法与串行算法相比具有较高的加速比. 相似文献
16.
借鉴人工免疫系统的记忆、动态识别等功能,提出一种约束动态免疫算法(CDIOA),并用于高维约束动态背包问题的求解。通过随机约束选择策略选择可行及非可行抗体,非可行抗体参与群体的进化;利用抗体修正策略确保进化群中有一定比例可行抗体,提高算法搜索功能;设计环境识别模块判断环境变化与否,建立环境记忆池保存较优秀记忆细胞,记忆细胞参与相似(相同)环境初始群的产生,加速算法在相似环境搜索速度。建立三种不同环境的动态背包问题作为标准测试实例,将CDIOA与已有的四种动态优化算法进行测试比较,结果表明:CDIOA对各测试问题在不同环境表现出较好的收敛性能,在相似环境能快速跟踪最优值。 相似文献
17.
传统的多目标进化算法多是基于Pareto最优概念的类随机搜索算法,求解速度较慢,特别是当问题维度变高,需要群体规模较大时,上述问题更加凸显。这一问题已经获得越来越多研究人员以及从业人员的关注。实验仿真中可以发现,构造非支配集和保持群体多样性这两部分工作占用了算法99%以上的执行时间。解决上述问题的一个有效方法就是对这一部分算法进行并行化改造。本文提出了一种基于CUDA平台的并行化解决方案,采用小生境技术实现共享适应度来维持候选解集的多样性,将多目标进化算法的实现全部置于GPU端,区别于以往研究中非支配排序的部分工作以及群体多样性保持的全部工作仍在CPU上执行。通过对ZDT系列函数的仿真结果,可以看出本文算法性能远远优于NSGA-Ⅱ和NPGA。最后通过求解油品调和过程这一有约束多目标优化问题,可以看出在解决化工应用中的有约束多目标优化问题时,该算法依然表现出优异的加速效果。 相似文献
18.
19.
Given a set of positive integers S, the minimum generating set problem consists in finding a set of positive integers T with a minimum cardinality such that every element of S can be expressed as the sum of a subset of elements in T. It constitutes a natural problem in combinatorial number theory and is related to some real-world problems, such as planning radiation therapies.We present a new formulation to this problem (based on the terminology for the multiple knapsack problem) that is used to design an evolutionary approach whose performance is driven by three search strategies; a novel random greedy heuristic scheme that is employed to construct initial solutions, a specialized crossover operator inspired by real-parameter crossovers and a restart mechanism that is incorporated to avoid premature convergence. Computational results for problem instances involving up to 100,000 elements show that our innovative genetic algorithm is a very attractive alternative to the existing approaches. 相似文献
20.
Branch‐and‐bound (B&B) algorithms are attractive methods for solving to optimality combinatorial optimization problems using an implicit enumeration of a dynamically built tree‐based search space. Nevertheless, they are time‐consuming when dealing with large problem instances. Therefore, pruning tree nodes (subproblems) is traditionally used as a powerful mechanism to reduce the size of the explored search space. Pruning requires to perform the bounding operation, which consists of applying a lower bound function to the subproblems generated during the exploration process. Preliminary experiments performed on the Flow‐Shop scheduling problem (FSP) have shown that the bounding operation consumes over 98% of the execution time of the B&B algorithm. In this paper, we investigate the use of graphics processing unit (GPU) computing as a major complementary way to speed up the search. We revisit the design and implementation of the parallel bounding model on GPU accelerators. The proposed approach enables data access optimization. Extensive experiments have been carried out on well‐known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU‐based single core execution using an Intel Core i7‐970 processor without GPU, speedups higher than 100 times faster are achieved for large problem instances. At an equivalent peak performance, GPU‐accelerated B&B is twice faster than its multi‐core counterpart. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献