首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 205 毫秒
1.
熊壬浩  刘羽 《计算机应用》2015,35(7):1843-1848
针对串行A*算法时间性能较差的问题,提出了一种基于并行搜索和快速插入(PSFI)的算法。首先,研究了共享存储平台上的常见并行启发式搜索算法;然后,通过使用一种延迟的单表搜索(DSTS)方法和新的数据结构,改进了串行算法;其次,在此基础上,设计出一种基于共享存储平台的并行算法;最后,采用OpenMP加以实现。对24数码问题的测试结果表明,改进的串行和并行算法将运行时间分别减少到原算法的1/140和1/450;与并行的NBlock优先(PBNF)算法相比,并行算法将加速比提高到3.2,同时,改进算法是严格的最佳优先搜索算法,保证了解的质量,且易于实现。  相似文献   

2.
Branch-and-bound algorithms in a system with a two-level memory hierarchy were evaluated. An efficient implementation depends on the disparities in the numbers of subproblems expanded between the depth-first and best-first searches as well as the relative speeds of the main and secondary memories. A best-first search should be used when it expands a much smaller number of subproblems than that of a depth-first search, and the secondary memory is relatively slow. In contrast, a depth-first search should be used when the number of expanded subproblems is close to that of a best-first search. The choice is not as clear for cases in between these cases are studied. Two strategies are proposed and analyzed: a specialized virtual-memory system that matches the architectural design with the characteristics of the existing algorithm, and a modified branch-and-bound algorithm that can be tuned to the characteristic of the problem and the architecture. The latter strategy illustrates that designing a better algorithm is sometimes more effective that tuning the architecture alone  相似文献   

3.
Branch-and-bound algorithms are organized and intelligently structured searches of solutions in a combinatorially large problem space. In this paper, we propose an approximate stochastic model of branch-and-bound algorithms with a best-first search. We have estimated the average memory space required and have predicted the average number of subproblems expanded before the process terminates. Both measures are exponentials of sublinear exponent. In addition, we have also compared the number of subproblems expanded in a best-first search to that expanded in a depth-first search. Depth-first search has been found to have computational complexity comparable to best-first search when the lower-bound function is very accurate or very inaccurate; otherwise, best-fit search is usually better. The results obtained are useful in studying the efficient evaluation of branch-and-bound algorithms in a virtual memory environment. They also confirm that approximations are very effective in reducing the total number of iterations.  相似文献   

4.
In this paper a parallel algorithm for branch-and-bound problems is sketched. It is designed to run on MIMD machines and exploits coarse grain parallelism. Due to the irregular and unpredictable behavior of branch-and-bound algorithms, it is hard to obtain a good load-balance. Another design issue is the minimization of the communication overhead. Our algorithm overcomes these problems by a dynamic load-balancing strategy and a dynamic way to decide when communication is really useful. After first sketching a general parallel algorithm for branch-and-bound problems, we concentrate on a particular instance of a branch-and-bound problem: the so-called knapsack problem. The performance of the algorithm for this problem is measured with a simulator for a multiprocessor system.  相似文献   

5.
Speedup and efficiency, two measures for performance of pipelined computers, are now used to evaluate performance of parallel algorithms for multiprocessor systems. However, these measures consider only the computation time and number of processors used and do not include the number of the communication links in the system. The author defines two new measures, cost effectiveness and time-cost effectiveness, for evaluating performance of a parallel algorithm for a multiprocessor system. From these two measures two characterization factors for multiprocessor systems are defined and used to analyze some well-known multiprocessor systems. It is found that for a given penalty function, every multiprocessor architecture has an optimal number of processors that produces maximum profit. If too many processors are used, the higher cost of the system reduces the profit obtained from the faster solution. On the other hand, if too few processors are used, the penalty paid for taking a longer time to obtain the solution reduces the profit  相似文献   

6.
In this paper, we address the problem of multiple sequence alignment (MSA) for handling very large number of proteins sequences on mesh-based multiprocessor architectures. As the problem has been conclusively shown to be computationally complex, we employ divisible load paradigm (also, referred to as divisible load theory, DLT) to handle such large number of sequences. We design an efficient computational engine that is capable of conducting MSAs by exploiting the underlying parallelism embedded in the computational steps of multiple sequence algorithms. Specifically, we consider the standard Smith–Waterman (SW) algorithm in our implementation, however, our approach is by no means restrictive to SW class of algorithms alone. The treatment used in this paper is generic to a class of similar dynamic programming problems. Our approach is recursive in the sense that the quality of solutions can be refined continuously till an acceptable level of quality is achieved. After first phase of computation, we design a heuristic scheme that renders the final solution for MSA. We conduct rigorous simulation experiments using several hundreds of homologous protein sequences derived from the Rattus Norvegicus and Mus Musculus databases of olfactory receptors. We quantify the performance based on speed-up metric. We compare our algorithms to serial or single machine processing approaches. We testify our findings by comparing with conventional equal load partitioning (ELP) strategy that is commonly used in the parallel processing literature. Based on our extensive simulation study, we observe that DLT paradigm offers an excellent speed-up characteristics and provides avenues for its use in several other biological sequence processing related problem. This study is a first time attempt in using the DLT paradigm to devise efficient strategies to handle large scale multiple protein sequence alignment problem on mesh-based multiprocessor systems.  相似文献   

7.
The paper presents a Conservative Time Window (CTW) algorithm for parallel simulation of discrete event systems. The physical system to be simulated is partitioned into n disjoint sub-systems, each of which is represented by an object The CTW algorithm identifies a time window for each object, such that events occurring in each window are independent of events in other windows and thus they can be processed concurrently. The CTW algorithm was implemented on a shared memory multiprocessor, a Sequent Symmetry S81 with 16 processors. We measured performance of the CTW algorithm on two types of network topologies: feed-forward networks and networks with feedback loops. We used three metrics to measure performance: speed-up, average number of independent windows detected by the algorithm, and average number of events occurring in each window. We also investigated the impact of various event scheduling policies on performance. The results obtained show that the CTW algorithm produces good performance in many cases.  相似文献   

8.
We present a parallel algorithm for computing the correlation dimension (D2) from a time series generated by a dynamic system, using the method of correlation integrals, which essentially requires the computation of distances among a set of points in the state space. The parallelization is suitable for coarse-grained multiprocessor systems with distributed memory and is carried out using a virtually shared memory model. The algorithm simultaneously gives all the correlation integrals at various state space dimensions needed to estimate the D2. Two versions are discussed: the first computes all distances between points; the second computes only distances less than a fixed ϵ, and employs a box-assisted approach and linked lists for an efficient search of neighbouring points. The algorithms, coded in Fortran 77, are tested on a heterogeneous network of workstations consisting of various DEC Alphas of different powers, interconnected by Ethernet; the Network Linda parallel environment is used. A detailed analysis of performance is carried out using the generalization of speed-up and efficiency for heterogeneous systems. The algorithms are fully asynchronous and so intrinsically balanced. In almost all the situations they provide a unitary efficiency. The second version greatly reduces the computational work, thus making it possible to tackle D2 estimation even for medium and high-dimensional systems, where an extremely large number of points is involved. The algorithms can also be employed in other applicative contexts requiring the efficient computation of distances among a large set of points. The method proposed for the analysis of performance can be applied to similar problems. © 1998 John Wiley & Sons, Ltd.  相似文献   

9.
武继刚  计永昶  陈国良 《软件学报》2000,11(12):1572-1580
分枝界限算法是求解组合优化问题的技术之一,它被广泛地应用在埃运筹学与组合数学中.对共享存储的最优优先一般并行分枝界限算法给出了运行时间复杂度下界Ω(m/p+hlogp),其中p为可用处理器数,h为扩展的结点数,m为状态空间中的活结点数.通过将共享存器设计成p个立体堆,提出了PRAM-EREW上一个新的一般并行分枝界限算法,理论上证明了对于h<p2p,该算法为最快且渐近最优的并行分枝界限算法.最后对0-r背包问题给出了模拟实验结果.  相似文献   

10.
This paper describes an efficient algorithm for the parallel solution of systems of linear equations with a block tridiagonal coefficient matrix. The algorithm comprises a multilevel LU-factorization based on block cyclic reduction and a corresponding solution algorithm.

The paper includes a general presentation of the parallel multilevel LU-factorization and solution algorithms, but the main emphasis is on implementation principles for a message passing computer with hypercube topology. Problem partitioning, processor allocation and communication requirement are discussed for the general block tridiagonal algorithm.

Band matrices can be cast into block tridiagonal form, and this special but important problem is dealt with in detail. It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors. The presentation of the multilevel band solver is accompanied by detailed complexity analyses.

The properties of the parallel band solver were evaluated by implementing the algorithm on an Intel iPSC hypercube parallel computer and solving a larger number of banded linear equations using 2 to 32 processors. The results of the evaluation include speed-up over a sequential processor, and the measure values are in good agreement with the theoretical values resulting from complexity analysis. It is found that the maximum asymptotic speed-up of the multilevel LU-factorization using p processors and load balancing is approximated well by the expression (p +6)/4.

Finally, the multilevel parallel solver is compared with solvers based on row and column interleaved organization.  相似文献   


11.
We consider the multiprocessor scheduling problem in which independent jobs are scheduled on identical parallel machines, with the objective of minimizing the normalized sum of square for workload deviations (NSSWD) criterion in order to obtain workload balancing. NSSWD and other criteria for the related problem of number partitioning are presented from a statistical viewpoint, which allows to derive some insightful connections with statistical measures of dispersion. A new local search algorithm is also developed. The algorithm at first generates and merges a set of partial solutions in order to obtain a feasible solution for the multiprocessor scheduling problem. Then a set of interchange procedures are utilized in order to improve the solution. The effectiveness of this approach is evaluated by solving a large number of benchmark instances.  相似文献   

12.
Randomized algorithms are algorithms that employ randomness in their solution method. We show that the performance of randomized algorithms is less affected by factors that prevent most parallel deterministic algorithms from attaining their theoretical speedup bounds. A major reason is that the mapping of randomized algorithms onto multiprocessors involves very little scheduling or communication overhead. Furthermore, reliability is enhanced because the failure of a single processor leads only to degradation, not failure, of the algorithm. We present results of an extensive simulation done on a multiprocessor simulator, running a randomized branch-and-bound algorithm. The particular case we consider is the knapsack problem, due to its ease of formulation. We observe the largest speedups in precisely those problems that take large amounts of time to solve. This work has been supported by the U.S. Army Research Office under Contract No. DAAG 29-85-K-0236.  相似文献   

13.
《Artificial Intelligence》2006,170(4-5):385-408
Recent work shows that the memory requirements of A* and related graph-search algorithms can be reduced substantially by only storing nodes that are on or near the search frontier, using special techniques to prevent node regeneration, and recovering the solution path by a divide-and-conquer technique. When this approach is used to solve graph-search problems with unit edge costs, we show that a breadth-first search strategy can be more memory-efficient than a best-first strategy. We also show that a breadth-first strategy allows a technique for preventing node regeneration that is easier to implement and can be applied more widely. The breadth-first heuristic search algorithms introduced in this paper include a memory-efficient implementation of breadth-first branch-and-bound search and a breadth-first iterative-deepening A* algorithm that is based on it. Computational results show that they outperform other systematic search algorithms in solving a range of challenging graph-search problems.  相似文献   

14.
The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup Sp O(N/3) and an efficiency Ep 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors.  相似文献   

15.
Metric-space similarity search has proven suitable in a number of application domains such as multimedia retrieval and computational biology to name a few. These applications usually work on very large databases that are often indexed to speed-up on-line searches. To achieve efficient throughput, it is essential to exploit the intrinsic parallelism in the respective search query processing algorithms. Many strategies have been proposed in the literature to parallelize these algorithms either on shared or distributed memory multiprocessor systems. Lately, GPUs have been used to implement brute-force parallel search strategies instead of using index data structures. Indexing poses difficulties when it comes to achieve efficient exploitation of GPU resources. In this paper we propose single and multi GPU metric space techniques that efficiently exploit GPU tailored index data structures for parallel similarity search in large databases. The experimental results show that our proposal outperforms previous index-based sequential and OpenMP search strategies.  相似文献   

16.
We report significant speed-up for seismic migration running in parallel on networkconnected IBM RISC/6000 workstations, A sustained performance of 15 MFLOP is obtained on a single-entry-level model 320, and speed-ups as high as 5 are obtained for six workstations connected by Ethernet or token ring. Our parallel software uses remote procedure calls provided by NCS (Network Computing System). We have run over a dozen workstations in parallel, but speed-ups become limited by network data rate. Fiber-optic communication should allow much greater speed-ups, and we describe our preliminary results with the fiberoptic serial link adapter of the RISC/6000. We also present a simple theoretical model that agrees well with our measurements and allows speed-up to be predicted from a knowledge of the ratio of computation to communication, which can be determined empirically before the program is parallellzed. We conclude with a brief discussion of alternative software approaches and programming models for network-connected parallel systems. In particular, our program was recently ported to PVM and Linda, and preliminary measurements yield speed-ups very close to those described here.  相似文献   

17.
Cellular network design is a major issue in second generation GSM mobile telecommunication systems. In this paper, a new model of the problem in its full practical complexity, based on multiobjective constrained combinatorial optimization, has been used. We propose an evolutionary algorithm that aims at approximating the Pareto frontier of the problem, which removes the need for a cellular network designer to rank or weight objectives a priori. Specific coding scheme and genetic operators have been designed. Advanced intensification and diversification search techniques, such as elitism and adaptive sharing, have been used.Three complementary hierarchical parallel models have been designed to improve the solution quality and robustness, to speed-up the search and to solve large instances of the problem. The obtained Pareto fronts and speed-ups on different parallel architectures show the efficiency and the scalability of the parallel model. Performance evaluation of the algorithm has been carried out on different realistic benchmarks. The obtained results show the impact of the proposed parallel models and the introduced search mechanisms.  相似文献   

18.
The class of NP-hard problems contains many problems of considerable practical importance. The complexity of these problems is so overwhelming that exhaustive search of the solution space is not possible even using massively parallel search techniques, particularly for problems of super-exponential complexity such as flow-shop and job-shop scheduling. This two-part paper discusses a novel, inherently parallel heuristic solution technique. Part I presents this technique, known as parallel dynamic interaction (PDI), as a general solution framework that is applicable to a number of computationally intractable problems, and gives details of its general methodology. From a parallel processing standpoint, PDI is interesting because it is inherently based upon the dynamic interplay between simultaneously executing subproblems. As such, PDI has no direct serial analog, and is not directly amenable to conventional parallel processing speedup analysis. This may provide an indication that parallel processing can offer opportunities beyond simply speeding up the solution of sequentially specified algorithms. From a practical problem-solving perspective, PDI shows promise as a method capable of generating high quality solutions to exponentially and super-exponentially hard problems. For large problems, PDI is often able to find a near-optimal solution many orders of magnitude faster than the time taken for a conventional parallel branch-and-bound search to find a solution of comparable quality  相似文献   

19.
We consider the problem of scheduling a number of jobs on a number of unrelated parallel machines in order to minimize the makespan. We develop three heuristic approaches, i.e., a genetic algorithm, a tabu search algorithm and a hybridization of these heuristics with a truncated branch-and-bound procedure. This hybridization is made in order to accelerate the search process to near-optimal solutions. The branch-and-bound procedure will check whether the solutions obtained by the meta-heuristics can be scheduled within a tight upper bound. We compare the performances of these heuristics on a standard dataset available in the literature. Moreover, the influence of the different heuristic parameters is examined as well. The computational experiments reveal that the hybrid heuristics are able to compete with the best known results from the literature.  相似文献   

20.
The increasing availability of parallel hardware encourages the design and adoption of parallel algorithms. In this article, we present a study in which we analyze the impact that different communication policies have on the solution quality reached by a parallel homogeneous multi-colony ACO algorithm for the traveling salesman problem. We empirically test different configurations of each algorithm on a distributed-memory parallel architecture, and analyze the results with a fixed-effects model of the analysis of variance. We consider several factors that influence the performance of a multi-colony ACO algorithm: the number of colonies, migration schedules, communication strategies on different interconnection topologies, and the use of local search. We show that the importance of the communication strategy employed decreases with increasing search effort and stronger local search, and that the relative effectiveness of one communication strategy versus another changes with the addition of local search.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号