共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Wilson Rivera 《Artificial Intelligence Review》2001,16(2):153-168
Genetic algorithms, search algorithms based on the genetic processes observed in natural evolution, have been used to solve difficult problems in many different disciplines. When applied to very large-scale problems, genetic algorithms exhibit high computational cost and degradation of the quality of the solutions because of the increased complexity. One of the most relevant research trends in genetic algorithms is the implementation of parallel genetic algorithms with the goal of obtaining quality of solutions efficiently. This paper first reviews the state-of-the-art in parallel genetic algorithms. Parallelization strategies and emerging implementations are reviewed and relevant results are discussed. Second, this paper discusses important issues regarding scalability of parallel genetic algorithms. 相似文献
3.
In this paper we present deterministic parallel algorithms for the coarse-grained multicomputer (CGM) and bulk synchronous parallel (BSP) models for solving the following well-known graph problems: (1) list ranking, (2) Euler tour construction in a tree,
(3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction
and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, and (7) 2-edge connectivity
and biconnectivity (testing and component computation). The algorithms require O(log p) communication rounds with linear sequential work per round (p = no. processors, N = total input size). Each processor creates, during the entire algorithm, messages of total size O(log (p) (N/p)) .
The algorithms assume that the local memory per processor (i.e., N/p ) is larger than p
ε
, for some fixed ε > 0 . Our results imply BSP algorithms with O(log p) supersteps, O(g log (p) (N/p)) communication time, and O(log (p) (N/p)) local computation time.
It is important to observe that the number of communication rounds/ supersteps obtained in this paper is independent of the problem size, and grows only logarithmically with respect to p . With growing problem size, only the sizes of the messages grow but the total number of messages remains unchanged. Due
to the considerable protocol overhead associated with each message transmission, this is an important property. The result
for Problem (1) is a considerable improvement over those previously reported. The algorithms for Problems (2)—(7) are the
first practically relevant parallel algorithms for these standard graph problems.
Received July 5, 2000; revised April 16, 2001. 相似文献
4.
We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP type model. We first describe a simple version which requires, with high probability, log(3p)+log ln(n)=Õ(logp+log logn) communication rounds (h-relations withh=Õ(n/p)) andÕ(n/p)) local computation. We then outline an improved version that requires high probability, onlyr?(4k+6) log(2/3p)+8=Õ(k logp) communication rounds wherek=min{i?0 |ln(i+1)n?(2/3p)2i+1}. Notekn) is an extremely small number. Forn andp?4, the value ofk is at most 2. Hence, for a given number of processors,p, the number of communication rounds required is, for all practical purposes, independent ofn. Forn?1, 500,000 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 78, but the actual number of communication rounds observed so far is 25 in the worst case. Forn?10010100 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 118; and we conjecture that the actual number of communication rounds required will not exceed 50. Our algorithm has a considerably smaller member of communication rounds than the list ranking algorithm used in Reid-Miller’s empirical study of parallel list ranking on the Cray C-90.(1) To our knowledge, Reid-Miller’s algorithm(1) was the fastest list ranking implementation so far. Therefore, we expect that our result will have considerable practical relevance. 相似文献
5.
Abstract. External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory
of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation.
In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited
success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge
by the ACM Working Group on Storage I/ O for Large-Scale Computing.
In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors
on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine.
When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems. 相似文献
6.
This paper describes the design of the Abstract Library for Parallel Search (ALPS), a framework for implementing scalable, parallel algorithms based on tree search. ALPS is specifically designed to support data-intensive algorithms, in which large amounts of data are required to describe each node in the search tree. Implementing such algorithms in a scalable manner is challenging both because of data storage requirements and communication overhead. ALPS incorporates a number of new ideas to address this challenge. The paper also describes the design of two other libraries forming a hierarchy built on top of ALPS. The first is the Branch, Constrain, and Price Software (BiCePS) library, a framework that supports the implementation of parallel branch and bound algorithms in which the bounds are obtained by solving some sort of relaxation, usually Lagrangian. In this layer, the notion of global data objects associated with the variables and constraints is introduced. These global objects provide a connection between the various subproblems in the search tree, but they pose further difficulties for designing scalable algorithms. The other library is the BiCePS linear integer solver (BLIS), a concretization of BiCePS, in which linear programming is used to obtain bounds in each search tree node. 相似文献
7.
Abstract. The traditional worst-case analysis often fails to predict the actual behavior of the running time of geometric algorithms in practical situations. One reason is that worst-case scenarios are often very contrived and do not occur in practice. To avoid this, models are needed that describe the properties that realistic inputs have, so that the analysis can take these properties into account. We try to bring some structure to this emerging research direction. In particular, we present the following results: • We show the relations between various models that have been proposed in the literature. • For several of these models, we give algorithms to compute the model parameter(s) for a given (planar) scene; these algorithms can be used to verify whether a model is appropriate for typical scenes in some application area. • As a case study, we give some experimental results on the appropriateness of some of the models for one particular type of scene often encountered in geographic information systems, namely certain triangulated irregular networks. 相似文献
8.
Given two finite sets of points in a plane, the polygon separation problem is to construct a separating convexk-gon with smallestk. In this paper, we present a parallel algorithm for the polygon separation problem. The algorithm runs inO(logn) time on a CREW PRAM withn processors, wheren is the number of points in the two given sets. The algorithm is cost-optimal, since (n logn) is a lower-bound for the time needed by any sequential algorithm. We apply this algorithm to the problem of finding a convex polygon, with the minimal number of edges, for which a given convex region is its digital image. The algorithm in this paper constructs one such polygon with possibly two more edges than the minimal one.The research is sponsored by NSERC Operating Grant OGPIN 007. 相似文献
9.
10.
Memetic Algorithms for Parallel Code Optimization 总被引:1,自引:0,他引:1
Discovering the optimum number of processors and the distribution of data on distributed memory parallel computers for a given
algorithm is a demanding task. A memetic algorithm (MA) is proposed here to find the best number of processors and the best
data distribution method to be used for each stage of a parallel program. Steady state memetic algorithm is compared with
transgenerational memetic algorithm using different crossover operators and hill-climbing methods. A self-adaptive MA is also
implemented, based on a multimeme strategy. All the experiments are carried out on computationally intensive, communication
intensive, and mixed problem instances. The MA performs successfully for the illustrative problem instances. 相似文献
11.
We describe a scalable parallel implementation of the self organizing map (SOM) suitable for data-mining applications involving clustering or segmentation against large data sets such as those encountered in the analysis of customer spending patterns. The parallel algorithm is based on the batch SOM formulation in which the neural weights are updated at the end of each pass over the training data. The underlying serial algorithm is enhanced to take advantage of the sparseness often encountered in these data sets. Analysis of a realistic test problem shows that the batch SOM algorithm captures key features observed using the conventional on-line algorithm, with comparable convergence rates.Performance measurements on an SP2 parallel computer are given for two retail data sets and a publicly available set of census data.These results demonstrate essentially linear speedup for the parallel batch SOM algorithm, using both a memory-contained sparse formulation as well as a separate implementation in which the mining data is accessed directly from a parallel file system. We also present visualizations of the census data to illustrate the value of the clustering information obtained via the parallel SOM method. 相似文献
12.
In this paper a parallel algorithm is given that, given a graph G=(V,E) , decides whether G is a series parallel graph, and, if so, builds a decomposition tree for G of series and parallel composition rules. The algorithm uses O(log \kern -1pt |E|log ^\ast \kern -1pt |E|) time and O(|E|) operations on an EREW PRAM, and O(log \kern -1pt |E|) time and O(|E|) operations on a CRCW PRAM. The results hold for undirected as well as for directed graphs.
Algorithms with the same resource bounds are described for the recognition of graphs of treewidth two, and for constructing
tree decompositions of treewidth two. Hence efficient parallel algorithms can be found for a large number of graph problems
on series parallel graphs and graphs with treewidth two. These include many well-known problems like all problems that can
be stated in monadic second-order logic.
Received July 15, 1997; revised January 29, 1999, and June 23, 1999. 相似文献
13.
A Fast Parallel Clustering Algorithm for Large Spatial Databases 总被引:2,自引:0,他引:2
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior. 相似文献
14.
Scalable Parallel Algorithms for FPT Problems 总被引:4,自引:0,他引:4
Faisal N. Abu-Khzam Michael A. Langston Pushkar Shanbhag Christopher T. Symons 《Algorithmica》2006,45(3):269-284
Algorithmic methods based on the theory of fixed-parameter tractability are combined with powerful computational platforms
to launch systematic attacks on combinatorial problems of significance. As a case study, optimal solutions to very large instances
of the NP-hard vertex cover problem are computed. To accomplish this, an efficient sequential algorithm and various forms
of parallel algorithms are devised, implemented, and compared. The importance of maintaining a balanced decomposition of the
search space is shown to be critical to achieving scalability. Target problems need only be amenable to reduction and decomposition.
Applications in high throughput computational biology are also discussed. 相似文献
15.
Earley's algorithm has been commonly used for the parsing of general context-free languages and the error-correcting parsing in syntactic pattern recognition. The time complexity for parsing is 0(n3). This paper presents a parallel Earley's recognition algorithm in terms of an ``X*' operator. By restricting the input context-free grammar to be ?-free, the parallel algorithm can be executed on a triangular-shape VLSI array. This array system has an efficient way of moving data to the right place at the right time. Simulation results show that this system can recognize a string with length n in 2n + 1 system time. We also present a parallel parse-extraction algorithm, a complete parsing algorithm, and an error-correcting recognition algorithm. The parallel complete parsing algorithm has been simulated on a processor array which is similar to the triangular VLSI array. For an input string of length n the processor array will give the correct right-parse at system time 2n + 1 if the string is accepted. The error-correcting recognition algorithm has also been simulated on a triangular VLSI array. This array recognizes an erroneous string of length n in time 2n + 1 and gives the correct error count. These parallel algorithms are especially useful for syntactic pattern recognition. 相似文献
16.
Ismail Chabini & Sridevi Ganugapati 《International Transactions in Operational Research》2002,9(3):279-302
The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster‐than‐real‐time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst‐case running‐time complexity. This implies that no algorithm with a better worst‐case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all‐to‐one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially‐available high‐performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared‐memory and two message‐passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message‐passing environment based on the parallel virtual machine (PVM) library and a multi‐threading environment based on the SUN Microsystems Multi‐Threads (MT) library. We also develop a time‐based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ‘ideal’ theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared‐memory machine containing eight processors. Satisfactory speed‐ups in the running time of sequential algorithms are achieved, in particular for shared‐memory machines. Numerical results indicate that shared‐memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real‐time ITS applications. 相似文献
17.
粗粒度并行遗传算法性能分析 总被引:3,自引:0,他引:3
依据实验来分析影响并行遗传算法性能的因素得到的结论缺乏理论上的说服力.通过对粗粒度并行遗传算法加速比公式的分析,提出了影响并行遗传算法性能的关键因素,同时否定了以迁移率作为评价并行遗传算法性能指标的合理性,并通过实难进一步验证结论的正确性.得到的结论为提高遗传算法的并行化效率提供了可靠的依据。 相似文献
18.
曹小鹏 《计算机与数字工程》2011,39(8):118-122,142
为了解决模式识别中的计算效率问题,文章研究GPU的体系架构,采用基于GPU的并行计算方法,针对人脸识别算法在GPU上做了相关实验,获得了较高的计算加速比,得到了采用基于GPU的并行计算方法可以较好解决在模式识别中的计算瓶颈问题的结论。 相似文献
19.
本文首先介绍了计算几何的基本概念,论述了计算几何的四个基本问题,即几何搜索问题、相交问题、邻接问题及凸壳问题。然后重点分析了凸壳构造问题,介绍了其最佳串行算法、及相应的并行算法。接着对一些计算几何的串行及并行算法进行了分析比较。最后提出了笔者对新一代并行计算机系统上设计计算几何并行算法的看法。 相似文献