首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have developed a parallel algorithm for radial basis function (rbf) interpolation that exhibits O(N) complexity, requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a gmres iterative solver with a restricted additive Schwarz method (rasm) as a preconditioner and a fast matrix-vector algorithm. Previous fast rbf methods — achieving at most O(NlogN) complexity — were developed using multiquadric and polyharmonic basis functions. In contrast, the present method uses Gaussians with a small variance with respect to the domain, but with sufficient overlap. This is a common choice in particle methods for fluid simulation, our main target application. The fast decay of the Gaussian basis function allows rapid convergence of the iterative solver even when the subdomains in the rasm are very small. At the same time we show that the accuracy of the interpolation can achieve machine precision. The present method was implemented in parallel using the petsc library (developer version). Numerical experiments demonstrate its capability in problems of rbf interpolation with more than 50 million data points, timing at 106 s (19 iterations for an error tolerance of 10? 15) on 1024 processors of a Blue Gene/L (700 MHz PowerPC processors). The parallel code is freely available in the open-source model.  相似文献   

2.
TheP4-tidy graphs were introduced by I. Rusu to generalize some already known classes of graphs with few inducedP4(cographs,P4-sparse graphs,P4-lite graphs). Here, we propose an extension of R. Lin and S. Olariu's work (1994.J. Parallel Distributed Computing22, 26–36.) on cographs, using the modular decomposition. As an application, we show how to obtain a maximum matching parallel algorithm for the family ofP4-tidy graphs (repre- sented by a parse tree) inO(log n) time withO(n/log n) processors in the EREW-PRAM model withn-vertex graphs.  相似文献   

3.
Diagnosis of reliability is an important topic for interconnection networks. Under the classical PMC model, Dahura and Masson [5] proposed a polynomial time algorithm with time complexity O(N2.5) to identify all faulty nodes in an N-node network. This paper addresses the fault diagnosis of so called bijective connection (BC) graphs including hypercubes, twisted cubes, locally twisted cubes, crossed cubes, and Möbius cubes. Utilizing a helpful structure proposed by Hsu and Tan [20] that was called the extending star by Lin et al. [24], and noting the existence of a structured Hamiltonian path within any BC graph, we present a fast diagnostic algorithm to identify all faulty nodes in O(N) time, where N = 2n, n ? 4, stands for the total number of nodes in the n-dimensional BC graph. As a result, this algorithm is significantly superior to Dahura–Masson’s algorithm when applied to BC graphs.  相似文献   

4.
We present a divide and conquer based algorithm for optimal quantum compression/decompression, using O(n(log4n)log log n) elementary quantum operations. Our result provides the first quasi-linear time algorithm for asymptotically optimal (in size and fidelity) quantum compression and decompression. We also outline the quantum gate array model to bring about this compression in a quantum computer. Our method uses various classical algorithmic tools to significantly improve the bound from the previous best known bound of O(n3) for this operation.  相似文献   

5.
We present a parallel algorithm for solving thenext element search problemon a set of line segments, using a BSP-like model referred to as thecoarse grained multicomputer(CGM). The algorithm requiresO(1) communication rounds (h-relations withh=O(n/p)),O((n/p) log n) local computation, andO((n/p) log p) memory per processor, assumingn/pp. Our result implies solutions to the point location, trapezoidal decomposition, and polygon triangulation problems. A simplified version for axis-parallel segments requires onlyO(n/p) memory per processor, and we discuss an implementation of this version. As in a previous paper by Develliers and Fabri (Int. J. Comput. Geom. Appl.6(1996), 487–506), our algorithm is based on a distributed implementation of segment trees which are of sizeO(n log n). This paper improves onop. cit.in several ways: (1) It studies the more general next element search problem which also solves, e.g., planar point location. (2) The algorithms require onlyO((n/p) log n) local computation instead ofO(log p*(n/p) log n). (3) The algorithms require onlyO((n/p) log p) local memory instead ofO((n/p) log n).  相似文献   

6.
Given a set S of n points in the plane, we consider the problem of partitioning S into two subsets such that the maximum of their diameters is minimized. We present a parallel algorithm to solve this problem that runs in time O(log n) using the CREW PRAM with 0(n2) processors.  相似文献   

7.
We present a simple parallel algorithm for the single-source shortest path problem in planar digraphs with nonnegative real edge weights. The algorithm runs on the EREW PRAM model of parallel computation in O((n2ε+n1−ε) log n) time, performing O(n1+ε log n) work for any 0<ε<1/2. The strength of the algorithm is its simplicity, making it easy to implement and presumable quite efficient in practice. The algorithm improves upon the work of all previous parallel algorithms. Our algorithm is based on a region decomposition of the input graph and uses a well-known parallel implementation of Dijkstra's algorithm. The logarithmic factor in both the work and the time can be eliminated by plugging in a less practical, sequential planar shortest path algorithm together with an improved parallel implementation of Dijkstra's algorithm.  相似文献   

8.
The well-known Goldbach Conjecture (GC) states that any sufficiently large even number can be represented as a sum of two odd primes. Although not yet demonstrated, it has been checked for integers up to 1014. Using two stronger versions of the conjecture, we offer a simple and fast method for recognition of a gray box group G known to be isomorphic to Sn(or An) with knownn   20, i.e. for construction1of an isomorphism from G toSn (or An). Correctness and rigorous worst case complexity estimates rely heavily on the conjectures, and yield times of O([ρ + ν + μ ] n log2n) or O([ ρ + ν + μ ] n logn / loglog n) depending on which of the stronger versions of the GC is assumed to hold. Here,ρ is the complexity of generating a uniform random element of G, ν is the complexity of finding the order of a group element in G, and μ is the time necessary for group multiplication in G. Rigorous lower bound and probabilistic approach to the time complexity of the algorithm are discussed in the Appendix.  相似文献   

9.
In this paper we describe scalable parallel algorithms for building the convex hull and a triangulation ofncoplanar points. These algorithms are designed for thecoarse grained multicomputermodel:pprocessors withO(n/p)⪢O(1) local memory each, connected to some arbitrary interconnection network. They scale over a large range of values ofnandp, assuming only thatnp1+ε(ε>0) and require timeO((Tsequential/p)+Ts(n, p)), whereTs(n, p) refers to the time of a global sort ofndata on approcessor machine. Furthermore, they involve only a constant number of global communication rounds. Since computing either 2D convex hull or triangulation requires timeTsequential=Θ(n log n) these algorithms either run in optimal time,Θ((n log n)/p), or in sort time,Ts(n, p), for the interconnection network in question. These results become optimal whenTsequential/pdominatesTs(n, p) or for interconnection networks like the mesh for which optimal sorting algorithms exist.  相似文献   

10.
Thedistance transform(DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a grey-level image, where each pixel has a value corresponding to the distance to the nearest foreground pixel. The time complexity for computing the distance transform is fully dependent on the different distance metrics. Especially, the more exact the distance transform is, the worse execution time reached will be. Nowadays, quite often thousands of images are processed in a limited time. It seems quite impossible for a sequential computer to do such a computation for the distance transform in real time. In order to provide efficient distance transform computation, it is considerably desirable to develop a parallel algorithm for this operation. In this paper, based on the diagonal propagation approach, we first provide anO(N2) time sequential algorithm to compute thechessboard distance transform(CDT) of anN×Nimage, which is a DT using the chessboard distance metrics. Based on the proposed sequential algorithm, the CDT of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. Following the mapping as proposed by Lee and Horng, the algorithm for the medial axis transform is also efficiently derived. The medial axis transform of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. The proposed parallel algorithms are composed of a set of prefix operations. In each prefix operation phase, only increase (add-one) operation and minimum operation are employed. So, the algorithms are especially efficient in practical applications.  相似文献   

11.
L. Chen 《Algorithmica》1997,17(3):266-280
Based on Tucker's work, we present an accurate proof of the characterization of proper circular arc graphs and obtain the first efficient parallel algorithm which not only recognizes proper circular arc graphs but also constructs proper circular arc representations. The algorithm runs inO(log2 n) time withO(n 3) processors on a Common CRCW PRAM. The sequential algorithm can be implemented to run inO(n 2) time and is optimal if the input graph is given as an adjacency matrix, so to speak. Portions of this paper appear in preliminary form in theProceedings of the 1989Workshop on Algorithms and Data Structures [2], and theProceedings of the 1994International Symposium on Algorithms and Computation [5].  相似文献   

12.
Congruence closure algorithms for deduction in ground equational theories are ubiquitous in many (semi-)decision procedures used for verification and automated deduction. In many of these applications one needs an incremental algorithm that is moreover capable of recovering, among the thousands of input equations, the small subset that explains the equivalence of a given pair of terms. In this paper we present an algorithm satisfying all these requirements. First, building on ideas from abstract congruence closure algorithms, we present a very simple and clean incremental congruence closure algorithm and show that it runs in the best known time O(n  log  n). After that, we introduce a proof-producing union-find data structure that is then used for extending our congruence closure algorithm, without increasing the overall O(n  log  n) time, in order to produce a k-step explanation for a given equation in almost optimal time (quasi-linear in k). Finally, we show that the previous algorithms can be smoothly extended, while still obtaining the same asymptotic time bounds, in order to support the interpreted functions symbols successor and predecessor, which have been shown to be very useful in applications such as microprocessor verification.  相似文献   

13.
In this paper, we present a parallel sorting algorithm using the technique of multi-way merge. This algorithm, when implemented on a t dimensional mesh having nt nodes (t>2), sorts nt elements in O((t2−3t+2) n) time, thus offering a better order of time complexity than the [((t2t) n log n)/2+O(nt)]-time algorithm of P. F. Corbett and I. D. Scherson (1992, IEEE Trans. Parallel Distrib. Systems3, 626–632). Further, the proposed algorithm can also be implemented on a Multi-Mesh network (1999, D. Das, M. De, and B. P. Sinha, IEEE Trans. Comput.48, 536–551) to sort N elements in 54N1/4+o(N1/4) steps, which shows an improvement over 58N1/4+o(N1/4) steps needed by the algorithm in (1997, M. De, D. Das, M. Ghosh, and P. B. Sinha, IEEE Trans. Comput.46, 1132–1137).  相似文献   

14.
We present new, efficient algorithms for some fundamental computations with finite-dimensional (but not necessarily commutative) associative algebras over finite fields. For a semisimple algebra A we show how to compute a complete Wedderburn decomposition of A as a direct sum of simple algebras, an isomorphism between each simple component and a full matrix algebra, and a basis for the centre of A. If A is given by a generating set of matrices inFm × m, then our algorithm requires aboutO (m3) operations inF, in addition to the cost of factoring a polynomial inF[ x ] of degree O(m), and the cost of generating a small number of random elements from A. We also show how to compute a complete set of orthogonal primitive idempotents in any associative algebra over a finite field in this same time.  相似文献   

15.
We prove new lower bounds for nearest neighbor search in the Hamming cube. Our lower bounds are for randomized, two-sided error, algorithms in Yao's cell probe model. Our bounds are in the form of a tradeoff among the number of cells, the size of a cell, and the search time. For example, suppose we are searching among n points in the d dimensional cube, we use poly(n,d) cells, each containing poly(d, log n) bits. We get a lower bound of Ω(d/log n) on the search time, a significant improvement over the recent bound of Ω(log d) of Borodin et al. This should be contrasted with the upper bound of O(log log d) for approximate search (and O(1) for a decision version of the problem; our lower bounds hold in that case). By previous results, the bounds for the cube imply similar bounds for nearest neighbor search in high dimensional Euclidean space, and for other geometric problems.  相似文献   

16.
Clustering is a basic operation in image processing and computer vision, and it plays an important role in unsupervised pattern recognition and image segmentation. While there are many methods for clustering, the single-link hierarchical clustering is one of the most popular techniques. In this paper, with the advantages of both optical transmission and electronic computation, we design efficient parallel hierarchical clustering algorithms on the arrays with reconfigurable optical buses (AROB). We first design three efficient basic operations which include the matrix multiplication of two N×N matrices, finding the minimum spanning tree of a graph with N vertices, and identifying the connected component containing a specified vertex. Based on these three data operations, an O(log N) time parallel hierarchical clustering algorithm is proposed using N3 processors. Furthermore, if the connectivity of the AROB with four-port connection is allowed, two constant time clustering algorithms can be also derived using N4 and N3 processors, respectively. These results improve on previously known algorithms developed on various parallel computational models.  相似文献   

17.
Fix a finite commutative ringR. Letuandvbe power series overR, withv(0) = 0. This paper presents an algorithm that computes the firstnterms of the compositionu(v), given the firstnterms ofuandv, inn1 + o(1)ring operations. The algorithm is very fast in practice whenRhas small characteristic.  相似文献   

18.
Thegeobucketdata structure is a suitable intermediate representation of polynomials for performing large numbers of polynomial additions in the face of interspersed lead-term extractions. A sum involvingNterms has worst-case running timeO(N log N), matching or surpassing the performance of lists and binomial heaps. This makes the geobucket a good choice for performing reductions in Gröbner basis computations.  相似文献   

19.
We modify the concept of LLL-reduction of lattice bases in the sense of Lenstra, Lenstra, Lovász, Factoring polynomials with rational coefficients, Math. Ann. 261 (1982) 515–534 towards a faster reduction algorithm. We organize LLL-reduction in segments of the basis. Our SLLL-bases approximate the successive minima of the lattice in nearly the same way as LLL-bases. For integer lattices of dimension n given by a basis of length 2O(n), SLLL-reduction runs in O (n5 +ε) bit operations for every ε > 0, compared to O (n7 +ε) for the original LLL and to O (n6 +ε) for the LLL-algorithms of Schnorr, A more efficient algorithm for lattice reduction, Journal of Algorithm, 9 (1988) 47–62 and Storjohann, Faster Algorithms for Integer Lattice Basis Reduction. TR 249, Swiss Federal Institute of Technology, ETH-Zurich, Department of Computer Science, Zurich, Switzerland, July 1996. We present an even faster algorithm for SLLL-reduction via iterated subsegments running in O (n3log n) arithmetic steps. Householder reflections are shown to provide better accuracy than Gram–Schmidt for orthogonalizing LLL-bases in floating point arithmetic.  相似文献   

20.
Previous research on developing parallel triangulation algorithms concentrated on triangulating planar point sets.O(log3 n) running time algorithms usingO(n) processors have been developed in Refs. 1 and 2. Atallah and Goodrich(3) presented a data structure that can be viewed as a parallel analogue of the sequential plane-sweeping paradigm, which can be used to triangulate a planar point set inO(logn loglogn) time usingO(n) processors. Recently Merks(4) described an algorithm for triangulating point sets which runs inO(logn) time usingO(n) processors, and is thus optimal. In this paper we develop a parallel algorithm for triangulating simplicial point sets in arbitrary dimensions based on the idea of the sequential algorithm presented in Ref. 5. The algorithm runs inO(log2 n) time usingO(n/logn) processors. The algorithm hasO(n logn) as the product of the running time and the number of processors; i.e., an optimal speed-up.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号